Course Description

This course introduces Python, a programming language widely regarded as the standard for data scientists. It aims to cover the fundamentals of Python programming, along with data exploration and visualisation using Python. The course also covers the design, analysis, and implementation of core data structures in Python, as well as common algorithms that operate on those data structures. Key data science tools, including NumPy, SciPy, Pandas, Matplotlib, Seaborn, Vega-Altair, and Scikit-learn, will be introduced. Students will use Jupyter Notebooks for interactive Python programming and Visual Studio Code for code editing. Course assignments will be released via the "assignments" folder on the DSCD 611 Students' Stuffs Google Drive weekly. Students are supposed to check and download it, and submit it before the deadline. Other course materials can also be accessed via the same drive.


Learning Outcomes

This class emphasises practical skills for real-world data science applications. It is expected that at the end of this course, students will be able

  • to programme effectively in Python for scientific and statistical data analysis projects;
  • to leverage open-source libraries and tools to enhance their skills;
  • to select and implement Python workflows that suit their project needs; and
  • to contribute and participate in the worldwide Python and data science community.


Instructors

Clifford Broni-Bediako (Email: bronibbc@yahoo.co.uk) and Michael Soli (Email: msoli@ug.edu.gh/agbotettey@gmail.com)


Prerequisites

There are no prerequisites for this course; however, having some programming skills would be beneficial to students. Students are expected to know how to use a text editor and a terminal, as well as understand how computer programs work, etc. Students need access to a computer with a recent operating system of their choice for the course.


Students Responsibilities & Professional Conduct

Students are expected to read the assigned materials before each scheduled class, attend and actively participate in discussions, complete assignments and projects on time, and seek help as soon as they encounter difficulties. Attendance will not be taken; therefore, you are expected to attend class regularly and be prepared to participate in the learning process and contribute to your group project. Students are expected to turn off their mobile phones and refrain from sending emails, texts, and other messages. Also, be respectful and courteous towards other students and the instructors.


Course Textbook

The course readings will rely on the following textbooks:

  • [MW] McKinney, W. (2022). Python for Data Analysis, 3rd ed. O'Reilly Media.
  • [LB] Lubanovic, B. (2019). Introducing Python: Modern Computing in Simple Packages, 2nd ed. O'Reilly Media.
  • [BE] Bressert, E. (2013). SciPy and NumPy. O'Reilly Media.
  • [UJ] Unpingco, J. (2021). Python Programming for Data Analysis. Springer Cham.


Course Schedule

WeekTopicsReadings (see the textbooks)Remarks
Part 1: Python programming
1 Introducing Python language, tools, and setting up Python environment. MW, Chapter 1
LB, Chapter 1
Assignment 0
(non-examinable)
2 Syntax, data types, variables, reserved keywords, operators, numbers, commenting, and control structures (loops and conditionals). MW, Chapter 2
LB, Chapters 2, 3, 4, & 6
UJ, Chapter 1
Assignment 1 released
3 Data structures (containers): lists, tuples, sets, and dictionaries; handling basic text strings; and basic debugging with print(). MW, Chapter 3
LB, Chapters 5, 7, & 8
UJ, Chapter 1
Assignment 2 is released
Assignment 1 is due
Project groups formed
4 Functions, generators, decorators, iterators, iterables, code readability, and docstrings. MW, Chapter 3
LB, Chapter 9
UJ, Chapter 1
Assignment 3 is released
Assignment 2 is due
5 Tabular (CSV, JSON, Excel) and text files handling; file directories; error handling (try and except, and assertion); challenging text strings (regular expressions, unicode, bytes); and date and time. MW, Chapters 6 & 7.4
LB, Chapters 12, 13, & 14
UJ, Chapters 1
Assignment 4 is released
Assignment 3 is due
Project proposal is due
6 Object-Oriented Programming (OOP) LB, Chapter 10
UJ, Chapter 2
Assignment 5 is released
Assignment 4 is due
7 Modules and packages, unittest and doctest, advanced debugging with pdb and breakpoints, and measuring time complexity. LB, Chapter 11
UJ, Chapter 3
Assignment 6 is released
Assignment 5 is due
Quiz 1
Part 2: Python tools for data science
8 Numpy and SciPy for numerical computing MW, Chapter 6
LB, Chapters 16 & 18
Assignment 7 is released
Assignment 6 is due
9 Data manipulation with Pandas (handling tabular files, missing data, duplicates, and outliers; merging, reshaping, filtering, etc.). MW, Chapter 4 & Appendix A
LB, Chapter 22
BE, Chapters 2 & 3
UJ, Chapter 4
Assignment 8 is released
Assignment 7 is due
10 Data visualisation with Matplotlib, Seaborn, Bokeh, Vega-Altair, and Holoviews. MW, Chapters 7 & 8
LB, Chapter 22
UJ, Chapter 5
Assignment 9 is released
Assignment 8 is due
11 Data analysis with Pandas, statsmodels, and SciPy (descriptive statistics, correlation analysis, grouping, aggregation). MW, Chapters 10 & 12
LB, Chapter 22
BE, Chapters 3
UJ, Chapter 5
Assignment 10 is released
Assignment 9 is due
12 Scrape information from web pages: APIs, HTTP requests, JSON responses; Handling HTML, XML, and YAML data; BeautifulSoup for HTML data. MW, Chapter 9
UJ, Chapter 6
Assignment 10 is due
Quiz 2
13 Statistical machine learning algorithms with Scikit-learn. MW, Chapter 12
BE, Chapter 4
No Assignment/Quiz

Note: Assignments/Quizzes will be done with Jupyter Notebook and only a single .ipynb file will be submitted per assignment/quiz for auto-grading. The file naming format is: <StudentID>_assignment<number>.ipynb. Example: 355683_assignment1.ipynb (take notice of the assignment number). Not adhering to this file naming format will cause your assignment/quiz not to be graded. Each assignment is 10%, making a total of 100% for the ten assignments. And each quiz is 50%, which makes a total of 100% for the two quizzes.


Final Project

The final project offers students the chance to practically apply their Python skills by extracting, transforming, and analysing a public dataset of their choice while working in teams to tackle a meaningful social or scientific question. Students will be randomly assigned to groups (3-5 students per team) in the third week of the class. Each group will prepare and submit a 1-page proposal. Once the proposal is approved, the team starts work and arranges a meeting with instructors to review progress. Each team will present their analysis and results, as well as submit a 3-page project report in the week after the last class.

The 1-page (excluding references and any appendices) proposal should address the following: (1) the topic to be studied; (2) what is known of the topic; (3) why is the topic interesting, relevant, or important; (4) description of data to be used; (5) expected results and impact; and (6) how the project will be done (tools/methods to be used, project plan, and what each team member will do).

The final 3-page (excluding references and any appendices) project report should address the following: (1) the topic that was studied; (2) what is known of the topic; (3) why is the topic interesting, relevant, or important; (4) description of the data used; (5) how the project was done (tools/methods that were used); (6) the results and societal impact; (7) what each team member did to complete the project; and (8) reflections on the project.

The final project will be examined based on the following criteria:

  • importance/relevance of the topic/research question (5%)
  • dataset/task complexity (5%)
  • code quality and reproducibility (40%)
  • insightful data/results graphics (20%)
  • final presentation and teamwork (15%)
  • project report (15%)

Students can check the following GitHub repos to appreciate the kind of project expected from each team:

Publicly available datasets for research:


Grading Policy

There is no midterm or final exam for this course. The final week of the class is reserved for project presentations and the submission of project reports.

The grading of this course will be a model: \(h(x_1, x_2, x_3, x_4) = 0.3x_1 + 0.2x_2 + 0.4x_3 + 0.1x_4\)
where \(x_1\) is the 10-weekly assignments, \(x_2\) is the two quizzes, \(x_3\) is the final project, and \(x_4\) is the student's responsibility and professional conduct. The model \(h\) shall predict (grade) student's performance in this course as follows:

GradeMarks (%)InterpretationGrade Point
A80 - 100Excellent4.00
B+70 - 79Very Good3.50
B60 - 69Good3.00
C50 - 59Pass2.00
D30 - 49Fail1.50
F0 - 29Pass1.00

Academic Honesty, Collaboration, AI Tools Usage

UG students and faculty are expected to uphold the highest standards of academic honesty. Students can find information on the core principles and standards in the University's policy on academic integrity, which is accessible at the University of Ghana Plagiarism Policy. Plagiarism includes, but not limited to:

  • copying of code or text with none or minimal reformatting (e.g., just changing variable names);
  • just translating an algorithm or a script from one language to another; and
  • reproducing (copying and pasting) code or text generated by an automatic code/text generation tool.

Students must ensure that full acknowledgement (citation) is given to the authors of any source code or text that is reused. Students' works may be checked using similarity-detection software. At a minimum, violating the plagiarism policy and examination rules will result in a grade of 0 for a particular assignment/quiz. Repeated violations may be referred to Academic Affairs for a more severe penalty.

This class offers a collaborative learning experience. Students are expected to discuss ideas with their peers but should write their own solutions independently, without referencing solutions from other students. This encourages students to work independently while sharing ideas on how to test their implementation and evaluate their own work, rather than claiming other students' work as their own, which is a crucial part of integrity in their future careers. Sharing your solutions with another student breaches the University's examination rules. Therefore, students should not make their assignment solutions publicly available, such as posting them on social networks or a public git repository (the final project work can be hosted on a public git repository after submission).

Students are permitted to use generative AI tools, such as Gemini, GPT-4, and Co-Pilot, in a manner that is considered acceptable as human collaboration. You are not allowed to ask for solutions or copy code directly, and you should indicate when a generative AI tool is used. Like human collaboration assistance, students are ultimately responsible and accountable for their own work.


Additional Reference Materials

The following texts also provide helpful information:



Website by CBB