Course Description
This course introduces Python, a programming language widely regarded as the standard for data scientists. It aims to cover the fundamentals of Python programming, along with data exploration and visualisation using Python. The course also covers the design, analysis, and implementation of core data structures in Python, as well as common algorithms that operate on those data structures. Key data science tools, including NumPy, SciPy, Pandas, Matplotlib, Seaborn, Vega-Altair, and Scikit-learn, will be introduced. Students will use Jupyter Notebooks for interactive Python programming and Visual Studio Code for code editing. Course assignments will be released via the "assignments" folder on the DSCD 611 Students' Stuffs Google Drive weekly. Students are supposed to check and download it, and submit it before the deadline. Other course materials can also be accessed via the same drive.
Learning Outcomes
This class emphasises practical skills for real-world data science applications. It is expected that at the end of this course, students will be able
- to programme effectively in Python for scientific and statistical data analysis projects;
- to leverage open-source libraries and tools to enhance their skills;
- to select and implement Python workflows that suit their project needs; and
- to contribute and participate in the worldwide Python and data science community.
Instructors
Clifford Broni-Bediako (Email: bronibbc@yahoo.co.uk) and Michael Soli (Email: msoli@ug.edu.gh/agbotettey@gmail.com)
Prerequisites
There are no prerequisites for this course; however, having some programming skills would be beneficial to students. Students are expected to know how to use a text editor and a terminal, as well as understand how computer programs work, etc. Students need access to a computer with a recent operating system of their choice for the course.
Students Responsibilities & Professional Conduct
Students are expected to read the assigned materials before each scheduled class, attend and actively participate in discussions, complete assignments and projects on time, and seek help as soon as they encounter difficulties. Attendance will not be taken; therefore, you are expected to attend class regularly and be prepared to participate in the learning process and contribute to your group project. Students are expected to turn off their mobile phones and refrain from sending emails, texts, and other messages. Also, be respectful and courteous towards other students and the instructors.
Course Textbook
The course readings will rely on the following textbooks:
- [MW] McKinney, W. (2022). Python for Data Analysis, 3rd ed. O'Reilly Media.
- [LB] Lubanovic, B. (2019). Introducing Python: Modern Computing in Simple Packages, 2nd ed. O'Reilly Media.
- [BE] Bressert, E. (2013). SciPy and NumPy. O'Reilly Media.
- [UJ] Unpingco, J. (2021). Python Programming for Data Analysis. Springer Cham.
Course Schedule
| Week | Topics | Readings (see the textbooks) | Remarks |
|---|---|---|---|
| Part 1: Python programming | |||
| 1 | Introducing Python language, tools, and setting up Python environment. | MW, Chapter 1 LB, Chapter 1 | Assignment 0 (non-examinable) |
| 2 | Syntax, data types, variables, reserved keywords, operators, numbers, commenting, and control structures (loops and conditionals). | MW, Chapter 2LB, Chapters 2, 3, 4, & 6UJ, Chapter 1 | Assignment 1 released |
| 3 | Data structures (containers): lists, tuples, sets, and dictionaries; handling basic text strings; and basic debugging with print(). | MW, Chapter 3LB, Chapters 5, 7, & 8UJ, Chapter 1 | Assignment 2 is released Assignment 1 is due Project groups formed |
| 4 | Functions, generators, decorators, iterators, iterables, code readability, and docstrings. | MW, Chapter 3LB, Chapter 9UJ, Chapter 1 | Assignment 3 is released Assignment 2 is due |
| 5 | Tabular (CSV, JSON, Excel) and text files handling; file directories; error handling (try and except, and assertion); challenging text strings (regular expressions, unicode, bytes); and date and time. | MW, Chapters 6 & 7.4LB, Chapters 12, 13, & 14UJ, Chapters 1 | Assignment 4 is released Assignment 3 is due Project proposal is due |
| 6 | Object-Oriented Programming (OOP) | LB, Chapter 10UJ, Chapter 2 | Assignment 5 is released Assignment 4 is due |
| 7 | Modules and packages, unittest and doctest, advanced debugging with pdb and breakpoints, and measuring time complexity. | LB, Chapter 11UJ, Chapter 3 | Assignment 6 is released Assignment 5 is due Quiz 1 |
| Part 2: Python tools for data science | |||
| 8 | Numpy and SciPy for numerical computing | MW, Chapter 6LB, Chapters 16 & 18 | Assignment 7 is released Assignment 6 is due |
| 9 | Data manipulation with Pandas (handling tabular files, missing data, duplicates, and outliers; merging, reshaping, filtering, etc.). | MW, Chapter 4 & Appendix ALB, Chapter 22BE, Chapters 2 & 3UJ, Chapter 4 | Assignment 8 is released Assignment 7 is due |
| 10 | Data visualisation with Matplotlib, Seaborn, Bokeh, Vega-Altair, and Holoviews. | MW, Chapters 7 & 8LB, Chapter 22UJ, Chapter 5 | Assignment 9 is released Assignment 8 is due |
| 11 | Data analysis with Pandas, statsmodels, and SciPy (descriptive statistics, correlation analysis, grouping, aggregation). | MW, Chapters 10 & 12LB, Chapter 22BE, Chapters 3UJ, Chapter 5 | Assignment 10 is released Assignment 9 is due |
| 12 | Scrape information from web pages: APIs, HTTP requests, JSON responses; Handling HTML, XML, and YAML data; BeautifulSoup for HTML data. | MW, Chapter 9UJ, Chapter 6 | Assignment 10 is due Quiz 2 |
| 13 | Statistical machine learning algorithms with Scikit-learn. | MW, Chapter 12BE, Chapter 4 | No Assignment/Quiz |
Note: Assignments/Quizzes will be done with Jupyter Notebook and only a single .ipynb file will be submitted per assignment/quiz for auto-grading. The file naming format is: <StudentID>_assignment<number>.ipynb. Example: 355683_assignment1.ipynb (take notice of the assignment number). Not adhering to this file naming format will cause your assignment/quiz not to be graded. Each assignment is 10%, making a total of 100% for the ten assignments. And each quiz is 50%, which makes a total of 100% for the two quizzes.
Final Project
The final project offers students the chance to practically apply their Python skills by extracting, transforming, and analysing a public dataset of their choice while working in teams to tackle a meaningful social or scientific question. Students will be randomly assigned to groups (3-5 students per team) in the third week of the class. Each group will prepare and submit a 1-page proposal. Once the proposal is approved, the team starts work and arranges a meeting with instructors to review progress. Each team will present their analysis and results, as well as submit a 3-page project report in the week after the last class.
The 1-page (excluding references and any appendices) proposal should address the following: (1) the topic to be studied; (2) what is known of the topic; (3) why is the topic interesting, relevant, or important; (4) description of data to be used; (5) expected results and impact; and (6) how the project will be done (tools/methods to be used, project plan, and what each team member will do).
The final 3-page (excluding references and any appendices) project report should address the following: (1) the topic that was studied; (2) what is known of the topic; (3) why is the topic interesting, relevant, or important; (4) description of the data used; (5) how the project was done (tools/methods that were used); (6) the results and societal impact; (7) what each team member did to complete the project; and (8) reflections on the project.
The final project will be examined based on the following criteria:
- importance/relevance of the topic/research question (5%)
- dataset/task complexity (5%)
- code quality and reproducibility (40%)
- insightful data/results graphics (20%)
- final presentation and teamwork (15%)
- project report (15%)
Students can check the following GitHub repos to appreciate the kind of project expected from each team:
- https://github.com/michaelbaluja/ECE143_WI22_FinalProject_Group14
- https://github.com/ParamChordiya/ECE143-UCSD-GROUP-5
- https://github.com/TRBG/ECE-143-Project
Publicly available datasets for research:
Grading Policy
There is no midterm or final exam for this course. The final week of the class is reserved for project presentations and the submission of project reports.
The grading of this course will be a model: \(h(x_1, x_2, x_3, x_4) = 0.3x_1 + 0.2x_2 + 0.4x_3 + 0.1x_4\) where \(x_1\) is the 10-weekly assignments, \(x_2\) is the two quizzes, \(x_3\) is the final project, and \(x_4\) is the student's responsibility and professional conduct. The model \(h\) shall predict (grade) student's performance in this course as follows:
| Grade | Marks (%) | Interpretation | Grade Point |
|---|---|---|---|
| A | 80 - 100 | Excellent | 4.00 |
| B+ | 70 - 79 | Very Good | 3.50 |
| B | 60 - 69 | Good | 3.00 |
| C | 50 - 59 | Pass | 2.00 |
| D | 30 - 49 | Fail | 1.50 |
| F | 0 - 29 | Pass | 1.00 |
Academic Honesty, Collaboration, AI Tools Usage
UG students and faculty are expected to uphold the highest standards of academic honesty. Students can find information on the core principles and standards in the University's policy on academic integrity, which is accessible at the University of Ghana Plagiarism Policy. Plagiarism includes, but not limited to:
- copying of code or text with none or minimal reformatting (e.g., just changing variable names);
- just translating an algorithm or a script from one language to another; and
- reproducing (copying and pasting) code or text generated by an automatic code/text generation tool.
Students must ensure that full acknowledgement (citation) is given to the authors of any source code or text that is reused. Students' works may be checked using similarity-detection software. At a minimum, violating the plagiarism policy and examination rules will result in a grade of 0 for a particular assignment/quiz. Repeated violations may be referred to Academic Affairs for a more severe penalty.
This class offers a collaborative learning experience. Students are expected to discuss ideas with their peers but should write their own solutions independently, without referencing solutions from other students. This encourages students to work independently while sharing ideas on how to test their implementation and evaluate their own work, rather than claiming other students' work as their own, which is a crucial part of integrity in their future careers. Sharing your solutions with another student breaches the University's examination rules. Therefore, students should not make their assignment solutions publicly available, such as posting them on social networks or a public git repository (the final project work can be hosted on a public git repository after submission).
Students are permitted to use generative AI tools, such as Gemini, GPT-4, and Co-Pilot, in a manner that is considered acceptable as human collaboration. You are not allowed to ask for solutions or copy code directly, and you should indicate when a generative AI tool is used. Like human collaboration assistance, students are ultimately responsible and accountable for their own work.
Additional Reference Materials
The following texts also provide helpful information:
- Python documentation
- For environment setup, check a guide at ECE 313H (Jon Tamir)
- Think Python: How to Think Like a Computer Scientist
- A Crash Course in Python for Scientists
- A Whirlwind Tour of Python
- Python Data Science Handbook
- Kaggle Learn Python Tutorials
- Scientific Python Lectures
- Automate the Boring Stuff with Python
- Python Cheatsheet
- Conda Cheatsheet
- Git Cheatsheet
- Python for Data Scientists (University at Buffalo)
- Getting Started with Google Colab
- Getting Started with Python in Google Colab
- Jupyter Notebooks in VS Code
- Jupyter Notebook Basics
- NumPy quickstart
- Python Numpy Tutorial (with Jupyter and Colab)
- How to Use Kaggle
Website by CBB
