Distributed Systems
Tuesday/Thursday 3:30 PM to 5:00 PM
Venue: MCLD-Floor 2-Room 2002
Instructor: Arpan Gujarati
Teaching Assistants: Aida Aminian, Heng Zhao, Philip Schowitz, Rachel Li, Wyatt Zhang
Online Forums: Piazza, Canvas, Discord
Leslie Lamport, the 2013 ACM Turing Award winner, offered a memorable definition of a distributed system:
A distributed system is one in which the failure of a computer you didn’t even know existed can render your own computer unusable.
Despite this sobering description, distributed systems offer numerous benefits. They can be more fault tolerant by avoiding single points of failure and eliminating centralized components. By adding more physical nodes, a system can also improve its performance and become more scalable, capable of handling increased load. Geographic diversity, i.e., placing resources closer to clients, can reduce latency as well. However, achieving these benefits is not easy. As Lamport's quote illustrates, distributed systems can fail in subtle and complex ways. They are often more difficult to design, build, test, and reason about than centralized systems.
This course introduces a broad range of topics in distributed systems, as outlined in the tentative schedule below. While much of the material will be covered in lecture format, distributed systems are notoriously difficult to fully grasp without hands-on experience. Therefore, a significant emphasis of the course will be on designing and building distributed system prototypes, both small and large.
Overview
- Prerequisites
-
- Computer networking basics (CPSC 317 or ELEC 331)
- Operating system basics (CPSC 313 or CPEN 331)
- Learning objectives
-
- Understand key principles behind the design and implementation of distributed systems
- Analyze and reason about problems involving distributed components
- Become familiar with fundamental techniques to solve challenges that arise in distributed environments
- Design and build distributed system prototypes using the Go programming language
- Communication
- We will use Piazza for all course-related communication. URL and access code will be shared in the first lecture.
- We have also setup a Discord server to facilitate discussion among students. Invitation link will be shared in the first lecture.
- Further, the teaching staff will have at least one, weekly, in-person office hours, which will be announced in class.
- Lectures
-
- Lectures will focus primarily on understanding core distributed systems concepts and how they apply to real-world systems. See the Schedule for details.
- Research papers, sometimes textbook chapters, and lecture notes will serve as the primary reading materials. We will make these available on Canvas.
- Lectures will not be recorded.
- Programming Assignments / Project / Labs
- There is no open-ended project component in this course. There are three well-structured programming labs (assignments).
- All programming labs are to be done in teams of up to 3 students and will require Go Programming skills.
- The assignments are released and due (at 6 PM, Vancouver time) on the dates specified in the Schedule.
- More information will be released at the time of the release of each programming lab.
- Please review the No Copying Policy and the late submission policy in the Accommodations section.
- Grading Exams will constitute 75% of the total grade and project labs will constitute 25% of the total grade.
- Quiz 1 (20%)
- Quiz 2 (20%)
- Final Exam (35%)
- LAB 1 (5%)
- LAB 2 (15%)
- LAB 3 (5%)
Exams may be organized using the Computer Science department's Computer Based Testing Facilities (CBTF).
Each quiz will cover topics from the start of the class (or since the previous quiz) up to the last lecture before the quiz.
The final exam will have two parts: Part I will cover the last few lectures of the course not covered by the quizzes, and Part II will cover the entire course.
The exams will evaluate your understanding of the lecture material and also your understanding of the programming labs.
You do not need to pass the Final Exam to pass the course. We will look at your total grade across all components.
Note that sharing quiz / exam questions and answers to any external site, or to people outside the course section, now or at a later point in time, is forbidden.
We will evaluate your solutions to the labs using automated tests as well as through oral examination, in addition to questions from the labs in the exams.
- Waiting List
- Do not contact the instructor or course staff about the waiting list or about admission into the class -- instructors have no control over who gets into the course.
- Waitlists are processed in priority order by the department.
- Unfortunately, we cannot sign course registration forms and have no knowledge or control over the class composition or waitlists.
- If you have any questions about registration, please contact the CS advisors.
- If you are on the waiting list, you are required to keep up with all the course work.
- Amanda and Stewart led an in-class Go tutorial in the Winter 2017 version of the course. Here is the recorded version: part 1, and part 2. These are for an earlier version of Go. They are still useful, but take them with a grain of salt.
- Late submissions for programming assignments.
- Other Absences We will deal with absences from quizzes and the final exam on a case-by-case basis.
- IMPORTANT: DO NOT copy/share any portion of the programming assignments. Copying any portion (even from your own project from a prior semester) means you cheated on the whole project. Cheating is copying or sharing solutions. All work must be done by you. A tutor must not help you with project content. You are not allowed to use (look at, copy) any online source that is specifically related to this project beyond those explicitly provided by us. If you are using Github or similar source control system, your repository must remain private, and must not be shared publicly on the internet.
- Specifically, you are NOT allowed to
- look at prior solutions to this project, or current students' code for the project
- use, look at, etc. any aspects of your own prior solution, if you have taken a similar course previously
- copy code or algorithms from others or the web without appropriate attribution
- discuss code-level details or pseudocode level algorithms with students past or present or take other students' help in any detailed way
- share or discuss details of code or algorithms with students past, present or future or with tutors
- post code to the web during the course, or after completing this course
Make sure every line of code you submit is either provided by us, or written by you.
If you see snippets of code on Stack Overflow or some general resource, you can use them but you must cite your source, and indicate the extent of the code used.
This policy is bidirectional: whether you copied some other code, or your code was copied, the penalties are the same.
- It is straightforward to avoid violating this policy!
- Always make sure your work is your own.
- Never look at or copy another student's solution, either from this term or past terms, either from UBC or from other schools.
- If you have a tutor, don't ask them project-specific questions; use them to help you generally on skill building.
- Do not share your code with other students or post your solution online.
After the respective lab deadlines, we may carefully examine all solutions using automated analyses and the course staff may carefully manually examine every case of derivative code.
We will contact you if we have any questions about whether you violated the policy at some point during the semester.
- Use of Generative AI You may use generative AI tools (e.g., ChatGPT, Claude, Gemini) in this course, but you must use them responsibly. Remember, you are here to learn. These tools can speed up tasks (e.g., brainstorming, debugging, clarifying concepts), but they can also short-circuit your learning if misused (e.g., copy-pasting solutions without understanding them). Always ask yourself: Is this tool helping me learn, or harming my learning? Use it to support, not substitute, your effort.
- Key Guidelines for Using Generative AI
- Work must be your own: All assessments must reflect your own thinking unless otherwise specified. Submitting AI-generated work without understanding it is academic dishonesty.
- If you use GenAI on an assessment, cite it (i.e., name the tool) and annotate it (i.e., briefly explain how you used it). For example, "I asked ChatGPT for a simple explanation of Raft's leader election process. I compared its response with my notes, clarified one detail about the protocol implementation, and then wrote my own code."
- Do not share sensitive content: Never paste into AI tools student information (names, IDs, personal data) or quiz questions or answers
- For group work: All team members must know about and agree to any generative AI use. The group is collectively responsible for ensuring the final work complies with this policy.
- If your instructor suspects overuse, you may be asked to explain your work orally in detail. If you cannot, penalties will apply.
- Failure to follow these rules will be treated as a violation of UBC's academic integrity policy.
- The official policies for Academic Misconduct at UBC can be found at the following links.
Piazza and office hours are intended to be the primary mechanism to communicate with the teaching team. You may use private posts on Piazza to communicate with the instructor and the TAs. We will monitor the Discord server but not moderate it or answer questions on it. While we encourage participation on both Piazza and Discord, your activity (or lack thereof) on these platforms will not affect your grade in any way.
Schedule (a work in progress; will change)
Date / Day | Topic | Slides | Additional Reading Material | Milestones | |
09/02, Tuesday | No Lectures; Imagine Day | 09/04, Thursday | Overview | L1 | |
09/09, Tuesday | Programming with Threads | L2 | OSTEP (Chapters 25-34) Go for C programmers, A Tour of Go |
||
09/11, Thursday | Implementing Remote Procedure Calls | L3 | TOCS 1984 (optional) VST (Chapter 4) (optional) |
||
09/16, Tuesday | Distributed Computing with MapReduce | OSDI 2004 | LAB 1 Released! | ||
09/18, Thursday | Lab 1 Design Discussion | ||||
09/23, Tuesday | Crash Consistency with FSCK and Journaling | OSTEP (Chapter 42) | |||
09/25, Thursday | Two-phase Commit | VST (Section 8.5) | |||
09/30, Tuesday | Statutory Holiday: National Day for Truth and Reconciliation | ||||
10/02, Thursday | No Lectures; QUIZ 1 | LAB 1 Due! | |||
10/07, Tuesday | The Raft Consensus Protocol | ATC 2014 | LAB 2 Released! | ||
10/09, Thursday | ^ | ^ | |||
10/14, Tuesday | ^ | ^ | |||
10/16, Thursday | Lab 2 Design Discussion | ||||
10/21, Tuesday | Practical Byzantine Fault Tolerance | OSDI 1999, DS (Section 8.2.5) | |||
10/23, Thursday | Sequential Consistency | DS (Section 7.2.1), TOCS 1989 | |||
10/28, Tuesday | File Synchronization with Vector Time Pairs | MIT-CSAIL-TR 2005 | |||
10/30, Thursday | Eventual Consistency | DS (Section 5.2), SOSP 1995 | |||
11/04, Tuesday | No Lectures; QUIZ 2 | LAB 2 Due! | |||
11/06, Thursday | Mutual Exclusion | DS (Section 5.3), CACM 1981, CACM 2022 | LAB 3 Released! | ||
11/11, Tuesday | Lab 3 Design Discussion | ||||
11/13, Thursday | No Lectures; Midterm Break | ||||
11/18, Tuesday | Distributed Snapshots | Coulouris (Section 14.5), TOCS 1985 | |||
11/20, Thursday | Distributed Stream Processing in Apache Flink | VLDB 2017 | |||
11/25, Tuesday | Distributed Hashing with Chord | SIGCOMM 2001, DS (Chapter 6) | |||
11/27, Thursday | Cryptocurrencies | Bitcoin Blog 2013, Bitcoin Paper 2008 | |||
12/02, Tuesday | TBD | ||||
12/04, Thursday | TBD | LAB 3 Due! | |||
TBA | FINAL EXAM |
Go Programming
In this course we will exclusively use the Go programming language for all project labs.
Learning a new programming language is an important skill.
You will practice it in this course.
For the most part I will expect that you learn this language on your own.
We will be using Go version 1.24.5 (available at /cs/local/bin/go
on ugrad servers).
If you use a personal machine, make sure to install this exact version.
Though, please note that all homework solutions will be tested on the ugrad server machines.
Go is a systems language originally introduced by Google. It is especially well suited to building distributed systems. Like with any language, the fastest way to become proficient at Go is to put in the time writing programs in Go. Here are some resources to get you started:
Accommodations
For all labs, we will apply a flexible slip date policy for late submissions. Each student will be allocated an automatic extension of 3 calendar days for the entire course. You can use the extension in daily increments. For instance, you can hand in Lab 1 three days late, or Lab 1 two days late and Lab 2 one day late. Since we measure extensions in daily increments, submitting an assignment 1 hour late is equivalent to submitting it 1 day late. The time allowed for each lab is already quite generous, so you are expected to use the extensions only for extraordinary circumstances! We will provide more instructions on how to seek extensions along with the submission instructions.
No Copying Policy
How to Do Well in This Course
Learn Go early and practice it regularly. Learning a new language while being time constrained is stressful and not fun. Since the assignments rapidly increase in their difficulty, it will be to your advantage to learn Go as quickly as possible and to learn it well. The posted Go resources are a great starting point, but reading is no substitute for practice, bug, debug, practice, practice, bug, coffee, debug, practice, ...
Do not skimp on software engineering. Distributed systems are hard. They are hard to understand, to build, to debug, to run, to trace, to document, etc. Do not make your life any more difficult. Use best practices from software engineering to help you in this course. Write unit and integration tests, use version control, document your code with comments, write small prototypes, refactor your code, make your code readable and easy to run and debug. If you fail to follow best practices, they will come back to bite you later on. Unfortunately, this course will not explicitly teach you these best practices, but you probably took a course that introduced you to these concepts. If you have any questions, just ask us on Piazza.
In the lectures, we will focus on fundamental concepts of distributed systems. The reading material primarily consists of papers describing well-known systems that rely on these fundamental concepts. To do well in exams, prioritize understanding the key concepts discussed in the class, and, when reading papers, understand how these are applied to practical systems. The exams will not test your knowledge of details covered in the papers that are not related to the concepts covered in class.
Reach out for success: This is intended to be a challenging fourth-year course, but that does not mean that you have to work through it on your own! The course Piazza should be your first stop for all technical questions. The course has specific office hours (see top of page), but I and the TAs are flexible. Send any of us an email to schedule a time to discuss the course, the assignments, etc. University students often encounter setbacks from time to time that can impact academic performance. Discuss your situation with us or an academic advisor as early as possible. For help in addressing mental or physical health concerns, including seeing a UBC counselor or doctor, visit this link.
Acknowledgements
This course is based on the graduate course on Distributed Systems (6.584) developed by Robert Morris, Frans Kaashoek, and Nickolai Zeldovich at MIT, with permission from the content authors. Many aspects are also inspired by the graduate course on Distributed Systems taught by Peter Druschel and others at MPI-SWS, which I audited during my PhD.