Distributed Systems meets
Real-Time Computing
2023 Winter Term 1
Tuesday/Thursday 2:30 PM to 4:00 PM
1009 Ponderosa Commons North - Oak/Cedar House
Instructor: Arpan Gujarati
Discussion forum: UBC Canvas
Office hours: Wednesday 12:15 PM to 1:00 PM, ICCS 333
Cyber-Physical Systems (CPS) are ubiquitous, with examples ranging from industrial robot arms to autonomous home vacuum cleaners. Unlike a personal computer or a cloud server, CPS interact directly with the physical world and control physical processes. Their correctness criteria are determined as much by the physics of their environment as by the internal state of their program. For example, an engine control unit in a car, which decides the amount of fuel to be injected, must adapt its decision logic and frequency with the engine RPM.
Many such CPS must be inherently distributed, are subject to stringent real-time constraints, and must remain fully functional in the face of transient and, to some extent, permanent subsystem failures. In addition to satisfying highest reliability expectations, safety-critical CPS are also often subject to certification requirements and/or formal validation efforts. That is, not only must they work in practice, but it must also be possible to formally establish their correctness a priori. Going forward, the reliability of learning-enabled components and security of networked CPS also ought to be first-order safety concerns, like timing and fault tolerance has been in so far.
The focus of this course is to explore the design principles that allow the construction of analytically sound and functionally safe distributed real-time systems, which are integral to a plethora of contemporary safety-critical CPS. To this end, we will reason about, compare, and contrast the techniques from both the distributed systems and real-time systems literature.
- Learning objectives
-
- Become familiar with key techniques in the field of real-time computing and distributed systems
- Compare and contrast design principles for general-purpose systems with those for CPS, which are special-purpose systems
- Design and conduct a research project in distributed systems and/or real-time computing (broadly construed)
- Learn to read, critique, write about, and present systems (or theory!) research
- Prerequisites
- This is a research-oriented course intended for Masters and Ph.D. students in Computer Science. Students are expected to have at least an undergraduate-level understanding of systems, such as operating systems, computer networks, or distributed systems. Prior exposure to real-time computing, embedded systems, or cyber-physical systems is recommended but not required. Enterprising Bachelors students who fulfill the above pre-requisites, i.e., CPSC 313 and CPSC 317, are welcome to participate. In addition, a certain level of proficiency in discrete mathematics and probability, and some background in algorithms and complexity of computation, typically at the level of CPSC 320, is also expected.
-
- If you do not satisfy the necessary prerequisites but would still like to take the course, send me an email summarizing your interest in the course, your experience working on any related topic, and, if you are from another department, a list of courses related to computer science that you have taken.
- Class participation (35%)
- Students are expected to read the assigned reading material before each class and come prepared with questions and a critical analysis of the papers. In the class, we will discuss the reading material and applications of the ideas beyond the papers. Each paper will also be assigned a discussion lead who is resonsible for presenting a short overview of the paper to kickstart the discussion (more on this later). You will be graded based on the quality of your participation and your presentation. See the Schedule section for the reading material.
- Project (65%)
- The course project must be done in teams of 1-3.
The goal of the project is to conduct original research.
You may talk to the instructor for some ideas that are well-scoped for a
course project.
You may also undertake a project overlapping with your own research, if you can
demonstrate how it is related to and/or influenced by some topic from this
course.
The project deliverables and their relative weightage is given below.
See the Project section for detailed instructions.
-
- Research proposal (20%)
- Proposal presentation (10%)
- Final presentation (20%)
- Final report (50%)
-
- Research Proposal
- The proposal should be minimally four pages (excluding references)
and must include the following sections.
-
- Introduction: Explain what is the problem, why is it an important problem, and why are you interested in this problem? Describe the background and motivation for the problem you are interested in, and pose a concrete research question.
- Related work: Do a brief survey of related work in the problem space. This includes papers that solve the same problem but with a different approach and papers whose ideas you build upon in your own work. Compare and contrast your own proposed solution with each related work. If you have not read all the related work by the time of proposal submission, make a list of papers that you will be covering by the final report submission.
- Proposed solution: Describe your proposed solution and planned methodology to answer the research question at a high level.
- Evaluation: Describe how you plan to evaluate your proposed solution. What kind of data, plots, or proofs artifacts would you generate from the evaluation?
-
- Proposal Presentation
- Each group will give a short presentation followed by Q&A and feedback from the class. Focus on presenting:
-
- Problem: What is the problem they are working on? (Include the basic context that is required to understand the problem statement.)
- Motivation: Why is an important problem and why did you choose to work on this problem?
- Key idea: What is the key idea of your solution?
- Deliverable: What is the expected final deliverable?
-
- Final Presentation
- Each group will give a short presentation followed by Q&A from the class. The presentation would be similar to a conference or workshop talk. Focus on presenting the motivation, the problem, one key idea of your project, and the results.
- Final Report
- The final report must include similar sections as in the proposal but written in a way to describe what has been done. Additionally, add a section describing the limitations of your work and how the research can be extended (by you or someone else) in the future.
Grading
This is a seminar-style graduate course. The primary goal is to prepare you to do research, by encouraging you to read and discuss research papers, and by giving you an opportunity to carry out an open-ended course project in the broad areas of distributed and real-time computing. Therefore, the evaluation for this course consists of two main components. The grading scheme is still tentative.
Schedule
Here is a tentative list of papers to be discussed in the class. The goal is to discuss every week two papers, typically, one from the real-time computing literature and one paper from the distributed systems literature, but that follow a common theme (e.g., that focus on similar type of faults or that focus on similar safety properties). The URLs take you to the official page of publishers, such as IEEE or ACM. You should be able to get the PDFs from there using your UBC credentials. Nonetheless, I will also be uploading all papers to the UBC Canvas course page.
Date | Theme | Reading Material | Discussion Lead |
05/09 | Grad Orientation 2023: Main Orientation | ||
07/09 | OVERVIEW | The Real-Time Environment. Springer Nature 2022 How to Read a Paper. ACM SIGCOMM Computer Communication Review 2007 (Optional) How to Give a Good Research Talk. ACM SIGPLAN Notices 1993 (Optional) How to Write a Great Research Paper. MRConnections PhD Summer School 2016 (Optional) |
Arpan |
12/09 | ^ | Abstract PRET Machines. IEEE RTSS 2017 | Arpan |
14/09 | Well SCHEDULED is Half Done | FIRM: An Intelligent Fine-Grained Resource Management Framework for SLO-Oriented Microservices. USENIX OSDI 2020 | William |
19/09 | ^ | System Level Performance Analysis – The SymTA/S Approach. IEE Proceedings 2005 | Nima |
21/09 | Everybody Loves MACHINE LEARNING | Serving DNNs like Clockwork: Performance Predictability from the Bottom Up. USENIX OSDI 2020 |
Philip |
26/09 | ^ | DeepRT: A Soft Real Time Scheduler for Computer Vision Applications on the Edge. ACM/IEEE SEC 2021 |
Grady |
28/09 | ^ | R-TOD: Real-Time Object Detector with Minimized End-to-End Delay for Autonomous Driving. IEEE RTSS 2020 Demand Layering for Real-Time DNN Inference with Minimized Memory Usage. IEEE RTSS 2022 (Optional) |
Brandon |
03/10 | OVERVIEW | The Consensus Problem in Fault-Tolerant Computing. ACM Computing Surveys 1993 | Brandon |
05/10 | NETWORKING is Key | PCSPOOF: Compromising the Safety of Time-Triggered Ethernet. IEEE Security & Privacy 2023 | Niloo |
10/10 | ^ | Timing Analysis of Real-Time Communication Under Electromagnetic Interference. Real-Time Systems 2005 Solar Superstorms: Planning for an Internet Apocalypse. ACM SIGCOMM 2021 (Optional) |
Rut |
12/10 | Make-up Monday. | ||
13/10 | Project proposal reports due today by 6 PM! | ||
17/10 | Beg, Borrow, or COORDINATE | ZooKeeper: Wait-Free Coordination for Internet-Scale Systems. USENIX ATC 2010 | Nima |
19/10 | ^ | RT-ZooKeeper: Taming the Recovery Latency of a Coordination Service. ACM EMSOFT 2022 | Niloo |
24/10 | ^ | Right on Time Distributed Shared Memory. IEEE RTSS 2016 | William |
26/10 | ^ | Communication Centric Design in Complex Automotive Embedded Systems. ECRTS 2017 | Zainab |
31/10 | Every team/individual gives a lightning talk based on their project proposal! | ||
02/11 | Whose FAULT is it Anyway? | Byzantine Fault Tolerance, from Theory to Reality. SAFECOMP 2003 Tolerating Arbitrary Node Failures in the Time-Triggered Architecture. SAE Transactions 2001 |
Jackson Philip |
07/11 | ^ | Practical Byzantine Fault Tolerance and Proactive Recovery. ACM TOCS 2002 | Arpan |
09/11 | ^ | IGOR: Accelerating Byzantine Fault Tolerance for Real-Time Systems with Eager Execution. RTAS 2021 | Zainab |
14/11 | Mid-term break | ||
16/11 | Class cancelled, replaced by office hours to discuss your projects | ||
21/11 | Take CONTROL | In-ConcReTeS: Interactive Consistency meets Distributed Real-Time Systems, Again! RTSS 2022 | Rudransh |
23/11 | ^ | Consensual Resilient Control: Stateless Recovery of Stateful Controllers. ECRTS 2023 | Jackson |
28/11 | ^ | Consistency vs. Availability in Distributed Cyber-Physical Systems. EMSOFT 2023 CAP Twelve Years Later: How the “Rules” Have Changed. IEEE Computer Magazine 2012 (Optional) Perspectives on the CAP Theorem. IEEE Computer Magazine 2012 (Optional) The CAP Theorem’s Growing Impact. IEEE Computer Magazine 2012 (Optional) |
Rut |
30/11 | Don’t be too FORMAL | Static and Dynamic Analysis of Timed Distributed Traces. IEEE RTSS 2012 | Rudransh |
05/12 | ^ | Priority Scheduling of Distributed Systems Based on Model Checking. Formal Methods in System Design 2011 |
Grady |
07/12 | ^ | Real Time is Really Simple. Microsoft Research Technical Report 2005 | Arpan |
12/12 | Every team/individual gives a lightning talk based on their final project! | ||
15/12 | Final project reports due today by 6 PM! |
Project
You must propose a research project with a problem statment and a research plan, conduct the research, and write up your research results and experience. If you need any specific equipment, software, or tools for your project, please speak to us as soon as possible.
It is okay if you do not complete a full-fledged project by the end of the term. The goal is to learn how to go from a one-line problem to a fully scoped out research problem, then try and identify potential solutions. If a topic is difficult and you do not reach the practical implementation stage, that will be fine.
There are four deliverables in the project. Each written component must be formatted using the LIPIcs style template. You are strongly encouraged to use LaTeX for typesetting.