CSE 223A: Principles of Distributed Computing
Winter 2004
Professor Keith Marzullo
Class meets MWF 2:00-2:50 in WLH 2206


You will be asked to do four homeworks, a project and a final. Together the homework is worth one third of your grade; the project is worth a third and the final is worth a third.



Homework:

  Homework 1, due 16 January 2004.
  Homework 2, due 30 January 2004.
  Homework 3, due 17 February 2004.
  Homework 4, due 12 March 2004.



Project:

You will need to do a class project which is due at the end of the tenth week. You can either work by yourself or with one other person. I'll expect more from a team project than an individual one. Once you decide what you'd like to do, please let me know before you start; I'd like to have the chance to help you refine your idea.

You project can be:



Papers:

For many of these, you will need to be using a campus-linked computer to retrieve the paper.

Implementing fault-tolerant services using the state machine approach: a tutorial by Fred B. Schneider.
Distributed snapshots: determining global states of distributed systems by K. Mani Chandy and Leslie Lamport.
Virtual time and global states of distributed systems by Friedemann Mattern.
The Byzantine General's Problem by Leslie Lamport.
Determining the Last Process to Fail by Dale Skeen.
A low-cost processor group membership protocol for a hard real-time distributed system by Matthew Clegg and Keith Marzullo.
Primary-backup protocols: Lower Bounds and Optimal Implementations system by Navin Budhiraja, Keith Marzullo, Fred B. Schneider and Sam Toueg.
A Survey of Rollback-Recovery Protocols in Message-Passing Systems by L Alvisi, E. Elnozahy, Y.M. Wang and D.B. Johnson.
Synchronous Atomic Broadcast for Redundant Broadcast Channels by Flaviu Cristian.
Impossibility of distributed consensus with one faulty process by M.J. Fischer, N. A. Lynch and M.S. Paterson.
A How to assign votes in a distributed system by Hector Garcia-Molina and Daniel Barbara
Byzatine Quorum Systems by Dahlia Malkhi and Michael Reiter
Simulating Fail-Stop in Asynchronous Distributed Systems by Laura S. Sabel and Keith Marzullo.
Unreliable failure detectors for reliable distributed systems by Tushar Deepak Chandra and Sam Toueg.
Paxos Made Simple by Leslie Lamport.
Minimal Byzantine Storage by Jean-Philippe Martin, Lorenzo Alvisi and Michael Dahlin.



Syllabus (subject to change):

 1/5   Introduction: distributed computing and fault tolerance. (ppt, pdf)
 
1/7   Derivation of snapshot protocol. (ppt, pdf)
 1/9   Continuation of snapshots. Vector clocks. (ppt, pdf)

 1/12  Vector clocks; derivation of deadlock detection protocol.
 1/14  State machine approach. (ppt, pdf)
 1/16  Synchronous systems. Consensus for crash failures. n > 3t for Byz. (ppt, pdf)

 1/19  Martin Luther King day
 1/21  Early stopping consensus. Consensus for broadcast busses. (ppt, pdf)
 1/23  (continuation of last lecture)

 1/26  ACP and consensus. 2 phase commit. (ppt, pdf)
 1/28  Traditional nonblocking commit protocols. (ppt, pdf)
 1/30  Homework discussion.

  2/2   Group membership in a synchronous system (ppt, pdf).
  2/4   Nondeterminism: primary-backup protocols (ppt, pdf).
  2/6   Message logging. (ppt, pdf).

 2/9   More on message logging.
 2/11  Invited Lecture
 2/13  No class -  professor at PC meeting

 2/16  President's day
 2/18  Communications induced checkpointing..
 2/20  FLP (ppt, pdf).

 2/23  FLP
 2/25  Quorums (ppt, pdf)
 2/27  Guest lecturer Marc Shapiro, MSR Cambridge.

  3/1   Byzantine Quorum Systems (ppt, pdf)
  3/3   Group membership (ppt, pdf)
  3/5   <>S Consensus (ppt, pdf)

 3/8   Paxos (ppt, pdf).
 3/10 Byzantine Paxos
 3/12 (continuation)