CSE 262 Class Lecture Schedule (Spring 2007)

Many of the readings are on-line, but some may be handed out in class. Be sure to read the assigned papers before lecture, and be prepared to discuss them in class.

Note: your grade will be based in part on 1-page write-ups you must submit before lecture. For 10 of the 20 lectures, you must submit a 1 page write-up for each paper assigned that lecture. Submit these writeups via a web portal which will be described soon. Writeups will be posted publicly, and made available after class.


In an effort to comply with prevailing copyright restrictions, most links to ACM and IEEE papers refer to the ACM or IEEE digital library, respectively. Both digital libraries are subscription services. However, UCSD has a campus-wide subscription in each case and you should be able to obtain the papers from any campus machine. If you are a member of ACM and/or IEEE, then you should also be able to access the on-line papers as using your member web account(s).

If you have a UCSD network account, then you may use the UCSD Web proxy which will enable you to access restricted content from non-UCSD Internet service providers.

Please email me at baden AT ucsd DOT edu if you have any difficulties.

Current lecture readings


Tuesday, 4/3/2007: Introduction

  • Lecture Slides
  • Thursday, 4/5/2007: Fundamental programming models

  • Lecture Slides

  • Tuesday, 4/10/2007: The Cray-1

  • "The CRAY-1 computer system, R. M. Russell, CACM 21(1): 63-72 (Jan. 1978.) DOI     [Cliff Rhyne]
  • "Vector Processors," Computer Architecture: A Quantitative Approach, 4th Ed., Appendix F, by J. L. Hennessy and D. A. Patterson, Morgan Kaufmann, 2007. (45 pages) Also available as Appendix G in the 3rd Edition on-line.
  • Thursday, 4/12/2007: The Connection Machine

  • "Architecture and Applications of the Connection Machine," L. W. Tucker and G. G. Robertson, Computer 21(8):26-38 (Aug. 1988), DOI (Joey Hammer)
  • "Data parallel algorithms," D. W. Hillis and G. L. Steele, CACM, 29(12):1170-1183, Dec 1986, DOI     [Scott Ricketts]
  • Getting Started in CM Fortran, Thinking Machines, Inc. (For reference)


  • Tuesday, 4/17/2007: Data Parallel Programming

  • Reader on Data Parallel Programming
  • Little, J. J., Blelloch, G. E., and Cass, T. A. 1989. "Algorithmic Techniques for Computer Vision on a Fine-Grained Parallel Machine," IEEE Trans. Pattern Anal. Mach. Intell. 11(3):244-257 (Mar. 1989) DOI
  • Lecture Slides
  • Thursday, 4/19/2007: Tera Multithreaded Architecture

  • "The Tera computer system," Proc. 4th Intl. Conf. on Supercomputing (Amsterdam), pp. 1-6, 1990. DOI     [Cliff Rhyne]
  • S. Brunett, J. Thornley, M. Ellenbecker, "An Initial Evaluation of the Tera Multithreaded Architecture and Programming System Using the C3I Parallel Benchmark Suite," Supercomputing, 1998. SC98.  HTML (Jerry Fu)


  • Tuesday, 4/24/2007: CELL (I) [Joey Hammer]

  • Synergistic Processing in Cell's Multicore Architecture, by M. Gschwind et al., IEEE Micro, 26(2):10-24, March-April 2006.
  • The Cell project at IBM Research
  • Cell Broadband Engine resource center (IBM)
  • Weds, 4/25/2007, 6004 CalIT (6th floor)

  • Project proposal presentations
  • Thursday, 4/26/2007: CELL (II) [Didem Unat]

  • "Optimizing Compiler for a Cell Processor," Alexandre E. Eichenberer et al.14th Int'l. Conf. Parallel Architectures and Compilation Techniques, 2005 (PACT 2005), 17-21 Sept. 2005, pp. 161- 172. DOI
  • "The potential of the cell processor for scientific computing," S. Williams et al., Proc. 3rd Conf. on Computing Frontiers, pp 9-20, May 2006.   DOI
  • For further reading about CELL
  • Summit on Software and Algorithms for the Cell Processor, October 25th and 26th, 2006

  • Tuesday, 5/1/2007

  • Phil Colella, Lawrence Berkeley Lab: "Software design for structured-grid numerical methods"
  • Thursday, 5/3/2007: NVIDIA [Bruce Carneal]

  • CUDA Programming Guide version 0.8.2.Read pages 1-18, 43-56. Read over the Matrix Multiplication example in Chapter 7, referring to the manual as needed

  • Tuesday, 5/8/2007: Stream architecture and programming

  • "Merrimac: Supercomputing with Streams," W. J. Dally et al., Proc. SC 2003, Washington, DC.   PDF     [Scott Ricketts]
  • "Brook For GPUs: Stream Computing on Graphics Hardware," by Ian Buck et al,ACM Trans. Graphics 23(3):777-786 (August 2004). Special Issue: Proc 2004 SIGGRAPH Conference.    DOI     [Jerry Fu]
  • For further reading: General-Purpose computation on GPUs
  • Thursday, 5/10/2007: Distributed shared memory

  • "The SGI Origin: a ccNUMA highly scalable server," J. Laudon and D. Lenoski, Proc. 24th ISCA, pp 241-251, 1997. DOI
  • Notes on shared memory
  • Origin 20000 and Onyx2 Performance Tuning and Optimization Guide, Document No. 007-3430-003, SGI, 2001.
  • Chapter 1. Understanding SN0 Architecture
  • Chapter 2. SN0 Memory Management
  • Chapter 8. Tuning for Parallel Processing: read sections "Tuning Parallel Code for SN0," "Scalability and Data Placement," "Using Data Distribution Directives," but only read through "Understanding the AFFINITY clases for threads (Example 8-11). There is a conventient table of contents at the beginning of the section.
  • Background on shared memory: Hennessy and Patterson, Computer Architecture A Quantitative Approach, 3rd Ed., Morgan Kaufmann: Chapter 6, esp §6.1 (Introduction), §6.3 (SMPs), § 6.5 (Distributed shared memory), §6.7 (Synchronization), §6.8 (Memory Consistency models), §6.10 (Crosscutting issues).
  • Presentation Materials on the Origin 2000 (CS 258, David Culler, UC Berkeley)
  • Lecture Slides

  • Thursday, 5/17/2007: Architecture cognizance (Guest Lecture by Larry Carter) Room 6004, CalIT (6th floor)

  • "Architecture-cognizant divide and conquer algorithms," by K.S. Gatlin and L. Carter. Proc. 1999 ACM/IEEE Conference on Supercomputing, Portland, Oregon, Nov. 14 - 19, 1999. DOI
  • Sequoia: Programming the Memory Hierarchy, by K. Fatahalian et al., Proc ACM/IEEE SC 2006 Conf. Nov. 2006. PDF

  • Tuesday, 5/22/2007: PGAS Languages [Alex Brugh and Dave Allen]

  • "Optimizing bandwidth limited problems using one-sided communication and overlap," C. Bell, D. Bonachea, R. Nishtala, and K. Yelick. Proc 20th IPDPS, April 2006. PDF
  • There is no definitive UPC paper. Either read the UPC Manual or look over some of the presentations from the Language Tutorials page in the UPC Wiki.
  • For Reference
  • Berkeley UPC web site
  • UPC community web site
  • UPC Tutorials
  • Wednesday, 5/23/2007: Project Progress Report Presentations (EBU3B 1202)


    Tuesday, 5/29/2007: Cilk [Alex Brugh and Dave Allen]

  • M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. SIGPLAN Not. 33(5):212-223 (May. 1998).     DOI
  • HPCC06 Challenge Class 2 in Cilk (Most Productivity)
  • Thursday, 5/31/2007: GPU programming and course wrap up [Didem Unat]

  • M.D. McCool, "Data-Parallel Programming on the Cell BE and the GPU using the RapidMind Development Platform," GSPx Mulicore Applications Conf, Santa Clara, CA (2006).     PDF

    Lecture Slides PDF


  • Tuesday, 6/5/2007: Project Presentations (I)

  • 6004 CalIT (6th floor)
  • Joey Hammer
  • Alex Brugh and Dave Allen
  • Jerry Fu and Cliff Rhyne
  • Thursday, 6/7/2007: Project Presentations (II)

  • 6004 CalIT (6th floor)
  • Didem Unat and Scott Ricketts
  • Bruce Carneal

  • Maintained by baden @ ucsd.
    edu   [Wed May 23 23:00:36 PDT 2007]