CSE.240b Mini-project warmup

    
Due: January 30th

This assigment and your mini-project, you will use the DataStar Supercomputing cluster at the San Diego Supercomputer Center.

DataStar has 2464 processors and is the 68th most powerful supercomputer in the world, and you get to use it. Frankly, that's awesome. Since this is a computer architecture class (rather than a "high performance computing") class, our focus will be on a single DataStar node: the IBM p655+.

The p655+ employs a 8-processor Power4+ -based multichip module (see figure), which consists of four integrated dual-core chips.

[DataStar]
San Diego Supercomputer Center's DataStar
[IBM MCM]
     The p655 Multichip Module (MCM).

The goal of this assignment is to get comfortable with the DataStar machines, and some general hacking.

Please browse the SDSC DataStar Userguide so that you understand what information is available.


Running on the DataStar machines.

CPU Time

Each student has been allocated 100 SUs of CPU time; it is up to you to
manage this time and make sure you do not run over. 1 SU is equal to
either one-half to one CPU-hours, depending on the priority of the machine
or job queue you use. You can type:

reslist -a csd366

to see how much time you have left. Note that when you submit to the
batch queues, you are actually being billed for all 8 of the processors
on the machine.

Types of nodes

There are several types of datastar nodes. These are enumerated
in SDSC DataStar Userguide: Running Jobs.

dslogin.sdsc.edu

This machine is for editing, compiling, running very short programs,
and submitting jobs to the batch nodes.

batch nodes

These machines are for running compute-intensive programs, and for timing the runtime with accuracy. All official timing runs for the class must be run on these machines. You can submit jobs to the "Normal" or "High" job queues, which dispatches to one of 265 8-processor machines. Once a job has been submitted to the job queue, it will wait in the queue until one of the 8-processor machines is available and the job will be run. Your program has the whole machine to itself. The output will be directed to a file LL_out.XXXXX.
You should receive an email telling you when the job has been launched.

dsdirect.sdsc.edu

These machines are for running, testing and debugging short compute intensive jobs. These
machines are shared machines, which means timing numbers will not be accurate.


Provided Code

For this project, some helper code has been provided for the assignment in:

/rmount/users02/ucsd/swanson/prog1

You can use cp to copy it to your local directory. This code is provided to help you with parts of the assignment.

I recommend you read through the code (including the Makefile, but not the file run-script.template, which is specific to the DataStar system) and make sure you understand all of it. (Use man pages and the web to look things up. Note: some of the unix man pages are missing on the DataStar machines, you may want to try a linux machine if you don't find the function.) Unless you are a big hacker, you will learn advanced features of POSIX, etc by reading and understanding the code. Almost everything included demonstrates some useful feature or programming idiom.

This code includes:

Makefile

This file contains some example rules for building your code with xlc, the IBM C compiler, and gcc, the GNU C compiler. xlc probably generates better code, but gcc may be easier and less buggy (especially at high optimization levels) to use. The default rule actually builds the code on both gcc and xlc (the names end in the compiler used to create them; e.g., binary.xlc or binary.gcc) The Makefile also has a rule which allows you to submit programs to the batch queue by typing:

gmake binary_name.submit

e.g.,

gmake binary.xlc.submit


run-script.template

This file contains the configuration settings for submitting jobs to the batch nodes. The Makefile fills in some of its fields with a sed script in order to submit jobs. You can refer to the DataStar LoadLeveler guide to learn how to modify this. (Keep in mind that if you modify this without renaming it, it will affect the behaviour of the .submit rule.)

helper.c, helper.h

This contains helper code for timing sections of your programs. Contains
get_processor_freq() for determining the speed of the processor, and diff_cycles(), for computing the number of cycles between two calls to read_real_time().

objdump

Executable file (from GNU binutils) that allows you to disassemble binaries. The Makefile contains a rule that will invoke this to disassemble a binary -- .e.g.,

gmake binary.xlc.dis
gmake binary.gcc.dis

Aside: You can also look at the assembly generated by the C compiler (-S) using a pattern rule in the Makefile. This usually has more information about local-labels than the output of objdump; however in some cases, it doesn't tell you what actually was generated in the end. For instance, xlc -O5 does intra-procedural analysis at link time which is only reflected in the object dump. Here is an example of generating assembly using the makefile pattern rule:

gmake <.c file prefix>.gcc.s
gmake <.c file prefix>.xlc.s

e.g.,

gmake helper.gcc.s

main.c

Some sample code that performs timing, uses inline asm, and creates POSIX threads.


The Assignment

Collaboration Policy: For this assignment, you may not share code with other students. However, you may discuss the problems you are having abstractly, or inquire/suggest as to what function, macro or instruction you should use to overcome the problem. You may also supply weblinks for others to use to address an issue.


Deliverables: In the following parts of the assignment, certain parts are marked as "Deliverables".  These are the parts to be printed out and submitted. Please keep these concise and do not include excess
material. Excessively unclear writeups will not be given full credit.


I. The file increment.c contains a simple program that uses two identical threads to increment a shared variable.  Examine the code carefully to understand what it's doing, especially the various pthreads calls and the volatile variable declarations.  Use gmake to build increment.gcc and run it.  Verify that program generates the correct result.

The code uses a lock to protect shared, which is declared volatile.  Experiment with removing the calls to lock/unlock the variable and removing  the volatile declaration.  How does the program's output change in each case (if at all)? 

The Makefile contains a rule to dump the assembly from the executable.  To use it do

make increment.gcc.dis

and then examine increment.gcc.dis.  Using this rule on each version of the program, explain the differences in program output based on the assembly code the compiler generated in each case.

Deliverable:

                 Your descriptions and explanation of how the output changed for each version.


II.  Modify increment.c to demonstrate that DataStar is not sequentially consistent.  Try to make your code as short, and readable as possible.  The place as few memory barrier instructions (see helper.h and the links to the Power 4 ISA manuals below) as possible to make program behave in a sequentially consistent manner.

Deliverable:

A copy of your code.  A description of it's output with and without barriers and why the difference demonstrates non-sequentiality.  Also include a discussion of why your placement of the barriers is correct.  This discussion is critical:  Your code is not necesarrily correct just because the program works correctly every time you  have tried it (this is one reason parallel programming is so difficult).  You must be able to explain why the code is correct for all possible executions.



Since this assignment is newly created, please do not hesitate to post for clarifications on the assignment, subject to the restriction that you should not give away any answers. Students are free to respond to these clarification requests.


Resources

The SDSC DataStar Userguide contains a wealth of information about using and programming DataStar.

The GNU Make manual may be useful to understand the Makefile used in the provided code.

Power 4 Manuals: Book 1, Book 2, Book 3.