CSE.240b Mini-project warmup
|
Due: January 30th
This assigment and your mini-project, you will use the DataStar
Supercomputing cluster at the San Diego Supercomputer Center.
DataStar has 2464 processors and is the 68th most powerful
supercomputer in the world, and
you get to use it. Frankly, that's awesome.
Since this is a computer architecture class (rather than a "high
performance computing") class, our focus will be on a single DataStar
node: the IBM
p655+.
The p655+ employs a 8-processor Power4+ -based multichip
module (see figure), which consists of four integrated dual-core
chips.
![[DataStar]](DataStar.gif) |
| San Diego Supercomputer Center's DataStar |
|
![[IBM MCM]](IBM_MCM_crop.jpg) |
| The p655 Multichip Module
(MCM). |
|
The goal of this assignment is to get comfortable with the DataStar
machines, and some general hacking.
Please browse the SDSC
DataStar Userguide so that you understand
what information is available.
Running on the DataStar machines.
CPU Time
Each student has been allocated 100 SUs of CPU time; it is up to you to
manage this time and make sure you do not run over. 1 SU is equal to
either one-half to one CPU-hours, depending on the priority of the
machine
or job queue you use. You can type:
reslist -a csd366
to see how much time you have left. Note that when you submit to the
batch queues, you are actually
being billed for all 8 of the processors
on the machine.
Types of nodes
There are several types of datastar nodes. These are enumerated
in SDSC
DataStar Userguide: Running Jobs.
dslogin.sdsc.edu
This machine is for editing, compiling, running
very short programs,
and submitting jobs to the batch nodes.
batch nodes
These machines are for running compute-intensive
programs, and for timing the runtime with accuracy. All official
timing runs for the class must be run on these machines. You can
submit jobs to the "Normal" or "High" job queues, which dispatches to
one of 265 8-processor machines. Once a job has been submitted to the
job queue, it will wait in the queue until one of the 8-processor
machines is available and the job will be run. Your program
has the whole machine to itself. The output will be directed to a file LL_out.XXXXX.
You should receive an email telling you when the job has been launched.
dsdirect.sdsc.edu
These machines are for running,
testing and debugging short compute intensive jobs. These
machines are shared machines, which means timing numbers will not be
accurate.
Provided Code
For this project, some helper code has been provided for the assignment
in:
/rmount/users02/ucsd/swanson/prog1
You can use cp to copy it to
your local directory.
This code is provided to help you with parts of the assignment.
I recommend you read through the code (including the Makefile, but
not the file run-script.template, which is specific
to the DataStar system) and make sure you understand all of it.
(Use man pages and the web to look things up.
Note: some of the unix man pages are missing on the DataStar machines,
you may want to try a linux machine if you don't
find the function.) Unless you are a big hacker, you will learn
advanced features of
POSIX, etc by reading and understanding the code. Almost everything
included
demonstrates some useful feature or programming idiom.
This code includes:
Makefile
This file contains some example rules for building
your code with xlc, the IBM C
compiler, and gcc, the GNU C
compiler. xlc probably
generates better code, but gcc
may be easier
and less buggy (especially at high optimization levels) to use. The
default rule actually
builds the code on both gcc
and xlc (the names end in the
compiler used to create them;
e.g., binary.xlc or binary.gcc) The Makefile
also has a rule which allows you to submit programs to the batch queue
by typing:
gmake binary_name.submit
e.g.,
gmake binary.xlc.submit
run-script.template
This file contains the configuration settings for
submitting jobs to the batch nodes. The Makefile
fills in some of its fields with a sed
script in order to submit jobs. You can refer to the DataStar
LoadLeveler
guide to learn how to modify this. (Keep in mind that if you modify
this without renaming it,
it will affect the behaviour of the .submit
rule.)
helper.c, helper.h
This contains helper code for timing sections of
your programs. Contains
get_processor_freq() for
determining
the speed of the processor, and diff_cycles(),
for computing the number of cycles between two calls to read_real_time().
objdump
Executable file (from GNU binutils) that allows you
to disassemble binaries. The Makefile contains a rule that will
invoke this to disassemble a binary -- .e.g.,
gmake binary.xlc.dis
gmake binary.gcc.dis
Aside: You can also look at the assembly generated by the C
compiler (-S) using a pattern
rule in the Makefile.
This usually has more information about local-labels than the output of
objdump; however in some cases,
it doesn't tell you what actually was generated in the end. For
instance, xlc -O5
does intra-procedural analysis at link time which is only reflected in
the object dump. Here is an example of generating
assembly using the makefile pattern rule:
gmake <.c file prefix>.gcc.s
gmake <.c file prefix>.xlc.s
e.g.,
gmake helper.gcc.s
main.c
Some sample code that performs timing, uses inline asm,
and creates POSIX threads.
The Assignment
Collaboration Policy:
For this assignment, you may not share code with other students.
However, you may discuss the problems you are having
abstractly, or inquire/suggest as to what function, macro or
instruction you should use to overcome the problem. You may also supply
weblinks
for others to use to address an issue.
Deliverables: In the following parts of the assignment, certain
parts are marked as "Deliverables". These are the parts to
be printed out and submitted. Please keep these concise and do not
include excess
material. Excessively unclear writeups
will not be given full credit.
I. The file increment.c contains a simple program that uses
two identical threads to increment a shared variable. Examine the
code carefully to understand what it's doing, especially the various pthreads
calls and the volatile
variable declarations. Use gmake to build increment.gcc and run
it. Verify that program generates the correct result.
The code uses a lock to protect shared,
which is declared volatile.
Experiment with removing the calls to lock/unlock the variable and
removing the volatile declaration. How does the program's
output change in each case (if at all)?
The Makefile contains a rule to dump the assembly from the
executable. To use it do
make
increment.gcc.dis
and then examine increment.gcc.dis. Using this rule on each
version of the program, explain the differences in program output based
on the
assembly code the compiler generated in each case.
Deliverable:
Your descriptions and
explanation of how the output changed for each version.
II. Modify increment.c to demonstrate that DataStar is
not sequentially consistent. Try to make your code as short, and
readable as possible. The place as few memory barrier
instructions (see helper.h and the links to the Power 4 ISA manuals
below) as possible to make program behave in a sequentially consistent
manner.
Deliverable:
A copy of your code. A description of
it's output with and without barriers and why the difference
demonstrates non-sequentiality. Also include a discussion of
why your placement of the
barriers is correct. This discussion is critical: Your code
is not necesarrily correct just because the program works correctly
every time you have tried it (this is one reason parallel
programming is so difficult). You must be able to explain why the
code is correct for all possible executions.
Since this assignment is newly created, please do not hesitate to post
for clarifications on the assignment, subject to
the restriction that you should not give away any answers. Students are
free to respond to these clarification requests.
Resources
The SDSC
DataStar Userguide contains a wealth of information about using and
programming DataStar.
The GNU
Make manual may be useful to understand the Makefile used in the
provided code.
Power 4 Manuals: Book 1, Book 2, Book
3.