Getting started with Valkyrie

Change Log

Changed the path for MPI utilities to /opt/mpich/myrinet/gnu/bin enable the fast switch. [10/14/05 08:01]

Valkyrie

The hardware platform for the course is a Beowulf cluster named Valkyrie. Valkyrie is managed by the ACS and runs the Rocks software developed at the San Diego Supercomputer Center. The system consists of 16 dual 1 GHz Pentium III CPUs, each with 1GB of RAM and running Linux. (Here is a web page telling you more about Valkyrie's CPUs.) A Myrinet switch provides low latency connectivity between the nodes.

Valkyrie should only be used for parallel program development and measurement. If you need to use a Linux or UNIX system, please use your student account on the Solaris machines in the Advanced Programming Environment (APE) lab, located in APM 6426.

At the time, consult this web page for using Valkyrie. (ACS has not yet updated their web page since a recent software upgrade.)

Issues

To run with two processors per node, you'll need to specify a machine file. See the link below for instructions. The -g 2 option is no longer available. [Sun Oct 23 14:05:30 PDT 2005]

Valkyrie is running, but with one caveat:

Node 6 is down. You may run with not more than 15 processes unless you use a machinefile. See below. [10/14/05, 6.27 PM]

Runaway processes

The Intel C/C++ compiler and the Intel Math Kernel Library MKL should be available soon.

SSH

If you haven't set up your environment for the secure shell, then you'll get the following message printed on your screen:

It doesn't appear that you have set up your ssh key. This process will make the files: /home/cs160s/<your account>/.ssh/identity.pub /home/cs160s/<your account>/.ssh/identity /home/cs160s/<your account>/.ssh/authorized_keys Generating public/private rsa1 key pair.You will then be asked 3 questions shown below. Be sure to hit carriage return (entering no other input) in response to
each question: Enter file in which to save the key (/home/cs160s/<your account>/.ssh/identity): Created directory '/home/cs160s/<your account>/.ssh'. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/cs160s/<your account>/.ssh/identity. Your public key has been saved in /home/cs160s/<your account>/.ssh/identity.pub. The key fingerprint is: <several 2 digit hex numbers separated by :> <your account>@valkyrie.ucsd.edu

Environment

We'll be using the bash shell. Modify your .bash_profile using information found in /export/home/cs260x-public/bash_profile. (From now on we'll refer to this directory as $(PUB).)

Eventually we'll use the Intel C++ compiler, but for now we'll use a special version of the Gnu C++ compiler that incorporate the MPI libraries. To use these compilers, you must have the following environment variable set:

export PATH=/opt/mpich/myrinet/gnu/bin:$PATH

You may also want to set the MANPATH so you can access the MPI manual pages:

MANPATH=/opt/mpich/gnu/man:$MANPATH

Both of these are set for you in the bash_profile file provided. (The default profile also defines the PATH to include $(PUB)/bin.) Note: the ACS web page at is out of date. The correct PATH and MANPATH for MPI are as described here.

Getting started

We've set up some code to get your started in the directory $(PUB)/examples. These will help acquaint you with process of running an MPI program. Compile and run the two programs in the subdirectory called Basic. Be sure to use the Makefile that we've supplied so you'll get the correct compiler and loader flags. The Makefile includes an "arch" file that defines appropriate command line flags for the compiler. You currently have arch which provides the appropriate settings needed for Valkyrie. (If you want to run on other machines let us know)

To compile your programs use the makefiles provided for you. These makefiles include an architecture file (arch) file containing the appropriate compiler settings.

Running a program with mpirun

Run your program with the mpirun command. The command provides the -np flag so you can specify how many nodes to run on. There are 16 nodes numbered 0 through 15. Be sure to use the "-1" (the number one) to make sure that you don't run on the front end. Be sure to run this program in a subdirectory of your home directory, as it is not possible to run in $(PUB)

To establish that your environment has been set up correctly, compile and run the parallel "hello world" program. This program prints "Hello World'' from each process along with the process ID. It also reports the total number of processes in the run. The hello world program is found in $(PUB)/examples/Basic/hello. To run the program use mpirun as follows:

mpirun -np 2 -1 ./hello

Here is some sample output:

# processes: 2
Hello world from node 0
Hello world from node 1

You must specify "." before the executable. Note that any command line arguments come in the usual position, after the name of the executable. Thus, to run the Ring program(found in $(PUB)/examples/Ring) on 4 processes with command line arguments -t 5 and -s 1024, type:

mpirun -np 4 -1 $(PUB)/examples/Ring/ring -t 5 -s 1024

Using machine files

Sometimes you'll want to specify particular nodes to run on. To do this you need to specify a machines file listing the names of the physical nodes. The command line sequence takes the following form

   mpirun -np <# NODES> -1 -machinefile <MACHINE_FILE>

The machine file contains a list of physical node names, one per line. The nodes are numbered from 0 to 15, and are named compute-0-0 through compute-0-15. (Each node contains 2 CPUs, but in effect you may use only 1 CPU per node.) Thus, to run the ring program with nodes 6, 7, 11, 14 as logical processes; 0-3, create the following file, say mfile:

  compute-0-6
  compute-0-7
  compute-0-11
  compute-0-14

To, run, type

mpirun -np 4 -1 -machinefile mfile ./ring -t 5 -s 1024

We have provided a python script to generate randomized machine files: $(PUB)/bin/randMach.py. The command line argument specifies the number of processors in the machine file. For example, the command randMach.py 7 > mach was used to generate the following 7-line machine file:

compute-0-8
compute-0-0
compute-0-1
compute-0-15
compute-0-4
compute-0-9
compute-0-2

Running with 2 CPUs per node

If you want to run with 2 CPUs per node, you'll need to use a machine file. (The -g 2 does not work on the fast myrinet installation of MPI.)

Generate a machine file with ONE entry for each node you want to use. Do not list each machine entry twice. Then specify the number of processes you want to run with along with the machine file. For example, if you want to run with 6 processors using this machine file p3

compute-0-1
compute-0-2
compute-0-8

you enter

mpirun -np 6 -machinefile p3 ./a.out

Intel Math kernel library

The Intel Math Kernel Library (MKL) provides high performance implementations of common numerical kernels like Matrix Multiplication and FFT.

THIS IS SUBJECT TO CHANGE, as MKL has not yet available To link with MKL use the following on your load line

${MKLPATH}/libmkl_lapack.a ${MKLPATH}/libmkl_ia32.a ${MKLPATH}/libguide.a -lpthread

where ${MKLPATH} has been set as follows (i.e. in your .bash_profile file):

export MKLPATH=/opt/intel/mkl61/lib/32

If you want to use the FFT or DFT, add -lm to your link line.

Documentation can be found at http://developer.intel.com/software/products/mkl/docs/mklqref/index.htm .

Runaway processes

Sometimes runaway processes will persist after a run. This can occur if you break a run using control/C. If you feel that the machine is slowing down, run the command

ganglia load_one | sort -n -k 2

which will displays the load on each node. Since there are 2 CPUs per node the load should be not more that
2.5 or so (i.e., there are maximum 2 PBS jobs running per node) unless there are other processes running on the node.
This behavior could be due to valid user jobs or due to runaway processes.

Even so, it's a good idea to check from time to time if there are any runaway processes on the machine. To do this run the following command, which will display all of your processes sorted by node:

	cluster-ps <username>

(If any nodes are down, you'll be notified.)

If you see that you have processes running:

    compute-0-13: 
    cs260x 12208  0.0  0.1  5172 1228 ?        S    Oct14   0:00 ./parallel_jacobi 4 .001 100000
    cs260x 12209  0.0  0.1  5904 1228 ?        S    Oct14   0:00 ./parallel_jacobi 4 .001 100000

use the cluster-kill command to delete them:

cluster-kill <username>

Ignore messages of the following form:

     compute-0-13:
     kill 9363: Operation not permitted
     Connection to compute-0-13 closed by remote host.
     compute-0-14:
     kill 9904: Operation not permitted
     Connection to compute-0-14 closed by remote host.
     compute-0-15:
     kill 9662: Operation not permitted
     Connection to compute-0-15 closed by remote host.

When done, re-run the cluster-ps command to make sure all is clear, but specify the user "cs260x" in order to search all course user IDs (including your instructor!). This method will filter out extraneous commands, making it easier to locate runaway processes:

cluster-fork 'ps aux' | egrep "cs260x|compute-" | sed -f $PUB/bin/cl.sed

If you find other running processes, and the user is not logged in (you can find that out with the who command), then notify the user by email. Since email doesn't work on Valkyrie, you'll need to finger the user's real name (e.g. finger cs260x) and then check the ucsd data base as in finger username@ucsd.edu.

As matter of etiquette, be sure and run cluster-ps before logging out. If you plan to be on the machine for a long time, it would be a good idea to run this command occasionally, and before you start a long series of benchmark runs.

MPI

MPI documentation is found at at http://www.cse.ucsd.edu/classes/fa05/cse260/testbeds.html#MPI. You can obtain man pages for the MPI calls used in the example programs described here at http://www-unix.mcs.anl.gov/mpi/www/www3/:

MPI _Init: Initialize the MPI environment. Call this prior to making any MPI calls.
MPI_Finalize: Exit the MPI environment. Call this function before exiting your program.
MPI_Comm_size: Queries the total number of processes use in this program invocation.
MPI_Get_processor_name: Returns the host name of the processor running this process.
MPI_Comm_rank: Queries this process's unique identifier, an integer called the rank
MPI _Send: Sends a blocking message from calling process to a destination process.
MPI _Recv: Receives a message from another process, a return signifies that the message has been received.
MP I_Wtime: Timer.
MPI_Barrier: Blocks calling process until all processes have checked in (all processes must call this routine)