Yes, documentation for new users is available. Those with little or no experience with "unix-like" environments such as Linux should read:
For instructions on compiling and running programs on Kodiak, seeWhat computer systems are available?
ARCS currently supports one computer system for academic and research purposes: Kodiak, a Linux cluster with 128 nodes (1024 core) for general use + 13 nodes (208 core) reserved. For details on ARCS supported systems, please see the ARCS Systems page.How do I get a computer account?
To get a computer account on one of the ARCS systems, contact Mike Hutcheson.How do I connect and log in?
Use ssh to log in to the ARCS systems. Telnet is not supported.
Windows users should use PuTTY. Faculty and staff can download PuTTY from Baylor's AppCenter page. Students can get PuTTY from the official site. PuTTY is also installed on many of the public access PCs on campus. Mac OS X users should connect with the
ssh command in a Terminal window. Linux users should connect with the
ssh command in an xterm window or the console.
To connect to Kodiak with ssh, enter the following:
% ssh -l username kodiak.baylor.edu or % ssh firstname.lastname@example.org
If your username on the ARCS system happens to be the same as that on your local system, you can omit the "-l username". Occasionally we update the operating system on the ARCS systems, so you may get the following message when you try to connect:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
To fix this, edit the file $HOME/.ssh/known_hosts and delete the line for the system and then reconnect. If you are concerned, you can contact us for assistance.
The first time you access your account you should change your password. To do so, type the command
passwd and follow the prompts. On Linux systems, the "bash" shell is the default shell. If you require a different shell please let the system administrator know.
To compile parallel MPI programs on Kodiak, use mpicc (for C/C++) and mpif90 or mpif77 (for Fortran).How do I run my program?
Although technically you can run your program from the command line on the Kodiak "head node", don't. Instead, use the batch system and the
qsub command to run your program on one or more of Kodiak's compute nodes. To do so, you will first need to create a shell script and place the command inside to run your program. Below is a sample shell script called
run_myprog.sh. This assumes that your program is serial, i.e., not parallel (MPI), is named "myprog" and is in a directory, "/home/username/myprog/".
To submit the job to the batch system, use the
qsub command. Note that you cannot qsub your binary directly but instead must use a shell script.
% qsub -N myprog ./run_myprog.sh 1234.n131
The qsub command will return with the job identifier assigned to the job by the batch system, e.g., 1234 above. The "n131" (sometimes seen as "n131.localdomain") is the default batch queue name which can be omitted on most commands.
When the job finishes, you should find myprog.e1234 and myprog.o1234 in the myprog directory. These correspond to your program's stderr and stdout. If you want to see what is currently running in batch, use the
% qstat Job id Name User Time Use S Queue ------------------- ---------------- ---------------- -------- - ----- 1234.n130.loca myprog me 0 R batch
If your program is running (or waiting in the batch queue) and you want to stop it, use the
% qdel 1234How do I run my parallel (MPI) program?
To run your parallel program, you should submit it to the batch system with the
qsub command as described above. To do so, you will first need to create a shell script and use the
mpiexec command inside to run your program.
You will need to specify the number of processes you want to run, i.e., the number of nodes and the number of processes per node. You do this by adding the "-l" option (that's a lower case L and not a one) to the qsub command. So to run your program on 2 nodes and 8 processes per node (for a total of 16 processes) you could do the following:
% qsub -l nodes=2:ppn=8 ./run_myprog.sh 1234.n131
You can also specify the requested resources along with other qsub options, e.g., the name of the job, as batch system "directives" at the top of your shell script. Below is a sample shell script called run_myprog.sh. This assumes that your program is parallel, is named "myprog" and is in a directory, "/home/username/myprog/". A description of the shell script follows.
#!/bin/sh #PBS -l nodes=2:ppn=8 #PBS -N test1 #PBS -o test1.out #PBS -e test1.err num=`cat $PBS_NODEFILE | wc -l` mpiexec -np $num -machinefile $PBS_NODEFILE /home/username/myprog/myprog
To submit the job to the batch system, use the
qsub command above but without the "-l option" (because that is now specified within the script).
% qsub ./run_myprog.sh 1234.n131
The batch system directives, i.e., the "#PBS" lines, appear at the top of the shell script. The most important directive is the "
-l" directive which specifies the number of processes and processes per node. (Actually, the
-l directive can specify other resources such as particular hosts and maximum cpu time as well.) If your program is memory intensive, you might want to run fewer processes per nodes, so instead you could use "
-l nodes=4:ppn=4" (4 nodes x 4 processes per node = 16 processes). If your program requires a number of processes that is not divisible by 8, you would add them to the directive with a "
+" so if your job requires 20 processes, then the
-l directive would be "
-l nodes=2:ppn=8+1:ppn=4". That would be 2 nodes x 8 processes per node + 1 node x 4 processes per node, i.e., 16 + 4 = 20 processes.
The other batch directives in the example are
-N which specifies the name of the job, and
-2 which specify the output and error files respectively. These directives are optional.
The next line in the batch system dynamically calculates the total number of jobs to be specified in the mpiexec command. The way it works is that the
-l directive figures out how many (and which) nodes to use and creates the mpirun command's "machine file" ($PBS_NODEFILE) that contains the list of nodes to use, one node per line.
cat $PBS_NODEFILE if you want to see what it looks like. The "cat $PBS_NODEFILE | wc -l" simply counts the number of lines. Yes, since you already know the number of processes (because you specified them with the
-l directive) you can just use that value in the mpiexec command. However, when/if you change the directive, it's easy to forget to change the mpiexec command as well. It is safer to simply calculate it dynamically.
The last line of the shells script is the call to
mpiexec itself. This command specifies the number of nodes calculated above as well as the machine file. Note that here we use the full path to your executable. You can also
cd /home/username/myprog and use
There are several different MPI implementations installed on Kodiak. To specify which one to use, create a file in your home directory named
.mpi-selector. That file should contain a single line with the "name" of the MPI implementation to use for compilation and execution of your programs. The supported MPI versions (and .mpi-selector text) are listed below.
When you log in, the system checks for this file and set up your MPI environment for you. To make sure your .mpi-selector file is correct, after logging in you can check to see which MPI tools, e.g., mpicc, will be run.
% cat $HOME/.mpi-selector openmpi-1.6.5 % which mpicc /usr/local/openmpi-1.6.5/bin/mpicc
which mpicc returns
/usr/bin/which: no mpicc in (...), then the .mpi-selector file is incorrect.
The maximum number of jobs allowed to run on Kodiak is 256. This is actually the maximum number of processors/cores, so if your parallel program uses 32 processes, you would only be allowed to run 8 jobs. Jobs running on Kodiak have a maximum time limit of 5000 hours. Most accounts do not have a quota on disk space.
Note: We reserve the right to add or modify runtime limits and quotas at any time.How do I print from the ARCS systems?
You don't. ITS does not allow printing from the ARCS systems. To print a file, you need to transfer it to your local desktop computer and print it from there.How do I move files between my computer and the ARCS systems?
To transfer files between a Windows system and an ARCS system, you should use WinSCP. Faculty and staff can download WinSCP from Baylor's AppCenter page. Students can get WinSCP from the official site. WinSCP is also installed on many of the PCs in the Electronic Library Compute Facilities.
Mac OS X and Linux users should use the sftp (or scp) command from a Terminal window or console.How can I get Linux installed on my computer?
Contact ARCS staff for assistance.