ARCS System Support
Other Stuff
Online Documentation
Other HPC and Research Related Resources
What computer systems are available?
ARCS supports several computer systems available for academic and research purposes. These include a 128 node (1024 core) Linux cluster, a 32 node (64 core) Linux cluster, a 4 processor (dual core) Linux system, and an 4 processor OpenVMS system. For details on these systems, please see the ARCS Systems page.
How do I get a computer account?
To get a computer account on one of the ARCS systems, contact Mike Hutcheson.
How do I connect and log in?
Use ssh to log in to the ARCS systems. Telnet is not supported.
Windows users should use PuTTY. Faculty and staff can download PuTTY from Baylor's AppCenter page. Students can get PuTTY from the official site. PuTTY is also installed on many of the PCs in the Electronic Library Compute Facilities.
Mac OS X users should connect with the ssh command in a Terminal window. Classic MacOS users should upgrade to Mac OS X.
Linux users should connect with the ssh command in an xterm window or the console.
To connect to Kodiak with ssh, enter the following:
% ssh -l username kodiak.baylor.edu
or
% ssh username@kodiak.baylor.edu
If your username on the ARCS system happens to be the same as that on your local system, you can omit the "-l username". Occasionally we update the operating system on the ARCS systems, so you may get the following message when you try to connect:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
To fix this, edit the file $HOME/.ssh/known_hosts and delete the line for the system and then reconnect. If you are concerned, you can contact us for assistance.
The first time you access your account you should change your password. To do so, type the command passwd and follow the prompts. (On the VMS system, use the SET PASSWORD command.) On Linux systems, the "bash" shell is the default shell. If you require a different shell please let the system administrator know.
How do I compile my program?
To compile serial (non-parallel) programs, use the Intel C/C++ compiler (icc), the Gnu C/C++ compiler (gcc), or the Intel Fortran compiler (ifort). On Rush, you can compile OpenMP programs with the Intel compilers.
To compile parallel MPI programs on the Cluster, use mpicc (for C/C++) and mpifort (for Fortran).
How do I run my program on Rush?
Although technically you can run your program from the command line, you should use the batch system. To do so, you will first need to create a shell script and place the command inside to run your binary. Below is a sample shell script called run_myprog.sh. This assumes that your program is named "myprog" and is in a directory, "/home/username/myprog/".
#!/bin/sh
/home/username/myprog/myprog
To submit the job to the batch system, use the qsub command. Note that you cannot qsub your binary directly but instead must use a shell script.
% qsub -N myprog ./run_myprog
1234.rush.baylor.edu
The qsub command will return with the job identifier assigned to the job by the batch system, e.g., 1234 above.
When the job finishes, you should find myprog.e1234 and myprog.o1234 in the myprog directory. These correspond to your program's stderr and stdout. If you want to see what is currently running in batch, use the qstat command:
% qstat
Job id Name User Time Use S Queue
------------------- ---------------- ---------------- -------- - -----
1234.rush myprog me 0 R batch
If your program is running (or waiting in the batch queue) and you want to stop it, use the qdel command:
% qdel 1234
Note for matlab users: The "matlab" command requires specific arguments in order to function correctly in batch. There's a sample shell script located in /usr/local/examples called run_matlab.sh that you may copy to your account and use to submit your matlab jobs to the batch system.
Here's an example of how to submit a matlab program to the batch system:
qsub -N MyMatlabProgram ./run_matlab.sh
How do I run my program on the Cluster?
To run your program on the cluster, you should submit it to the batch system. This includes serial (non-parallel) programs as well. To do so, use the scasub command. scasub is a wrapper script for qsub. To submit a parallel (MPI) program, you will need to include the -mpimon option as well as specify the number of processes per node (-npn 2) and the number of processes you want to run (-np 8).
% scasub -mpimon -npn 2 -np 4 ./myprog
5678.el-hpc-cluster1a.baylor.edu
The scasub command will return with the job identifier assigned to the job by the batch system, e.g., 5678 above.
When the job finishes, you should find myprog.e5678 and myprog.o5678 in the program's directory. These correspond to your program's stderr and stdout. Note that output from multiple MPI programs may be jumbled. If you want to see what is currently running in batch, uses the qstat command:
% qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
5677.el-hpc-clus someprog someuser 0 R scali_exec
5678.el-hpc-clus myprog me 0 R scali_exec
You can also use the scatop command to get CPU and memory usage of programs running on the computer nodes:
% scatop
Node : PID USER PRI NI SZ RSS STAT %CPU %MEM TIME COMMAND
n01 : 11613 someuser 14 0 330624 2660 R 94.0 0.0 0:02 someprog
n01 : 11614 someuser 14 0 330624 2660 R 93.0 0.0 0:02 someprog
n02 : 11537 someuser 14 0 330624 2660 R 93.6 0.0 0:02 someprog
n02 : 11538 someuser 14 0 330624 2660 R 94.3 0.0 0:02 someprog
n03 : 11202 someuser 14 0 330624 2660 R 75.0 0.0 0:03 someprog
n03 : 11203 someuser 14 0 330624 2660 R 75.0 0.0 0:03 someprog
n04 : 11361 someuser 14 0 330588 2440 R 72.2 0.0 0:02 someprog
n04 : 11362 someuser 14 0 330624 2668 R 72.5 0.0 0:02 someprog
n05 : 11351 someuser 14 0 330624 2660 R 79.7 0.0 0:03 someprog
n05 : 11352 someuser 14 0 330624 2668 R 80.0 0.0 0:03 someprog
n06 : 11479 someuser 14 0 330624 2660 R 78.0 0.0 0:03 someprog
n06 : 11480 someuser 14 0 330624 2660 R 77.5 0.0 0:03 someprog
n07 : 11415 someuser 14 0 330624 2660 R 77.7 0.0 0:03 someprog
n07 : 11416 someuser 14 0 330624 2660 R 77.7 0.0 0:03 someprog
n08 : 11468 someuser 14 0 330624 2660 R 80.2 0.0 0:03 someprog
n08 : 11469 someuser 14 0 330624 2668 R 80.5 0.0 0:03 someprog
n28 : 9554 me 16 0 1717 3452 R 79.0 0.0 0:00 myprog
n28 : 9555 me 16 0 1764 3648 R 80.0 0.0 0:00 myprog
n29 : 9449 me 17 0 1476 3648 R 75.0 0.0 0:00 myprog
n29 : 9450 me 17 0 1734 3452 R 75.0 0.0 0:00 myprog
If your program is running (or waiting in the batch queue) and you want to stop it, use the qdel command:
% qdel 5678
How do I use MPICH (mpirun) instead of Scali MPI?
By default, when you log on to the Cluster, your environment is set up to use Scali for MPI. If you want to switch to MPICH, run the program /usr/local/bin/usempich. You must run it with the source command so that it executes in the current shell and not a sub-shell:
% source usempich
Now using MPICH.
If you want, you can check your $PATH variable to make sure it worked:
% echo $PATH
/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin
% source usempich
Now using MPICH.
% echo $PATH
/usr/local/mpich/bin:/usr/local/bin:/bin:/usr/bin:/usr/X11R6/bin
Notice that the MPICH directory is now the first item in your PATH. Commands such as mpicc and mpirun will now run from the MPICH directory instead of Scali MPI. If you log out and log back in, your environment will be reset to Scali MPI, so you will need to souce usempich again. There is no "usescali" command. If you wish to swich back to Scali MPI, log out and log back in.
If your program has been compiled to use Scali MPI, you will need to recompile it to use MPICH instead, and vice versa. The compile commands are the same but are now located in the MPICH directory. Likewise, if your program has been compiled to use MPICH and you wish to switch back to Scali MPI, you need to recompile.
Unlike Scali, you don't use the scasub command to submit your program to batch. Instead, you will need to use mpirun. However, you cannot simply mpirun your MPICH program because it may interfere with programs currently running on the compute nodes. What you will need to do instead is reserve nodes with scasub and then use mpirun. To reserve nodes, specify the number of minutes you want the nodes for with the -r option. For example, if you want to reserve 4 nodes for 3 hours:
% scasub -r 180 -np 4
167.el-hpc-cluster1a.baylor.edu
% qstat
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
167.el-hpc-clust RESERVED_180 me 0 R scali_exec
Note the R in the job status. This means your job is running (or in this case, nodes are reserved). If the status is Q, then there are currently not enough nodes available and your request is waiting in the queue. When the nodes are allocated to you, you will see a file in your directory named "reserved_nodes.jobid". This is a list of nodes that you should use as the "machinefile" for the mpirun command:
% cat reserved_nodes.167
n04
n03
n02
n01
Keep in mind that this is a list of reserved nodes and not processors. Because the nodes on the cluster have two processors each, you can run 8 jobs on these 4 nodes. If you would like, you can reserve the nodes specifying number of processors and processors per node, e.g., "scasub -np 8 -npn 2" as in a regular scasub command.
These nodes will be reserved for your use for the time requested. After the time expires the job is removed from the queue and the nodes will then be available for others to use. This does not mean that any programs you are running will automatically quit when time expires so you need to ensure that your programs don't continue to run by quitting them with the kill command. If you know approximately how long it will take for your program to finish, you should request more time than will be required. You can always explicitly free up the nodes by using qdel to remove the job from the queue.
Because the Scali batch system does not support MPICH programs, you will need to use mpirun to run your program and specify the reserved_nodes.jobid file as the machinefile. Also, be sure and include -nolocal so that the job won't run on the head node.
% mpirun -machinefile reserved_nodes.167 -np 8 -nolocal ./myprog
This runs the program interactively (although on the compute nodes). You can run mpirun in the background with a & after the command. Note that unlike the batch system, standard output (and stderr) is not redirected to a file and will still appear on the terminal. You will need to add "> outfile" if you wish to save stdout. You also need to redirect stdin or the program will stop.
% mpirun -machinefile reserved_nodes.167 -np 8 -nolocal ./myprog > outfile 2> errfile < /dev/null &
[1] 18037
% tail -f outfile
data
^C
% [1]+ Done mpirun -machinefile reserved_nodes.167 -nolocal -np 8 ./myprog > outfile 2> errfile
What are the runtime limits, quotas, etc.?
How do I move files between my computer and the ARCS systems?
Use sftp (Secure FTP). You can also use scp (secure copy) but sftp is recommended. "Classic" FTP is not enabled on the ARCS systems.
To transfer files between a Windows system and an ARCS system, you should use WinSCP. Faculty and staff can download WinSCP from Baylor's AppCenter page. Students can get WinSCP from the official site. WinSCP is also installed on many of the PCs in the Electronic Library Compute Facilities.
Mac OS X and Linux users should use the sftp (or scp) command from a Terminal window or console.
How can I get Linux installed on my computer?
How can I get the Tivoli backup client installed on my Linux system?
I need help with some other Linux/Unix/VMS issue.
test