This guide is designed to provide the basic level of information a user would need to access and use the Hyalite HPC resource.  For additional help or to schedule a training email us at [email protected].


 Logging in to Hyalite

The Hyalite HPC system as two login/head nodes.  These login/head nodes are to be used for file transfer and for launching jobs on the cluster using the SLURM job scheduler.  DO NOT execute computation on the login nodes!  This will potentially make them inaccessible for other users and can bring the cluster down for EVERYONE which will upset other users and make you unpopular.

To access the login/head nodes you must use ssh (Secure Shell) to get a terminal on the login/head node with your MSU-NETID  and password as your credentials.  The command should look like this

ssh [email protected]

You will be asked to input your NETID password to authenticate yourself. 

Note: You cannot access the login node from off-campus without using the MSU VPN.  


 

Transfer Files To and From Hyalite

Moving data to and from the cluster can be done two ways.

  • The first is using Globus which is the fastest and most robust way to transfer data to and from the cluster – it does require you sign up for a Globus account first and install the Globus connect software on your device. Our globus endpoint on the cluster is montana#hyalite  and you will be asked to enter your username and password when you access them – which is your MSU-NETID and password.
  • The second is sftp/scp. We recommend CyberDuck as it is cross-platform, easy to use, and includes support for remote editing.  You can use hyalite.rci.montana.edu as the “Host” username and password are your MSU-NETID and password and the “Port” is 22.

Globus is recommended as this is rather “fire and forget” and will retry transfers if the connections are broken as well as email you when transfers are complete. More information can be found here (Globus Quick Start Guide).

File Locations

  • /home/netID
    • System home director – Environment settings only.  Space is very limited and quoat is enforced on this.
  • /mnt/lustrefs/scratch/netID
    • Temporary storage for your working files (Lustre) - this is where you should launch sbatch from and deposit computational output
 


The system home directory space is very limited so all data should be stored on the Lustre (/mnt/lustrefs) filesystem.  


 

Using SLURM – an example slurm script

Tools that you want to run are embedded in a command script and the script is submitted to the job control system using an appropriate SLURM command (additional information about sbatch can be found here http://www.schedmd.com/slurmdocs/sbatch.html).  For a simple example that just prints the hostname of a compute host to both standard out and standard err, create a file called example.slurm with the following content:

#!/bin/bash
#SBATCH -J hello.R # Name for your job
#SBATCH -n 8 # Number of tasks when using MPI. Default is 1, and SLURM assumes the usage of 1 cpu per task.
#SBATCH -N 1 # Number of nodes to spread cores across - default is 1 - if you are not using MPI this should likely be 1
#SBATCH --mem 16000 # Megabytes of memory requested. Default is 2000/task.
#SBATCH -t 0-00:05:00 # Runtime in days-hours:minutes:seconds.
#SBATCH -p defq # Partition to submit to the standard compute node partition(defq) or the express node partition(express)
#SBATCH -o example-%j.out.txt # Standard output (stdout) goes to this file (what would print to the screen if you were running the command locally)
#SBATCH -e example-%j.err.txt # Standard error (stderr) goes to this file (errors that would print to the screen if you were running the command locally)
#SBATCH --mail-user [email protected] # this is the email you wish to be notified.
#SBATCH --mail-type ALL # this specifies what events you should get an email about ALL will alert you of job beginning, completion, failure, etc.

module load somemodule #if you want to load a module use this and the above common with the module name you wanted loaded.
myscript-or-command #put the commands you want to run - the same way you run it from the command line normally

The slum script you wish to use and the sbatch slurm file should be located in the same location you want your output and should be launched from that location. That location on the Hyalite should be located somewhere within your /mnt/lustrefs/scratch/MSU-NETID/ directory (that is the Lustre file system directory 700TB of scratch space on the cluster).  If you run jobs from outside of the Lustre scratch directory on the normal file system you could fill up the usable disk space since it is limited which would bring the cluster down for everyone using it.  Also, the Lustre file system has been optimized to provide a high number of I/Os for superior read and write performance as it is a parallel file system.

MPI Specific Example Slurm batch script:
#!/bin/bash
#SBATCH --time=1:00:00 # walltime, abbreviated by -t
#SBATCH --nodes=2 # number of cluster nodes, abbreviated by -N
#SBATCH -o slurm-%j.out-%N # name of the stdout, using the job number (%j) and the first node (%N)
#SBATCH --ntasks=40 # number of MPI tasks, abbreviated by -n
# additional information for allocated clusters
#SBATCH --partition=defq# partition, abbreviated by –p
# load appropriate modules if necessary (an MPI module in this case)
module load mpi/OpenMPI/1.8.4-GCC-4.8.4
# run the program NOTE the command "mpirun" is required along with "-n" the number of tasks 
mpirun –n 40 my_mpi_program > my_program.out

Submit a job script to SLURM

  sbatch example.slurm

When command scripts are submitted, SLURM looks at the resources you’ve requested and waits until an acceptable compute node or nodes are available on which to run it. Once the resources are available, it runs the script as a background process (i.e. you don’t need to keep your terminal open while it is running), returning the output and error streams to the locations designated by the script.

You can monitor the progress of your job using the squeue -j JOBID command, where JOBID is the ID returned by SLURM when you submit the script. The output of this command will indicate if your job is PENDING, RUNNING, COMPLETED, FAILED, etc. If the job is completed, you can get the output from the file specified by the -o option. If there are errors, they should appear in the file specified by the -e option.


To run an interactive SLURM Session

  srun -I -p express -N 1 -n 1 --pty -t 0-00:05 /bin/bash

This will try and run an interactive session in the express partition for 5 minutes on one node with 1 core.  It runs the bash shell.  For this to work there have to be resources immediately available.  (Try defq instead of express if this is not the case.) This is useful if you need to play around in the environment prior to starting a large job.  Additional information about srun can be found here http://www.schedmd.com/slurmdocs/srun.html


To run an interactive session with X11 Forwarding enabled to be able to use GUI applications

When sshing to the login/head nodes use

  ssh -Y [email protected]

Then run the interactive x11 slurm session by loading the module and running the following:

module load srun.x11/1  
srun.x11 -I -N 1 -n 1 -t 0-00:05  -p defq

This will launch you into a bash session on the compute node SLURM has assigned you (provided that one is immediately available) with X11 forwarding enabled on the defq partition.  From there you can launch things like web browsers and other gui tools.  For example to launch the Rstudio run:

module load rstudio/0.98.1103 R/3.2.5
rstudio

This will load Rstudio and version 3.2.5 of R to run in an X11 gui window.


 

 

Requesting Memory and using Large Memory Nodes

Most of the compute nodes have 64GB of available memory, there are 16 compute nodes with 256GB of memory and the xlarge node has 1500GB of memory. 

To have your job run on a node with large memory, just make that a requirement of the job in your batch file.  For example, if the following were present in the batch file:

#SBATCH -p defq
#SBATCH --mem 256000

... then the job would be schedule to one of the 16 compute nodes with 256GB of RAM.  If the queue was "unsafe" instead of "defq", then the job may be scheduled to the xlarge node.

 


Available Queues (partitions)

Queue Name Time Limit CPU/Node Memory (GB) Nodes Available Description
defq 24 hours 32, 40 64, 256 62 The default queue, if you do not specify a queue, this is where the job will run.
express 30 minutes 32 64 2 For testing and training, this queue has a maximum run time of only 30 minutes.  If you are testing a new batch script, this is a good place to test it first.  It is almost always available.
priority 72 hours 32, 40 64, 256 46  A high-priority queue for contributors to the cluster, jobs here will jump in front of the line if the cluster is full. 
bigjob 30 days 32, 40 64, 256 24 A queue for long-running jobs.  It is available by request only, please contact [email protected] if you need to run jobs longer than 24 or 72 hours, and the unsafe queue is inappropriate for the type of job you are running.
xlarge 7 days 40 (fast) 1550 1 Special compute nodes with more CPU/RAM than the standard compute nodes contributed by labs that need exceptional resources.  They are available for use by everyone through the "unsafe" queue.
unsafe 3 days 32, 40 64, 256, 1550 77

These jobs will run anywhere, but if the cluster becomes full, jobs on this queue will be killed and requeued by jobs that need resources.  
For example, a job on this queue could be scheduled to one of the "express" queue nodes.  In that case, if someone else ran a job on the "express" queue, and both nodes were in use, the "unsafe" job would be killed to make room for the "express" job.

 

Note that if you submit a job to SLURM that requests more wall-time or nodes than the maximum it will not run unless an admin intervenes on your behalf.  Jobs that require more nodes or longer wall-times can be requested .

The time limits for the queues are designed to keep the wait time for jobs at less than 12 hours for default queue jobs, and less than 2 hours for priority queue jobs.  If you are experiencing wait times that are longer than these, please let us know at [email protected].

 


 

Available Modules

Certain software application have been installed on the cluster partitions and are available to use.  To view the current list of software you need to enter an interactive session and then type 

module avail

This will show all the software modules available on for the nodes on that partition.  

The "avail" command can also be used to search for modules, but it is case-sensitive, and only searches from the beginning of the module name.  For a more flexible search, use the "module-search" (aliased to just "ms") command.  Since it searches the entire name, it can be problematic for searching for tools like "R" (which will return every module with the letter 'r' in it).

For example:

[user@hyalite ~]$ ms fftw
FFTW/3.3.4-gompi-1.7.20
FFTW/3.3.4-gompi-2015b
FFTW/3.3.4-gompi-2016a
FFTW/3.3.4-gompi-2016b
fftw3/openmpi/gcc/64/3.3.4
numlib/FFTW/3.3.4-gompi-1.7.20
numlib/FFTW/3.3.4-gompi-2015b
numlib/FFTW/3.3.4-gompi-2016a
numlib/FFTW/3.3.4-gompi-2016b

To use the software you need to load the module.  For example we can load a module called R/3.2.5 by typing:

module load R/3.2.5

After that the R programming software is available in my current path and can be called and executed.  When using the module commands, you can use the tab-complete feature of the command line (partially typing the name of a module, and clicking tab to autocomplete the name if it is unambiguous).

If you are using software installed on the nodes in these modules you will need to put the “module load” command into your slum script before the line where you call the application.


 

Other useful SLURM Commands

 

LIST JOBS:

$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
106 defq      slurm-jo  x12345   R   0:04      1 compute001

This command will only show the jobs that are on queues that you have permission to use.

Use options to get more information:

  • Jobs for one user: "squeue -u username" will show all jobs for a particular user (squeue -u `whoami` will show your jobs).
  • Jobs for all queues: "squeue -a" will show all jobs, regardless of the queues you are allowed to use.

 

GET JOB DETAILS:

$ scontrol show job 106
JobId=106 Name=slurm-job.sh
UserId=x12345(1001) GroupId=x12345(1001)
Priority=4294901717 Account=(null) QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
RunTime=00:00:07 TimeLimit=UNLIMITED TimeMin=N/A
SubmitTime=2013-01-26T12:55:02 EligibleTime=2013-01-26T12:55:02
StartTime=2013-01-26T12:55:02 EndTime=Unknown
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=defq AllocNode:Sid=atom-head1:3526
ReqNodeList=(null) ExcNodeList=(null)
NodeList=compute001
BatchHost=compute001
NumNodes=1 NumCPUs=32 CPUs/Task=1 ReqS:C:T=*:*:*
MinCPUsNode=1 MinMemoryNode=0 MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=0 Contiguous=0 Licenses=(null) Network=(null)
Command=/home/x12345/slurm-job.sh
WorkDir=/mnt/lustrefs/scratch/x12345/

KILL A JOB:

$ scancel 106
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)

Users can only kill their own jobs

HOLD A JOB:

$ squeue
JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
139      defq   simple  x12345  PD       0:00      1 (Dependency)
138      defq   simple  x12345   R       0:16      1 compute001
$ scontrol hold 139
$ squeue
JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
139      defq   simple  x12345  PD       0:00      1 (JobHeldUser)
138      defq   simple  x12345   R       0:32      1 compute001

RELEASE A JOB:

$ scontrol release 139

$ squeue
JOBID PARTITION     NAME     USER  ST       TIME  NODES NODELIST(REASON)
139      defq   simple  x12345  PD       0:00      1 (Dependency)
138      defq   simple  x12345   R       0:46      1 compute001

LIST PARTITIONS:

$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
defq* up 1-00:00:00 1 down* compute050
defq* up 1-00:00:00 1 drain compute026
defq* up 1-00:00:00 2 mix compute[007,038]
defq* up 1-00:00:00 54 alloc compute[003-006,008-025,027-037,039-049,051-060]
express up 30:00 2 idle compute[001-002]
priority up 3-00:00:00 1 down* compute050
priority up 3-00:00:00 1 drain compute026
priority up 3-00:00:00 1 mix compute038
priority up 3-00:00:00 46 alloc compute[012-025,027-037,039-049,051-060]
bigjob up 30-00:00:0 1 down* compute050
bigjob up 30-00:00:0 1 drain compute026
bigjob up 30-00:00:0 2 mix compute[007,038]
bigjob up 30-00:00:0 54 alloc compute[003-006,008-025,027-037,039-049,051-060]

LIST PARTITION INFORMATION:

$scontrol show partition defq
 AllowGroups=ALL AllowAccounts=ALL AllowQos=ALL
AllocNodes=ALL Default=YES
DefaultTime=00:15:00 DisableRootJobs=NO GraceTime=0 Hidden=NO
MaxNodes=8 MaxTime=1-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED
Nodes=compute[003-060]
Priority=1 RootOnly=NO ReqResv=NO Shared=NO PreemptMode=OFF
State=UP TotalCPUs=1856 TotalNodes=58 SelectTypeParameters=N/A
DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED

LIST NODE INFORMATION:

$scontrol show node compute001
 NodeName=compute001 Arch=x86_64 CoresPerSocket=8
CPUAlloc=0 CPUErr=0 CPUTot=32 CPULoad=0.00 Features=(null)
Gres=(null)
NodeAddr=compute001 NodeHostName=compute001 Version=14.11
OS=Linux RealMemory=64523 AllocMem=0 Sockets=2 Boards=1
State=IDLE ThreadsPerCore=2 TmpDisk=2015 Weight=1
BootTime=2016-07-14T16:18:46 SlurmdStartTime=2016-07-14T16:21:22
CurrentWatts=0 LowestJoules=0 ConsumedJoules=0
ExtSensorsJoules=n/s ExtSensorsWatts=0 ExtSensorsTemp=n/s