# What type of compute nodes?
qhost -F arch | tail -n +4 | xargs -l2 | grep -v ^sge | awk '{print $12,$3}' | awk -F = '{print $2}' | sort | uniq -c | awk 'BEGIN {print "CPU-type\t\t# nodes\t\t#cores/node\t# tot. cores"} {SUM_NODES +=$1; SUM_CORES +=$1*$3; {printf "%-16s %8d\t %8d\t\t %8d\n", $2,$1,$3,$1*$3}} END {print "TOTALS\t\t\t"SUM_NODES"\t\t-\t\t\t"SUM_CORES}'
qhost -F arch | tail -n +4 | xargs -l2 | grep -v ^sge | awk '{print $12,$3,$8}' | awk -F = '{print $2}' | sort | uniq -c | awk 'BEGIN {print "CPU-type\t\t# nodes\t\t#cores/node\t# tot. cores\t\tmemory/core (GB)\ttot memory (GB)"} {SUM_NODES +=$1; SUM_CORES +=$1*$3; SUM_MEM +=$4; {printf "%-16s %8d\t %8d\t\t %8d\t\t %.3f\t\t\t %.3f\n", $2,$1,$3,$1*$3,$4/$3,$4}} END {print "TOTALS\t\t\t"SUM_NODES"\t\t-\t\t\t"SUM_CORES"\t\t -\t\t\t"SUM_MEM}'
In this presentation we assume that:
If you will be running this presentation as a jupyter notebook:
$ which indicates the terminal prompt If you use a terminal and SSH to connect to the Hoffman2 Cluster:
open a terminal on your local computer and SSH into the Hoffman2 Cluster with the command (substitute joebruin w/ your Hoffman2 user name):
ssh joebruin@hoffman2.idre.ucla.edu
when applicable cut and paste the commands from the slides omitting the $ character which is included to indicate the unix (or terminal) prompt
A summary of the commands is also available as a text file in:
/u/project/systems/PUBLIC_SHARED/dauria/F2023-INTRO-TO-H2/running-jobs.txt
If use a remote desktop (NoMachine or X2Go) to connect to the Hoffman2 Cluster
This presentation is a jupyter notebook, if you so choose you can run it following these steps:
download the python script h2jupy with the command:
$ curl -O https://raw.githubusercontent.com/rdauria/jupyter-notebook/main/h2jupynb
or:
$ wget https://raw.githubusercontent.com/rdauria/jupyter-notebook/main/h2jupynb
run the script with, for example, if your Hoffman2 Cluster account is joebruin (substitute joebruin w/ your user name):
$ python h2jupynb -u joebruin -t 2 -m 5
or:
$ python3 h2jupynb -u joebruin -t 2 -m 5
New button and select terminal$ sign):$ cp /u/project/systems/PUBLIC_SHARED/dauria/F2023-INTRO-TO-H2/H2HH-jobs.ipynb ./
navigate back to the Jupyter Notebook homepage and search for and double click on:
H2HH-Running-jobs.ipynb
highp refers to the use of group owned compute nodesshared refers to the use of temporarily unused group owned compute nodescampus refers to compute nodes owned by OARC/IDRE and made available to the UCLA communityOpen a terminal on the Hoffman2 Cluster and issue:
$ myresources
if the first line of your ouput contains:
User joebruin is in the following resource group(s): campus
you do NOT have access to group-owned compute nodes and can only run for up to 24 hours on nodes owned by OARC/IDRE
if the first line of your output contains:
User joebruin is in the following resource group(s): gobruins evebruin
you have access to the nodes purchased by groups: gobruins and evebruin and you can run for up to 24 hours on shared queues and for up to 14 days when requesting to run on owned resources (highp mode)
# Do I have access to highp resources?
# if you are running this presentation as a jupyter notebook you can test your resources by running this cell:
myresources -u rdtest
To find out paste in a terminal connected to the clster the command (omitting the $ character indicative of the unix prompt):
$ myresources
what do you see?
Any work that will use substantial computational resources should be run on compute nodes and not on the login nodes.
To get an interactive session on one core of a compute node, from a terminal issue the following command(omitting the $ character indicative of the unix prompt):
$ qrsh
What happens?
(To terminate your interactive session, after the prompt returns, type: Control + d or logout)
Customizing your interactive session. To request:
a specific runtime of, for example, 12 hours, use:
$ qrsh -l h_rt=12:00:00
a specific amount of memory, for example 4GB, use:
$ qrsh -l h_data=4G
an entire node in exclusive mode (e.g., all of its cores and memory):
$ qrsh -l exclusive
a session on group-owned nodes (check first if you have access with the command myresources):
$ qrsh -l highp
access to a GPU card:
$ qrsh -l gpu,cuda=1
Customizing your interactive session. To request multiple computing cores:
from the same node (server), use, for example to request 8 cores:
$ qrsh -pe shared 8
across multiple nodes (servers), use, for example to request 42 cores:
$ qrsh -pe dc* 42
See also: https://www.hoffman2.idre.ucla.edu/Using-H2/Computing/Computing.html#requesting-multiple-cores
Putting it all together a few examples:
To request an interactive session for 1 hour with 4GB per core and 6 cores on the same node:
$ qrsh -l h_rt=1:00:00,h_data=4G -pe shared 6
To request an interactive session for 2 hours with 3GB per core and 48 cores across any node:
$ qrsh -l h_rt=2:00:00,h_data=3G -pe dc* 48
https://www.hoffman2.idre.ucla.edu/Using-H2/Computing/Computing.html#gpu-access
GPU cards available to all Hoffman2 users:
| GPU type | Compute capability | No. of CUDA cores | Global memory side |
|---|---|---|---|
| A100 | 8.0 | 6912 | 80GB |
| V100 | 7.0 | 5120 | 32 GB |
| RTX2080Ti | 7.5 | 4352 | 10 GB |
| P4 | 6.1 | 2560 | 8 GB |
https://www.hoffman2.idre.ucla.edu/Using-H2/Computing/Computing.html#gpu-access
Scheduler options to request specific GPU cards:
| GPU type | scheduler options |
|---|---|
| A100 | -l gpu,A100,cuda=1 |
| V100 | -l gpu,V100,cuda=1 |
| RTX2080Ti | -l gpu,RTX2080Ti,cuda=1 |
| P4 | -l gpu,P4,cuda=1 |
E.g. to request a session on a specific GPU card issue at the command prompt:
$ qrsh -l gpu,P4,cuda=1,h_rt=3:00:00
NOTE: GPU cards are a hot commodity and you may need to wait for a while!
To see all the CUDA GPU nodes (you may not have access to all) and their running jobs, issue at the command line:
$ qhost -l cuda.0.name=* -q -j
qhost -l cuda.0.name=* -q -j
Refer to: https://www.hoffman2.idre.ucla.edu/Using-H2/Software/Software.html
To see what applications are available in the current hierarchy, at a terminal connected to Hoffman2 issue the command:
$ module av # press enter to scroll down and exit the view
To look for a specific software, for example R, issue the command:
$ modules_lookup -m R
## Most centrally installed apps are available via `modulefiles`
## (if you are running this presentation as a jupyter notebook execute this cell):
module av --no-pager
## Most centrally installed apps are available via `modulefiles` to look for a specifc software use `modules_lookup`
## (if you are running this presentation as a jupyter notebook execute this cell):
modules_lookup
## To look for a specific appliaction say R
## (if you are running this presentation as a jupyter notebook execute this cell
## or paste the command in your terminal):
modules_lookup -m R
## Load an application in your environment
## (if you are running this presentation as a jupyter notebook execute this cell
## or paste the command in your terminal):
which R
## Load an application in your environment - continued:
## (if you are running this presentation as a jupyter notebook execute this cell
## or paste the command in your terminal):
which R
module load gcc/10.2.0; module load R/4.3.0
which R
qsub## Submitting non interactive (batch) jobs
# create a time-stamped directory, cd to it and copy in it the submission script:
# /u/local/apps/submit_scripts/submit_job.sh
timestamp=`date "+%F"`
mkdir $HOME/H2HH_$timestamp; cd $HOME/H2HH_$timestamp; pwd
if [ ! -f "submit_job.sh" ]; then
cp /u/local/apps/submit_scripts/submit_job.sh ./submit_job.sh
else
echo "File: submit_job.sh already present";
fi
# check that the submission script has been copied in the current directory:
ls -l submit_job.sh
# now submit the job:
qsub submit_job.sh
# is my job running?
myjobs
# save the job ID number into the variable $JOB_ID for later use:
JOB_ID=`myjob | grep submit_job | awk '{print $1}'`
# echo the JOB_ID:
echo "JOB_ID=$JOB_ID"
Very many jobs are constantly running on the cluster... how many?
#first four jobs queuing (status "p" pending):
qstat -s p | head -n 6
# tot. no. of currently jobs pending
qstat -s p | grep qw | wc -l
#Let's count the total number of compute cores requested using some handy command line expressions:
count=1; qstat -s p | grep qw | awk -v count=$count '{count=count+$8} END {print "Total no. of cores requested: "count}'
#first four jobs running (status "r" running):
qstat -s r | head -n 6
# tot. no. of jobs running
qstat -s r | grep r | wc -l
#Let's count the total number of compute cores currently running jobs using some handy command line expressions:
count=1 ; val=0 ; qstat -s r | grep @ | awk -v count=$count '{count=count+$9} END {print "Total no. of cores in use: "count}'
# let's take a look at the submission script:
cat submit_job.sh
# let's take a look at the joblog file:
cat joblog.${JOB_ID}
Under: https://www.hoffman2.idre.ucla.edu/Using-H2/Software/Software.html
Look for a specific software and navigate to the Batch use tab:
use, for example, the nano editor:
$ nano stata_submit.sh
paste in the shell the script, edit as needed and exit and save (Control + x)
submit the job w/:
$ qsub stata_submit.sh
# or look in:
ls /u/local/apps/submit_scripts
# Submit R jobs with R_job_submitter.sh:
# create temporary directory in your $SCRATCH and change directory to it:
if [ ! -d $SCRATCH/R_tests ]; then mkdir $SCRATCH/R_tests; fi; cd $SCRATCH/R_tests
# copy the R file R-benchmark-25.R:
if [ ! -f R-benchmark-25.R ]; then cp /u/local/apps/submit_scripts/R/R-benchmark-25.R ./;fi
# submit the R script R-benchmark-25.R to the queues using R_job_submitter.sh:
/u/local/apps/submit_scripts/R_job_submitter.sh -n R-benchmark-25.R -m 1 -t 1 -s 4 -v 4.0.2 -nts
JOB_ID2=`myjob | grep R-benchmar | awk '{print $1}'`
# echo JOB_ID:
echo "JOB_ID=$JOB_ID2"
# check the submission status of the job(s):
myjob
# check if output has been generated:
ls -ltr
# check the submission script generated by `/u/local/apps/submit_scripts/R_job_submitter.sh`:
cat R-benchmark-25.cmd
# let's check the joblog file (one of last two files in the list above):
cat R-benchmark-25.joblog.$JOB_ID2
Job R-benchmark-25, ID no. 735109 started on: n1020 Job R-benchmark-25, ID no. 735109 started at: Wed Nov 22 11:55:23 PST 2023 Loading R/4.0.2 Loading requirement: intel/.2019.2 curl/8.4.0 Currently Loaded Modulefiles: 1) intel/.2019.2 <aL> 2) curl/8.4.0 <aL> 3) R/4.0.2 Key: <module-tag> <aL>=auto-loaded R CMD BATCH --no-save --no-restore R-benchmark-25.R R-benchmark-25.out.735109 real 28.07 user 32.88 sys 2.31 Job R-benchmark-25, ID no. 735109 finished at: Wed Nov 22 11:55:51 PST 2023
# let's check the output file:
cat R-benchmark-25.out.$JOB_ID2