Introducton to Baskerville

Advanced Research Computing (ARC) Team

Baskerville Hackathon

Baskerville Content

  1. What is Baskerville
  2. Baskerville resources
  3. How to access Baskerville
  4. How to submit jobs to Baskerville

What is Baskerville

  • Baskerville is a tier 2 HPC (High Performance Computer)
  • It is a GPU focused system with 228 GPUs (57 nodes) in total
    • 46 Nodes with A100 GPUs
    • 11 Nodes A100-80 GPUs
  • Icelake CPUs with 72 cores per node
  • 5.4 PB storage

Note:

Baskerville has hyperthereading enabled this means that 1 core = 2 threads/tasks

Baskerville Resources

Baskerville has a number of resources availible to you:

Baskerville Hackathon

  • 24th March - 26th March you will have access to Baskerville
  • Reservation of 2 nodes (8 GPUs, 144 cores and 288 threads)
    • A reservation only allows those from seltcted projects to use them this is done through the --reservation flag
  • Everyone is part of the project ranaaaa-hackathon
  • Other projects have more memory (5 TB) and contain the data.

Baskerville Login

  • Everyone will receive a welcome email when being first added to Baskerville
  • Email points to our documentation and instructions for first time access https://docs.baskerville.ac.uk/logging-on/
    • Instructions and video present to show both what to do and what not to do
  • To setup your one time passcode you need an authenticator app please have this open and ready to scan your OTP
  • To login you will need to username, password and OTP to access Baskerville
  • Whilst ssh key option is available this is not recommended during the hackathon

Baskerville Login Example

Baskerville Storage

  • home directory /bask/homes/_inital_/_username_
    • When you first login, this is where you start
    • Only 20 GB of space - this is not where you run jobs
    • Should be for local and cached files
  • Project Directory /bask/project/_initial_/_projectname_
    • Has 5 TB space - place where you run jobs
    • Contains you data, job scripts and results

Baskerville User Details

  • https://docs.baskerville.ac.uk/storage/
  • You can inspect your details either in the admin site or in the terminal:
    • my_quota - This shows how much of your home directory is utilised
    • my_baskerville - This gives your project account and QoS details

Baskerville User Details Example

Baskerville jobs

Login and Compute nodes

  • There are 2 types of nodes on Baskerville
  • A login node is where you start and Baskerville has 3 bask-pg-login01, bask-pg-login02 and bask-pg-login03
    • Access to all Baskerville users
    • Intended for simple tasks like managing files
    • Does not have access to GPUs
    • A job submitted goes from a login node to a compute node
  • A compute node is the node that does the work nodes do not have login in their name
    • When you submit a job this is where they go

Job script

  • Job account
  • Job time
  • Job compute resources (CPUS, GPUs etc)
  • Quality of Service (QoS) for everyone in this hackathon this is bham
  • Other options are also availible https://slurm.schedmd.com/sbatch.html

Job script example

#!/bin/bash
#SBATCH --qos=bham
#SBATCH --account=ranaaaa-hackathon
#SBATCH --time=1:0:0
#SBATCH --ntasks=4
#SBATCH --gres=gpu:1
#SBATCH --reservation=ranaaaa-hackathon

module purge
module load baskerville

source activate_env

# your commands

Job script example

#!/bin/bash <--- run the job using GNU Bourne Again Shell
#SBATCH --qos=bham
#SBATCH --account=ranaaaa-hackathon
#SBATCH --time=1:0:0
#SBATCH --ntasks=4
#SBATCH --gres=gpu:1
#SBATCH --reservation=ranaaaa-hackathon

module purge
module load baskerville

source activate_env

# your commands

Job script example

#!/bin/bash
#SBATCH --qos=bham                      <--- Using bham job queue
#SBATCH --account=ranaaaa-hackathon     <--- Hackathon project account
#SBATCH --time=1:0:0                    <--- Job to run for 1 hour
#SBATCH --ntasks=4                      <--- Job requests 4 tasks
#SBATCH --gres=gpu:1                    <--- Requesting 1 GPU
#SBATCH --reservation=ranaaaa-hackathon <--- Use reserved resources

module purge
module load baskerville

source activate_env

# your commands

Job script example

#!/bin/bash
#SBATCH --qos=bham
#SBATCH --account=ranaaaa-hackathon
#SBATCH --time=1:0:0
#SBATCH --ntasks=4
#SBATCH --gres=gpu:1
#SBATCH --reservation=ranaaaa-hackathon

module purge            <--- Purging environemnt
module load baskerville <--- Loading default Baskerville modules

source activate_env     <--- Activate conda environment

# your commands         <--- Commands you want to run

Submitting a job

When you submit a job you will see:

sbatch job.sh
Submitted batch job 992337

The number (in this case 992337) is your job id this is a unique job number

SLURM

  • SLURM=Simple Linux Utility for Resource Management https://docs.baskerville.ac.uk/jobs/#slurm-jobs
    • SLURM is our job sceduler it decides the priority of jobs and when they start
    • Since this has a reservation jobs submitted will be outside of the general queue
    • Batch jobs are submitted with the sbatch command and you can monitor the job with squeue
    • If your job is in the queue you can use the sbatch --start to see an estimate of when your job will start
    • If you want to cancel a job you can use scancel XXXX where XXXX is the job id

Job results file

  • A file slurm-XXXXX.out is created where XXXXX is the job id
  • This file would be the standard output of whatever you would run
  • Below example is from running output from checking if PyTorch can see the GPUs using the command python -c "import torch; print(torch.backends.cuda.is_built())"
Miniforge3/24.1.2-0   
True
  • This shows that PyTorch has access to the GPUs
  • A file slurm-XXXXX.stats is created where XXXXX is the job id
  • Contains information on resources used (time, memory amount of CPUS and GPUs
  • Will also contain the exit code so you can see if job ended correctly)
+--------------------------------------------------------------------------+
| Job on the Baskerville cluster:
| Starting at Fri Mar 21 15:00:30 2025 for user(123456)
| Identity jobid 12345 jobname job.sh
| Running against project ace-project and in partition baskerville-shared
| Requested cpu=4,mem=12G,node=1,billing=4,gres/gpu=1 - 00:10:00 walltime 
| Allocated cpu=4,mem=12G,node=1,billing=4,gres/gpu=1
| Assigned to nodes bask-pg0308u37a
| Command /bask/projects/r/ranaaaa-hackathon/job.sh  
| WorkDir /bask/projects/r/ranaaaa-hackathon    
+--------------------------------------------------------------------------+
+--------------------------------------------------------------------------+
| Finished at Fri Mar 21 15:00:35 2025 for user(123456) on the Baskerville Cluster
| Required (00:01.942 cputime, 3580K memory used) - 00:00:05 walltime
| JobState COMPLETING - Reason None
| Exitcode 0:0
+--------------------------------------------------------------------------+

Baskerville Conda Environment

  • In the project location /bask/projects/r/ranaaaa-hackathon there are the following:
    • Conda environment conda_env - this contains all the software you will need for this hackathon
    • create_conda_env.sh - This file has 2 purposes, create conda_env point all users to the already existing conda environment
    • job.sh An example job script that can be copied and editted by all
    • activate_env - loads modues and activates conda environment
    • interactive.sh - starts an interactive job placing users directly onto a compute node

Baskerville interactive jobs

  • An interactive job follows the same general format as a batch job script - https://docs.baskerville.ac.uk/interactive-jobs/
  • It contains srun instead of sbatch and does not need #SBATCH in front of each option
  • You will be on a compute node first thing to do is source activate_env to have acccess to all the software required for this hackathon
srun --export=USER,HOME,PATH,TERM,DISPLAY --account=ranaaaa-hackathon --qos=bham --nodes=1 --ntasks=4 --time=0-01:0:0 --reservation=ranaaaa-hackathon --pty /bin/bash

Baskerville Portal

  • Baskerville Portal provides web-based access to a range of Baskerville services and some GUI applications https://docs.baskerville.ac.uk/portal/portal/, such as JupyterLab:
    • JupyterLab will submit jobs on the reservation for ranaaaa-hackathon
    • You can access the conda environment in Jupter https://docs.baskerville.ac.uk/portal/jupyter-conda/ by ticking “Show Conda Environments”
    • Make sure you have run create_conda_env.sh which adds the environment to your ~/.conda/environments.txt

Warning:

Each JupyterLab session will use a quarter of a node (1 GPU and 36 threads) so be considerate of others

Baskerville Tips

Thank You