Skip to content

Cluster Etiquette

A cluster is a shared commons. Its performance and fairness depend on every user acting as a responsible steward of shared resources. The following principles follow directly from treating your colleagues' time as seriously as your own.


๐Ÿšซ The Login Node โ€” No Exceptions

๐Ÿšซ Never run computations on the login node.

The login node enforces a hard per-user limit of 1 CPU core and 1 GB of RAM via kernel cgroups. Processes that exceed these limits are killed automatically, without warning and without saving state. More importantly, the login node is shared simultaneously by every connected user โ€” a runaway process degrades the experience for everyone. When in doubt, use salloc.


๐Ÿšซ Do Not Attempt to SSH Directly into Compute Nodes

Direct SSH access to compute nodes is not a policy request โ€” it is technically enforced. The cluster uses FreeIPA Host-Based Access Control (HBAC) rules, which whitelist exactly which users may authenticate to which hosts. Regular user accounts are not permitted to SSH to compute nodes under any circumstances; only the Slurm daemon and system administrators are.

If you find yourself wanting to SSH into a node โ€” to check on a running job, inspect memory usage, or debug interactively โ€” the correct tools are:

# Attach to a running job's context and open a shell inside it
srun --jobid=JOB_ID --pty bash
# and run btop, htop or whatever you prefer 
# Or, for a fresh interactive session on a compute node
salloc --ntasks=1 --cpus-per-task=4 --mem=8G --time=01:00:00 --pty bash -l


๐Ÿ“ Use $SLURM_TMPDIR for Temporary Job I/O

The cluster does not provide a shared /scratch filesystem. Instead, each job is given a private temporary directory on the local disk of the compute node, exposed via the environment variable $SLURM_TMPDIR. This directory offers fast local I/O with no network overhead โ€” ideal for intermediate files, temporary datasets, and anything that does not need to outlive the job.

#!/bin/bash
#SBATCH --job-name=my_analysis
#SBATCH ...

# $SLURM_TMPDIR is set automatically by Slurm for every job
# It is fast local storage, private to this job
echo "Temporary workspace: $SLURM_TMPDIR"

# Copy input data from /home or /project into local tmp
cp /home/your_username/data/input.hdf5 $SLURM_TMPDIR/

# Run your code, writing outputs to local tmp
cd $SLURM_TMPDIR
./my_calculator --input input.hdf5 --output result.dat

# Copy results you want to keep back to permanent storage before the job ends
cp $SLURM_TMPDIR/result.dat /home/your_username/results/

โš ๏ธ $SLURM_TMPDIR is destroyed when your job ends.

Slurm wipes this directory automatically at job completion โ€” whether the job succeeded, failed, or was cancelled. Any file you want to keep must be copied to /home/user or /project before your script exits. Make this the last step in every batch script that writes intermediate files.

The size of $SLURM_TMPDIR depends on the local disk capacity of the assigned node. For jobs generating very large temporary files, check with the support team on per-partition limits before designing your I/O strategy around it.


โฑ๏ธ Request Only the Resources You Will Actually Use

Over-requesting CPUs, memory, or time is one of the most common and costly forms of cluster misuse. Resources held by your job are unavailable to everyone else โ€” including your own future jobs, since many schedulers factor in historical efficiency.

  • CPUs: If your code is single-threaded, request --cpus-per-task=1. Requesting 32 cores for a serial job removes 31 cores from the shared pool for no gain.
  • Memory: Use sacct to inspect the actual MaxRSS of past jobs and calibrate accordingly. A 20โ€“30% margin above your typical peak is reasonable; 10ร— is not.
  • Time: Jobs that run over their time limit are killed by Slurm. But requesting 48 hours for a 3-hour job prevents the backfill scheduler from slotting your job into short gaps, lengthening your own queue wait. Profile first, then budget with a modest margin.
    # Review actual resource usage of your past jobs
    sacct --user=your_username \
          --starttime=$(date -d '30 days ago' +%Y-%m-%d) \
          --format=JobID,JobName,Elapsed,CPUTime,MaxRSS,ReqMem,State
    

๐Ÿงช Always Test at Small Scale First

Before submitting a large job array or a multi-day run, submit a single short test job with a small time limit and reduced problem size. Verify that it:

  • Completes without errors
  • Produces output of the expected format and magnitude
  • Uses approximately the resources you requested

Discovering a path typo, a missing module, or a segfault after one task costs you one short queue wait. Discovering it after 200 tasks have run โ€” or failed โ€” costs the cluster far more, and you the embarrassment of filing a bug report with no useful data.


๐Ÿ”ข Throttle Job Arrays

A job array with hundreds of tasks submitted without a concurrency cap can flood the queue and starve other users. Always use the %N modifier to limit simultaneous running tasks:

# Run at most 20 tasks at a time from a 500-task array
#SBATCH --array=1-500%20

There is no hard rule on the right cap โ€” it depends on the array size, the partition, and the current load. As a rough guide: if your array would occupy more than roughly 20โ€“25% of a partition's total cores on its own, reduce the concurrency.


๐Ÿ—‚๏ธ Keep /home and /project Tidy

/home and /project are backed-up, shared network filesystems. They are not unlimited, and heavy parallel I/O from many jobs writing simultaneously to these paths can saturate the storage network and slow down the entire cluster.

  • Use $SLURM_TMPDIR for all intermediate, temporary, and throwaway files during a job run.
  • Write final outputs โ€” the results you actually care about โ€” back to /home or /project at the end of your job.
  • Audit your usage periodically and remove data you no longer need. Storage quotas exist and will be enforced.
    # Find and list your largest files under /home
    du -ah /home/your_username/ | sort -rh | head -20
    

๐Ÿ’ค Release Interactive Sessions When Done

An salloc session holds a real compute allocation for its entire duration, even if you are idle, have gone for lunch, or forgotten the terminal is open. Other users' jobs are queued behind the resources your session is holding.

  • Exit your session with exit as soon as your interactive work is complete.
  • Request realistic time limits for salloc โ€” --time=01:00:00 for a debugging session, not --time=24:00:00.
  • If you realise you no longer need an allocation, cancel it explicitly:
    # Cancel an interactive allocation by job ID
    scancel JOB_ID
    

๐Ÿ“ฃ Do Not Submit the Same Job Repeatedly Without Reading the Error

If a job fails, read the .err log before resubmitting. Submitting the same broken job five times in a row wastes queue slots and contributes to unnecessary scheduler churn. The error log almost always contains enough information to diagnose the problem โ€” check Troubleshooting if you are unsure how to interpret it.


๐Ÿ“ฌ Contact HPC Admins.

For account issues, software requests, quota increases, or anything not covered here, email admins@lcm.mi.infn.it. Please include your username, relevant Job ID(s), and the full text of any error messages.