Skip to content

Troubleshooting & FAQ

My job is stuck in PENDING — why?

Run squeue -j JOB_ID and inspect the REASON column. Common reasons:

Reason Meaning
Resources Resources are available but not yet allocated — your job will start soon
Priority Higher-priority jobs are ahead of yours in the queue
QOSMaxCpuPerUserLimit You have hit your per-user CPU quota
ReqNodeNotAvail You requested a specific node that is down or reserved
InvalidAccount Your account is not associated with a valid Slurm account — contact support

My job failed immediately — where do I look?

# Check the job's exit state
sacct -j JOB_ID --format=JobID,State,ExitCode

# Read the error log (you specified this with --error in your script)
cat logs/job_JOB_ID.err

Common causes: missing module load in the script, incorrect file paths, insufficient memory (OOM Killed).

How do I know how much memory my job actually used?

sacct -j JOB_ID --format=JobID,MaxRSS,AveRSS,Elapsed

MaxRSS is the peak resident memory used. Use this to tune future jobs.

I need to run a Jupyter notebook — how?

Launch Jupyter on a compute node via srun, forward the port via SSH:

# Step 1: On the cluster login node, start an interactive session
srun --ntasks=1 --cpus-per-task=2 --mem=8G --time=04:00:00 --pty bash

# Step 2: On the compute node (note the hostname, e.g., node07)
source ~/venvs/myenv/bin/activate
jupyter lab --no-browser --port=8888 --ip=0.0.0.0

# Step 3: On your local machine, open an SSH tunnel
ssh -N -L 8888:node:8888 your_username@galileo.mi.infn.it

# Step 4: Open http://localhost:8888 in your browser

I accidentally ran something heavy on the login node — what do I do?

Kill the process immediately:

# Find the process
ps aux | grep your_username

# Kill it by PID
kill -9 PID

Then submit the work as a proper Slurm job. If you are unsure whether a task is "too heavy" for the login node, the answer is almost always: use salloc to be safe.


📬 Need help?

For account issues, software requests, or anything not covered here, contact the HPC support team at admins@lcm.mi.infn.it. Please include your username, job ID(s), and the relevant error output in your message.