Case Study: Running EERAD3 on the Cluster¶
This guide walks through a complete, realistic workflow for running the EERAD3 parton-level Monte Carlo generator on the cluster, from first checkout to publication-quality distributions. The physics target is the thrust (τ = 1 − T) and C-parameter distributions in e⁺e⁻ → hadrons via γ*/Z → qq̄ at NNLO QCD — a textbook application for αs determinations and event-shape phenomenology.
Every step follows the cluster's golden rules: no compiling on the login node, no running on the login node, everything through Slurm.
Table of Contents¶
- Overview and Strategy
- Step 1 — Get the Code
- Step 2 — Compile EERAD3 on a Compute Node
- Step 3 — Understand the Run Card
- Step 4 — Start Small: a Single LO Test Job
- Step 5 — Inspect the Output
- Step 6 — Profile Resource Usage
- Step 7 — Design the Full NNLO Campaign
- Step 8 — Submit the Full Campaign as Job Arrays
- Step 9 — Post-Processing with
eerad3hist - Step 10 — Scale Up
- Directory Layout Reference
- Troubleshooting
Overview and Strategy¶
Why does NNLO require six separate channels?¶
EERAD3 uses antenna subtraction to handle infrared singularities. At NNLO, the full cross section is assembled from six independent perturbative contributions — referred to as channels — which are infrared-singular individually but finite when combined:
| Channel | Content | Order |
|---|---|---|
LO |
Born-level, three partons | α⁰s (relative) |
V |
One-loop virtual | NLO |
R |
Real emission, four partons | NLO |
VV |
Two-loop double-virtual | NNLO |
RV |
One-loop real-virtual | NNLO |
RR |
Double-real, five partons | NNLO |
Each channel is a separate EERAD3 run. The eerad3hist tool combines them
into physical distributions afterward.
The recommended workflow¶
Clone → Compile → Test (LO, few shots) → Profile → Full campaign (6 channels × N seeds)
→ merge (per channel) → combine → makedist → plots
Start small. Profile. Then scale. The Vegas integrator inside EERAD3 requires a warmup phase to learn the phase space. Running too few shots in warmup gives a poorly adapted grid; running too many wastes time. This guide teaches you how to calibrate both.
Step 1 — Get the Code¶
Do this on the login node — cloning a repository is exactly the kind of lightweight task that belongs there.
# Navigate to your home directory (or /project if your group has one)
cd
# Clone the EERAD3 release repository
git clone https://gitlab.com/eerad-team/releases.git eerad3
cd eerad3
Take a moment to look at the top-level structure before doing anything else:
You will see:
bin/ ← executables are placed here after compilation
examples/ ← example run cards for all six available processes
src/
core/ ← main program, phase space, antenna functions
analyses/ ← default and custom analyses (Fortran 90 modules)
Zqq/ ← matrix elements for γ*/Z → qq̄
Hbb/ ← matrix elements for H → bb̄
Hgg/ ← matrix elements for H → gg (HTL)
Makefile
README
The run cards in examples/ are the authoritative reference for each
process. Read them before writing your own.
Step 2 — Compile EERAD3 on a Compute Node¶
Compilation requires gfortran ≥ 9.0. Do not run make on the login
node. Submit a dedicated build job.
Create the file jobs/build.sh:
#!/bin/bash
# =============================================================
# EERAD3 — Build job
# Compiles the main executable and eerad3hist.
# =============================================================
#SBATCH --job-name=eerad3_build
#SBATCH --partition=general
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=4G
#SBATCH --time=00:20:00
#SBATCH --output=logs/build_%j.out
#SBATCH --error=logs/build_%j.err
module purge
module load gcc/13
EERAD3_DIR=~/eerad3
cd ${EERAD3_DIR}
echo "Compiling EERAD3 on $(hostname) at $(date)"
echo "gfortran version: $(gfortran --version | head -1)"
# -j uses all allocated CPUs for parallel compilation
make -j ${SLURM_CPUS_PER_TASK}
echo "Build finished at $(date)"
echo "Executables:"
ls -lh bin/
Submit it:
# Create the logs and jobs directories if they do not exist
mkdir -p logs jobs
# Submit and save the job ID
BUILD_ID=$(sbatch --parsable jobs/build.sh)
echo "Build job submitted: ${BUILD_ID}"
# Watch it in the queue
watch -n 5 squeue --job=${BUILD_ID}
Once it finishes, verify:
ls -lh eerad3/bin/
# You should see:
# eerad3 ← the main Monte Carlo executable
# eerad3hist ← the Python post-processing tool
If the build fails, the first place to look is logs/build_JOBID.err.
The most common cause is a missing or wrong version of gfortran — check
the module is loaded in the script.
Step 3 — Understand the Run Card¶
A run card is a plain-text file that tells EERAD3 everything about a single run: which process, which perturbative channel, how many phase-space points, and what technical cuts to apply.
Here is the anatomy of a run card for our target: the LO contribution to thrust and C-parameter in Z → qq̄ → 3 jets.
! ─────────────────────────────────────────────
! run_Zqq_3j_LO.card
! Process: gamma*/Z -> qq-bar, 3-jet production
! Channel: LO (Born-level, three partons)
! ─────────────────────────────────────────────
! Process settings
process = 1 ! 1 = Z->qqbar | 21 = H->bbbar | 22 = H->gg
njets = 3 ! Number of hard jets in the final state
channel = LO ! LO | V | R | VV | RV | RR
! Technical settings
y0 = 1d-6 ! IR cut-off on kinematic invariants.
! Default is 1e-6; use 1e-8 for precision runs.
! Observable cuts
cut = 1d-5 ! Minimum value of the event-shape observable.
! Events below this are discarded.
sigma_obs = 0 ! Observable used to weight the integration.
! 0 = use the cross section itself (standard choice).
moment = 1 ! Power of the observable in the integrand weight.
! Vegas (phase-space integration) settings
warmup = 5 ! Number of warmup iterations (grid adaptation)
production = 5 ! Number of production iterations
shots = 100K ! Phase-space points per iteration.
! Use 100K for tests, 1M–10M for production.
shots, warmup, and production.
EERAD3 uses the Vegas adaptive Monte Carlo integrator. The warmup phase adapts the importance-sampling grid to the integrand. The production phase uses the frozen grid to accumulate statistics. A good starting point is:
- Test runs:
warmup=5, production=5, shots=100K - Production runs:
warmup=5, production=5, shots=1Mto5M
For the numerically harder channels (RV, RR),
more shots are needed for comparable statistical precision. Profile
first (see Step 6).
Create a dedicated directory for run cards:
Step 4 — Start Small: a Single LO Test Job¶
Before running anything at scale, verify that the executable runs correctly and produces sensible output. Use the smallest possible settings.
Save the following as runcards/Zqq_3j_LO_test.card:
! Test run card — small statistics, LO only
process = 1
njets = 3
channel = LO
y0 = 1d-6
cut = 1d-5
sigma_obs = 0
moment = 1
warmup = 3
production = 3
shots = 100K
Save the following as jobs/test_LO.sh:
#!/bin/bash
# =============================================================
# EERAD3 — LO test job
# Small statistics; use this to verify the build and setup.
# =============================================================
#SBATCH --job-name=eerad3_LO_test
#SBATCH --partition=general
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1 # EERAD3 is single-threaded per run
#SBATCH --mem=2G
#SBATCH --time=00:30:00
#SBATCH --output=logs/LO_test_%j.out
#SBATCH --error=logs/LO_test_%j.err
module purge
module load gcc/13
EERAD3_DIR=~/eerad3
cd ${EERAD3_DIR}
echo "─────────────────────────────────────────────"
echo " EERAD3 LO test"
echo " Host: $(hostname)"
echo " Started: $(date)"
echo " Job ID: ${SLURM_JOB_ID}"
echo "─────────────────────────────────────────────"
./bin/eerad3 \
-i runcards/Zqq_3j_LO_test.card \
-s 0
echo "─────────────────────────────────────────────"
echo " Finished: $(date)"
echo "─────────────────────────────────────────────"
Submit and monitor:
TEST_ID=$(sbatch --parsable jobs/test_LO.sh)
echo "Test job submitted: ${TEST_ID}"
# Monitor in real time
tail -f logs/LO_test_${TEST_ID}.out
A successful run prints Vegas iteration statistics — convergent estimates
with decreasing uncertainties — and terminates cleanly. A failed run
prints a Fortran runtime error or exits silently; in that case check the
.err log.
Step 5 — Inspect the Output¶
By default EERAD3 writes histogram files to a results/ subdirectory.
You will see files following the naming convention:
For our test run, look for files like:
Zqq.3j.0000.LO.0.T1.dat ← thrust τ = 1-T
Zqq.3j.0000.LO.0.C.dat ← C-parameter
Zqq.3j.0000.LO.0.LogT.dat ← dσ/d log(τ)
Zqq.3j.0000.LO.0.Log10C.dat ← dσ/d log₁₀(C)
Each file contains one histogram in plain-text format:
Have a quick look at the thrust histogram:
# Print the first few bins of the thrust distribution
head -20 results/Zqq.3j.0000.LO.0.T1.dat
# Count how many bins are non-empty
awk '$5 > 0' results/Zqq.3j.0000.LO.0.T1.dat | wc -l
# Rough sanity check: the total integral should be close to
# the LO cross section (up to the cut)
awk 'NR>0 {sum += $3 * ($2-$1)} END {print "Integral:", sum}' \
results/Zqq.3j.0000.LO.0.T1.dat
<weight> is the sum of event weights in the
bin — not normalised to the bin width. The eerad3hist makedist
command handles the normalisation, scale-variation assembly, and K-factor
division for you. Do not attempt to plot these raw files directly as
physical distributions.
Step 6 — Profile Resource Usage¶
Before launching a large campaign you need to know how long each channel
actually takes and how much memory it uses. Submit one short job per channel
with modest statistics and inspect sacct afterward.
Save as jobs/profile_all_channels.sh:
#!/bin/bash
# =============================================================
# EERAD3 — Profile all six NNLO channels.
# Runs once per channel with small statistics.
# After completion, use sacct to read Elapsed and MaxRSS.
# =============================================================
#SBATCH --job-name=eerad3_profile
#SBATCH --partition=general
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=02:00:00
#SBATCH --output=logs/profile_%x_%j.out
#SBATCH --error=logs/profile_%x_%j.err
# Channel is passed at submission time via --export=CHANNEL=...
# e.g. sbatch --export=CHANNEL=RR jobs/profile_all_channels.sh
module purge
module load gcc/13
EERAD3_DIR=~/eerad3
cd ${EERAD3_DIR}
# Write a temporary run card for this channel
TMPCARD=$(mktemp runcards/profile_XXXXXX.card)
cat > ${TMPCARD} << EOF
process = 1
njets = 3
channel = ${CHANNEL}
y0 = 1d-6
cut = 1d-5
sigma_obs = 0
moment = 1
warmup = 3
production = 3
shots = 500K
EOF
echo "Profiling channel: ${CHANNEL}"
echo "Run card: ${TMPCARD}"
echo "Started: $(date)"
./bin/eerad3 -i ${TMPCARD} -s 42
echo "Finished: $(date)"
rm ${TMPCARD}
Submit one job per channel:
for CHAN in LO V R VV RV RR; do
JID=$(sbatch --parsable \
--job-name="profile_${CHAN}" \
--export=CHANNEL=${CHAN} \
jobs/profile_all_channels.sh)
echo "Submitted ${CHAN}: job ${JID}"
done
Once all six jobs complete, read the resource usage:
sacct \
--user=your_username \
--starttime=$(date -d '3 hours ago' +%Y-%m-%dT%H:%M) \
--format=JobID,JobName%20,Elapsed,CPUTime,MaxRSS,State \
| grep profile
Timing analysis:¶
The profiling used warmup=3, production=3, shots=500k -- so 6 iterations x 500k = 3M
total phase space points per job.
| Channel | Wall time (500k shots) | Relative runtime | Notes |
|---|---|---|---|
LO |
00:00:11 | ~1× | Fast — Born-level only |
V |
00:00:11 | ~1× | One-loop special functions (HPLs) |
R |
00:08:02 | ~44× | Four-parton phase space |
VV |
00:03:14 | ~18× | Two-loop amplitudes |
RV |
00:55:33 | ~303× | Mixed real-virtual, largest integrand variance |
RR |
1-15:03:25 | ~8600× | Five-parton phase space, dominant at high multiplicity |
Use these ratios to set shots and request appropriate wall times for
production runs. RV and RR generally need 3–5× more shots than LO
to reach comparable relative statistical precision.
Step 7 — Design the Full NNLO Campaign¶
Physics reminder: njets=3 is the complete NNLO prediction for thrust and C-parameter¶
Thrust and C-parameter are event shape observale defined over the full hadronic
final state regardless of jet multiplicity. In EERAD3's framework,
njets=3 does not mean "events with exactly 3 jets" - it specifies the Born multiplicity
from which the perturbative exapansion is built. The six channels LO, V, R, VV, RV, RR with
njest=3 together give the complete NNLO prediction for these observales, already including
3-, 4- and 5-parton final states through the antenna subtraction framework.
The njest=4,5 settings in EERAD3 compute entirely different observables - jet rates and
jet-resolution scales (R_4,R_s,y_{34},y_{45}) - and are not summed with the njets=3 results.
Key principle: parallelism¶
The profiling data reveals that the six channels span four orders of magnitude in cost. A one-size-fits-all shot count is therefore wrong. The correct approach is:
- For cheap channels (
LO,V,VV,R,RV): use more shots per seed, fewer seeds. - For
RR: use fewer shots per seed so each job fits within the cluster's wall-time ceiling, and compensate with many seeds.
The scaling is:
where the factor 10/6 accounts for moving from warmup=3, production=3
(profiling) to warmup=5, production=5 (production).
| Channel | Profile time (500K shots) | shots/job | Est. wall time/job | Seeds | Est. total CPU |
|---|---|---|---|---|---|
LO |
00:00:11 | 1M | ~30 sec | 5 | ~5 CPU-min |
V |
00:00:11 | 1M | ~30 sec | 5 | ~5 CPU-min |
VV |
00:03:14 | 1M | ~10 min | 10 | ~2 CPU-h |
R |
00:08:02 | 1M | ~27 min | 10 | ~5 CPU-h |
RV |
00:55:33 | 1M | ~3.1 h | 20 | ~62 CPU-h |
RR |
1-15:03:25 | 200K | ~25 h | 80 | ~2000 CPU-h |
RR will dominate your campaign budget by two orders of magnitude. This
is expected: the five-parton double-real phase space is the most complex
integrand in the calculation.
Production run cards¶
runcards/production/Zqq_3j_LO.card:
process = 1
njets = 3
channel = LO
y0 = 1d-8
cut = 1d-5
sigma_obs = 0
moment = 1
warmup = 5
production = 5
shots = 1M
runcards/production/Zqq_3j_V.card:
process = 1
njets = 3
channel = V
y0 = 1d-8
cut = 1d-5
sigma_obs = 0
moment = 1
warmup = 5
production = 5
shots = 1M
runcards/production/Zqq_3j_R.card:
process = 1
njets = 3
channel = R
y0 = 1d-8
cut = 1d-5
sigma_obs = 0
moment = 1
warmup = 5
production = 5
shots = 1M
runcards/production/Zqq_3j_VV.card:
process = 1
njets = 3
channel = VV
y0 = 1d-8
cut = 1d-5
sigma_obs = 0
moment = 1
warmup = 5
production = 5
shots = 1M
runcards/production/Zqq_3j_RV.card:
process = 1
njets = 3
channel = RV
y0 = 1d-8
cut = 1d-5
sigma_obs = 0
moment = 1
warmup = 5
production = 5
shots = 1M
runcards/production/Zqq_3j_RR.card:
process = 1
njets = 3
channel = RR
y0 = 1d-8
cut = 1d-5
sigma_obs = 0
moment = 1
warmup = 5
production = 5
shots = 200K ! Deliberately small: ~25h per job.
! Use 80+ seeds to accumulate statistics.
Step 8 — Submit the Full Campaign as Job Arrays¶
The job script is the same for all channels; the per-channel differences (shots, wall time, number of seeds) are encoded entirely in the submission script.
jobs/run_channel.sh is unchanged from before. The critical changes are
in submit_campaign.sh:
#!/bin/bash
# =============================================================
# Submit the full NNLO campaign — calibrated to actual timings.
# Run this on the LOGIN NODE: bash submit_campaign.sh
#
# Profiling results (500K shots, 3+3 iterations):
# LO 00:00:11 V 00:00:11 VV 00:03:14
# R 00:08:02 RV 00:55:33 RR 1-15:03:25
# =============================================================
EERAD3_DIR=/home/your_username/eerad3
cd ${EERAD3_DIR}
mkdir -p logs
# ── Shots per job ────────────────────────────────────────────
# RR uses 200K (not 1M) to stay under the 48h wall-time limit.
# All other channels use 1M comfortably.
declare -A SHOTS
SHOTS[LO]="1M"
SHOTS[V]="1M"
SHOTS[R]="1M"
SHOTS[VV]="1M"
SHOTS[RV]="1M"
SHOTS[RR]="200K" # ~25h/job at 200K; do NOT increase without re-profiling
# ── Number of seeds ──────────────────────────────────────────
# RR needs many seeds to compensate for low per-job shot count.
# Total RR stats: 80 seeds × 200K shots × 10 iterations = 160M points.
declare -A NSEEDS
NSEEDS[LO]=5
NSEEDS[V]=5
NSEEDS[VV]=10
NSEEDS[R]=10
NSEEDS[RV]=20
NSEEDS[RR]=80
# ── Wall-time limits ─────────────────────────────────────────
# Based on: T ≈ T_profile × (10/6) × (shots / 500K) + 20% margin.
#
# LO/V: 30 sec → round up to 00:10:00 (minimum sensible request)
# VV: ~10 min → 00:30:00
# R: ~27 min → 01:00:00
# RV: ~3.1 h → 04:00:00
# RR: ~25 h (200K shots) → 30:00:00
declare -A TIMELIM
TIMELIM[LO]="00:10:00"
TIMELIM[V]="00:10:00"
TIMELIM[VV]="00:30:00"
TIMELIM[R]="01:00:00"
TIMELIM[RV]="04:00:00"
TIMELIM[RR]="30:00:00" # 25h estimated + 20% safety margin
# ── Memory ───────────────────────────────────────────────────
# MaxRSS was not reported by sacct for this profile run.
# Conservative safe defaults based on typical EERAD3 usage.
declare -A MEMORY
MEMORY[LO]="1G"
MEMORY[V]="1G"
MEMORY[VV]="2G"
MEMORY[R]="2G"
MEMORY[RV]="2G"
MEMORY[RR]="2G"
echo "═══════════════════════════════════════════════════════"
echo " EERAD3 NNLO campaign: Z->qqbar 3-jet"
echo " $(date)"
echo "═══════════════════════════════════════════════════════"
printf " %-6s %6s %5s %10s %s\n" "Chan" "Seeds" "Shots" "Walltime" "Est. total CPU"
# Rough CPU estimate: seeds × wall_time (in hours)
declare -A HOURS_EST
HOURS_EST[LO]="0.01"
HOURS_EST[V]="0.01"
HOURS_EST[VV]="0.16"
HOURS_EST[R]="0.45"
HOURS_EST[RV]="3.1"
HOURS_EST[RR]="25"
for CHAN in LO V VV R RV RR; do
N=${NSEEDS[$CHAN]}
H=${HOURS_EST[$CHAN]}
TOTAL=$(echo "$N * $H" | bc)
printf " %-6s %6s %5s %10s ~%.0f CPU-h\n" \
"${CHAN}" "${N}" "${SHOTS[$CHAN]}" "${TIMELIM[$CHAN]}" "${TOTAL}"
done
echo "───────────────────────────────────────────────────────"
TOTAL_RR=$(echo "${NSEEDS[RR]} * ${HOURS_EST[RR]}" | bc)
echo " RR alone: ~${TOTAL_RR} CPU-h ($(echo "$TOTAL_RR / 24" | bc) node-days)"
echo "═══════════════════════════════════════════════════════"
read -p "Proceed with submission? [y/N] " CONFIRM
[[ "${CONFIRM}" =~ ^[Yy]$ ]] || { echo "Aborted."; exit 0; }
echo ""
for CHAN in LO V VV R RV RR; do
N=${NSEEDS[$CHAN]}
LAST=$(( N - 1 ))
JID=$(sbatch --parsable \
--job-name="eerad3_${CHAN}" \
--array="0-${LAST}%10" \
--time="${TIMELIM[$CHAN]}" \
--mem="${MEMORY[$CHAN]}" \
--export=CHANNEL=${CHAN} \
jobs/run_channel.sh)
echo " [${CHAN}] submitted: job array ${JID} (seeds 0–${LAST})"
done
echo ""
echo "Campaign submitted. Monitor with:"
echo " squeue --user=$(whoami) --format='%.10i %.20j %.8T %.10M %R'"
The RR channel took 1 day 15 hours for
just 500K shots at profiling settings. At production settings (10
iterations vs 6), 500K shots would take ~39 hours — dangerously close
to a 48h wall-time ceiling with no margin for slowdowns, node
variability, or queue delays. The 200K/job setting was chosen to give a
comfortable ~25h estimate with a 30h limit. If you wish to run 500K
shots per RR job, re-profile with warmup=5, production=5,
shots=500K first and verify the actual elapsed time on your
partition.
Practical note on total campaign cost¶
With this configuration, the RR channel alone accounts for
approximately 2000 CPU-hours (80 seeds × 25 hours each). The remaining
five channels together total roughly 70 CPU-hours. Plan accordingly:
if your group has a CPU-hour quota, check your remaining allocation with
the support team before launching the full RR array.
# Check your current allocation usage (syntax varies by site)
sacct \
--user=your_username \
--starttime=$(date -d '30 days ago' +%Y-%m-%d) \
--format=JobID,CPUTimeRAW \
| awk 'NR>2 && $1~/^[0-9]/ {sum+=$2} END {printf "CPU-hours used (30d): %.0f\n", sum/3600}'
Monitoring the RR array specifically¶
Because RR jobs are long-running, it is worth checking on them
periodically:
# How many RR tasks are done / running / pending?
squeue --user=your_username \
--name=eerad3_RR \
--format="%.8T" \
| sort | uniq -c
# Estimated completion: running tasks × remaining fraction of wall time
# (use squeue -o "%i %L" to see remaining time per job)
squeue --user=your_username \
--name=eerad3_RR \
--format="%.10i %.8T %.10L"
Step 9 — Post-Processing with eerad3hist¶
Once all jobs have completed, you have a results/ directory containing
many histogram files — one per observable per seed per channel. The
eerad3hist tool processes these in three stages.
Check that all expected files are present¶
Before post-processing, verify nothing is missing:
cd /home/your_username/eerad3
# Count files per channel
for CHAN in LO V R VV RV RR; do
COUNT=$(ls results/Zqq.3j.*.${CHAN}.0.T.dat 2>/dev/null | wc -l)
echo "${CHAN}: ${COUNT} thrust histogram files"
done
If any channel has fewer files than expected, some jobs failed. Check their
.err logs and resubmit the missing seeds individually:
# Resubmit a single missing seed (e.g. seed 7 of channel RV)
sbatch \
--job-name="eerad3_RV_resub" \
--array="7-7" \
--time="12:00:00" \
--mem="4G" \
--export=CHANNEL=RV \
jobs/run_channel.sh
Stage 1 — Merge histograms from independent seeds¶
The merge command combines all seeds for a given channel into a single
set of histograms, accumulating uncertainties as bin-wise weighted means.
# Merge all channels
# The -t flag sets the tag that replaces the seed field in the output filename
for CHAN in LO V R VV RV RR; do
echo "Merging ${CHAN}..."
./bin/eerad3hist merge \
-o results/merged/ \
-t merged \
results/
echo " Done."
done
After merging, inspect the merged directory:
ls results/merged/ | grep "\.T\.dat"
# Zqq.3j.merged.LO.0.T.dat
# Zqq.3j.merged.V.0.T.dat
# Zqq.3j.merged.R.0.T.dat
# Zqq.3j.merged.VV.0.T.dat
# Zqq.3j.merged.RV.0.T.dat
# Zqq.3j.merged.RR.0.T.dat
Each seed's histogram is merged using bin-wise inverse-variance weighting. This means seeds with better (lower-error) estimates contribute more to the merged result. Seeds that ran fewer points or had higher integrand variance automatically contribute less. You do not need to worry about unequal seed statistics.
Stage 2 — Combine channels into perturbative coefficients¶
The combine command assembles the individual channels into the full LO,
NLO, and NNLO perturbative coefficients A, B, C (as in Eq. 6 of the paper):
LO→ coefficient AV+R→ coefficient B (the NLO correction)VV+RV+RR→ coefficient C (the NNLO correction)
The combine step also auto-generates a template makedist.input file
in results/combined/. This is your starting point for Stage 3.
Stage 3 — Make physical distributions with makedist¶
This is where the physics happens: makedist reads the perturbative
coefficients, applies the K-factor normalisation, runs the strong coupling
at scale μ, and assembles the full distribution with scale-variation bands.
First, inspect and edit the auto-generated makedist.input:
Edit it to match the physical setup — centre-of-mass energy at the Z pole, Standard Model parameters in the Gμ scheme:
! makedist.input — Z->qqbar at sqrt(s) = MZ
process = 1
sqrts = 91.2
njets = 3
! SM parameters (Gmu scheme)
GF = 1.16638e-5
aS[MZ] = 0.118
MASS[W] = 80.379
MASS[Z] = 91.1876
! ── Histograms ──────────────────────────────────────────────
! Format: <output_name> <LO_file> [<NLO_file>] [<NNLO_file>]
HISTOGRAMS
Thrust Zqq.3j.merged.LO.0.T.dat Zqq.3j.merged.NLO.0.T.dat Zqq.3j.merged.NNLO.0.T.dat
LogThrust Zqq.3j.merged.LO.0.LogT.dat Zqq.3j.merged.NLO.0.LogT.dat Zqq.3j.merged.NNLO.0.LogT.dat
C_param Zqq.3j.merged.LO.0.C.dat Zqq.3j.merged.NLO.0.C.dat Zqq.3j.merged.NNLO.0.C.dat
LogC_param Zqq.3j.merged.LO.0.LogC.dat Zqq.3j.merged.NLO.0.LogC.dat Zqq.3j.merged.NNLO.0.LogC.dat
END HISTOGRAMS
Run makedist:
./bin/eerad3hist makedist \
-o results/hist/ \
results/combined/makedist.input
echo "Physical distributions:"
ls results/hist/
# Thrust.LO.dat
# Thrust.NLO.dat
# Thrust.NNLO.dat
# LogThrust.LO.dat
# LogThrust.NLO.dat
# LogThrust.NNLO.dat
# C_param.LO.dat
# C_param.NLO.dat
# C_param.NNLO.dat
# ...
Reading the final output files¶
Each .dat file in results/hist/ contains a physical distribution in the
format:
sig— the cross section (divided by bin width): this is your central valuemcerror— statistical Monte Carlo uncertaintyvardown,varup— scale variation envelope (μ varied by a factor of 2)
A quick sanity-check plot with Python (run in an salloc session):
# quick_plot.py — run in an salloc session, not on the login node
import numpy as np
import matplotlib.pyplot as plt
def load_dist(fname):
data = np.loadtxt(fname)
xlow, xhigh = data[:,0], data[:,1]
xmid = 0.5*(xlow + xhigh)
sig, mcerr, vardn, varup = data[:,2], data[:,3], data[:,4], data[:,5]
return xmid, sig, mcerr, vardn, varup
fig, axes = plt.subplots(1, 2, figsize=(12, 5))
for ax, obs, label in zip(
axes,
['Thrust', 'C_param'],
[r'$\tau = 1-T$', r'$C$-parameter']
):
for order, color, ls in [('LO','grey','--'), ('NLO','steelblue','-.'), ('NNLO','firebrick','-')]:
x, sig, mcerr, vardn, varup = load_dist(f'results/hist/{obs}.{order}.dat')
mask = sig > 0
ax.fill_between(x[mask], vardn[mask], varup[mask],
alpha=0.2, color=color, label=f'{order} scale var.')
ax.plot(x[mask], sig[mask], color=color, ls=ls, label=order)
ax.set_xlabel(label, fontsize=13)
ax.set_ylabel(r'$\frac{1}{\sigma_0}\frac{d\sigma}{d\tau}$', fontsize=13)
ax.set_yscale('log')
ax.legend(fontsize=10)
ax.set_title(f'{label} — Z→qqbar at NNLO, √s = 91.2 GeV')
plt.tight_layout()
plt.savefig('thrust_Cparam_NNLO.pdf', bbox_inches='tight')
print("Saved: thrust_Cparam_NNLO.pdf")
To obtain YODA-format output (for use with Rivet plotting scripts):
./bin/eerad3hist makedist \
-f yoda \
-o results/hist_yoda/ \
results/combined/makedist.input
ls results/hist_yoda/
# LO.yoda
# NLO.yoda
# NNLO.yoda
Step 10 — Scale Up¶
Once the workflow is validated end-to-end, scaling up is straightforward:
increase shots and NSEEDS in the submission script. The table below
gives practical guidance based on typical outcomes:
| Goal | shots |
Seeds per channel | Expected relative error on σ |
|---|---|---|---|
| Qualitative shape check | 100K | 1–3 | ~5–10% |
| First quantitative result | 1M | 5–10 | ~1–2% |
| Publication quality | 2M–5M | 20–50 | <0.5% |
Useful monitoring commands during a campaign¶
# How many tasks are done vs still running?
squeue --user=your_username \
--format="%.20j %.8T" \
| awk 'NR>1 {counts[$2]++} END {for(s in counts) print s, counts[s]}'
# Which jobs completed successfully vs failed?
sacct \
--user=your_username \
--starttime=$(date -d '2 days ago' +%Y-%m-%d) \
--format=JobID,JobName%25,State,ExitCode,Elapsed,MaxRSS \
| grep eerad3
# Total CPU-hours consumed by the campaign so far
sacct \
--user=your_username \
--starttime=$(date -d '2 days ago' +%Y-%m-%d) \
--format=CPUTimeRAW \
| awk 'NR>2 && $1~/^[0-9]/ {sum += $1} END {printf "Total CPU-hours: %.1f\n", sum/3600}'
Handling partial failures¶
If some array tasks fail (non-zero exit code), identify and resubmit them:
# Find failed tasks for a specific array job
ARRAY_JOB_ID=491055
sacct --jobs=${ARRAY_JOB_ID} \
--format=JobID,State,ExitCode \
| awk '$2 != "COMPLETED" && NR > 2 {print $1, $2, $3}'
# The seed is the array task index — resubmit the specific failing seeds
# e.g. tasks 3, 11, 17 failed for channel RV:
sbatch \
--job-name="eerad3_RV_resub" \
--array="3,11,17" \
--time="12:00:00" \
--mem="4G" \
--export=CHANNEL=RV \
jobs/run_channel.sh
After resubmission completes, re-run the merge/combine/makedist pipeline.
eerad3hist merge will automatically incorporate the new seeds.
Directory Layout Reference¶
After following this guide, your eerad3/ directory should look like:
eerad3/
├── bin/
│ ├── eerad3 ← main executable
│ └── eerad3hist ← post-processing tool
├── examples/ ← upstream example run cards (read-only reference)
├── runcards/
│ ├── Zqq_3j_LO_test.card
│ └── production/
│ ├── Zqq_3j_LO.card
│ ├── Zqq_3j_V.card
│ ├── Zqq_3j_R.card
│ ├── Zqq_3j_VV.card
│ ├── Zqq_3j_RV.card
│ └── Zqq_3j_RR.card
├── jobs/
│ ├── build.sh
│ ├── test_LO.sh
│ ├── profile_all_channels.sh
│ └── run_channel.sh
├── logs/ ← all Slurm stdout/stderr files
├── results/
│ ├── Zqq.3j.0000.LO.0.T.dat ← raw per-seed histograms
│ ├── Zqq.3j.0001.LO.0.T.dat
│ ├── ...
│ ├── merged/ ← after eerad3hist merge
│ ├── combined/ ← after eerad3hist combine
│ │ └── makedist.input
│ └── hist/ ← after eerad3hist makedist — final distributions
│ ├── Thrust.LO.dat
│ ├── Thrust.NLO.dat
│ ├── Thrust.NNLO.dat
│ ├── C_param.LO.dat
│ └── ...
└── submit_campaign.sh
Troubleshooting¶
eerad3 exits immediately with no output¶
Check that y0 is not set tighter than 1d-8. Values smaller than that
can cause the program to reject all phase-space points immediately.
Also verify that cut is not so large that it excludes the entire event-shape
range; for thrust, cut = 1d-5 is a safe starting value.
The Vegas integral does not converge — large relative errors¶
This is most common for RV and RR. Increase shots significantly
(try 5M to 10M), and if the error is still large, increase warmup
iterations to give Vegas more time to adapt the grid. Check the log for
warnings about integrand-grid mismatches.
eerad3hist merge produces files with zero bins¶
This typically means the seed files were written to a non-default prefix
directory. Check your run card's prefix setting (default: results/) and
ensure merge is pointed at the correct directory.
The combined NNLO histogram has very large errors on some bins¶
This indicates that either the RV or RR channel is under-sampled in
those bins. Look at which observable bins have large errors and compare
the merged/ files for each channel separately. The channel with the
largest bin errors needs more seeds or more shots.
Scale-variation bands cross zero or become negative¶
This is a physical pathology near the Sudakov peak (τ → 0 and C → 0) and in the far-tail region (τ → 0.5). It reflects the breakdown of fixed-order perturbation theory in the infrared-sensitive region — it is not a bug in EERAD3 or in your setup. The resolution is resummation matched to the fixed-order result, which is beyond the scope of EERAD3 alone.