Case Study: Running EERAD3 on the Cluster¶

This guide walks through a complete, realistic workflow for running the EERAD3 parton-level Monte Carlo generator on the cluster, from first checkout to publication-quality distributions. The physics target is the thrust (τ = 1 − T) and C-parameter distributions in e⁺e⁻ → hadrons via γ*/Z → qq̄ at NNLO QCD — a textbook application for αs determinations and event-shape phenomenology.

Every step follows the cluster's golden rules: no compiling on the login node, no running on the login node, everything through Slurm.

Table of Contents¶

Overview and Strategy
Step 1 — Get the Code
Step 2 — Compile EERAD3 on a Compute Node
Step 3 — Understand the Run Card
Step 4 — Start Small: a Single LO Test Job
Step 5 — Inspect the Output
Step 6 — Profile Resource Usage
Step 7 — Design the Full NNLO Campaign
Step 8 — Submit the Full Campaign as Job Arrays
Step 9 — Post-Processing with eerad3hist
Step 10 — Scale Up
Directory Layout Reference
Troubleshooting

Overview and Strategy¶

Why does NNLO require six separate channels?¶

EERAD3 uses antenna subtraction to handle infrared singularities. At NNLO, the full cross section is assembled from six independent perturbative contributions — referred to as channels — which are infrared-singular individually but finite when combined:

Channel	Content	Order
`LO`	Born-level, three partons	α⁰s (relative)
`V`	One-loop virtual	NLO
`R`	Real emission, four partons	NLO
`VV`	Two-loop double-virtual	NNLO
`RV`	One-loop real-virtual	NNLO
`RR`	Double-real, five partons	NNLO

Each channel is a separate EERAD3 run. The eerad3hist tool combines them into physical distributions afterward.

The recommended workflow¶

Clone → Compile → Test (LO, few shots) → Profile → Full campaign (6 channels × N seeds)
   → merge (per channel) → combine → makedist → plots

Start small. Profile. Then scale. The Vegas integrator inside EERAD3 requires a warmup phase to learn the phase space. Running too few shots in warmup gives a poorly adapted grid; running too many wastes time. This guide teaches you how to calibrate both.

Step 1 — Get the Code¶

Do this on the login node — cloning a repository is exactly the kind of lightweight task that belongs there.

# Navigate to your home directory (or /project if your group has one)
cd 

# Clone the EERAD3 release repository
git clone https://gitlab.com/eerad-team/releases.git eerad3
cd eerad3

Take a moment to look at the top-level structure before doing anything else:

ls -lh

You will see:

bin/           ← executables are placed here after compilation
examples/      ← example run cards for all six available processes
src/
  core/        ← main program, phase space, antenna functions
  analyses/    ← default and custom analyses (Fortran 90 modules)
  Zqq/         ← matrix elements for γ*/Z → qq̄
  Hbb/         ← matrix elements for H → bb̄
  Hgg/         ← matrix elements for H → gg (HTL)
Makefile
README

The run cards in examples/ are the authoritative reference for each process. Read them before writing your own.

Step 2 — Compile EERAD3 on a Compute Node¶

Compilation requires gfortran ≥ 9.0. Do not run make on the login node. Submit a dedicated build job.

Create the file jobs/build.sh:

#!/bin/bash
# =============================================================
#  EERAD3 — Build job
#  Compiles the main executable and eerad3hist.
# =============================================================
#SBATCH --job-name=eerad3_build
#SBATCH --partition=general
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=4G
#SBATCH --time=00:20:00
#SBATCH --output=logs/build_%j.out
#SBATCH --error=logs/build_%j.err

module purge
module load gcc/13

EERAD3_DIR=~/eerad3
cd ${EERAD3_DIR}

echo "Compiling EERAD3 on $(hostname) at $(date)"
echo "gfortran version: $(gfortran --version | head -1)"

# -j uses all allocated CPUs for parallel compilation
make -j ${SLURM_CPUS_PER_TASK}

echo "Build finished at $(date)"
echo "Executables:"
ls -lh bin/

Submit it:

# Create the logs and jobs directories if they do not exist
mkdir -p logs jobs

# Submit and save the job ID
BUILD_ID=$(sbatch --parsable jobs/build.sh)
echo "Build job submitted: ${BUILD_ID}"

# Watch it in the queue
watch -n 5 squeue --job=${BUILD_ID}

Once it finishes, verify:

ls -lh eerad3/bin/
# You should see:
#   eerad3        ← the main Monte Carlo executable
#   eerad3hist    ← the Python post-processing tool

If the build fails, the first place to look is logs/build_JOBID.err. The most common cause is a missing or wrong version of gfortran — check the module is loaded in the script.

Step 3 — Understand the Run Card¶

A run card is a plain-text file that tells EERAD3 everything about a single run: which process, which perturbative channel, how many phase-space points, and what technical cuts to apply.

Here is the anatomy of a run card for our target: the LO contribution to thrust and C-parameter in Z → qq̄ → 3 jets.

! ─────────────────────────────────────────────
!  run_Zqq_3j_LO.card
!  Process: gamma*/Z -> qq-bar, 3-jet production
!  Channel: LO (Born-level, three partons)
! ─────────────────────────────────────────────

! Process settings
process = 1          ! 1 = Z->qqbar  |  21 = H->bbbar  |  22 = H->gg
njets   = 3          ! Number of hard jets in the final state
channel = LO         ! LO | V | R | VV | RV | RR

! Technical settings
y0 = 1d-6            ! IR cut-off on kinematic invariants.
                     ! Default is 1e-6; use 1e-8 for precision runs.

! Observable cuts
cut       = 1d-5     ! Minimum value of the event-shape observable.
                     ! Events below this are discarded.
sigma_obs = 0        ! Observable used to weight the integration.
                     ! 0 = use the cross section itself (standard choice).
moment    = 1        ! Power of the observable in the integrand weight.

! Vegas (phase-space integration) settings
warmup     = 5       ! Number of warmup iterations (grid adaptation)
production = 5       ! Number of production iterations
shots      = 100K    ! Phase-space points per iteration.
                     ! Use 100K for tests, 1M–10M for production.

💡 On shots, warmup, and production.

EERAD3 uses the Vegas adaptive Monte Carlo integrator. The warmup phase adapts the importance-sampling grid to the integrand. The production phase uses the frozen grid to accumulate statistics. A good starting point is:

Test runs: warmup=5, production=5, shots=100K
Production runs: warmup=5, production=5, shots=1M to 5M

For the numerically harder channels (RV, RR), more shots are needed for comparable statistical precision. Profile first (see Step 6).

Create a dedicated directory for run cards:

mkdir -p /home/your_username/eerad3/runcards

Step 4 — Start Small: a Single LO Test Job¶

Before running anything at scale, verify that the executable runs correctly and produces sensible output. Use the smallest possible settings.

Save the following as runcards/Zqq_3j_LO_test.card:

! Test run card — small statistics, LO only
process   = 1
njets     = 3
channel   = LO
y0        = 1d-6
cut       = 1d-5
sigma_obs = 0
moment    = 1
warmup    = 3
production = 3
shots     = 100K

Save the following as jobs/test_LO.sh:

#!/bin/bash
# =============================================================
#  EERAD3 — LO test job
#  Small statistics; use this to verify the build and setup.
# =============================================================
#SBATCH --job-name=eerad3_LO_test
#SBATCH --partition=general
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1      # EERAD3 is single-threaded per run
#SBATCH --mem=2G
#SBATCH --time=00:30:00
#SBATCH --output=logs/LO_test_%j.out
#SBATCH --error=logs/LO_test_%j.err

module purge
module load gcc/13

EERAD3_DIR=~/eerad3
cd ${EERAD3_DIR}

echo "─────────────────────────────────────────────"
echo " EERAD3 LO test"
echo " Host:    $(hostname)"
echo " Started: $(date)"
echo " Job ID:  ${SLURM_JOB_ID}"
echo "─────────────────────────────────────────────"

./bin/eerad3 \
    -i runcards/Zqq_3j_LO_test.card \
    -s 0

echo "─────────────────────────────────────────────"
echo " Finished: $(date)"
echo "─────────────────────────────────────────────"

Submit and monitor:

TEST_ID=$(sbatch --parsable jobs/test_LO.sh)
echo "Test job submitted: ${TEST_ID}"

# Monitor in real time
tail -f logs/LO_test_${TEST_ID}.out

A successful run prints Vegas iteration statistics — convergent estimates with decreasing uncertainties — and terminates cleanly. A failed run prints a Fortran runtime error or exits silently; in that case check the .err log.

Step 5 — Inspect the Output¶

By default EERAD3 writes histogram files to a results/ subdirectory.

ls results/

You will see files following the naming convention:

<process>.<njets>j.<seed>.<channel>.<icol>.<observable>.dat

For our test run, look for files like:

Zqq.3j.0000.LO.0.T1.dat        ← thrust τ = 1-T
Zqq.3j.0000.LO.0.C.dat        ← C-parameter
Zqq.3j.0000.LO.0.LogT.dat     ← dσ/d log(τ)
Zqq.3j.0000.LO.0.Log10C.dat   ← dσ/d log₁₀(C)

Each file contains one histogram in plain-text format:

<xlow>  <xhigh>  <weight>  <error>  <count>

Have a quick look at the thrust histogram:

# Print the first few bins of the thrust distribution
head -20 results/Zqq.3j.0000.LO.0.T1.dat

# Count how many bins are non-empty
awk '$5 > 0' results/Zqq.3j.0000.LO.0.T1.dat | wc -l

# Rough sanity check: the total integral should be close to
# the LO cross section (up to the cut)
awk 'NR>0 {sum += $3 * ($2-$1)} END {print "Integral:", sum}' \
    results/Zqq.3j.0000.LO.0.T1.dat

💡 What the histogram columns mean.

<weight> is the sum of event weights in the bin — not normalised to the bin width. The eerad3hist makedist command handles the normalisation, scale-variation assembly, and K-factor division for you. Do not attempt to plot these raw files directly as physical distributions.

Step 6 — Profile Resource Usage¶

Before launching a large campaign you need to know how long each channel actually takes and how much memory it uses. Submit one short job per channel with modest statistics and inspect sacct afterward.

Save as jobs/profile_all_channels.sh:

#!/bin/bash
# =============================================================
#  EERAD3 — Profile all six NNLO channels.
#  Runs once per channel with small statistics.
#  After completion, use sacct to read Elapsed and MaxRSS.
# =============================================================
#SBATCH --job-name=eerad3_profile
#SBATCH --partition=general
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=4G
#SBATCH --time=02:00:00
#SBATCH --output=logs/profile_%x_%j.out
#SBATCH --error=logs/profile_%x_%j.err

# Channel is passed at submission time via --export=CHANNEL=...
# e.g. sbatch --export=CHANNEL=RR jobs/profile_all_channels.sh

module purge
module load gcc/13

EERAD3_DIR=~/eerad3
cd ${EERAD3_DIR}

# Write a temporary run card for this channel
TMPCARD=$(mktemp runcards/profile_XXXXXX.card)
cat > ${TMPCARD} << EOF
process   = 1
njets     = 3
channel   = ${CHANNEL}
y0        = 1d-6
cut       = 1d-5
sigma_obs = 0
moment    = 1
warmup    = 3
production = 3
shots     = 500K
EOF

echo "Profiling channel: ${CHANNEL}"
echo "Run card: ${TMPCARD}"
echo "Started: $(date)"

./bin/eerad3 -i ${TMPCARD} -s 42

echo "Finished: $(date)"
rm ${TMPCARD}

Submit one job per channel:

for CHAN in LO V R VV RV RR; do
    JID=$(sbatch --parsable \
                 --job-name="profile_${CHAN}" \
                 --export=CHANNEL=${CHAN} \
                 jobs/profile_all_channels.sh)
    echo "Submitted ${CHAN}: job ${JID}"
done

Once all six jobs complete, read the resource usage:

sacct \
    --user=your_username \
    --starttime=$(date -d '3 hours ago' +%Y-%m-%dT%H:%M) \
    --format=JobID,JobName%20,Elapsed,CPUTime,MaxRSS,State \
    | grep profile

Timing analysis:¶

The profiling used warmup=3, production=3, shots=500k -- so 6 iterations x 500k = 3M total phase space points per job.

Channel	Wall time (500k shots)	Relative runtime	Notes
`LO`	00:00:11	~1×	Fast — Born-level only
`V`	00:00:11	~1×	One-loop special functions (HPLs)
`R`	00:08:02	~44×	Four-parton phase space
`VV`	00:03:14	~18×	Two-loop amplitudes
`RV`	00:55:33	~303×	Mixed real-virtual, largest integrand variance
`RR`	1-15:03:25	~8600×	Five-parton phase space, dominant at high multiplicity

Use these ratios to set shots and request appropriate wall times for production runs. RV and RR generally need 3–5× more shots than LO to reach comparable relative statistical precision.

Step 7 — Design the Full NNLO Campaign¶

Physics reminder: njets=3 is the complete NNLO prediction for thrust and C-parameter¶

Thrust and C-parameter are event shape observale defined over the full hadronic final state regardless of jet multiplicity. In EERAD3's framework, njets=3 does not mean "events with exactly 3 jets" - it specifies the Born multiplicity from which the perturbative exapansion is built. The six channels LO, V, R, VV, RV, RR with njest=3 together give the complete NNLO prediction for these observales, already including 3-, 4- and 5-parton final states through the antenna subtraction framework.

The njest=4,5 settings in EERAD3 compute entirely different observables - jet rates and jet-resolution scales (R_4,R_s,y_{34},y_{45}) - and are not summed with the njets=3 results.

Key principle: parallelism¶

The profiling data reveals that the six channels span four orders of magnitude in cost. A one-size-fits-all shot count is therefore wrong. The correct approach is:

For cheap channels (LO, V, VV, R, RV): use more shots per seed, fewer seeds.
For RR: use fewer shots per seed so each job fits within the cluster's wall-time ceiling, and compensate with many seeds.

The scaling is:

T_production ≈ T_profile × (10/6) × (shots / 500K)

where the factor 10/6 accounts for moving from warmup=3, production=3 (profiling) to warmup=5, production=5 (production).

Channel	Profile time (500K shots)	shots/job	Est. wall time/job	Seeds	Est. total CPU
`LO`	00:00:11	1M	~30 sec	5	~5 CPU-min
`V`	00:00:11	1M	~30 sec	5	~5 CPU-min
`VV`	00:03:14	1M	~10 min	10	~2 CPU-h
`R`	00:08:02	1M	~27 min	10	~5 CPU-h
`RV`	00:55:33	1M	~3.1 h	20	~62 CPU-h
`RR`	1-15:03:25	200K	~25 h	80	~2000 CPU-h

RR will dominate your campaign budget by two orders of magnitude. This is expected: the five-parton double-real phase space is the most complex integrand in the calculation.

Production run cards¶

mkdir -p runcards/production

runcards/production/Zqq_3j_LO.card:

process   = 1
njets     = 3
channel   = LO
y0        = 1d-8
cut       = 1d-5
sigma_obs = 0
moment    = 1
warmup    = 5
production = 5
shots     = 1M

runcards/production/Zqq_3j_V.card:

process   = 1
njets     = 3
channel   = V
y0        = 1d-8
cut       = 1d-5
sigma_obs = 0
moment    = 1
warmup    = 5
production = 5
shots     = 1M

runcards/production/Zqq_3j_R.card:

process   = 1
njets     = 3
channel   = R
y0        = 1d-8
cut       = 1d-5
sigma_obs = 0
moment    = 1
warmup    = 5
production = 5
shots     = 1M

runcards/production/Zqq_3j_VV.card:

process   = 1
njets     = 3
channel   = VV
y0        = 1d-8
cut       = 1d-5
sigma_obs = 0
moment    = 1
warmup    = 5
production = 5
shots     = 1M

runcards/production/Zqq_3j_RV.card:

process   = 1
njets     = 3
channel   = RV
y0        = 1d-8
cut       = 1d-5
sigma_obs = 0
moment    = 1
warmup    = 5
production = 5
shots     = 1M

runcards/production/Zqq_3j_RR.card:

process   = 1
njets     = 3
channel   = RR
y0        = 1d-8
cut       = 1d-5
sigma_obs = 0
moment    = 1
warmup    = 5
production = 5
shots     = 200K       ! Deliberately small: ~25h per job.
                       ! Use 80+ seeds to accumulate statistics.

Step 8 — Submit the Full Campaign as Job Arrays¶

The job script is the same for all channels; the per-channel differences (shots, wall time, number of seeds) are encoded entirely in the submission script.

jobs/run_channel.sh is unchanged from before. The critical changes are in submit_campaign.sh:

#!/bin/bash
# =============================================================
#  Submit the full NNLO campaign — calibrated to actual timings.
#  Run this on the LOGIN NODE: bash submit_campaign.sh
#
#  Profiling results (500K shots, 3+3 iterations):
#    LO  00:00:11   V   00:00:11   VV  00:03:14
#    R   00:08:02   RV  00:55:33   RR  1-15:03:25
# =============================================================

EERAD3_DIR=/home/your_username/eerad3
cd ${EERAD3_DIR}
mkdir -p logs

# ── Shots per job ────────────────────────────────────────────
# RR uses 200K (not 1M) to stay under the 48h wall-time limit.
# All other channels use 1M comfortably.
declare -A SHOTS
SHOTS[LO]="1M"
SHOTS[V]="1M"
SHOTS[R]="1M"
SHOTS[VV]="1M"
SHOTS[RV]="1M"
SHOTS[RR]="200K"      # ~25h/job at 200K; do NOT increase without re-profiling

# ── Number of seeds ──────────────────────────────────────────
# RR needs many seeds to compensate for low per-job shot count.
# Total RR stats: 80 seeds × 200K shots × 10 iterations = 160M points.
declare -A NSEEDS
NSEEDS[LO]=5
NSEEDS[V]=5
NSEEDS[VV]=10
NSEEDS[R]=10
NSEEDS[RV]=20
NSEEDS[RR]=80

# ── Wall-time limits ─────────────────────────────────────────
# Based on: T ≈ T_profile × (10/6) × (shots / 500K) + 20% margin.
#
#   LO/V:  30 sec → round up to 00:10:00 (minimum sensible request)
#   VV:    ~10 min → 00:30:00
#   R:     ~27 min → 01:00:00
#   RV:    ~3.1 h  → 04:00:00
#   RR:    ~25 h (200K shots) → 30:00:00
declare -A TIMELIM
TIMELIM[LO]="00:10:00"
TIMELIM[V]="00:10:00"
TIMELIM[VV]="00:30:00"
TIMELIM[R]="01:00:00"
TIMELIM[RV]="04:00:00"
TIMELIM[RR]="30:00:00"    # 25h estimated + 20% safety margin

# ── Memory ───────────────────────────────────────────────────
# MaxRSS was not reported by sacct for this profile run.
# Conservative safe defaults based on typical EERAD3 usage.
declare -A MEMORY
MEMORY[LO]="1G"
MEMORY[V]="1G"
MEMORY[VV]="2G"
MEMORY[R]="2G"
MEMORY[RV]="2G"
MEMORY[RR]="2G"

echo "═══════════════════════════════════════════════════════"
echo " EERAD3 NNLO campaign: Z->qqbar 3-jet"
echo " $(date)"
echo "═══════════════════════════════════════════════════════"
printf "  %-6s  %6s  %5s  %10s  %s\n" "Chan" "Seeds" "Shots" "Walltime" "Est. total CPU"

# Rough CPU estimate: seeds × wall_time (in hours)
declare -A HOURS_EST
HOURS_EST[LO]="0.01"
HOURS_EST[V]="0.01"
HOURS_EST[VV]="0.16"
HOURS_EST[R]="0.45"
HOURS_EST[RV]="3.1"
HOURS_EST[RR]="25"

for CHAN in LO V VV R RV RR; do
    N=${NSEEDS[$CHAN]}
    H=${HOURS_EST[$CHAN]}
    TOTAL=$(echo "$N * $H" | bc)
    printf "  %-6s  %6s  %5s  %10s  ~%.0f CPU-h\n" \
        "${CHAN}" "${N}" "${SHOTS[$CHAN]}" "${TIMELIM[$CHAN]}" "${TOTAL}"
done
echo "───────────────────────────────────────────────────────"
TOTAL_RR=$(echo "${NSEEDS[RR]} * ${HOURS_EST[RR]}" | bc)
echo "  RR alone: ~${TOTAL_RR} CPU-h  ($(echo "$TOTAL_RR / 24" | bc) node-days)"
echo "═══════════════════════════════════════════════════════"

read -p "Proceed with submission? [y/N] " CONFIRM
[[ "${CONFIRM}" =~ ^[Yy]$ ]] || { echo "Aborted."; exit 0; }

echo ""
for CHAN in LO V VV R RV RR; do
    N=${NSEEDS[$CHAN]}
    LAST=$(( N - 1 ))

    JID=$(sbatch --parsable \
                 --job-name="eerad3_${CHAN}" \
                 --array="0-${LAST}%10" \
                 --time="${TIMELIM[$CHAN]}" \
                 --mem="${MEMORY[$CHAN]}" \
                 --export=CHANNEL=${CHAN} \
                 jobs/run_channel.sh)

    echo "  [${CHAN}] submitted: job array ${JID}  (seeds 0–${LAST})"
done

echo ""
echo "Campaign submitted. Monitor with:"
echo "  squeue --user=$(whoami) --format='%.10i %.20j %.8T %.10M %R'"

🚫 Do not increase RR shots without re-profiling first.

The RR channel took 1 day 15 hours for just 500K shots at profiling settings. At production settings (10 iterations vs 6), 500K shots would take ~39 hours — dangerously close to a 48h wall-time ceiling with no margin for slowdowns, node variability, or queue delays. The 200K/job setting was chosen to give a comfortable ~25h estimate with a 30h limit. If you wish to run 500K shots per RR job, re-profile with warmup=5, production=5, shots=500K first and verify the actual elapsed time on your partition.

Practical note on total campaign cost¶

With this configuration, the RR channel alone accounts for approximately 2000 CPU-hours (80 seeds × 25 hours each). The remaining five channels together total roughly 70 CPU-hours. Plan accordingly: if your group has a CPU-hour quota, check your remaining allocation with the support team before launching the full RR array.

# Check your current allocation usage (syntax varies by site)
sacct \
    --user=your_username \
    --starttime=$(date -d '30 days ago' +%Y-%m-%d) \
    --format=JobID,CPUTimeRAW \
    | awk 'NR>2 && $1~/^[0-9]/ {sum+=$2} END {printf "CPU-hours used (30d): %.0f\n", sum/3600}'

Monitoring the RR array specifically¶

Because RR jobs are long-running, it is worth checking on them periodically:

# How many RR tasks are done / running / pending?
squeue --user=your_username \
       --name=eerad3_RR \
       --format="%.8T" \
       | sort | uniq -c

# Estimated completion: running tasks × remaining fraction of wall time
# (use squeue -o "%i %L" to see remaining time per job)
squeue --user=your_username \
       --name=eerad3_RR \
       --format="%.10i %.8T %.10L"

Step 9 — Post-Processing with `eerad3hist`¶

Once all jobs have completed, you have a results/ directory containing many histogram files — one per observable per seed per channel. The eerad3hist tool processes these in three stages.

Check that all expected files are present¶

Before post-processing, verify nothing is missing:

cd /home/your_username/eerad3

# Count files per channel
for CHAN in LO V R VV RV RR; do
    COUNT=$(ls results/Zqq.3j.*.${CHAN}.0.T.dat 2>/dev/null | wc -l)
    echo "${CHAN}: ${COUNT} thrust histogram files"
done

If any channel has fewer files than expected, some jobs failed. Check their .err logs and resubmit the missing seeds individually:

# Resubmit a single missing seed (e.g. seed 7 of channel RV)
sbatch \
    --job-name="eerad3_RV_resub" \
    --array="7-7" \
    --time="12:00:00" \
    --mem="4G" \
    --export=CHANNEL=RV \
    jobs/run_channel.sh

Stage 1 — Merge histograms from independent seeds¶

The merge command combines all seeds for a given channel into a single set of histograms, accumulating uncertainties as bin-wise weighted means.

# Merge all channels
# The -t flag sets the tag that replaces the seed field in the output filename
for CHAN in LO V R VV RV RR; do
    echo "Merging ${CHAN}..."
    ./bin/eerad3hist merge \
        -o results/merged/ \
        -t merged \
        results/
    echo "  Done."
done

After merging, inspect the merged directory:

ls results/merged/ | grep "\.T\.dat"
# Zqq.3j.merged.LO.0.T.dat
# Zqq.3j.merged.V.0.T.dat
# Zqq.3j.merged.R.0.T.dat
# Zqq.3j.merged.VV.0.T.dat
# Zqq.3j.merged.RV.0.T.dat
# Zqq.3j.merged.RR.0.T.dat

💡 The merge is a weighted mean, not a simple sum.

Each seed's histogram is merged using bin-wise inverse-variance weighting. This means seeds with better (lower-error) estimates contribute more to the merged result. Seeds that ran fewer points or had higher integrand variance automatically contribute less. You do not need to worry about unequal seed statistics.

Stage 2 — Combine channels into perturbative coefficients¶

The combine command assembles the individual channels into the full LO, NLO, and NNLO perturbative coefficients A, B, C (as in Eq. 6 of the paper):

LO → coefficient A
V + R → coefficient B (the NLO correction)

VV + RV + RR → coefficient C (the NNLO correction)

./bin/eerad3hist combine \
    -o results/combined/ \
    results/merged/

echo "Combined files:"
ls results/combined/ | grep "\.T\.dat"
# Zqq.3j.merged.LO.0.T.dat
# Zqq.3j.merged.NLO.0.T.dat
# Zqq.3j.merged.NNLO.0.T.dat

The combine step also auto-generates a template makedist.input file in results/combined/. This is your starting point for Stage 3.

Stage 3 — Make physical distributions with `makedist`¶

This is where the physics happens: makedist reads the perturbative coefficients, applies the K-factor normalisation, runs the strong coupling at scale μ, and assembles the full distribution with scale-variation bands.

First, inspect and edit the auto-generated makedist.input:

cat results/combined/makedist.input

Edit it to match the physical setup — centre-of-mass energy at the Z pole, Standard Model parameters in the Gμ scheme:

! makedist.input — Z->qqbar at sqrt(s) = MZ
process  = 1
sqrts    = 91.2
njets    = 3

! SM parameters (Gmu scheme)
GF       = 1.16638e-5
aS[MZ]   = 0.118
MASS[W]  = 80.379
MASS[Z]  = 91.1876

! ── Histograms ──────────────────────────────────────────────
! Format: <output_name> <LO_file> [<NLO_file>] [<NNLO_file>]
HISTOGRAMS
  Thrust        Zqq.3j.merged.LO.0.T.dat    Zqq.3j.merged.NLO.0.T.dat    Zqq.3j.merged.NNLO.0.T.dat
  LogThrust     Zqq.3j.merged.LO.0.LogT.dat Zqq.3j.merged.NLO.0.LogT.dat Zqq.3j.merged.NNLO.0.LogT.dat
  C_param       Zqq.3j.merged.LO.0.C.dat    Zqq.3j.merged.NLO.0.C.dat    Zqq.3j.merged.NNLO.0.C.dat
  LogC_param    Zqq.3j.merged.LO.0.LogC.dat Zqq.3j.merged.NLO.0.LogC.dat Zqq.3j.merged.NNLO.0.LogC.dat
END HISTOGRAMS

Run makedist:

./bin/eerad3hist makedist \
    -o results/hist/ \
    results/combined/makedist.input

echo "Physical distributions:"
ls results/hist/
# Thrust.LO.dat
# Thrust.NLO.dat
# Thrust.NNLO.dat
# LogThrust.LO.dat
# LogThrust.NLO.dat
# LogThrust.NNLO.dat
# C_param.LO.dat
# C_param.NLO.dat
# C_param.NNLO.dat
# ...

Reading the final output files¶

Each .dat file in results/hist/ contains a physical distribution in the format:

<xlow>  <xhigh>  <sig>  <mcerror>  <vardown>  <varup>

sig — the cross section (divided by bin width): this is your central value
mcerror — statistical Monte Carlo uncertainty
vardown, varup — scale variation envelope (μ varied by a factor of 2)

A quick sanity-check plot with Python (run in an salloc session):

# quick_plot.py — run in an salloc session, not on the login node
import numpy as np
import matplotlib.pyplot as plt

def load_dist(fname):
    data = np.loadtxt(fname)
    xlow, xhigh = data[:,0], data[:,1]
    xmid = 0.5*(xlow + xhigh)
    sig, mcerr, vardn, varup = data[:,2], data[:,3], data[:,4], data[:,5]
    return xmid, sig, mcerr, vardn, varup

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

for ax, obs, label in zip(
    axes,
    ['Thrust', 'C_param'],
    [r'$\tau = 1-T$', r'$C$-parameter']
):
    for order, color, ls in [('LO','grey','--'), ('NLO','steelblue','-.'), ('NNLO','firebrick','-')]:
        x, sig, mcerr, vardn, varup = load_dist(f'results/hist/{obs}.{order}.dat')
        mask = sig > 0
        ax.fill_between(x[mask], vardn[mask], varup[mask],
                        alpha=0.2, color=color, label=f'{order} scale var.')
        ax.plot(x[mask], sig[mask], color=color, ls=ls, label=order)
    ax.set_xlabel(label, fontsize=13)
    ax.set_ylabel(r'$\frac{1}{\sigma_0}\frac{d\sigma}{d\tau}$', fontsize=13)
    ax.set_yscale('log')
    ax.legend(fontsize=10)
    ax.set_title(f'{label} — Z→qqbar at NNLO, √s = 91.2 GeV')

plt.tight_layout()
plt.savefig('thrust_Cparam_NNLO.pdf', bbox_inches='tight')
print("Saved: thrust_Cparam_NNLO.pdf")

To obtain YODA-format output (for use with Rivet plotting scripts):

./bin/eerad3hist makedist \
    -f yoda \
    -o results/hist_yoda/ \
    results/combined/makedist.input

ls results/hist_yoda/
# LO.yoda
# NLO.yoda
# NNLO.yoda

Step 10 — Scale Up¶

Once the workflow is validated end-to-end, scaling up is straightforward: increase shots and NSEEDS in the submission script. The table below gives practical guidance based on typical outcomes:

Goal	`shots`	Seeds per channel	Expected relative error on σ
Qualitative shape check	100K	1–3	~5–10%
First quantitative result	1M	5–10	~1–2%
Publication quality	2M–5M	20–50	<0.5%

Useful monitoring commands during a campaign¶

# How many tasks are done vs still running?
squeue --user=your_username \
       --format="%.20j %.8T" \
       | awk 'NR>1 {counts[$2]++} END {for(s in counts) print s, counts[s]}'

# Which jobs completed successfully vs failed?
sacct \
    --user=your_username \
    --starttime=$(date -d '2 days ago' +%Y-%m-%d) \
    --format=JobID,JobName%25,State,ExitCode,Elapsed,MaxRSS \
    | grep eerad3

# Total CPU-hours consumed by the campaign so far
sacct \
    --user=your_username \
    --starttime=$(date -d '2 days ago' +%Y-%m-%d) \
    --format=CPUTimeRAW \
    | awk 'NR>2 && $1~/^[0-9]/ {sum += $1} END {printf "Total CPU-hours: %.1f\n", sum/3600}'

Handling partial failures¶

If some array tasks fail (non-zero exit code), identify and resubmit them:

# Find failed tasks for a specific array job
ARRAY_JOB_ID=491055
sacct --jobs=${ARRAY_JOB_ID} \
      --format=JobID,State,ExitCode \
      | awk '$2 != "COMPLETED" && NR > 2 {print $1, $2, $3}'

# The seed is the array task index — resubmit the specific failing seeds
# e.g. tasks 3, 11, 17 failed for channel RV:
sbatch \
    --job-name="eerad3_RV_resub" \
    --array="3,11,17" \
    --time="12:00:00" \
    --mem="4G" \
    --export=CHANNEL=RV \
    jobs/run_channel.sh

After resubmission completes, re-run the merge/combine/makedist pipeline. eerad3hist merge will automatically incorporate the new seeds.

Directory Layout Reference¶

After following this guide, your eerad3/ directory should look like:

eerad3/
├── bin/
│   ├── eerad3              ← main executable
│   └── eerad3hist          ← post-processing tool
├── examples/               ← upstream example run cards (read-only reference)
├── runcards/
│   ├── Zqq_3j_LO_test.card
│   └── production/
│       ├── Zqq_3j_LO.card
│       ├── Zqq_3j_V.card
│       ├── Zqq_3j_R.card
│       ├── Zqq_3j_VV.card
│       ├── Zqq_3j_RV.card
│       └── Zqq_3j_RR.card
├── jobs/
│   ├── build.sh
│   ├── test_LO.sh
│   ├── profile_all_channels.sh
│   └── run_channel.sh
├── logs/                   ← all Slurm stdout/stderr files
├── results/
│   ├── Zqq.3j.0000.LO.0.T.dat    ← raw per-seed histograms
│   ├── Zqq.3j.0001.LO.0.T.dat
│   ├── ...
│   ├── merged/             ← after eerad3hist merge
│   ├── combined/           ← after eerad3hist combine
│   │   └── makedist.input
│   └── hist/               ← after eerad3hist makedist — final distributions
│       ├── Thrust.LO.dat
│       ├── Thrust.NLO.dat
│       ├── Thrust.NNLO.dat
│       ├── C_param.LO.dat
│       └── ...
└── submit_campaign.sh

Troubleshooting¶

`eerad3` exits immediately with no output¶

Check that y0 is not set tighter than 1d-8. Values smaller than that can cause the program to reject all phase-space points immediately. Also verify that cut is not so large that it excludes the entire event-shape range; for thrust, cut = 1d-5 is a safe starting value.

The Vegas integral does not converge — large relative errors¶

This is most common for RV and RR. Increase shots significantly (try 5M to 10M), and if the error is still large, increase warmup iterations to give Vegas more time to adapt the grid. Check the log for warnings about integrand-grid mismatches.

`eerad3hist merge` produces files with zero bins¶

This typically means the seed files were written to a non-default prefix directory. Check your run card's prefix setting (default: results/) and ensure merge is pointed at the correct directory.

The combined `NNLO` histogram has very large errors on some bins¶

This indicates that either the RV or RR channel is under-sampled in those bins. Look at which observable bins have large errors and compare the merged/ files for each channel separately. The channel with the largest bin errors needs more seeds or more shots.

Scale-variation bands cross zero or become negative¶

This is a physical pathology near the Sudakov peak (τ → 0 and C → 0) and in the far-tail region (τ → 0.5). It reflects the breakdown of fixed-order perturbation theory in the infrared-sensitive region — it is not a bug in EERAD3 or in your setup. The resolution is resummation matched to the fixed-order result, which is beyond the scope of EERAD3 alone.