BjetTLA.jl

Internal

The names of input variables used in BDT training, these are a subset of what make_bdt_input_table can produce.

julia

SQRT_S = Ref(13600)

The center of mass energy of the LHC in GeV. For Run 3 it's 13.6 TeV.

source

BjetTLA.CombinedPDF Type

julia

struct CombinedPDF{F, D}

using Makie: Combined func::F dG:😄 end

A callable type that takes a background function and parameter for a gaussian signal.

When it's called, it has the signature of (x, p), where x is the x-axis value and p is a vector of parameters. Note that x should be normalized by SQRT_S (default 13600 GeV) already. As for the p, the first npara elements are for the background function, and the last one is for the signal normalization.

Example

julia

julia> cc = CombinedPDF(ATLAS_f_4p, 192, 192*0.12)
CombinedPDF{typeof(ATLAS_f_4p)}(BjetTLA.ATLAS_f_4p, 192.0, 23.04)

# evaluated at 200 GeV
julia> cc(200 / 13600, rand(5))
0.23111286768842174

source

BjetTLA.HistBundle Type

julia

@kwdef struct HistBundle{N,HT}
    histograms::NTuple{N,HT}
    cutflow_names::NTuple{N,String}
    cutflow_hist::Hist1D{Float64} = Hist1D(; binedges=0:length(cutflow_names))
end
HistBundle(f::Function, cutflow_names)

A mergeable structure that keeps track of histograms and a single cutflow histogram. If you have N histogram variables and M cutflow bins, the structure will carry around N*M + 1 histograms.

If f::Function is provided, it will be called N times to initialize the histograms. Where N is length of cutflow_names.

source

Base.merge Method

julia

merge(hb1::HistBundle, hb2::HistBundle)

source

BjetTLA.ATLAS_f_4p Method

julia

@. ATLAS_f_4p(x, p) = p[1]*(1-x)^p[2] * x^(p[3] + p[4]*log(x))

p_{1} (1 - x)^{p_{2}} x^{p_{3} + p_{4} \log (x)}

source

BjetTLA.ATLAS_f_5p Method

julia

@. ATLAS_f_5p(x, p) = p[1]*(1-x)^p[2] * x^(p[3] + p[4]*log(x) + p[5]*log(x)^2)

p_{1} (1 - x)^{p_{2}} x^{p_{3} + p_{4} \log (x) + p_{5} \log^{2} (x)}

source

BjetTLA.ATLAS_f_6p Method

julia

@. ATLAS_f_6p(x, p) = p[1]*(1-x)^p[2] * x^(p[3] + p[4]*log(x) + p[5]*log(x)^2 + p[6]*log(x)^3)

p_{1} (1 - x)^{p_{2}} x^{p_{3} + p_{4} \log (x) + p_{5} \log^{2} (x) + p_{6} \log^{3} (x)}

source

BjetTLA.EMFrac_study Method

julia

EMFrac_study(md, df; photon_ID=true, electron_only=true, check_charge_sum=true)

A task to be used with loop_one_dir and friends for calculating the EMFraction cut scale factor and uncertainty.

Returns:

julia

(;
    mlly_mll = Hist2D(; binedges = (50:2:150, 15:2:100)),
    EMFrac_grea35 = Hist1D(; binedges = 0:0.02:1.1),
    EMFrac_2535 = Hist1D(; binedges = 0:0.02:1.1),
    EMFrac_2025 = Hist1D(; binedges = 0:0.02:1.1),
    pT_pass = Hist1D(; binedges = emfrac_pt_binedges, overflow=true),
    pT_fail = Hist1D(; binedges = emfrac_pt_binedges, overflow=true),
    EMFrac_pT = Hist2D(; binedges = (0:0.02:1.1, 15:2:150), overflow=false),
    pT = Hist1D(; binedges = (0:2:100), overflow=true),
)

Examples

julia

julia> df_ee_bkgs_path = [
    "/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700786.root",
    "/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700787.root",
    "/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700788.root",
    ];

julia> hist_mc_eejets = mapreduce(+, df_ee_bkgs_path) do path
    BjetTLA.loop_one_file(path, EMFrac_study; tree_name="TreeAlgo_noCalib/nominal").EMFrac_grea35
end;

julia> hists_data = BjetTLA.loop_one_file("/data/jiling/TLA/eos_ntuples/RadiativeZ/data_egam3_noVGammaOR/alldata.99999999.run3_merged.root", EMFrac_study);

julia> sf35 = BjetTLA.hist_cut_eff(hists_data.EMFrac_grea35) / BjetTLA.hist_cut_eff(hists_mc.EMFrac_grea35);
1.043 ± 0.015

source

BjetTLA.ExampleTask Method

julia

ExampleTask(md,evt)

This is an example task, that can be run in loop_one_proc to process the data. It is a simple task that will fill a histogram with the pT of the leading jet in the event, if there are at least 2 good jets and 1 good photon.

source

BjetTLA.F_test Method

julia

F_test(chi2_nom, chi2_alt, n_nom, n_alt, nbins) -> Float64

Calculate the F-test score for two models.

chi2_nom: chi² of the nominal model
chi2_alt: chi² of the alternative model
n_nom: number of parameters of the nominal model
n_alt: number of parameters of the alternative model
nbins: number of bins in the histogram

Examples

julia

julia> BjetTLA.F_test(42, 40, 5, 6, 200)
0.0026311523165153876


::: tip Note

```python
>>> import ROOT
>>> ROOT.Math.fdistribution_cdf_c(0.2, 1, 20)
0.6595265132032726

is equivalents to using Distributions; ccdf(FDist(1, 20), 0.2) in Julia

:::

source

BjetTLA.F_test_on_sr Method

julia

F_test_on_sr(sr_model, nbins) -> Vector{Float64}

Assuming report(sr_model).losses are χ² values of the models, calculate the F-test score for each formula with respect to the next complex one.

The order of the output corresponds to the order of the models in the sr_model, you can use findfirst(>(0.05), output) to find the first formula you should retain.

Note

Due to the nature of Symbolic Regression, instead of counting # of parameters, we use the "complexity" of the formula as proxy. This avoids the problem of "same # of parameters" which causes F-test to fail, for example, consider: x^(b) and x^(b*x), technically they both have just 1 parameter, but clearly the latter is more complex.

See also: F_test.

Examples

julia

julia> sr_model = BjetTLA.fit_sr_model(hist_data_bdt);

julia> BjetTLA.F_test_on_sr(sr_model, nbins(hist_data_bdt))
9-element Vector{Float64}:
1.5007930525231936e-30
 1.994665149876051e-7
 0.006603081588013456
 0.9849216334868175
 0.9968279815401603
 0.9998969027366611
 0.9999999919476441
 0.9999991299279695
 0.9999666474343953

source

BjetTLA.GN1_Eff_Task Method

julia

GN1_Eff_Task(md,evt)

source

BjetTLA.NLL_sr_loss Method

julia

NLL_sr_loss(ex, dataset::Dataset{T,L}, options)

Negative loglikelihood loss function for SRRegressor.

source

BjetTLA.add_ATLAS_internal! Method

julia

add_ATLAS_internal!(ax; pos=(0, 1), revision="Internal", lumi=run3_partial_lumi, offset=(30, -20))

Add the ATLAS label and additional information to a given axis.

Arguments

ax: The Makie axis to which the label will be added.
pos: A tuple specifying the position of the label in relative coordinates. Defaults to (0, 1) (top left).
revision: A string indicating the revision type (e.g., "Internal", "Preliminary"). Defaults to "Internal".
lumi: The integrated luminosity to display in the label. Defaults to run3_partial_lumi.
offset: A tuple specifying the offset of the label in pixels. Defaults to (30, -20).

Example

julia

fig, ax = Figure(), Axis()
add_ATLAS_internal!(ax; pos=(0.1, 0.9), revision="Preliminary", lumi=139e3)

source

BjetTLA.add_mlj_score_col! Method

julia

add_mlj_score_col!(df, mlj_machines; input_names=BDT_input_names, force=true, branch_name=:mlj_scores)

Take a DataFrame df and a vector of MLJ machines, then add a column branch_name to df with the predictions of the MLJ machines.

source

BjetTLA.best_Db_fc Method

julia

best_Db_fc(hist_sig, hist_data)::(;Db, fc)

Find the best Db and fc values for a given signal histogram and data histogram. The histograms should be output of fc_Db_cumulative_hist.

source

BjetTLA.bkg_prediction Method

julia

bkg_prediction(hist_data; fit_func = ATLAS_f_4p, sqrt_s) -> Vector{Float64}

Predict the background using the 4-parameter fit function (see ATLAS_f_4p. The function returns the background prediction normalized to the integral of the input histogram hist_data. The sqrt_s parameter is used to normalize the x-axis of the input histogram. (mjj/sqrt_s)

source

BjetTLA.calc_db Method

julia

calc_db(b::Number, c::Number, u::Number; fc=0.018) -> Float64
calc_db(b::Vector, c::Vector, u::Vector; fc=0.018) -> Vector{Float64}

Compute the b-tagging discriminants, see https://cds.cern.ch/record/2862499 for more detail

The output is: $D_{b} = \log \frac{b}{c f_{c} + (1 - f_{c}) u}$

Note

The log operation usese safeLogRatio to prevent producing Inf and NaN.

source

BjetTLA.chi2 Method

julia

chi2(os, cs, σs) -> Float64

Calculate chi2 given three arrays of data points, model points and errors.

χ^{2} = \sum_{i} \frac{(o_{i} - c_{i})^{2}}{σ_{i}^{2}}

os: observed data points
cs: calculated model points
σs: errors of the observed data points

source

BjetTLA.chi2_sr_functor Method

julia

chi2_sr_functor(errs)::Callable(ex, dataset, options)

Returns a callable to be used as a loss function for SRRegressor.

source

BjetTLA.closest_lly_pair Function

julia

closest_lly_pair(leps, photons, target=91.2) -> (i1, i2, ip)

Find the indicies that give lepton pair + photon three-body invariant mass closest to the target mass.

Return the indices of the lepton pair (i1, i2) and the photon (ip).

source

BjetTLA.df_from_label Method

julia

df_from_label(label::AbstractString; base_path="/data/jiling/TLA/julia_arrows/")
df_from_label(labels::AbstractVector{<:AbstractString}; base_path="/data/jiling/TLA/julia_arrows/")

Load feather files from a label, which is a substring of the file name. See also dump_bdt_arrow, which is used to produce these feather files.

source

BjetTLA.dump_bdt_arrow Method

julia

dump_bdt_arrow(proc_name; ntuple_base_path = "/data/jiling/TLA/eos_ntuples/", output_base_path = "/data/jiling/TLA/julia_arrows/", min_mjj=100)

Read ntuples from ntuple_base_path and write out feather files to output_base_path for each file that matches proc_name. It runs the files through main_task with fill_data=true and then uses make_bdt_input_table to clean up and prepare the table before writing out the arrow file.

source

BjetTLA.dump_stacked_plots Method

julia

dump_stacked_plots(dict_proc; output_dir, mc_procs = MC_proc_labels, lumi=run3_partial_lumi, fig_kw=(; size=(1000, 600))

)

Dump cutflow plots (plot_cutflow) and stacked histogram plots (plot_stack_hist) to target directory.

Examples

julia

hists_serial_path = "/data/jiling/TLA/julia_jlds/2025_05_03.jlserialize"
dict_proc_histbundles = deserialize(hists_serial_path);
BjetTLA.dump_stacked_plots(dict_proc_histbundles; output_dir="/home/jiling/BjetTLA/plots/2025_05_06")

source

BjetTLA.extract_dsid Method

julia

extract_dsid(path::String)::String

Extract the DSID from the path of a file. It works by using a regular expression to match the DSID (\d{6}) in the path.

If no DSID can be found from path, it returns 99999999, which resembles a "data" run.

source

BjetTLA.fc_Db_cumulative_hist Method

julia

fc_Db_cumulative_hist(df; Db_range=-4:0.2:7, fc_range=0:0.01:0.1, mjj_limits=(80, Inf))::Hist2D

Make a 2D histogram where:

x-axis is the min(Db1, Db2) reverse cumulative across cells
y-axis is the fc (charme fraction) setting

Expects df to have the following columns:

GN1_pb1s, GN1_pc1s, GN1_pu1s: leading jet's b-tagging variables
GN1_pb2s, GN1_pc2s, GN1_pu2s: sub-leading jet's b-tagging variables
mjj12s: invariant mass of the two leading jets
wgts: event weights

Note

For example, if a bin centered at (min_Db = 1.5, fc=0.04) has 100 events, then it means there are 100 events with min(Db1, Db2) >= 1.5 when using fc = 0.04.

source

BjetTLA.feather_to_hist Method

julia

feather_to_hist(df, mlj_machines; var=:mjj12s, binedges = 0:5:600, mlj_cut=0.22)::Hist1D

Take a DataFrame df (likely from make_bdt_input_table), and a vector of MLJ machines, then make a histogram of variable var after selecting for events passing the mlj_cut after evaluating models on each event.

Each machine (ith) will be evaluated on events with eventNumber_modN .== i.See also xg_cross_train.

Note

Each machine will be evaluated on features defined in BDT_input_names as: pdf.(predict(MACH, df[!, BjetTLA.BDT_input_names]), true)

source

BjetTLA.feather_to_hist Method

julia

feather_to_hist(df; var=:mjj12s, binedges = 0:5:600, Db_cut=BjetTLA.Db_77)::Hist1D

Take a DataFrame df that has the same schema returned by make_bdt_input_table, then make a histogram of variable var after selecting for df.GN1_Db1s > Db_cut && df.GN1_Db2s > Db_cut.

source

BjetTLA.fit_sr_model Method

julia

fit_sr_model(xs, ys, errs; kw...) -> MLJ.Machine{SRRegressor ...}

source

BjetTLA.fit_sr_model Method

julia

fit_sr_model(hist::Hist1D; kw...) -> MLJ.Machine{SRRegressor ...}

Fit a symbolic regression model to a histogram. Return the fitted machine.

source

BjetTLA.fitted_smooth_hist Method

julia

fitted_smooth_hist(hist, func::T=ATLAS_f_4p, p0=0.1*ones(4); algo=:nm) -> NamedTuple(; sol = ..., hist= ...)

Fit a histogram with a function and return a new histogram with the fitted bin counts.

source

BjetTLA.get_file_list Method

julia

get_file_list(proc_name; ntuple_base_path = "/data/jiling/TLA/eos_ntuples/")

source

BjetTLA.get_fill_wgt Method

julia

get_fill_wgt(evt, pmgWeight, sumWeight;)::Float64

Get the fill weight for the event, if the evt has the mcEventWeight property, the weight is calculated as evt.mcEventWeight * (pmgWeight / sumWeight), otherwise the weight is 1.0 for real data.

source

BjetTLA.get_pmg_weight Method

julia

get_pmg_weight(dsid::Union{Integer, AbstractString};
md_path = "/cvmfs/atlas.cern.ch/repo/sw/database/GroupData/dev/PMGTools/PMGxsecDB_mc23.txt")

Get the PMG weight from cvmfs database for a given dsid: returns crossSection_pb * genFiltEff * kFactor

If the md_path is not a file, it will fall back to joinpath(@__DIR__, "../data/PMGxsecDB_mc23.txt")

Example

julia

julia> signal_dsid = 510392
510392

julia> BjetTLA.get_pmg_weight(signal_dsid)
0.7831930217100002

julia> BjetTLA.get_pmg_weight("510392")
0.7831930217100002

source

BjetTLA.get_shortname Method

julia

get_shortname(dsid, map)

Convert dsid to a short name, following the naming map.

source

BjetTLA.has_variable_in_formula Method

julia

has_variable_in_formula(equation) -> Bool

Check if a formula has a variable in the base and exponent functions.

Examples

julia

julia> has_variable_in_formula(MLJ.report(sr_model).equations[1])

source

BjetTLA.hist1d_ftag Method

julia

hist1d_ftag(df; nth_jet=1, tagger=:GN1, fc=0.08, mjj_limits=(100, Inf), binedges=-5:0.1:15)::Hist1D

Make a 1D histogram of the leading(nth_jet==1) or sub-leading(nth_jet==2) jet's b-tagging variables. See hist2d_ftag for more details.

source

BjetTLA.hist2d_ftag Method

julia

hist2d_ftag(df; tagger=:GN1, fc=0.08, mjj_limits=(100, Inf), binedges=-5:0.1:15)::Hist2D

Make a 2D histogram of the leading and sub-leading jet's b-tagging variables.

If tagger is set to :GN1, then the b-tagging variables are calculated using the BjetTLA.calc_db function. If tagger is set to :fastDIPS, then the b-tagging variables are taken directly from the DataFrame columns: Db1s and Db2s.

Assume a df DataFrame with the following columns:

GN1_pb1s, GN1_pc1s, GN1_pu1s: leading jet's b-tagging variables
GN1_pb2s, GN1_pc2s, GN1_pu2s: sub-leading jet's b-tagging variables
Db1s, Db2s: alternative b-tagging variables (if using :fastDIPS tagger)
mjj12s: invariant mass of the two leading jets
wgts: event weights

source

BjetTLA.hist_cut_eff Function

julia

hist_cut_eff(hist, cut=0.93) -> Measurement{Float64}

Calculate the efficiency of a cut on the histogram hist at the cut value. The numerator is the integral of the histogram to the right of the cut, and the denominator is the integral of the whole histogram.

Examples

julia

julia> h1 = Hist1D(; bincounts=[100.0, 100.0], binedges=0:0.5:1, sumw2=[2.0, 3.0])
edges: [0.0, 0.5, 1.0]
bin counts: [100.0, 100.0]
total count: 200.0

julia> BjetTLA.hist_cut_eff(h1, 0.5)
0.5 ± 0.0056

source

BjetTLA.hist_eff_divide Method

julia

hist_eff_divide(hist1, hist2)::Hist1D

Examples

julia

julia> h1 = Hist1D(; bincounts=[100.0], binedges=0:1, sumw2=[3.0])

julia> h2 = Hist1D(; bincounts=[200.0], binedges=0:1, sumw2=[5.0])

julia> h3 = BjetTLA.hist_eff_divide(h1, h2)
edges: [0.0, 1.0]
bin counts: [0.5]
total count: 0.5

julia> binerrors(h3)
1-element Vector{Float64}:
 0.005590169943749474

source

BjetTLA.hist_to_components Method

julia

hist_to_components(hist0) -> Tuple of Vector{Float64}

returns:

Where xs is bin centers. And ys is the bin counts. Finally, σs is the bin errors.

source

BjetTLA.idx_of_first_good_formula Method

julia

idx_of_first_good_formula(sr_model; by=:chi2_and_complexity, chi2_cutoff=1.05 impro_threshold=0.95) -> Int

Return the index of the first (least complex) good formula in the model. A good formula is defined as a formula that has a variable in the base and exponent functions.

by can be one of [:chi2_and_complexity, :chi2_improvement, :min_complexity_has_variables].

:chi2_and_complexity: Find the simplest formula that is within chi2_cutoff * minimal_chi2 of all formulas.
:chi2_improvement: Start with the simplest formula, if the next formula has chi2 smaller than current_chi2 * impro_threshold, pick it and repeat.
:min_complexity_has_variables: Find the least complex formula that uses variables in the base (uses x) and exponent (uses log(x)) functions.

Examples

julia

julia> typeof(sr_model)
Machine{SRRegressor{DynamicQuantities.SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}, Nothing}, SRRegressor{DynamicQuantities.SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}, Nothing}, true}

julia> BjetTLA.idx_of_first_good_formula(sr_model)
3

source

BjetTLA.loop_one_dir Method

julia

loop_one_dir(dir_path, loop_task; kwargs...)

Given a directory and a loop task, loop over all files in this directory and combine the output from loop task.

For an example task, see main_task.

source

BjetTLA.loop_one_proc Method

julia

loop_one_proc(proc, loop_task; ntuple_base_path="/data/jiling/TLA/eos_ntuples/", kwargs...)

Given a process name and a loop task, loop over all files under this process and combine the output.

For an example task, see main_task.

source

BjetTLA.main_task Method

julia

main_task(md, tree; fill_data=false, fc=0.085)::HistBundle

Main task to be run in loop_one_proc and friends. The fc parameter is used for calculating GN1 b-jet discriminant.

If fill_data is set to true, it will fill the data vectors in the HistBundle for further analysis such as deriving fc and WPs as well as BDT training. See also dump_bdt_arrow.

source

BjetTLA.make_bdt_input_table Method

julia

make_bdt_input_table(output::NamedTuple{Vectors...}, proc::Bool)::DataFrame

Take a NamedTuple of Vectors (potentially output of main_task with fill_data=true, and a proc (by convention, true for signal and false for bkg) and return a dataframe with the following columns:

eventNumber
ystars
mjj12s
ph_EMFracs
ph_j2_ΔRs
wgts
Db{1,2,3}s
Db{1,2,3}_{70,77}s
proc

source

BjetTLA.nonoverlap_mask Method

julia

nonoverlap_mask(photons, jets; cutoff=0.4)::Vector{Bool}

Create a mask that can be used to get non-overlapping jets.

Example

julia

mask = nonoverlap_mask(photons, jets)
clean_jets = @view jets[mask]

source

BjetTLA.normalize_to Method

julia

normalize_to(in_hist, ref_lumi)
normalize_to(in_hist, ref_hist::FHist.Hist1D)

Normalize the input histogram to the reference luminosity. If the reference histogram is provided, normalize the input histogram to the integral of the reference histogram.

source

BjetTLA.optim_fit Method

julia

optim_fit(func::Callable, hist0::FHist.Hist1D, p0; algo::Symbol=:nm, sqrt_s=SQRT_S[])

Fit a χ² fit using a func to a 1D histogram.

func: a function that takes two arguments, x and p, and returns a model value.
hist0: a 1D histogram to fit.
p0: initial parameters.

Note

The func should be defined with @., and the p0 should be a vector. See ATLAS_f_4p for an example.

Also, the x-axis (bincenters()) of the hist0 will be divided by sqrt_s to normalize the x-axis.

It returns the result of the optimization.

Example

julia

julia> hist_mjj12_mask
edges: [130.0, 132.0, 134.0, 136.0, 138.0, 140.0, 142.0, 144.0, 146.0, 148.0  …  482.0, 484.0, 486.0, 488.0, 490.0, 492.0, 494.0, 496.0, 498.0, 500.0]
bin counts: [650.0, 603.0, 649.0, 566.0, 595.0, 593.0, 535.0, 487.0, 492.0, 466.0  …  2.0, 10.0, 13.0, 9.0, 13.0, 10.0, 10.0, 8.0, 15.0, 8.0]
total count: 22293.0

julia> sol4 = optim_fit(ATLAS_f_4p, hist_mjj12_mask, ones(4))
retcode: Success
u: 4-element Vector{Float64}:
  0.0001086906699655396
 91.12405771249554l
  1.2345307604384157
 -0.03547447344008578

julia> sol4.u
4-element Vector{Float64}:
  0.0001086906699655396
 91.12405771249554
  1.2345307604384157
 -0.03547447344008578

source

BjetTLA.pass_filter_jets Method

julia

pass_filter_jets(evt, hlt_pfjets, photons; min_pt=25.0, max_eta = 2.1, dR_cutoff=0.4, min_Jvt=-Inf, fc=0.085)

Filter the HLT jets collection and related variables by doing the following:

require pt > min_pt
require abs(eta) < max_eta
remove jets that overlap with the photons collection by dR < dR_cutoff
require jvt > min_Jvt (jvt = jet vertex tagger)

Finally, return the following as a NamedTuple so you can select one or more at call site:

clean_jets: the cleaned jets collection
clean_jets_btags: the cleaned jets b-tagging values
clean_jets_emfracs: the cleaned jets EM fractions
clean_jets_jvts: the cleaned jets JVT values
fc: $f_{c}$ in the GN1 Db calculation, see calc_db.

Note

all the returnred arrays are sorted by the jet pt in descending order.

source

BjetTLA.pass_filter_photons Method

julia

clean_photons(hlt_photons); eta_cuts = true, min_pt = 20.0) -> subvector of photon lorentz vectors

Clean photons from the HLT photon collection. The cleaning is done by removing photons that satisfy the any of the following conditions:

pt < min_pt
abs(eta) > 2.37
1.37 < abs(eta) < 1.52

source

BjetTLA.plot_Db_fc_scan Method

julia

plot_Db_fc_scan(hist_sig, hist_data; mjj_limits::Tuple, sig_title = "Z' (mass = 150GeV)")

Given two input histograms (produced by fc_Db_cumulative_hist), this function plots the significance of the signal hist_sig over the background hist_data in a 2x2 grid of heatmaps.

It will also mark the optimal Db and f_c values on the significance 2D plane, the optimal points are calculated by best_Db_fc.

source

BjetTLA.plot_compare_Dbs_1D Method

julia

plot_compare_Dbs_1D(df_data, df_150, df_200; nth_jet=1, GN1_fc=0.0085, binedges=-5:0.5:15, wps=nothing)

source

BjetTLA.plot_cutflow Method

julia

plot_cutflow(cutflow::Hist1D; cutnames=_cut_flow_names(), label="data 2023 nv8")

Plot the cutflow histogram with the given label as the title.

source

BjetTLA.plot_fit Method

julia

plot_fit(hist_data; fit_func = ATLAS_f_4p, sqrt_s)

Plot the input histogram hist_data and the background prediction using the 4-parameter fit function (see bkg_prediction).

source

BjetTLA.plot_smooth_bkg_fit Method

Examples

julia

hist_data_bdt_smooth_3p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_3p_xnorm, p0_4para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_4p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_4p_xnorm, p0_4para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_5p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_5p_xnorm, p0_5para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_6p = @time BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_6p_xnorm, p0_6para; lb=nothing, ub=nothing, multistart, likelihood, algo);

plot_smooth_bkg_fit(hist_data_bdt, hist_data_bdt_smooth_3p, hist_data_bdt_smooth_4p, hist_data_bdt_smooth_5p, hist_data_bdt_smooth_6p;
    scale_factor=1, cut_name="BDT > $BDT_CUT")

source

BjetTLA.plot_stack_hist Function

julia

plot_stack_hist(dict_histbundle::Dict{<:Any,<:HistBundle}, varname::Symbol, cutflow_idx=lastindex(_cut_flow_names()); mc_procs=MC_proc_labels, lumi, rebin, fig_kw=(;), axis_kw=(;), atlas_label_pos=(0, 1), offset=(30, -20),
frag_direct=(0.7316265625500193, 3.1254759924281035)

)

Generate a stacked histogram plot comparing Monte Carlo (MC) histograms, data histograms, and optional signal histograms.

dict_histbundle: A dictionary where keys are MC process names and values are their corresponding HistBundle objects.
varname: A symbol representing the variable to be plotted.
cutflow_idx: An integer specifying the index of the cutflow step to use. Defaults to the last index of _cut_flow_names().
mc_procs: A list of Monte Carlo process labels to include in the plot. Defaults to MC_proc_labels.
lumi: The integrated luminosity to scale the MC histograms.
rebin: An integer specifying the rebinning factor for the histograms. Defaults to 1.
fig_kw: A named tuple of keyword arguments for configuring the figure. Defaults to an empty tuple.
axis_kw: A named tuple of keyword arguments for configuring the axis. Defaults to an empty tuple.
atlas_label_pos: A tuple specifying the position of the ATLAS label. See add_ATLAS_internal!.
offset: A tuple specifying the offset for the ATLAS label. Defaults to (30, -20).
frag_direct: A tuple of scale factors for the fragmentation and direct components of the gammajet process. Defaults to (0.7316265625500193, 3.1254759924281035).

Example

julia

hists_serial_path = "/data/jiling/TLA/julia_jlds/2025_05_03.jlserialize"

dict_proc_histbundles = deserialize(hists_serial_path);

plot_stack_hist(dict_proc_histbundles, :jet1_eta, 3; rebin=1; lumi=run3_partial_lumi, fig_kw=(; size=(1000, 600)),
    axis_kw=(; xlabel="ph1 eta", yscale=Makie.pseudolog10, yticks=10 .^ (0:7), limits=(-3, 3, 1, 10^7), ),
    frag_direct = (0.7316265625500193, 3.1254759924281035),
)

source

BjetTLA.plot_staircase_scan Method

julia

plot_staircase_scan(hist_sig, hist_data; colorscale=identity, rebin=1, wps=[1.0, 2.8, 5.8])

This function generates a staircase plot comparing signal and data histograms. It overlays heatmaps of the histograms and highlights working points (WPs) with vertical and horizontal lines.

Arguments

hist_sig: The signal histogram to be plotted.
hist_data: The data histogram to be plotted.
colorscale: The color scale to be applied to the heatmaps.
rebin: The rebinning factor for the histograms. Defaults to 1.
wps: A list of working points to be highlighted on the plot. Defaults to [1.0, 2.8, 5.8].

source

BjetTLA.poisson_fluctuate Method

julia

poisson_fluctuate(hist::Hist1D) -> Hist1D

Fluctuate the bin counts of a histogram with Poisson distribution. Returns a new histogram with fluctuated bin counts, and the sum of weights squared is preserved.

Example

julia

julia> poisson_fluctuate(Hist1D(; binedges=0:3, bincounts=[10, 20, 30]))
edges: [0.0, 1.0, 2.0, 3.0]
bin counts: [9.0, 18.0, 28.0]
total count: 55.0

source

BjetTLA.proc_dirs Method

julia

proc_dirs(proc; ntuple_base_path="/data/jiling/TLA/eos_ntuples/")::Vector{String}

Given a process name, return a list of directories containing the ntuple files.

source

BjetTLA.pyhf_simple_significance Method

julia

pyhf_simple_significance(sig_hist, data_hist; sqrt_s, fit_func=nothing)

Calculate the upper limit on the signal strength using the pyhf package. The function returns the upper limit on the signal strength at 95% confidence level.

Specifically, the function takes the signal histogram sig_hist and the data histogram data_hist. The sqrt_s parameter is used to normalize the x-axis of the input histograms. (mjj/sqrt_s), and the fit_funcparameter is used to predict the background. If thefit_func is not provided, the function uses the data histogram as the background prediction – a reverse Asimov fit.

This function first builds a uncorrelated_background() model using pyhf and then uses the upper_limit() function to calculate the upper limit on the signal strength.

Example

julia

julia> using BjetTLA, FHist

julia> h1 = Hist1D(;binedges=0:2, bincounts=[12.0, 11.0]);

julia> h2 = Hist1D(;binedges=0:2, bincounts=[50.0, 52.0], sumw2=[9, 49]);

julia> pyhf_simple_significance(h1, h2; sqrt_s=136000)
Python:
(array(1.22313872), [array(0.65879787), array(0.8805031), array(1.22313872), array(1.7063601),…
       1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
       2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
       3.9, 4. ]), [(array(1.), [array(1.), array(1.), array(1.), array(1.), array(1.)]), (arr…

Note

The normalization of the input histograms need to be correct (i.e. the bins should predict actual number of events expected).

source

BjetTLA.safeLogRatio Method

julia

safeLogRatio(num,den)

Safe log computation, to emulate athena implementation:

If iszero(den), return 15.0
If iszero(num), return -10.0
otherwise, return log(num/den) as expected

source

BjetTLA.scan_fluctuate Method

julia

scan_fluctuate(hist_smooth, SplusB::CombinedPDF; N=1, npara=4) -> Vector{Tuple{Float64, Float64}}

Throw N Poisson-fluctuated histograms (poisson_fluctuate) from hist_smooth and fit them with SplusB. Return a vector of tuples of the fitted signal strength and the χ²/ndf.

As noted in CombinedPDF, the signal strength is expected to be the last parameter.

source

BjetTLA.skip_JZ_event Function

julia

pass_JZ_cleaning(j_pt, jz, eps=0.2)::Bool

Apply cleaning filter for JZ slices, checks if the average pT of the two leading jets is in the expected range for slice jz, within a eps tolerance.

source

BjetTLA.split_by_dsid Method

julia

split_by_dsid(strings::Vector{String})

Splits a vector of strings into a dictionary based on the dsid, a sequence of six digits preceeded and followed by ".". If that criteria fails, looks for a run number instead (sequence of eight digits).

source

BjetTLA.staircase_inbound Method

julia

staircase_inbound(x,y,i,j,k)::Bool

Assume x,y are bin centers, and i,j,k are on binedges.

source

BjetTLA.staircase_scan Method

julia

staircase_scan(hist2d, hist2d_bkg; max=8.0)

Find the best staircase cut in a 2D histogram.

hist2d: Signal 2D histogram
hist2d_bkg: background histogram
max: maximum value for the D_b cut

source

BjetTLA.staircase_sig Method

julia

staircase_sig(hist2d, hist2d_bkg, centers, i, j, k)::Float64

Calculate the significance of a 2D histogram using the staircase method. i < j < k

source

BjetTLA.xg_cross_train Method

julia

xg_cross_train(df_all; num_round=90, max_depth=7, η=0.8, scale_pos_weight=200, input_names=BDT_input_names, gamma=0, alpha=0, lambda=1, Nfolds=5, lumi=30e3)

Take a DataFrame (potentially loaded from the output of dump_bdt_arrow) and return Nfolds The dataset is partitioned into Nfolds using mod1.(df_all.eventNumber, Nfolds).

Examples

julia

julia> Random.seed!(654321)
julia> df_all = shuffle(DataFrame(Arrow.Table("/data/jiling/TLA/julia_arrows/2024_11_04_data2023.feather")));
julia> df_all.proc = categorical(df_all.proc);

julia> bdt_machines = xg_cross_train(df_all);

source

BjetTLA.@cutflow Macro

julia

@cutflow counter hist name_list name wgt

Convinient macro to generate cutflow histogram filling code. It will increment the counter by 1, check if the name_list[counter] is equal to expected name, and then push the counter to the hist with the given wgt.

BjetTLA.@cutflow counter hist name_list name wgt

is equivalent to:

counter += 1
@assert name_list[counter] == name
atomic_push!(hist, counter, wgt)

source

BjetTLA.@fourmomenta Macro

julia

@fourmomenta evt obj_prefix

Creates the fourmomenta for a given object type in the event. Returns a StructArray of LorentzVectorCyl.

It assumes that the event is a structure containig all the projections of the 4-vectors: evt.obj_pt, evt.obj_eta, evt.obj_phi, evt.obj_m (or evt.obj_E for jets).

Concretely,

julia

@fourmomenta evt ph

would expand to:

julia

StructArray(LorentzVectorCyl.(
    evt.ph_pt,
    evt.ph_eta,
    evt.ph_phi,
    evt.ph_m
))

For jets, it will call fromPtEtaPhiE and the return type stays the same.

Example

julia

julia> for evt in tree
            hlt_jets = @fourmomenta evt jet
            count(>(25), hlt_jets.pt) # count how many jets have pt > 25 GeV
            ...
        end

source

Internal ​

Internal