Skip to content

Internal

BjetTLA.BDT_input_names Constant

The names of input variables used in BDT training, these are a subset of what make_bdt_input_table can produce.

source
BjetTLA.SQRT_S Constant
julia
SQRT_S = Ref(13600)

The center of mass energy of the LHC in GeV. For Run 3 it's 13.6 TeV.

source
BjetTLA.CombinedPDF Type
julia
struct CombinedPDF{F, D}

using Makie: Combined func::F dG:😄 end

A callable type that takes a background function and parameter for a gaussian signal.

When it's called, it has the signature of (x, p), where x is the x-axis value and p is a vector of parameters. Note that x should be normalized by SQRT_S (default 13600 GeV) already. As for the p, the first npara elements are for the background function, and the last one is for the signal normalization.

Example

julia
julia> cc = CombinedPDF(ATLAS_f_4p, 192, 192*0.12)
CombinedPDF{typeof(ATLAS_f_4p)}(BjetTLA.ATLAS_f_4p, 192.0, 23.04)

# evaluated at 200 GeV
julia> cc(200 / 13600, rand(5))
0.23111286768842174
source
BjetTLA.HistBundle Type
julia
@kwdef struct HistBundle{N,HT}
    histograms::NTuple{N,HT}
    cutflow_names::NTuple{N,String}
    cutflow_hist::Hist1D{Float64} = Hist1D(; binedges=0:length(cutflow_names))
end
HistBundle(f::Function, cutflow_names)

A mergeable structure that keeps track of histograms and a single cutflow histogram. If you have N histogram variables and M cutflow bins, the structure will carry around N*M + 1 histograms.

If f::Function is provided, it will be called N times to initialize the histograms. Where N is length of cutflow_names.

source
Base.merge Method
julia
merge(hb1::HistBundle, hb2::HistBundle)
source
BjetTLA.ATLAS_f_4p Method
julia
@. ATLAS_f_4p(x, p) = p[1]*(1-x)^p[2] * x^(p[3] + p[4]*log(x))
p1(1x)p2xp3+p4log(x)source
BjetTLA.ATLAS_f_5p Method
julia
@. ATLAS_f_5p(x, p) = p[1]*(1-x)^p[2] * x^(p[3] + p[4]*log(x) + p[5]*log(x)^2)
p1(1x)p2xp3+p4log(x)+p5log2(x)source
BjetTLA.ATLAS_f_6p Method
julia
@. ATLAS_f_6p(x, p) = p[1]*(1-x)^p[2] * x^(p[3] + p[4]*log(x) + p[5]*log(x)^2 + p[6]*log(x)^3)
p1(1x)p2xp3+p4log(x)+p5log2(x)+p6log3(x)source
BjetTLA.EMFrac_study Method
julia
EMFrac_study(md, df; photon_ID=true, electron_only=true, check_charge_sum=true)

A task to be used with loop_one_dir and friends for calculating the EMFraction cut scale factor and uncertainty.

Returns:

julia
(;
    mlly_mll = Hist2D(; binedges = (50:2:150, 15:2:100)),
    EMFrac_grea35 = Hist1D(; binedges = 0:0.02:1.1),
    EMFrac_2535 = Hist1D(; binedges = 0:0.02:1.1),
    EMFrac_2025 = Hist1D(; binedges = 0:0.02:1.1),
    pT_pass = Hist1D(; binedges = emfrac_pt_binedges, overflow=true),
    pT_fail = Hist1D(; binedges = emfrac_pt_binedges, overflow=true),
    EMFrac_pT = Hist2D(; binedges = (0:0.02:1.1, 15:2:150), overflow=false),
    pT = Hist1D(; binedges = (0:2:100), overflow=true),
)

Examples

julia
julia> df_ee_bkgs_path = [
    "/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700786.root",
    "/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700787.root",
    "/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700788.root",
    ];

julia> hist_mc_eejets = mapreduce(+, df_ee_bkgs_path) do path
    BjetTLA.loop_one_file(path, EMFrac_study; tree_name="TreeAlgo_noCalib/nominal").EMFrac_grea35
end;

julia> hists_data = BjetTLA.loop_one_file("/data/jiling/TLA/eos_ntuples/RadiativeZ/data_egam3_noVGammaOR/alldata.99999999.run3_merged.root", EMFrac_study);

julia> sf35 = BjetTLA.hist_cut_eff(hists_data.EMFrac_grea35) / BjetTLA.hist_cut_eff(hists_mc.EMFrac_grea35);
1.043 ± 0.015
source
BjetTLA.ExampleTask Method
julia
ExampleTask(md,evt)

This is an example task, that can be run in loop_one_proc to process the data. It is a simple task that will fill a histogram with the pT of the leading jet in the event, if there are at least 2 good jets and 1 good photon.

source
BjetTLA.F_test Method
julia
F_test(chi2_nom, chi2_alt, n_nom, n_alt, nbins) -> Float64

Calculate the F-test score for two models.

  • chi2_nom: chi² of the nominal model

  • chi2_alt: chi² of the alternative model

  • n_nom: number of parameters of the nominal model

  • n_alt: number of parameters of the alternative model

  • nbins: number of bins in the histogram

See also: https://atlas-stats-doc-dev.web.cern.ch/atlas-stats-doc-dev/recommendations/rec_smooth_background_modeling/#other-criteria

Examples

julia
julia> BjetTLA.F_test(42, 40, 5, 6, 200)
0.0026311523165153876

::: tip Note

```python
>>> import ROOT
>>> ROOT.Math.fdistribution_cdf_c(0.2, 1, 20)
0.6595265132032726

is equivalents to using Distributions; ccdf(FDist(1, 20), 0.2) in Julia

:::

source
BjetTLA.F_test_on_sr Method
julia
F_test_on_sr(sr_model, nbins) -> Vector{Float64}

Assuming report(sr_model).losses are χ² values of the models, calculate the F-test score for each formula with respect to the next complex one.

The order of the output corresponds to the order of the models in the sr_model, you can use findfirst(>(0.05), output) to find the first formula you should retain.

Note

Due to the nature of Symbolic Regression, instead of counting # of parameters, we use the "complexity" of the formula as proxy. This avoids the problem of "same # of parameters" which causes F-test to fail, for example, consider: x^(b) and x^(b*x), technically they both have just 1 parameter, but clearly the latter is more complex.

See also: F_test.

Examples

julia
julia> sr_model = BjetTLA.fit_sr_model(hist_data_bdt);

julia> BjetTLA.F_test_on_sr(sr_model, nbins(hist_data_bdt))
9-element Vector{Float64}:
1.5007930525231936e-30
 1.994665149876051e-7
 0.006603081588013456
 0.9849216334868175
 0.9968279815401603
 0.9998969027366611
 0.9999999919476441
 0.9999991299279695
 0.9999666474343953
source
BjetTLA.GN1_Eff_Task Method
julia
GN1_Eff_Task(md,evt)
source
BjetTLA.NLL_sr_loss Method
julia
NLL_sr_loss(ex, dataset::Dataset{T,L}, options)

Negative loglikelihood loss function for SRRegressor.

source
BjetTLA.add_ATLAS_internal! Method
julia
add_ATLAS_internal!(ax; pos=(0, 1), revision="Internal", lumi=run3_partial_lumi, offset=(30, -20))

Add the ATLAS label and additional information to a given axis.

Arguments

  • ax: The Makie axis to which the label will be added.

  • pos: A tuple specifying the position of the label in relative coordinates. Defaults to (0, 1) (top left).

  • revision: A string indicating the revision type (e.g., "Internal", "Preliminary"). Defaults to "Internal".

  • lumi: The integrated luminosity to display in the label. Defaults to run3_partial_lumi.

  • offset: A tuple specifying the offset of the label in pixels. Defaults to (30, -20).

Example

julia
fig, ax = Figure(), Axis()
add_ATLAS_internal!(ax; pos=(0.1, 0.9), revision="Preliminary", lumi=139e3)
source
BjetTLA.add_mlj_score_col! Method
julia
add_mlj_score_col!(df, mlj_machines; input_names=BDT_input_names, force=true, branch_name=:mlj_scores)

Take a DataFrame df and a vector of MLJ machines, then add a column branch_name to df with the predictions of the MLJ machines.

source
BjetTLA.best_Db_fc Method
julia
best_Db_fc(hist_sig, hist_data)::(;Db, fc)

Find the best Db and fc values for a given signal histogram and data histogram. The histograms should be output of fc_Db_cumulative_hist.

source
BjetTLA.bkg_prediction Method
julia
bkg_prediction(hist_data; fit_func = ATLAS_f_4p, sqrt_s) -> Vector{Float64}

Predict the background using the 4-parameter fit function (see ATLAS_f_4p. The function returns the background prediction normalized to the integral of the input histogram hist_data. The sqrt_s parameter is used to normalize the x-axis of the input histogram. (mjj/sqrt_s)

source
BjetTLA.calc_db Method
julia
calc_db(b::Number, c::Number, u::Number; fc=0.018) -> Float64
calc_db(b::Vector, c::Vector, u::Vector; fc=0.018) -> Vector{Float64}

Compute the b-tagging discriminants, see https://cds.cern.ch/record/2862499 for more detail

The output is: Db=logbcfc+(1fc)u

Note

The log operation usese safeLogRatio to prevent producing Inf and NaN.

source
BjetTLA.chi2 Method
julia
chi2(os, cs, σs) -> Float64

Calculate chi2 given three arrays of data points, model points and errors.

χ2=i(oici)2σi2
  • os: observed data points

  • cs: calculated model points

  • σs: errors of the observed data points

source
BjetTLA.chi2_sr_functor Method
julia
chi2_sr_functor(errs)::Callable(ex, dataset, options)

Returns a callable to be used as a loss function for SRRegressor.

source
BjetTLA.closest_lly_pair Function
julia
closest_lly_pair(leps, photons, target=91.2) -> (i1, i2, ip)

Find the indicies that give lepton pair + photon three-body invariant mass closest to the target mass.

Return the indices of the lepton pair (i1, i2) and the photon (ip).

source
BjetTLA.df_from_label Method
julia
df_from_label(label::AbstractString; base_path="/data/jiling/TLA/julia_arrows/")
df_from_label(labels::AbstractVector{<:AbstractString}; base_path="/data/jiling/TLA/julia_arrows/")

Load feather files from a label, which is a substring of the file name. See also dump_bdt_arrow, which is used to produce these feather files.

source
BjetTLA.dump_bdt_arrow Method
julia
dump_bdt_arrow(proc_name; ntuple_base_path = "/data/jiling/TLA/eos_ntuples/", output_base_path = "/data/jiling/TLA/julia_arrows/", min_mjj=100)

Read ntuples from ntuple_base_path and write out feather files to output_base_path for each file that matches proc_name. It runs the files through main_task with fill_data=true and then uses make_bdt_input_table to clean up and prepare the table before writing out the arrow file.

source
BjetTLA.dump_stacked_plots Method
julia
dump_stacked_plots(dict_proc; output_dir, mc_procs = MC_proc_labels, lumi=run3_partial_lumi, fig_kw=(; size=(1000, 600))

)

Dump cutflow plots (plot_cutflow) and stacked histogram plots (plot_stack_hist) to target directory.

Examples

julia
hists_serial_path = "/data/jiling/TLA/julia_jlds/2025_05_03.jlserialize"
dict_proc_histbundles = deserialize(hists_serial_path);
BjetTLA.dump_stacked_plots(dict_proc_histbundles; output_dir="/home/jiling/BjetTLA/plots/2025_05_06")
source
BjetTLA.extract_dsid Method
julia
extract_dsid(path::String)::String

Extract the DSID from the path of a file. It works by using a regular expression to match the DSID (\d{6}) in the path.

If no DSID can be found from path, it returns 99999999, which resembles a "data" run.

source
BjetTLA.fc_Db_cumulative_hist Method
julia
fc_Db_cumulative_hist(df; Db_range=-4:0.2:7, fc_range=0:0.01:0.1, mjj_limits=(80, Inf))::Hist2D

Make a 2D histogram where:

  • x-axis is the min(Db1, Db2) reverse cumulative across cells

  • y-axis is the fc (charme fraction) setting

Expects df to have the following columns:

  • GN1_pb1s, GN1_pc1s, GN1_pu1s: leading jet's b-tagging variables

  • GN1_pb2s, GN1_pc2s, GN1_pu2s: sub-leading jet's b-tagging variables

  • mjj12s: invariant mass of the two leading jets

  • wgts: event weights

Note

For example, if a bin centered at (min_Db = 1.5, fc=0.04) has 100 events, then it means there are 100 events with min(Db1, Db2) >= 1.5 when using fc = 0.04.

source
BjetTLA.feather_to_hist Method
julia
feather_to_hist(df, mlj_machines; var=:mjj12s, binedges = 0:5:600, mlj_cut=0.22)::Hist1D

Take a DataFrame df (likely from make_bdt_input_table), and a vector of MLJ machines, then make a histogram of variable var after selecting for events passing the mlj_cut after evaluating models on each event.

Each machine (ith) will be evaluated on events with eventNumber_modN .== i.See also xg_cross_train.

Note

Each machine will be evaluated on features defined in BDT_input_names as: pdf.(predict(MACH, df[!, BjetTLA.BDT_input_names]), true)

source
BjetTLA.feather_to_hist Method
julia
feather_to_hist(df; var=:mjj12s, binedges = 0:5:600, Db_cut=BjetTLA.Db_77)::Hist1D

Take a DataFrame df that has the same schema returned by make_bdt_input_table, then make a histogram of variable var after selecting for df.GN1_Db1s > Db_cut && df.GN1_Db2s > Db_cut.

source
BjetTLA.fit_sr_model Method
julia
fit_sr_model(xs, ys, errs; kw...) -> MLJ.Machine{SRRegressor ...}
source
BjetTLA.fit_sr_model Method
julia
fit_sr_model(hist::Hist1D; kw...) -> MLJ.Machine{SRRegressor ...}

Fit a symbolic regression model to a histogram. Return the fitted machine.

source
BjetTLA.fitted_smooth_hist Method
julia
fitted_smooth_hist(hist, func::T=ATLAS_f_4p, p0=0.1*ones(4); algo=:nm) -> NamedTuple(; sol = ..., hist= ...)

Fit a histogram with a function and return a new histogram with the fitted bin counts.

source
BjetTLA.get_file_list Method
julia
get_file_list(proc_name; ntuple_base_path = "/data/jiling/TLA/eos_ntuples/")
source
BjetTLA.get_fill_wgt Method
julia
get_fill_wgt(evt, pmgWeight, sumWeight;)::Float64

Get the fill weight for the event, if the evt has the mcEventWeight property, the weight is calculated as evt.mcEventWeight * (pmgWeight / sumWeight), otherwise the weight is 1.0 for real data.

source
BjetTLA.get_pmg_weight Method
julia
get_pmg_weight(dsid::Union{Integer, AbstractString};
md_path = "/cvmfs/atlas.cern.ch/repo/sw/database/GroupData/dev/PMGTools/PMGxsecDB_mc23.txt")

Get the PMG weight from cvmfs database for a given dsid: returns crossSection_pb * genFiltEff * kFactor

If the md_path is not a file, it will fall back to joinpath(@__DIR__, "../data/PMGxsecDB_mc23.txt")

Example

julia
julia> signal_dsid = 510392
510392

julia> BjetTLA.get_pmg_weight(signal_dsid)
0.7831930217100002

julia> BjetTLA.get_pmg_weight("510392")
0.7831930217100002
source
BjetTLA.get_shortname Method
julia
get_shortname(dsid, map)

Convert dsid to a short name, following the naming map.

source
BjetTLA.has_variable_in_formula Method
julia
has_variable_in_formula(equation) -> Bool

Check if a formula has a variable in the base and exponent functions.

Examples

julia
julia> has_variable_in_formula(MLJ.report(sr_model).equations[1])
source
BjetTLA.hist1d_ftag Method
julia
hist1d_ftag(df; nth_jet=1, tagger=:GN1, fc=0.08, mjj_limits=(100, Inf), binedges=-5:0.1:15)::Hist1D

Make a 1D histogram of the leading(nth_jet==1) or sub-leading(nth_jet==2) jet's b-tagging variables. See hist2d_ftag for more details.

source
BjetTLA.hist2d_ftag Method
julia
hist2d_ftag(df; tagger=:GN1, fc=0.08, mjj_limits=(100, Inf), binedges=-5:0.1:15)::Hist2D

Make a 2D histogram of the leading and sub-leading jet's b-tagging variables.

If tagger is set to :GN1, then the b-tagging variables are calculated using the BjetTLA.calc_db function. If tagger is set to :fastDIPS, then the b-tagging variables are taken directly from the DataFrame columns: Db1s and Db2s.

Assume a df DataFrame with the following columns:

  • GN1_pb1s, GN1_pc1s, GN1_pu1s: leading jet's b-tagging variables

  • GN1_pb2s, GN1_pc2s, GN1_pu2s: sub-leading jet's b-tagging variables

  • Db1s, Db2s: alternative b-tagging variables (if using :fastDIPS tagger)

  • mjj12s: invariant mass of the two leading jets

  • wgts: event weights

source
BjetTLA.hist_cut_eff Function
julia
hist_cut_eff(hist, cut=0.93) -> Measurement{Float64}

Calculate the efficiency of a cut on the histogram hist at the cut value. The numerator is the integral of the histogram to the right of the cut, and the denominator is the integral of the whole histogram.

Examples

julia
julia> h1 = Hist1D(; bincounts=[100.0, 100.0], binedges=0:0.5:1, sumw2=[2.0, 3.0])
edges: [0.0, 0.5, 1.0]
bin counts: [100.0, 100.0]
total count: 200.0

julia> BjetTLA.hist_cut_eff(h1, 0.5)
0.5 ± 0.0056
source
BjetTLA.hist_eff_divide Method
julia
hist_eff_divide(hist1, hist2)::Hist1D

Examples

julia
julia> h1 = Hist1D(; bincounts=[100.0], binedges=0:1, sumw2=[3.0])

julia> h2 = Hist1D(; bincounts=[200.0], binedges=0:1, sumw2=[5.0])

julia> h3 = BjetTLA.hist_eff_divide(h1, h2)
edges: [0.0, 1.0]
bin counts: [0.5]
total count: 0.5

julia> binerrors(h3)
1-element Vector{Float64}:
 0.005590169943749474
source
BjetTLA.hist_to_components Method
julia
hist_to_components(hist0) -> Tuple of Vector{Float64}

returns:

  • xs

  • ys

  • σs

Where xs is bin centers. And ys is the bin counts. Finally, σs is the bin errors.

source
BjetTLA.idx_of_first_good_formula Method
julia
idx_of_first_good_formula(sr_model; by=:chi2_and_complexity, chi2_cutoff=1.05 impro_threshold=0.95) -> Int

Return the index of the first (least complex) good formula in the model. A good formula is defined as a formula that has a variable in the base and exponent functions.

by can be one of [:chi2_and_complexity, :chi2_improvement, :min_complexity_has_variables].

  • :chi2_and_complexity: Find the simplest formula that is within chi2_cutoff * minimal_chi2 of all formulas.

  • :chi2_improvement: Start with the simplest formula, if the next formula has chi2 smaller than current_chi2 * impro_threshold, pick it and repeat.

  • :min_complexity_has_variables: Find the least complex formula that uses variables in the base (uses x) and exponent (uses log(x)) functions.

Examples

julia
julia> typeof(sr_model)
Machine{SRRegressor{DynamicQuantities.SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}, Nothing}, SRRegressor{DynamicQuantities.SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}, Nothing}, true}

julia> BjetTLA.idx_of_first_good_formula(sr_model)
3
source
BjetTLA.loop_one_dir Method
julia
loop_one_dir(dir_path, loop_task; kwargs...)

Given a directory and a loop task, loop over all files in this directory and combine the output from loop task.

For an example task, see main_task.

source
BjetTLA.loop_one_proc Method
julia
loop_one_proc(proc, loop_task; ntuple_base_path="/data/jiling/TLA/eos_ntuples/", kwargs...)

Given a process name and a loop task, loop over all files under this process and combine the output.

For an example task, see main_task.

source
BjetTLA.main_task Method
julia
main_task(md, tree; fill_data=false, fc=0.085)::HistBundle

Main task to be run in loop_one_proc and friends. The fc parameter is used for calculating GN1 b-jet discriminant.

If fill_data is set to true, it will fill the data vectors in the HistBundle for further analysis such as deriving fc and WPs as well as BDT training. See also dump_bdt_arrow.

source
BjetTLA.make_bdt_input_table Method
julia
make_bdt_input_table(output::NamedTuple{Vectors...}, proc::Bool)::DataFrame

Take a NamedTuple of Vectors (potentially output of main_task with fill_data=true, and a proc (by convention, true for signal and false for bkg) and return a dataframe with the following columns:

  • eventNumber

  • ystars

  • mjj12s

  • ph_EMFracs

  • ph_j2_ΔRs

  • wgts

  • Db{1,2,3}s

  • Db{1,2,3}_{70,77}s

  • proc

source
BjetTLA.nonoverlap_mask Method
julia
nonoverlap_mask(photons, jets; cutoff=0.4)::Vector{Bool}

Create a mask that can be used to get non-overlapping jets.

Example

julia
mask = nonoverlap_mask(photons, jets)
clean_jets = @view jets[mask]
source
BjetTLA.normalize_to Method
julia
normalize_to(in_hist, ref_lumi)
normalize_to(in_hist, ref_hist::FHist.Hist1D)

Normalize the input histogram to the reference luminosity. If the reference histogram is provided, normalize the input histogram to the integral of the reference histogram.

source
BjetTLA.optim_fit Method
julia
optim_fit(func::Callable, hist0::FHist.Hist1D, p0; algo::Symbol=:nm, sqrt_s=SQRT_S[])

Fit a χ² fit using a func to a 1D histogram.

  • func: a function that takes two arguments, x and p, and returns a model value.

  • hist0: a 1D histogram to fit.

  • p0: initial parameters.

Note

The func should be defined with @., and the p0 should be a vector. See ATLAS_f_4p for an example.

Also, the x-axis (bincenters()) of the hist0 will be divided by sqrt_s to normalize the x-axis.

It returns the result of the optimization.

Example

julia
julia> hist_mjj12_mask
edges: [130.0, 132.0, 134.0, 136.0, 138.0, 140.0, 142.0, 144.0, 146.0, 148.0  482.0, 484.0, 486.0, 488.0, 490.0, 492.0, 494.0, 496.0, 498.0, 500.0]
bin counts: [650.0, 603.0, 649.0, 566.0, 595.0, 593.0, 535.0, 487.0, 492.0, 466.0  2.0, 10.0, 13.0, 9.0, 13.0, 10.0, 10.0, 8.0, 15.0, 8.0]
total count: 22293.0

julia> sol4 = optim_fit(ATLAS_f_4p, hist_mjj12_mask, ones(4))
retcode: Success
u: 4-element Vector{Float64}:
  0.0001086906699655396
 91.12405771249554l
  1.2345307604384157
 -0.03547447344008578

julia> sol4.u
4-element Vector{Float64}:
  0.0001086906699655396
 91.12405771249554
  1.2345307604384157
 -0.03547447344008578
source
BjetTLA.pass_filter_jets Method
julia
pass_filter_jets(evt, hlt_pfjets, photons; min_pt=25.0, max_eta = 2.1, dR_cutoff=0.4, min_Jvt=-Inf, fc=0.085)

Filter the HLT jets collection and related variables by doing the following:

  • require pt > min_pt

  • require abs(eta) < max_eta

  • remove jets that overlap with the photons collection by dR < dR_cutoff

  • require jvt > min_Jvt (jvt = jet vertex tagger)

Finally, return the following as a NamedTuple so you can select one or more at call site:

  • clean_jets: the cleaned jets collection

  • clean_jets_btags: the cleaned jets b-tagging values

  • clean_jets_emfracs: the cleaned jets EM fractions

  • clean_jets_jvts: the cleaned jets JVT values

  • fc: fc in the GN1 Db calculation, see calc_db.

Note

all the returnred arrays are sorted by the jet pt in descending order.

source
BjetTLA.pass_filter_photons Method
julia
clean_photons(hlt_photons); eta_cuts = true, min_pt = 20.0) -> subvector of photon lorentz vectors

Clean photons from the HLT photon collection. The cleaning is done by removing photons that satisfy the any of the following conditions:

  • pt < min_pt

  • abs(eta) > 2.37

  • 1.37 < abs(eta) < 1.52

source
BjetTLA.plot_Db_fc_scan Method
julia
plot_Db_fc_scan(hist_sig, hist_data; mjj_limits::Tuple, sig_title = "Z' (mass = 150GeV)")

Given two input histograms (produced by fc_Db_cumulative_hist), this function plots the significance of the signal hist_sig over the background hist_data in a 2x2 grid of heatmaps.

It will also mark the optimal Db and f_c values on the significance 2D plane, the optimal points are calculated by best_Db_fc.

source
BjetTLA.plot_compare_Dbs_1D Method
julia
plot_compare_Dbs_1D(df_data, df_150, df_200; nth_jet=1, GN1_fc=0.0085, binedges=-5:0.5:15, wps=nothing)
source
BjetTLA.plot_cutflow Method
julia
plot_cutflow(cutflow::Hist1D; cutnames=_cut_flow_names(), label="data 2023 nv8")

Plot the cutflow histogram with the given label as the title.

source
BjetTLA.plot_fit Method
julia
plot_fit(hist_data; fit_func = ATLAS_f_4p, sqrt_s)

Plot the input histogram hist_data and the background prediction using the 4-parameter fit function (see bkg_prediction).

source
BjetTLA.plot_smooth_bkg_fit Method

Examples

julia
hist_data_bdt_smooth_3p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_3p_xnorm, p0_4para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_4p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_4p_xnorm, p0_4para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_5p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_5p_xnorm, p0_5para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_6p = @time BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_6p_xnorm, p0_6para; lb=nothing, ub=nothing, multistart, likelihood, algo);

plot_smooth_bkg_fit(hist_data_bdt, hist_data_bdt_smooth_3p, hist_data_bdt_smooth_4p, hist_data_bdt_smooth_5p, hist_data_bdt_smooth_6p;
    scale_factor=1, cut_name="BDT > $BDT_CUT")
source
BjetTLA.plot_stack_hist Function
julia
plot_stack_hist(dict_histbundle::Dict{<:Any,<:HistBundle}, varname::Symbol, cutflow_idx=lastindex(_cut_flow_names()); mc_procs=MC_proc_labels, lumi, rebin, fig_kw=(;), axis_kw=(;), atlas_label_pos=(0, 1), offset=(30, -20),
frag_direct=(0.7316265625500193, 3.1254759924281035)

)

Generate a stacked histogram plot comparing Monte Carlo (MC) histograms, data histograms, and optional signal histograms.

  • dict_histbundle: A dictionary where keys are MC process names and values are their corresponding HistBundle objects.

  • varname: A symbol representing the variable to be plotted.

  • cutflow_idx: An integer specifying the index of the cutflow step to use. Defaults to the last index of _cut_flow_names().

  • mc_procs: A list of Monte Carlo process labels to include in the plot. Defaults to MC_proc_labels.

  • lumi: The integrated luminosity to scale the MC histograms.

  • rebin: An integer specifying the rebinning factor for the histograms. Defaults to 1.

  • fig_kw: A named tuple of keyword arguments for configuring the figure. Defaults to an empty tuple.

  • axis_kw: A named tuple of keyword arguments for configuring the axis. Defaults to an empty tuple.

  • atlas_label_pos: A tuple specifying the position of the ATLAS label. See add_ATLAS_internal!.

  • offset: A tuple specifying the offset for the ATLAS label. Defaults to (30, -20).

  • frag_direct: A tuple of scale factors for the fragmentation and direct components of the gammajet process. Defaults to (0.7316265625500193, 3.1254759924281035).

Example

julia
hists_serial_path = "/data/jiling/TLA/julia_jlds/2025_05_03.jlserialize"

dict_proc_histbundles = deserialize(hists_serial_path);

plot_stack_hist(dict_proc_histbundles, :jet1_eta, 3; rebin=1; lumi=run3_partial_lumi, fig_kw=(; size=(1000, 600)),
    axis_kw=(; xlabel="ph1 eta", yscale=Makie.pseudolog10, yticks=10 .^ (0:7), limits=(-3, 3, 1, 10^7), ),
    frag_direct = (0.7316265625500193, 3.1254759924281035),
)
source
BjetTLA.plot_staircase_scan Method
julia
plot_staircase_scan(hist_sig, hist_data; colorscale=identity, rebin=1, wps=[1.0, 2.8, 5.8])

This function generates a staircase plot comparing signal and data histograms. It overlays heatmaps of the histograms and highlights working points (WPs) with vertical and horizontal lines.

Arguments

  • hist_sig: The signal histogram to be plotted.

  • hist_data: The data histogram to be plotted.

  • colorscale: The color scale to be applied to the heatmaps.

  • rebin: The rebinning factor for the histograms. Defaults to 1.

  • wps: A list of working points to be highlighted on the plot. Defaults to [1.0, 2.8, 5.8].

source
BjetTLA.poisson_fluctuate Method
julia
poisson_fluctuate(hist::Hist1D) -> Hist1D

Fluctuate the bin counts of a histogram with Poisson distribution. Returns a new histogram with fluctuated bin counts, and the sum of weights squared is preserved.

Example

julia
julia> poisson_fluctuate(Hist1D(; binedges=0:3, bincounts=[10, 20, 30]))
edges: [0.0, 1.0, 2.0, 3.0]
bin counts: [9.0, 18.0, 28.0]
total count: 55.0
source
BjetTLA.proc_dirs Method
julia
proc_dirs(proc; ntuple_base_path="/data/jiling/TLA/eos_ntuples/")::Vector{String}

Given a process name, return a list of directories containing the ntuple files.

source
BjetTLA.pyhf_simple_significance Method
julia
pyhf_simple_significance(sig_hist, data_hist; sqrt_s, fit_func=nothing)

Calculate the upper limit on the signal strength using the pyhf package. The function returns the upper limit on the signal strength at 95% confidence level.

Specifically, the function takes the signal histogram sig_hist and the data histogram data_hist. The sqrt_s parameter is used to normalize the x-axis of the input histograms. (mjj/sqrt_s), and the fit_funcparameter is used to predict the background. If thefit_func is not provided, the function uses the data histogram as the background prediction – a reverse Asimov fit.

This function first builds a uncorrelated_background() model using pyhf and then uses the upper_limit() function to calculate the upper limit on the signal strength.

Example

julia
julia> using BjetTLA, FHist

julia> h1 = Hist1D(;binedges=0:2, bincounts=[12.0, 11.0]);

julia> h2 = Hist1D(;binedges=0:2, bincounts=[50.0, 52.0], sumw2=[9, 49]);

julia> pyhf_simple_significance(h1, h2; sqrt_s=136000)
Python:
(array(1.22313872), [array(0.65879787), array(0.8805031), array(1.22313872), array(1.7063601),
       1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
       2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
       3.9, 4. ]), [(array(1.), [array(1.), array(1.), array(1.), array(1.), array(1.)]), (arr

Note

The normalization of the input histograms need to be correct (i.e. the bins should predict actual number of events expected).

source
BjetTLA.safeLogRatio Method
julia
safeLogRatio(num,den)

Safe log computation, to emulate athena implementation:

  • If iszero(den), return 15.0

  • If iszero(num), return -10.0

  • otherwise, return log(num/den) as expected

source
BjetTLA.scan_fluctuate Method
julia
scan_fluctuate(hist_smooth, SplusB::CombinedPDF; N=1, npara=4) -> Vector{Tuple{Float64, Float64}}

Throw N Poisson-fluctuated histograms (poisson_fluctuate) from hist_smooth and fit them with SplusB. Return a vector of tuples of the fitted signal strength and the χ²/ndf.

As noted in CombinedPDF, the signal strength is expected to be the last parameter.

source
BjetTLA.skip_JZ_event Function
julia
pass_JZ_cleaning(j_pt, jz, eps=0.2)::Bool

Apply cleaning filter for JZ slices, checks if the average pT of the two leading jets is in the expected range for slice jz, within a eps tolerance.

source
BjetTLA.split_by_dsid Method
julia
split_by_dsid(strings::Vector{String})

Splits a vector of strings into a dictionary based on the dsid, a sequence of six digits preceeded and followed by ".". If that criteria fails, looks for a run number instead (sequence of eight digits).

source
BjetTLA.staircase_inbound Method
julia
staircase_inbound(x,y,i,j,k)::Bool

Assume x,y are bin centers, and i,j,k are on binedges.

source
BjetTLA.staircase_scan Method
julia
staircase_scan(hist2d, hist2d_bkg; max=8.0)

Find the best staircase cut in a 2D histogram.

  • hist2d: Signal 2D histogram

  • hist2d_bkg: background histogram

  • max: maximum value for the D_b cut

source
BjetTLA.staircase_sig Method
julia
staircase_sig(hist2d, hist2d_bkg, centers, i, j, k)::Float64

Calculate the significance of a 2D histogram using the staircase method. i < j < k

source
BjetTLA.xg_cross_train Method
julia
xg_cross_train(df_all; num_round=90, max_depth=7, η=0.8, scale_pos_weight=200, input_names=BDT_input_names, gamma=0, alpha=0, lambda=1, Nfolds=5, lumi=30e3)

Take a DataFrame (potentially loaded from the output of dump_bdt_arrow) and return Nfolds The dataset is partitioned into Nfolds using mod1.(df_all.eventNumber, Nfolds).

Examples

julia
julia> Random.seed!(654321)
julia> df_all = shuffle(DataFrame(Arrow.Table("/data/jiling/TLA/julia_arrows/2024_11_04_data2023.feather")));
julia> df_all.proc = categorical(df_all.proc);

julia> bdt_machines = xg_cross_train(df_all);
source
BjetTLA.@cutflow Macro
julia
@cutflow counter hist name_list name wgt

Convinient macro to generate cutflow histogram filling code. It will increment the counter by 1, check if the name_list[counter] is equal to expected name, and then push the counter to the hist with the given wgt.

BjetTLA.@cutflow counter hist name_list name wgt

is equivalent to:

counter += 1
@assert name_list[counter] == name
atomic_push!(hist, counter, wgt)
source
BjetTLA.@fourmomenta Macro
julia
@fourmomenta evt obj_prefix

Creates the fourmomenta for a given object type in the event. Returns a StructArray of LorentzVectorCyl.

It assumes that the event is a structure containig all the projections of the 4-vectors: evt.obj_pt, evt.obj_eta, evt.obj_phi, evt.obj_m (or evt.obj_E for jets).

Concretely,

julia
@fourmomenta evt ph

would expand to:

julia
StructArray(LorentzVectorCyl.(
    evt.ph_pt,
    evt.ph_eta,
    evt.ph_phi,
    evt.ph_m
))

For jets, it will call fromPtEtaPhiE and the return type stays the same.

Example

julia
julia> for evt in tree
            hlt_jets = @fourmomenta evt jet
            count(>(25), hlt_jets.pt) # count how many jets have pt > 25 GeV
            ...
        end
source