Internal
BjetTLA.BDT_input_names Constant
The names of input variables used in BDT training, these are a subset of what make_bdt_input_table can produce.
BjetTLA.SQRT_S Constant
SQRT_S = Ref(13600)The center of mass energy of the LHC in GeV. For Run 3 it's 13.6 TeV.
sourceBjetTLA.CombinedPDF Type
struct CombinedPDF{F, D}using Makie: Combined func::F dG:😄 end
A callable type that takes a background function and parameter for a gaussian signal.
When it's called, it has the signature of (x, p), where x is the x-axis value and p is a vector of parameters. Note that x should be normalized by SQRT_S (default 13600 GeV) already. As for the p, the first npara elements are for the background function, and the last one is for the signal normalization.
Example
julia> cc = CombinedPDF(ATLAS_f_4p, 192, 192*0.12)
CombinedPDF{typeof(ATLAS_f_4p)}(BjetTLA.ATLAS_f_4p, 192.0, 23.04)
# evaluated at 200 GeV
julia> cc(200 / 13600, rand(5))
0.23111286768842174BjetTLA.HistBundle Type
@kwdef struct HistBundle{N,HT}
histograms::NTuple{N,HT}
cutflow_names::NTuple{N,String}
cutflow_hist::Hist1D{Float64} = Hist1D(; binedges=0:length(cutflow_names))
end
HistBundle(f::Function, cutflow_names)A mergeable structure that keeps track of histograms and a single cutflow histogram. If you have N histogram variables and M cutflow bins, the structure will carry around N*M + 1 histograms.
If f::Function is provided, it will be called N times to initialize the histograms. Where N is length of cutflow_names.
BjetTLA.ATLAS_f_5p Method
@. ATLAS_f_5p(x, p) = p[1]*(1-x)^p[2] * x^(p[3] + p[4]*log(x) + p[5]*log(x)^2)BjetTLA.ATLAS_f_6p Method
@. ATLAS_f_6p(x, p) = p[1]*(1-x)^p[2] * x^(p[3] + p[4]*log(x) + p[5]*log(x)^2 + p[6]*log(x)^3)BjetTLA.EMFrac_study Method
EMFrac_study(md, df; photon_ID=true, electron_only=true, check_charge_sum=true)A task to be used with loop_one_dir and friends for calculating the EMFraction cut scale factor and uncertainty.
Returns:
(;
mlly_mll = Hist2D(; binedges = (50:2:150, 15:2:100)),
EMFrac_grea35 = Hist1D(; binedges = 0:0.02:1.1),
EMFrac_2535 = Hist1D(; binedges = 0:0.02:1.1),
EMFrac_2025 = Hist1D(; binedges = 0:0.02:1.1),
pT_pass = Hist1D(; binedges = emfrac_pt_binedges, overflow=true),
pT_fail = Hist1D(; binedges = emfrac_pt_binedges, overflow=true),
EMFrac_pT = Hist2D(; binedges = (0:0.02:1.1, 15:2:150), overflow=false),
pT = Hist1D(; binedges = (0:2:100), overflow=true),
)Examples
julia> df_ee_bkgs_path = [
"/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700786.root",
"/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700787.root",
"/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700788.root",
];
julia> hist_mc_eejets = mapreduce(+, df_ee_bkgs_path) do path
BjetTLA.loop_one_file(path, EMFrac_study; tree_name="TreeAlgo_noCalib/nominal").EMFrac_grea35
end;
julia> hists_data = BjetTLA.loop_one_file("/data/jiling/TLA/eos_ntuples/RadiativeZ/data_egam3_noVGammaOR/alldata.99999999.run3_merged.root", EMFrac_study);
julia> sf35 = BjetTLA.hist_cut_eff(hists_data.EMFrac_grea35) / BjetTLA.hist_cut_eff(hists_mc.EMFrac_grea35);
1.043 ± 0.015BjetTLA.ExampleTask Method
ExampleTask(md,evt)This is an example task, that can be run in loop_one_proc to process the data. It is a simple task that will fill a histogram with the pT of the leading jet in the event, if there are at least 2 good jets and 1 good photon.
BjetTLA.F_test Method
F_test(chi2_nom, chi2_alt, n_nom, n_alt, nbins) -> Float64Calculate the F-test score for two models.
chi2_nom: chi² of the nominal modelchi2_alt: chi² of the alternative modeln_nom: number of parameters of the nominal modeln_alt: number of parameters of the alternative modelnbins: number of bins in the histogram
Examples
julia> BjetTLA.F_test(42, 40, 5, 6, 200)
0.0026311523165153876
::: tip Note
```python
>>> import ROOT
>>> ROOT.Math.fdistribution_cdf_c(0.2, 1, 20)
0.6595265132032726is equivalents to using Distributions; ccdf(FDist(1, 20), 0.2) in Julia
:::
sourceBjetTLA.F_test_on_sr Method
F_test_on_sr(sr_model, nbins) -> Vector{Float64}Assuming report(sr_model).losses are χ² values of the models, calculate the F-test score for each formula with respect to the next complex one.
The order of the output corresponds to the order of the models in the sr_model, you can use findfirst(>(0.05), output) to find the first formula you should retain.
Note
Due to the nature of Symbolic Regression, instead of counting # of parameters, we use the "complexity" of the formula as proxy. This avoids the problem of "same # of parameters" which causes F-test to fail, for example, consider: x^(b) and x^(b*x), technically they both have just 1 parameter, but clearly the latter is more complex.
See also: F_test.
Examples
julia> sr_model = BjetTLA.fit_sr_model(hist_data_bdt);
julia> BjetTLA.F_test_on_sr(sr_model, hist_data_bdt)
9-element Vector{Float64}:
1.5007930525231936e-30
1.994665149876051e-7
0.006603081588013456
0.9849216334868175
0.9968279815401603
0.9998969027366611
0.9999999919476441
0.9999991299279695
0.9999666474343953BjetTLA.NLL_sr_loss Method
NLL_sr_loss(ex, dataset::Dataset{T,L}, options)Negative loglikelihood loss function for SRRegressor.
BjetTLA.add_ATLAS_internal! Method
add_ATLAS_internal!(ax; pos=(0, 1), fontsize=16, color=:black, revision="Internal", lumi=run3_partial_lumi, offset=(30, -20))Add the ATLAS label and additional information to a given axis.
Arguments
ax: The Makie axis to which the label will be added.pos: A tuple specifying the position of the label in relative coordinates. Defaults to(0, 1)(top left).revision: A string indicating the revision type (e.g., "Internal", "Preliminary"). Defaults to"Internal".lumi: The integrated luminosity to display in the label. Defaults torun3_partial_lumi.offset: A tuple specifying the offset of the label in pixels. Defaults to(30, -20).
Example
fig, ax = Figure(), Axis()
add_ATLAS_internal!(ax; pos=(0.1, 0.9), revision="Preliminary", color=:white, lumi=139e3)BjetTLA.add_mlj_score_col! Method
add_mlj_score_col!(df, mlj_machines; input_names=BDT_input_names, force=true, branch_name=:mlj_scores)Take a DataFrame df and a vector of MLJ machines, then add a column branch_name to df with the predictions of the MLJ machines.
BjetTLA.best_Db_fc Method
best_Db_fc(hist_sig, hist_data)::(;Db, fc)Find the best Db and fc values for a given signal histogram and data histogram. The histograms should be output of fc_Db_cumulative_hist.
BjetTLA.bkg_prediction Method
bkg_prediction(hist_data; fit_func = ATLAS_f_4p, sqrt_s) -> Vector{Float64}Predict the background using the 4-parameter fit function (see ATLAS_f_4p. The function returns the background prediction normalized to the integral of the input histogram hist_data. The sqrt_s parameter is used to normalize the x-axis of the input histogram. (mjj/sqrt_s)
BjetTLA.calc_db Method
calc_db(b::Number, c::Number, u::Number; fc=0.018) -> Float64
calc_db(b::Vector, c::Vector, u::Vector; fc=0.018) -> Vector{Float64}Compute the b-tagging discriminants, see https://cds.cern.ch/record/2862499 for more detail
The output is:
Note
The log operation usese safeLogRatio to prevent producing Inf and NaN.
BjetTLA.chi2 Method
chi2(os, cs, σs) -> Float64Calculate chi2 given three arrays of data points, model points and errors.
os: observed data pointscs: calculated model pointsσs: errors of the observed data points
BjetTLA.chi2_sr_functor Method
chi2_sr_functor(errs)::Callable(ex, dataset, options)Returns a callable to be used as a loss function for SRRegressor.
BjetTLA.closest_lly_pair Function
closest_lly_pair(leps, photons, target=91.2) -> (i1, i2, ip)Find the indicies that give lepton pair + photon three-body invariant mass closest to the target mass.
Return the indices of the lepton pair (i1, i2) and the photon (ip).
BjetTLA.df_from_label Method
df_from_label(label::AbstractString; base_path="/data/jiling/TLA/julia_arrows/")
df_from_label(labels::AbstractVector{<:AbstractString}; base_path="/data/jiling/TLA/julia_arrows/")Load feather files from a label, which is a substring of the file name. See also dump_bdt_arrow, which is used to produce these feather files.
BjetTLA.dump_bdt_arrow Method
dump_bdt_arrow(proc_name; ntuple_base_path = "/data/jiling/TLA/eos_ntuples/", output_base_path = "/data/jiling/TLA/julia_arrows/", min_mjj=100)Read ntuples from ntuple_base_path and write out feather files to output_base_path for each file that matches proc_name. It runs the files through main_task with fill_data=true and then uses make_bdt_input_table to clean up and prepare the table before writing out the arrow file.
BjetTLA.dump_stacked_plots Method
dump_stacked_plots(dict_proc; output_dir, mc_procs = MC_proc_labels, lumi=run3_partial_lumi, fig_kw=(; size=(1000, 600)))
Dump cutflow plots (plot_cutflow) and stacked histogram plots (plot_stack_hist) to target directory.
Examples
hists_serial_path = "/data/jiling/TLA/julia_jlds/2025_05_03.jlserialize"
dict_proc_histbundles = deserialize(hists_serial_path);
BjetTLA.dump_stacked_plots(dict_proc_histbundles; output_dir="/home/jiling/BjetTLA/plots/2025_05_06")BjetTLA.extract_dsid Method
extract_dsid(path::String)::StringExtract the DSID from the path of a file. It works by using a regular expression to match the DSID (\d{6}) in the path.
If no DSID can be found from path, it returns 99999999, which resembles a "data" run.
BjetTLA.fc_Db_cumulative_hist Method
fc_Db_cumulative_hist(df; Db_range=-4:0.2:7, fc_range=0:0.01:0.1, mjj_limits=(80, Inf))::Hist2DMake a 2D histogram where:
x-axis is the
min(Db1, Db2)reverse cumulative across cellsy-axis is the
fc(charme fraction) setting
Expects df to have the following columns:
GN1_pb1s,GN1_pc1s,GN1_pu1s: leading jet's b-tagging variablesGN1_pb2s,GN1_pc2s,GN1_pu2s: sub-leading jet's b-tagging variablesmjj12s: invariant mass of the two leading jetswgts: event weights
Note
For example, if a bin centered at (min_Db = 1.5, fc=0.04) has 100 events, then it means there are 100 events with min(Db1, Db2) >= 1.5 when using fc = 0.04.
BjetTLA.feather_to_hist Method
feather_to_hist(df, mlj_machines; var=:mjj12s, binedges = 0:5:600, mlj_cut=0.22)::Hist1DTake a DataFrame df (likely from make_bdt_input_table), and a vector of MLJ machines, then make a histogram of variable var after selecting for events passing the mlj_cut after evaluating models on each event.
Each machine (ith) will be evaluated on events with eventNumber_modN .== i.See also xg_cross_train.
Note
Each machine will be evaluated on features defined in BDT_input_names as: pdf.(predict(MACH, df[!, BjetTLA.BDT_input_names]), true)
BjetTLA.feather_to_hist Method
feather_to_hist(df; var=:mjj12s, binedges = 0:5:600, Db_cut=BjetTLA.Db_77)::Hist1DTake a DataFrame df that has the same schema returned by make_bdt_input_table, then make a histogram of variable var after selecting for df.GN1_Db1s > Db_cut && df.GN1_Db2s > Db_cut.
BjetTLA.fit_sr_model Method
fit_sr_model(xs, ys, errs; kw...) -> MLJ.Machine{SRRegressor ...}BjetTLA.fit_sr_model Method
fit_sr_model(hist::Hist1D; kw...) -> MLJ.Machine{SRRegressor ...}Fit a symbolic regression model to a histogram. Return the fitted machine.
sourceBjetTLA.fitted_smooth_hist Method
fitted_smooth_hist(hist, func::T=ATLAS_f_4p, p0=0.1*ones(4); algo=:nm) -> NamedTuple(; sol = ..., hist= ...)Fit a histogram with a function and return a new histogram with the fitted bin counts.
sourceBjetTLA.get_file_list Method
get_file_list(proc_name; ntuple_base_path = "/data/jiling/TLA/eos_ntuples/")BjetTLA.get_fill_wgt Method
get_fill_wgt(evt, pmgWeight, sumWeight;)::Float64Get the fill weight for the event, if the evt has the mcEventWeight property, the weight is calculated as evt.mcEventWeight * (pmgWeight / sumWeight), otherwise the weight is 1.0 for real data.
BjetTLA.get_pmg_weight Method
get_pmg_weight(dsid::Union{Integer, AbstractString};
md_path = "/cvmfs/atlas.cern.ch/repo/sw/database/GroupData/dev/PMGTools/PMGxsecDB_mc23.txt")Get the PMG weight from cvmfs database for a given dsid: returns crossSection_pb * genFiltEff * kFactor
If the md_path is not a file, it will fall back to joinpath(@__DIR__, "../data/PMGxsecDB_mc23.txt")
Example
julia> signal_dsid = 510392
510392
julia> BjetTLA.get_pmg_weight(signal_dsid)
0.7831930217100002
julia> BjetTLA.get_pmg_weight("510392")
0.7831930217100002BjetTLA.get_shortname Method
get_shortname(dsid, map)Convert dsid to a short name, following the naming map.
sourceBjetTLA.has_variable_in_formula Method
has_variable_in_formula(equation) -> BoolCheck if a formula has a variable in the base and exponent functions.
Examples
julia> has_variable_in_formula(MLJ.report(sr_model).equations[1])BjetTLA.hist1d_ftag Method
hist1d_ftag(df; nth_jet=1, tagger=:GN1, fc=0.08, mjj_limits=(100, Inf), binedges=-5:0.1:15)::Hist1DMake a 1D histogram of the leading(nth_jet==1) or sub-leading(nth_jet==2) jet's b-tagging variables. See hist2d_ftag for more details.
BjetTLA.hist2d_ftag Method
hist2d_ftag(df; tagger=:GN1, fc=0.08, mjj_limits=(100, Inf), binedges=-5:0.1:15)::Hist2DMake a 2D histogram of the leading and sub-leading jet's b-tagging variables.
If tagger is set to :GN1, then the b-tagging variables are calculated using the BjetTLA.calc_db function. If tagger is set to :fastDIPS, then the b-tagging variables are taken directly from the DataFrame columns: Db1s and Db2s.
Assume a df DataFrame with the following columns:
GN1_pb1s,GN1_pc1s,GN1_pu1s: leading jet's b-tagging variablesGN1_pb2s,GN1_pc2s,GN1_pu2s: sub-leading jet's b-tagging variablesDb1s,Db2s: alternative b-tagging variables (if using :fastDIPS tagger)mjj12s: invariant mass of the two leading jetswgts: event weights
BjetTLA.hist_cut_eff Function
hist_cut_eff(hist, cut=0.93) -> Measurement{Float64}Calculate the efficiency of a cut on the histogram hist at the cut value. The numerator is the integral of the histogram to the right of the cut, and the denominator is the integral of the whole histogram.
Examples
julia> h1 = Hist1D(; bincounts=[100.0, 100.0], binedges=0:0.5:1, sumw2=[2.0, 3.0])
edges: [0.0, 0.5, 1.0]
bin counts: [100.0, 100.0]
total count: 200.0
julia> BjetTLA.hist_cut_eff(h1, 0.5)
0.5 ± 0.0056BjetTLA.hist_eff_divide Method
hist_eff_divide(hist1, hist2)::Hist1DExamples
julia> h1 = Hist1D(; bincounts=[100.0], binedges=0:1, sumw2=[3.0])
julia> h2 = Hist1D(; bincounts=[200.0], binedges=0:1, sumw2=[5.0])
julia> h3 = BjetTLA.hist_eff_divide(h1, h2)
edges: [0.0, 1.0]
bin counts: [0.5]
total count: 0.5
julia> binerrors(h3)
1-element Vector{Float64}:
0.005590169943749474BjetTLA.hist_to_components Method
hist_to_components(hist0) -> Tuple of Vector{Float64}returns:
xs
ys
σs
Where xs is bin centers. And ys is the bin counts. Finally, σs is the bin errors.
BjetTLA.idx_of_first_good_formula Method
idx_of_first_good_formula(sr_model; by=:chi2_and_complexity, chi2_cutoff=1.05 impro_threshold=0.95) -> IntReturn the index of the first (least complex) good formula in the model. A good formula is defined as a formula that has a variable in the base and exponent functions.
by can be one of [:chi2_and_complexity, :chi2_improvement, :min_complexity_has_variables].
:chi2_and_complexity: Find the simplest formula that is withinchi2_cutoff * minimal_chi2of all formulas.:chi2_improvement: Start with the simplest formula, if the next formula haschi2smaller thancurrent_chi2 * impro_threshold, pick it and repeat.:min_complexity_has_variables: Find the least complex formula that uses variables in the base (usesx) and exponent (useslog(x)) functions.
Examples
julia> typeof(sr_model)
Machine{SRRegressor{DynamicQuantities.SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}, Nothing}, SRRegressor{DynamicQuantities.SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}, Nothing}, true}
julia> BjetTLA.idx_of_first_good_formula(sr_model)
3BjetTLA.loop_one_dir Method
loop_one_dir(dir_path, loop_task; kwargs...)Given a directory and a loop task, loop over all files in this directory and combine the output from loop task.
For an example task, see main_task.
BjetTLA.loop_one_proc Method
loop_one_proc(proc, loop_task; ntuple_base_path="/data/jiling/TLA/eos_ntuples/", kwargs...)Given a process name and a loop task, loop over all files under this process and combine the output.
For an example task, see main_task.
BjetTLA.main_task Method
main_task(md, tree; fill_data=false, fc=0.085)::HistBundleMain task to be run in loop_one_proc and friends. The fc parameter is used for calculating GN1 b-jet discriminant.
If fill_data is set to true, it will fill the data vectors in the HistBundle for further analysis such as deriving fc and WPs as well as BDT training. See also dump_bdt_arrow.
BjetTLA.make_bdt_input_table Method
make_bdt_input_table(output::NamedTuple{Vectors...}, proc::Bool)::DataFrameTake a NamedTuple of Vectors (potentially output of main_task with fill_data=true, and a proc (by convention, true for signal and false for bkg) and return a dataframe with the following columns:
eventNumber
mjj12s
ph_EMFracs
ph_j2_ΔRs
wgts
Db{1,2,3}s
Db{1,2,3}_{70,77}s
proc
BjetTLA.nonoverlap_mask Method
nonoverlap_mask(photons, jets; cutoff=0.4)::Vector{Bool}Create a mask that can be used to get non-overlapping jets.
Example
mask = nonoverlap_mask(photons, jets)
clean_jets = @view jets[mask]BjetTLA.normalize_to Method
normalize_to(in_hist, ref_lumi)
normalize_to(in_hist, ref_hist::FHist.Hist1D)Normalize the input histogram to the reference luminosity. If the reference histogram is provided, normalize the input histogram to the integral of the reference histogram.
sourceBjetTLA.optim_fit Method
optim_fit(func::Callable, hist0::FHist.Hist1D, p0; algo::Symbol=:nm, sqrt_s=SQRT_S[])Fit a χ² fit using a func to a 1D histogram.
func: a function that takes two arguments,xandp, and returns a model value.hist0: a 1D histogram to fit.p0: initial parameters.
Note
The func should be defined with @., and the p0 should be a vector. See ATLAS_f_4p for an example.
Also, the x-axis (bincenters()) of the hist0 will be divided by sqrt_s to normalize the x-axis.
It returns the result of the optimization.
Example
julia> hist_mjj12_mask
edges: [130.0, 132.0, 134.0, 136.0, 138.0, 140.0, 142.0, 144.0, 146.0, 148.0 … 482.0, 484.0, 486.0, 488.0, 490.0, 492.0, 494.0, 496.0, 498.0, 500.0]
bin counts: [650.0, 603.0, 649.0, 566.0, 595.0, 593.0, 535.0, 487.0, 492.0, 466.0 … 2.0, 10.0, 13.0, 9.0, 13.0, 10.0, 10.0, 8.0, 15.0, 8.0]
total count: 22293.0
julia> sol4 = optim_fit(ATLAS_f_4p, hist_mjj12_mask, ones(4))
retcode: Success
u: 4-element Vector{Float64}:
0.0001086906699655396
91.12405771249554l
1.2345307604384157
-0.03547447344008578
julia> sol4.u
4-element Vector{Float64}:
0.0001086906699655396
91.12405771249554
1.2345307604384157
-0.03547447344008578BjetTLA.pass_filter_jets Method
pass_filter_jets(evt, hlt_pfjets, photons; min_pt=25.0, max_eta = 2.1, dR_cutoff=0.4, min_Jvt=-Inf, fc=0.085)Filter the HLT jets collection and related variables by doing the following:
require
pt > min_ptrequire
abs(eta) < max_etaremove jets that overlap with the photons collection by
dR < dR_cutoffrequire
jvt > min_Jvt(jvt = jet vertex tagger)
Finally, return the following as a NamedTuple so you can select one or more at call site:
clean_jets: the cleaned jets collectionclean_jets_btags: the cleaned jets b-tagging valuesclean_jets_emfracs: the cleaned jets EM fractionsclean_jets_jvts: the cleaned jets JVT valuesfc:in the GN1 Db calculation, see calc_db.
Note
all the returnred arrays are sorted by the jet pt in descending order.
BjetTLA.pass_filter_photons Method
clean_photons(hlt_photons); eta_cuts = true, min_pt = 20.0) -> subvector of photon lorentz vectorsClean photons from the HLT photon collection. The cleaning is done by removing photons that satisfy the any of the following conditions:
pt < min_ptabs(eta) > 2.371.37 < abs(eta) < 1.52
BjetTLA.plot_Db_fc_scan Method
plot_Db_fc_scan(hist_sig, hist_data; mjj_limits::Tuple, sig_title = "Z' (mass = 150GeV)")Given two input histograms (produced by fc_Db_cumulative_hist), this function plots the significance of the signal hist_sig over the background hist_data in a 2x2 grid of heatmaps.
It will also mark the optimal Db and f_c values on the significance 2D plane, the optimal points are calculated by best_Db_fc.
BjetTLA.plot_compare_Dbs_1D Method
plot_compare_Dbs_1D(df_data, df_150, df_200; nth_jet=1, GN1_fc=0.0085, binedges=-5:0.5:15, wps=nothing)BjetTLA.plot_cutflow Method
plot_cutflow(cutflow::Hist1D; cutnames=_cut_flow_names(), label="data 2023 nv9")Plot the cutflow histogram with the given label as the title.
BjetTLA.plot_fit Method
plot_fit(hist_data; fit_func = ATLAS_f_4p, sqrt_s)Plot the input histogram hist_data and the background prediction using the 4-parameter fit function (see bkg_prediction).
BjetTLA.plot_smooth_bkg_fit Method
Examples
hist_data_bdt_smooth_3p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_3p_xnorm, p0_4para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_4p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_4p_xnorm, p0_4para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_5p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_5p_xnorm, p0_5para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_6p = @time BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_6p_xnorm, p0_6para; lb=nothing, ub=nothing, multistart, likelihood, algo);
plot_smooth_bkg_fit(hist_data_bdt, hist_data_bdt_smooth_3p, hist_data_bdt_smooth_4p, hist_data_bdt_smooth_5p, hist_data_bdt_smooth_6p;
scale_factor=1, cut_name="BDT > $BDT_CUT")BjetTLA.plot_stack_hist Function
plot_stack_hist(dict_histbundle::Dict{<:Any,<:HistBundle}, varname::Symbol, cutflow_idx=lastindex(_cut_flow_names()); mc_procs=MC_proc_labels, lumi, rebin, fig_kw=(;), axis_kw=(;), atlas_label_pos=(0, 1), offset=(30, -20),
frag_direct=(0.7316265625500193, 3.1254759924281035))
Generate a stacked histogram plot comparing Monte Carlo (MC) histograms, data histograms, and optional signal histograms.
dict_histbundle: A dictionary where keys are MC process names and values are their correspondingHistBundleobjects.varname: A symbol representing the variable to be plotted.cutflow_idx: An integer specifying the index of the cutflow step to use. Defaults to the last index of_cut_flow_names().mc_procs: A list of Monte Carlo process labels to include in the plot. Defaults toMC_proc_labels.lumi: The integrated luminosity to scale the MC histograms.rebin: An integer specifying the rebinning factor for the histograms. Defaults to1.fig_kw: A named tuple of keyword arguments for configuring the figure. Defaults to an empty tuple.axis_kw: A named tuple of keyword arguments for configuring the axis. Defaults to an empty tuple.atlas_label_pos: A tuple specifying the position of the ATLAS label. Seeadd_ATLAS_internal!.offset: A tuple specifying the offset for the ATLAS label. Defaults to(30, -20).frag_direct: A tuple of scale factors for the fragmentation and direct components of thegammajetprocess. Defaults to(0.7316265625500193, 3.1254759924281035).
Example
hists_serial_path = "/data/jiling/TLA/julia_jlds/2025_05_03.jlserialize"
dict_proc_histbundles = deserialize(hists_serial_path);
plot_stack_hist(dict_proc_histbundles, :jet1_eta, 3; rebin=1; lumi=run3_partial_lumi, fig_kw=(; size=(1000, 600)),
axis_kw=(; xlabel="ph1 eta", yscale=Makie.pseudolog10, yticks=10 .^ (0:7), limits=(-3, 3, 1, 10^7), ),
frag_direct = (0.7316265625500193, 3.1254759924281035),
)BjetTLA.plot_staircase_scan Method
plot_staircase_scan(hist_sig, hist_data; colorscale=identity, rebin=1, wps=[1.0, 2.8, 5.8])This function generates a staircase plot comparing signal and data histograms. It overlays heatmaps of the histograms and highlights working points (WPs) with vertical and horizontal lines.
Arguments
hist_sig: The signal histogram to be plotted.hist_data: The data histogram to be plotted.colorscale: The color scale to be applied to the heatmaps.rebin: The rebinning factor for the histograms. Defaults to1.wps: A list of working points to be highlighted on the plot. Defaults to[1.0, 2.8, 5.8].
BjetTLA.poisson_fluctuate Method
poisson_fluctuate(hist::Hist1D) -> Hist1DFluctuate the bin counts of a histogram with Poisson distribution. Returns a new histogram with fluctuated bin counts, and the sum of weights squared is preserved.
Example
julia> poisson_fluctuate(Hist1D(; binedges=0:3, bincounts=[10, 20, 30]))
edges: [0.0, 1.0, 2.0, 3.0]
bin counts: [9.0, 18.0, 28.0]
total count: 55.0BjetTLA.proc_dirs Method
proc_dirs(proc; ntuple_base_path="/data/jiling/TLA/eos_ntuples/")::Vector{String}Given a process name, return a list of directories containing the ntuple files.
sourceBjetTLA.pyhf_simple_significance Method
pyhf_simple_significance(sig_hist, data_hist; sqrt_s, fit_func=nothing)Calculate the upper limit on the signal strength using the pyhf package. The function returns the upper limit on the signal strength at 95% confidence level.
Specifically, the function takes the signal histogram sig_hist and the data histogram data_hist. The sqrt_s parameter is used to normalize the x-axis of the input histograms. (mjj/sqrt_s), and the fit_funcparameter is used to predict the background. If thefit_func is not provided, the function uses the data histogram as the background prediction – a reverse Asimov fit.
This function first builds a uncorrelated_background() model using pyhf and then uses the upper_limit() function to calculate the upper limit on the signal strength.
Example
julia> using BjetTLA, FHist
julia> h1 = Hist1D(;binedges=0:2, bincounts=[12.0, 11.0]);
julia> h2 = Hist1D(;binedges=0:2, bincounts=[50.0, 52.0], sumw2=[9, 49]);
julia> pyhf_simple_significance(h1, h2; sqrt_s=136000)
Python:
(array(1.22313872), [array(0.65879787), array(0.8805031), array(1.22313872), array(1.7063601),…
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
3.9, 4. ]), [(array(1.), [array(1.), array(1.), array(1.), array(1.), array(1.)]), (arr…Note
The normalization of the input histograms need to be correct (i.e. the bins should predict actual number of events expected).
BjetTLA.safeLogRatio Method
safeLogRatio(num,den)Safe log computation, to emulate athena implementation:
If
iszero(den), return15.0If
iszero(num), return-10.0otherwise, return
log(num/den)as expected
BjetTLA.scan_fluctuate Method
scan_fluctuate(hist_smooth, SplusB::CombinedPDF; N=1, npara=4) -> Vector{Tuple{Float64, Float64}}Throw N Poisson-fluctuated histograms (poisson_fluctuate) from hist_smooth and fit them with SplusB. Return a vector of tuples of the fitted signal strength and the χ²/ndf.
As noted in CombinedPDF, the signal strength is expected to be the last parameter.
BjetTLA.skip_JZ_event Function
pass_JZ_cleaning(j_pt, jz, eps=0.2)::BoolApply cleaning filter for JZ slices, checks if the average pT of the two leading jets is in the expected range for slice jz, within a eps tolerance.
BjetTLA.split_by_dsid Method
split_by_dsid(strings::Vector{String})Splits a vector of strings into a dictionary based on the dsid, a sequence of six digits preceeded and followed by ".". If that criteria fails, looks for a run number instead (sequence of eight digits).
sourceBjetTLA.staircase_inbound Method
staircase_inbound(x,y,i,j,k)::BoolAssume x,y are bin centers, and i,j,k are on binedges.
BjetTLA.staircase_scan Method
staircase_scan(hist2d, hist2d_bkg; max=8.0)Find the best staircase cut in a 2D histogram.
hist2d: Signal 2D histogramhist2d_bkg: background histogrammax: maximum value for the D_b cut
BjetTLA.staircase_sig Method
staircase_sig(hist2d, hist2d_bkg, centers, i, j, k)::Float64Calculate the significance of a 2D histogram using the staircase method. i < j < k
sourceBjetTLA.xg_cross_train Method
xg_cross_train(df_all; num_round=90, max_depth=7, η=0.8, scale_pos_weight=200, input_names=BDT_input_names, gamma=0, alpha=0, lambda=1, Nfolds=5, lumi=30e3)Take a DataFrame (potentially loaded from the output of dump_bdt_arrow) and return Nfolds The dataset is partitioned into Nfolds using mod1.(df_all.eventNumber, Nfolds).
Examples
julia> Random.seed!(654321)
julia> df_all = shuffle(DataFrame(Arrow.Table("/data/jiling/TLA/julia_arrows/2024_11_04_data2023.feather")));
julia> df_all.proc = categorical(df_all.proc);
julia> bdt_machines = xg_cross_train(df_all);BjetTLA.@cutflow Macro
@cutflow counter hist name_list name wgtConvinient macro to generate cutflow histogram filling code. It will increment the counter by 1, check if the name_list[counter] is equal to expected name, and then push the counter to the hist with the given wgt.
BjetTLA.@cutflow counter hist name_list name wgtis equivalent to:
counter += 1
@assert name_list[counter] == name
atomic_push!(hist, counter, wgt)BjetTLA.@fourmomenta Macro
@fourmomenta evt obj_prefixCreates the fourmomenta for a given object type in the event. Returns a StructArray of LorentzVectorCyl.
It assumes that the event is a structure containig all the projections of the 4-vectors: evt.obj_pt, evt.obj_eta, evt.obj_phi, evt.obj_m (or evt.obj_E for jets).
Concretely,
@fourmomenta evt phwould expand to:
StructArray(LorentzVectorCyl.(
evt.ph_pt,
evt.ph_eta,
evt.ph_phi,
evt.ph_m
))For jets, it will call fromPtEtaPhiE and the return type stays the same.
Example
julia> for evt in tree
hlt_jets = @fourmomenta evt jet
count(>(25), hlt_jets.pt) # count how many jets have pt > 25 GeV
...
end