Internal
BjetTLA.BDT_input_names Constant
The names of input variables used in BDT training, these are a subset of what make_bdt_input_table
can produce.
BjetTLA.SQRT_S Constant
SQRT_S = Ref(13600)
The center of mass energy of the LHC in GeV. For Run 3 it's 13.6 TeV.
sourceBjetTLA.CombinedPDF Type
struct CombinedPDF{F, D}
using Makie: Combined func::F dG:😄 end
A callable type that takes a background function and parameter for a gaussian signal.
When it's called, it has the signature of (x, p)
, where x
is the x-axis value and p
is a vector of parameters. Note that x
should be normalized by SQRT_S
(default 13600 GeV) already. As for the p
, the first npara
elements are for the background function, and the last one is for the signal normalization.
Example
julia> cc = CombinedPDF(ATLAS_f_4p, 192, 192*0.12)
CombinedPDF{typeof(ATLAS_f_4p)}(BjetTLA.ATLAS_f_4p, 192.0, 23.04)
# evaluated at 200 GeV
julia> cc(200 / 13600, rand(5))
0.23111286768842174
BjetTLA.HistBundle Type
@kwdef struct HistBundle{N,HT}
histograms::NTuple{N,HT}
cutflow_names::NTuple{N,String}
cutflow_hist::Hist1D{Float64} = Hist1D(; binedges=0:length(cutflow_names))
end
HistBundle(f::Function, cutflow_names)
A merge
able structure that keeps track of histograms and a single cutflow histogram. If you have N
histogram variables and M
cutflow bins, the structure will carry around N*M + 1
histograms.
If f::Function
is provided, it will be called N
times to initialize the histograms. Where N
is length of cutflow_names
.
BjetTLA.ATLAS_f_5p Method
@. ATLAS_f_5p(x, p) = p[1]*(1-x)^p[2] * x^(p[3] + p[4]*log(x) + p[5]*log(x)^2)
BjetTLA.ATLAS_f_6p Method
@. ATLAS_f_6p(x, p) = p[1]*(1-x)^p[2] * x^(p[3] + p[4]*log(x) + p[5]*log(x)^2 + p[6]*log(x)^3)
BjetTLA.EMFrac_study Method
EMFrac_study(md, df; photon_ID=true, electron_only=true, check_charge_sum=true)
A task to be used with loop_one_dir
and friends for calculating the EMFraction cut scale factor and uncertainty.
Returns:
(;
mlly_mll = Hist2D(; binedges = (50:2:150, 15:2:100)),
EMFrac_grea35 = Hist1D(; binedges = 0:0.02:1.1),
EMFrac_2535 = Hist1D(; binedges = 0:0.02:1.1),
EMFrac_2025 = Hist1D(; binedges = 0:0.02:1.1),
pT_pass = Hist1D(; binedges = emfrac_pt_binedges, overflow=true),
pT_fail = Hist1D(; binedges = emfrac_pt_binedges, overflow=true),
EMFrac_pT = Hist2D(; binedges = (0:0.02:1.1, 15:2:150), overflow=false),
pT = Hist1D(; binedges = (0:2:100), overflow=true),
)
Examples
julia> df_ee_bkgs_path = [
"/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700786.root",
"/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700787.root",
"/data/jiling/TLA/eos_ntuples/RadiativeZ/v4/700788.root",
];
julia> hist_mc_eejets = mapreduce(+, df_ee_bkgs_path) do path
BjetTLA.loop_one_file(path, EMFrac_study; tree_name="TreeAlgo_noCalib/nominal").EMFrac_grea35
end;
julia> hists_data = BjetTLA.loop_one_file("/data/jiling/TLA/eos_ntuples/RadiativeZ/data_egam3_noVGammaOR/alldata.99999999.run3_merged.root", EMFrac_study);
julia> sf35 = BjetTLA.hist_cut_eff(hists_data.EMFrac_grea35) / BjetTLA.hist_cut_eff(hists_mc.EMFrac_grea35);
1.043 ± 0.015
BjetTLA.ExampleTask Method
ExampleTask(md,evt)
This is an example task, that can be run in loop_one_proc
to process the data. It is a simple task that will fill a histogram with the pT of the leading jet in the event, if there are at least 2 good jets and 1 good photon.
BjetTLA.F_test Method
F_test(chi2_nom, chi2_alt, n_nom, n_alt, nbins) -> Float64
Calculate the F-test score for two models.
chi2_nom
: chi² of the nominal modelchi2_alt
: chi² of the alternative modeln_nom
: number of parameters of the nominal modeln_alt
: number of parameters of the alternative modelnbins
: number of bins in the histogram
Examples
julia> BjetTLA.F_test(42, 40, 5, 6, 200)
0.0026311523165153876
::: tip Note
```python
>>> import ROOT
>>> ROOT.Math.fdistribution_cdf_c(0.2, 1, 20)
0.6595265132032726
is equivalents to using Distributions; ccdf(FDist(1, 20), 0.2)
in Julia
:::
sourceBjetTLA.F_test_on_sr Method
F_test_on_sr(sr_model, nbins) -> Vector{Float64}
Assuming report(sr_model).losses
are χ² values of the models, calculate the F-test score for each formula with respect to the next complex one.
The order of the output corresponds to the order of the models in the sr_model
, you can use findfirst(>(0.05), output)
to find the first formula you should retain.
Note
Due to the nature of Symbolic Regression, instead of counting # of parameters, we use the "complexity" of the formula as proxy. This avoids the problem of "same # of parameters" which causes F-test to fail, for example, consider: x^(b)
and x^(b*x)
, technically they both have just 1 parameter, but clearly the latter is more complex.
See also: F_test
.
Examples
julia> sr_model = BjetTLA.fit_sr_model(hist_data_bdt);
julia> BjetTLA.F_test_on_sr(sr_model, nbins(hist_data_bdt))
9-element Vector{Float64}:
1.5007930525231936e-30
1.994665149876051e-7
0.006603081588013456
0.9849216334868175
0.9968279815401603
0.9998969027366611
0.9999999919476441
0.9999991299279695
0.9999666474343953
BjetTLA.NLL_sr_loss Method
NLL_sr_loss(ex, dataset::Dataset{T,L}, options)
Negative loglikelihood loss function for SRRegressor
.
BjetTLA.add_ATLAS_internal! Method
add_ATLAS_internal!(ax; pos=(0, 1), revision="Internal", lumi=run3_partial_lumi, offset=(30, -20))
Add the ATLAS label and additional information to a given axis.
Arguments
ax
: The Makie axis to which the label will be added.pos
: A tuple specifying the position of the label in relative coordinates. Defaults to(0, 1)
(top left).revision
: A string indicating the revision type (e.g., "Internal", "Preliminary"). Defaults to"Internal"
.lumi
: The integrated luminosity to display in the label. Defaults torun3_partial_lumi
.offset
: A tuple specifying the offset of the label in pixels. Defaults to(30, -20)
.
Example
fig, ax = Figure(), Axis()
add_ATLAS_internal!(ax; pos=(0.1, 0.9), revision="Preliminary", lumi=139e3)
BjetTLA.add_mlj_score_col! Method
add_mlj_score_col!(df, mlj_machines; input_names=BDT_input_names, force=true, branch_name=:mlj_scores)
Take a DataFrame df
and a vector of MLJ machines, then add a column branch_name
to df
with the predictions of the MLJ machines.
BjetTLA.best_Db_fc Method
best_Db_fc(hist_sig, hist_data)::(;Db, fc)
Find the best Db
and fc
values for a given signal histogram and data histogram. The histograms should be output of fc_Db_cumulative_hist
.
BjetTLA.bkg_prediction Method
bkg_prediction(hist_data; fit_func = ATLAS_f_4p, sqrt_s) -> Vector{Float64}
Predict the background using the 4-parameter fit function (see ATLAS_f_4p
. The function returns the background prediction normalized to the integral of the input histogram hist_data
. The sqrt_s
parameter is used to normalize the x-axis of the input histogram. (mjj/sqrt_s)
BjetTLA.calc_db Method
calc_db(b::Number, c::Number, u::Number; fc=0.018) -> Float64
calc_db(b::Vector, c::Vector, u::Vector; fc=0.018) -> Vector{Float64}
Compute the b-tagging discriminants, see https://cds.cern.ch/record/2862499 for more detail
The output is:
Note
The log operation usese safeLogRatio
to prevent producing Inf
and NaN
.
BjetTLA.chi2 Method
chi2(os, cs, σs) -> Float64
Calculate chi2 given three arrays of data points, model points and errors.
os
: observed data pointscs
: calculated model pointsσs
: errors of the observed data points
BjetTLA.chi2_sr_functor Method
chi2_sr_functor(errs)::Callable(ex, dataset, options)
Returns a callable to be used as a loss function for SRRegressor
.
BjetTLA.closest_lly_pair Function
closest_lly_pair(leps, photons, target=91.2) -> (i1, i2, ip)
Find the indicies that give lepton pair + photon three-body invariant mass closest to the target
mass.
Return the indices of the lepton pair (i1
, i2
) and the photon (ip
).
BjetTLA.df_from_label Method
df_from_label(label::AbstractString; base_path="/data/jiling/TLA/julia_arrows/")
df_from_label(labels::AbstractVector{<:AbstractString}; base_path="/data/jiling/TLA/julia_arrows/")
Load feather files from a label, which is a substring of the file name. See also dump_bdt_arrow
, which is used to produce these feather files.
BjetTLA.dump_bdt_arrow Method
dump_bdt_arrow(proc_name; ntuple_base_path = "/data/jiling/TLA/eos_ntuples/", output_base_path = "/data/jiling/TLA/julia_arrows/", min_mjj=100)
Read ntuples from ntuple_base_path
and write out feather files to output_base_path
for each file that matches proc_name
. It runs the files through main_task
with fill_data=true
and then uses make_bdt_input_table
to clean up and prepare the table before writing out the arrow file.
BjetTLA.dump_stacked_plots Method
dump_stacked_plots(dict_proc; output_dir, mc_procs = MC_proc_labels, lumi=run3_partial_lumi, fig_kw=(; size=(1000, 600))
)
Dump cutflow plots (plot_cutflow
) and stacked histogram plots (plot_stack_hist
) to target directory.
Examples
hists_serial_path = "/data/jiling/TLA/julia_jlds/2025_05_03.jlserialize"
dict_proc_histbundles = deserialize(hists_serial_path);
BjetTLA.dump_stacked_plots(dict_proc_histbundles; output_dir="/home/jiling/BjetTLA/plots/2025_05_06")
BjetTLA.extract_dsid Method
extract_dsid(path::String)::String
Extract the DSID from the path of a file. It works by using a regular expression to match the DSID (\d{6}
) in the path.
If no DSID can be found from path, it returns 99999999
, which resembles a "data" run.
BjetTLA.fc_Db_cumulative_hist Method
fc_Db_cumulative_hist(df; Db_range=-4:0.2:7, fc_range=0:0.01:0.1, mjj_limits=(80, Inf))::Hist2D
Make a 2D histogram where:
x-axis is the
min(Db1, Db2)
reverse cumulative across cellsy-axis is the
fc
(charme fraction) setting
Expects df
to have the following columns:
GN1_pb1s
,GN1_pc1s
,GN1_pu1s
: leading jet's b-tagging variablesGN1_pb2s
,GN1_pc2s
,GN1_pu2s
: sub-leading jet's b-tagging variablesmjj12s
: invariant mass of the two leading jetswgts
: event weights
Note
For example, if a bin centered at (min_Db = 1.5, fc=0.04)
has 100 events, then it means there are 100 events with min(Db1, Db2) >= 1.5
when using fc = 0.04
.
BjetTLA.feather_to_hist Method
feather_to_hist(df, mlj_machines; var=:mjj12s, binedges = 0:5:600, mlj_cut=0.22)::Hist1D
Take a DataFrame df
(likely from make_bdt_input_table
), and a vector of MLJ machines, then make a histogram of variable var
after selecting for events passing the mlj_cut
after evaluating models on each event.
Each machine (ith) will be evaluated on events with eventNumber_modN .== i
.See also xg_cross_train
.
Note
Each machine will be evaluated on features defined in BDT_input_names
as: pdf.(predict(MACH, df[!, BjetTLA.BDT_input_names]), true)
BjetTLA.feather_to_hist Method
feather_to_hist(df; var=:mjj12s, binedges = 0:5:600, Db_cut=BjetTLA.Db_77)::Hist1D
Take a DataFrame df
that has the same schema returned by make_bdt_input_table
, then make a histogram of variable var
after selecting for df.GN1_Db1s > Db_cut && df.GN1_Db2s > Db_cut
.
BjetTLA.fit_sr_model Method
fit_sr_model(xs, ys, errs; kw...) -> MLJ.Machine{SRRegressor ...}
BjetTLA.fit_sr_model Method
fit_sr_model(hist::Hist1D; kw...) -> MLJ.Machine{SRRegressor ...}
Fit a symbolic regression model to a histogram. Return the fitted machine.
sourceBjetTLA.fitted_smooth_hist Method
fitted_smooth_hist(hist, func::T=ATLAS_f_4p, p0=0.1*ones(4); algo=:nm) -> NamedTuple(; sol = ..., hist= ...)
Fit a histogram with a function and return a new histogram with the fitted bin counts.
sourceBjetTLA.get_file_list Method
get_file_list(proc_name; ntuple_base_path = "/data/jiling/TLA/eos_ntuples/")
BjetTLA.get_fill_wgt Method
get_fill_wgt(evt, pmgWeight, sumWeight;)::Float64
Get the fill weight for the event, if the evt
has the mcEventWeight
property, the weight is calculated as evt.mcEventWeight * (pmgWeight / sumWeight)
, otherwise the weight is 1.0
for real data.
BjetTLA.get_pmg_weight Method
get_pmg_weight(dsid::Union{Integer, AbstractString};
md_path = "/cvmfs/atlas.cern.ch/repo/sw/database/GroupData/dev/PMGTools/PMGxsecDB_mc23.txt")
Get the PMG weight from cvmfs database for a given dsid: returns crossSection_pb * genFiltEff * kFactor
If the md_path
is not a file, it will fall back to joinpath(@__DIR__, "../data/PMGxsecDB_mc23.txt")
Example
julia> signal_dsid = 510392
510392
julia> BjetTLA.get_pmg_weight(signal_dsid)
0.7831930217100002
julia> BjetTLA.get_pmg_weight("510392")
0.7831930217100002
BjetTLA.get_shortname Method
get_shortname(dsid, map)
Convert dsid to a short name, following the naming map.
sourceBjetTLA.has_variable_in_formula Method
has_variable_in_formula(equation) -> Bool
Check if a formula has a variable in the base and exponent functions.
Examples
julia> has_variable_in_formula(MLJ.report(sr_model).equations[1])
BjetTLA.hist1d_ftag Method
hist1d_ftag(df; nth_jet=1, tagger=:GN1, fc=0.08, mjj_limits=(100, Inf), binedges=-5:0.1:15)::Hist1D
Make a 1D histogram of the leading(nth_jet==1
) or sub-leading(nth_jet==2
) jet's b-tagging variables. See hist2d_ftag
for more details.
BjetTLA.hist2d_ftag Method
hist2d_ftag(df; tagger=:GN1, fc=0.08, mjj_limits=(100, Inf), binedges=-5:0.1:15)::Hist2D
Make a 2D histogram of the leading and sub-leading jet's b-tagging variables.
If tagger
is set to :GN1
, then the b-tagging variables are calculated using the BjetTLA.calc_db
function. If tagger
is set to :fastDIPS
, then the b-tagging variables are taken directly from the DataFrame columns: Db1s
and Db2s
.
Assume a df
DataFrame with the following columns:
GN1_pb1s
,GN1_pc1s
,GN1_pu1s
: leading jet's b-tagging variablesGN1_pb2s
,GN1_pc2s
,GN1_pu2s
: sub-leading jet's b-tagging variablesDb1s
,Db2s
: alternative b-tagging variables (if using :fastDIPS tagger)mjj12s
: invariant mass of the two leading jetswgts
: event weights
BjetTLA.hist_cut_eff Function
hist_cut_eff(hist, cut=0.93) -> Measurement{Float64}
Calculate the efficiency of a cut on the histogram hist
at the cut
value. The numerator is the integral of the histogram to the right of the cut, and the denominator is the integral of the whole histogram.
Examples
julia> h1 = Hist1D(; bincounts=[100.0, 100.0], binedges=0:0.5:1, sumw2=[2.0, 3.0])
edges: [0.0, 0.5, 1.0]
bin counts: [100.0, 100.0]
total count: 200.0
julia> BjetTLA.hist_cut_eff(h1, 0.5)
0.5 ± 0.0056
BjetTLA.hist_eff_divide Method
hist_eff_divide(hist1, hist2)::Hist1D
Examples
julia> h1 = Hist1D(; bincounts=[100.0], binedges=0:1, sumw2=[3.0])
julia> h2 = Hist1D(; bincounts=[200.0], binedges=0:1, sumw2=[5.0])
julia> h3 = BjetTLA.hist_eff_divide(h1, h2)
edges: [0.0, 1.0]
bin counts: [0.5]
total count: 0.5
julia> binerrors(h3)
1-element Vector{Float64}:
0.005590169943749474
BjetTLA.hist_to_components Method
hist_to_components(hist0) -> Tuple of Vector{Float64}
returns:
xs
ys
σs
Where xs
is bin centers. And ys
is the bin counts. Finally, σs
is the bin errors.
BjetTLA.idx_of_first_good_formula Method
idx_of_first_good_formula(sr_model; by=:chi2_and_complexity, chi2_cutoff=1.05 impro_threshold=0.95) -> Int
Return the index of the first (least complex) good formula in the model. A good formula is defined as a formula that has a variable in the base and exponent functions.
by
can be one of [:chi2_and_complexity, :chi2_improvement, :min_complexity_has_variables]
.
:chi2_and_complexity
: Find the simplest formula that is withinchi2_cutoff * minimal_chi2
of all formulas.:chi2_improvement
: Start with the simplest formula, if the next formula haschi2
smaller thancurrent_chi2 * impro_threshold
, pick it and repeat.:min_complexity_has_variables
: Find the least complex formula that uses variables in the base (usesx
) and exponent (useslog(x)
) functions.
Examples
julia> typeof(sr_model)
Machine{SRRegressor{DynamicQuantities.SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}, Nothing}, SRRegressor{DynamicQuantities.SymbolicDimensions{DynamicQuantities.FixedRational{Int32, 25200}}, Nothing}, true}
julia> BjetTLA.idx_of_first_good_formula(sr_model)
3
BjetTLA.loop_one_dir Method
loop_one_dir(dir_path, loop_task; kwargs...)
Given a directory and a loop task, loop over all files in this directory and combine the output from loop task.
For an example task, see main_task
.
BjetTLA.loop_one_proc Method
loop_one_proc(proc, loop_task; ntuple_base_path="/data/jiling/TLA/eos_ntuples/", kwargs...)
Given a process name and a loop task, loop over all files under this process and combine the output.
For an example task, see main_task
.
BjetTLA.main_task Method
main_task(md, tree; fill_data=false, fc=0.085)::HistBundle
Main task to be run in loop_one_proc
and friends. The fc
parameter is used for calculating GN1 b-jet discriminant.
If fill_data
is set to true
, it will fill the data vectors in the HistBundle
for further analysis such as deriving fc and WPs as well as BDT training. See also dump_bdt_arrow
.
BjetTLA.make_bdt_input_table Method
make_bdt_input_table(output::NamedTuple{Vectors...}, proc::Bool)::DataFrame
Take a NamedTuple of Vectors (potentially output of main_task
with fill_data=true
, and a proc
(by convention, true
for signal and false
for bkg) and return a dataframe with the following columns:
eventNumber
ystars
mjj12s
ph_EMFracs
ph_j2_ΔRs
wgts
Db{1,2,3}s
Db{1,2,3}_{70,77}s
proc
BjetTLA.nonoverlap_mask Method
nonoverlap_mask(photons, jets; cutoff=0.4)::Vector{Bool}
Create a mask that can be used to get non-overlapping jets.
Example
mask = nonoverlap_mask(photons, jets)
clean_jets = @view jets[mask]
BjetTLA.normalize_to Method
normalize_to(in_hist, ref_lumi)
normalize_to(in_hist, ref_hist::FHist.Hist1D)
Normalize the input histogram to the reference luminosity. If the reference histogram is provided, normalize the input histogram to the integral of the reference histogram.
sourceBjetTLA.optim_fit Method
optim_fit(func::Callable, hist0::FHist.Hist1D, p0; algo::Symbol=:nm, sqrt_s=SQRT_S[])
Fit a χ² fit using a func
to a 1D histogram.
func
: a function that takes two arguments,x
andp
, and returns a model value.hist0
: a 1D histogram to fit.p0
: initial parameters.
Note
The func
should be defined with @.
, and the p0
should be a vector. See ATLAS_f_4p
for an example.
Also, the x-axis (bincenters()
) of the hist0
will be divided by sqrt_s
to normalize the x-axis.
It returns the result of the optimization.
Example
julia> hist_mjj12_mask
edges: [130.0, 132.0, 134.0, 136.0, 138.0, 140.0, 142.0, 144.0, 146.0, 148.0 … 482.0, 484.0, 486.0, 488.0, 490.0, 492.0, 494.0, 496.0, 498.0, 500.0]
bin counts: [650.0, 603.0, 649.0, 566.0, 595.0, 593.0, 535.0, 487.0, 492.0, 466.0 … 2.0, 10.0, 13.0, 9.0, 13.0, 10.0, 10.0, 8.0, 15.0, 8.0]
total count: 22293.0
julia> sol4 = optim_fit(ATLAS_f_4p, hist_mjj12_mask, ones(4))
retcode: Success
u: 4-element Vector{Float64}:
0.0001086906699655396
91.12405771249554l
1.2345307604384157
-0.03547447344008578
julia> sol4.u
4-element Vector{Float64}:
0.0001086906699655396
91.12405771249554
1.2345307604384157
-0.03547447344008578
BjetTLA.pass_filter_jets Method
pass_filter_jets(evt, hlt_pfjets, photons; min_pt=25.0, max_eta = 2.1, dR_cutoff=0.4, min_Jvt=-Inf, fc=0.085)
Filter the HLT jets collection and related variables by doing the following:
require
pt > min_pt
require
abs(eta) < max_eta
remove jets that overlap with the photons collection by
dR < dR_cutoff
require
jvt > min_Jvt
(jvt = jet vertex tagger)
Finally, return the following as a NamedTuple
so you can select one or more at call site:
clean_jets
: the cleaned jets collectionclean_jets_btags
: the cleaned jets b-tagging valuesclean_jets_emfracs
: the cleaned jets EM fractionsclean_jets_jvts
: the cleaned jets JVT valuesfc
:in the GN1 Db calculation, see calc_db
.
Note
all the returnred arrays are sorted by the jet pt
in descending order.
BjetTLA.pass_filter_photons Method
clean_photons(hlt_photons); eta_cuts = true, min_pt = 20.0) -> subvector of photon lorentz vectors
Clean photons from the HLT photon collection. The cleaning is done by removing photons that satisfy the any of the following conditions:
pt < min_pt
abs(eta) > 2.37
1.37 < abs(eta) < 1.52
BjetTLA.plot_Db_fc_scan Method
plot_Db_fc_scan(hist_sig, hist_data; mjj_limits::Tuple, sig_title = "Z' (mass = 150GeV)")
Given two input histograms (produced by fc_Db_cumulative_hist
), this function plots the significance of the signal hist_sig
over the background hist_data
in a 2x2 grid of heatmaps.
It will also mark the optimal Db
and f_c
values on the significance 2D plane, the optimal points are calculated by best_Db_fc
.
BjetTLA.plot_compare_Dbs_1D Method
plot_compare_Dbs_1D(df_data, df_150, df_200; nth_jet=1, GN1_fc=0.0085, binedges=-5:0.5:15, wps=nothing)
BjetTLA.plot_cutflow Method
plot_cutflow(cutflow::Hist1D; cutnames=_cut_flow_names(), label="data 2023 nv8")
Plot the cutflow histogram with the given label
as the title.
BjetTLA.plot_fit Method
plot_fit(hist_data; fit_func = ATLAS_f_4p, sqrt_s)
Plot the input histogram hist_data
and the background prediction using the 4-parameter fit function (see bkg_prediction
).
BjetTLA.plot_smooth_bkg_fit Method
Examples
hist_data_bdt_smooth_3p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_3p_xnorm, p0_4para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_4p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_4p_xnorm, p0_4para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_5p = BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_5p_xnorm, p0_5para; lb=nothing, ub=nothing, multistart, likelihood, algo)
hist_data_bdt_smooth_6p = @time BjetTLA.fitted_smooth_hist(hist_data_bdt, BjetTLA.ATLAS_f_6p_xnorm, p0_6para; lb=nothing, ub=nothing, multistart, likelihood, algo);
plot_smooth_bkg_fit(hist_data_bdt, hist_data_bdt_smooth_3p, hist_data_bdt_smooth_4p, hist_data_bdt_smooth_5p, hist_data_bdt_smooth_6p;
scale_factor=1, cut_name="BDT > $BDT_CUT")
BjetTLA.plot_stack_hist Function
plot_stack_hist(dict_histbundle::Dict{<:Any,<:HistBundle}, varname::Symbol, cutflow_idx=lastindex(_cut_flow_names()); mc_procs=MC_proc_labels, lumi, rebin, fig_kw=(;), axis_kw=(;), atlas_label_pos=(0, 1), offset=(30, -20),
frag_direct=(0.7316265625500193, 3.1254759924281035)
)
Generate a stacked histogram plot comparing Monte Carlo (MC) histograms, data histograms, and optional signal histograms.
dict_histbundle
: A dictionary where keys are MC process names and values are their correspondingHistBundle
objects.varname
: A symbol representing the variable to be plotted.cutflow_idx
: An integer specifying the index of the cutflow step to use. Defaults to the last index of_cut_flow_names()
.mc_procs
: A list of Monte Carlo process labels to include in the plot. Defaults toMC_proc_labels
.lumi
: The integrated luminosity to scale the MC histograms.rebin
: An integer specifying the rebinning factor for the histograms. Defaults to1
.fig_kw
: A named tuple of keyword arguments for configuring the figure. Defaults to an empty tuple.axis_kw
: A named tuple of keyword arguments for configuring the axis. Defaults to an empty tuple.atlas_label_pos
: A tuple specifying the position of the ATLAS label. Seeadd_ATLAS_internal!
.offset
: A tuple specifying the offset for the ATLAS label. Defaults to(30, -20)
.frag_direct
: A tuple of scale factors for the fragmentation and direct components of thegammajet
process. Defaults to(0.7316265625500193, 3.1254759924281035)
.
Example
hists_serial_path = "/data/jiling/TLA/julia_jlds/2025_05_03.jlserialize"
dict_proc_histbundles = deserialize(hists_serial_path);
plot_stack_hist(dict_proc_histbundles, :jet1_eta, 3; rebin=1; lumi=run3_partial_lumi, fig_kw=(; size=(1000, 600)),
axis_kw=(; xlabel="ph1 eta", yscale=Makie.pseudolog10, yticks=10 .^ (0:7), limits=(-3, 3, 1, 10^7), ),
frag_direct = (0.7316265625500193, 3.1254759924281035),
)
BjetTLA.plot_staircase_scan Method
plot_staircase_scan(hist_sig, hist_data; colorscale=identity, rebin=1, wps=[1.0, 2.8, 5.8])
This function generates a staircase plot comparing signal and data histograms. It overlays heatmaps of the histograms and highlights working points (WPs) with vertical and horizontal lines.
Arguments
hist_sig
: The signal histogram to be plotted.hist_data
: The data histogram to be plotted.colorscale
: The color scale to be applied to the heatmaps.rebin
: The rebinning factor for the histograms. Defaults to1
.wps
: A list of working points to be highlighted on the plot. Defaults to[1.0, 2.8, 5.8]
.
BjetTLA.poisson_fluctuate Method
poisson_fluctuate(hist::Hist1D) -> Hist1D
Fluctuate the bin counts of a histogram with Poisson distribution. Returns a new histogram with fluctuated bin counts, and the sum of weights squared is preserved.
Example
julia> poisson_fluctuate(Hist1D(; binedges=0:3, bincounts=[10, 20, 30]))
edges: [0.0, 1.0, 2.0, 3.0]
bin counts: [9.0, 18.0, 28.0]
total count: 55.0
BjetTLA.proc_dirs Method
proc_dirs(proc; ntuple_base_path="/data/jiling/TLA/eos_ntuples/")::Vector{String}
Given a process name, return a list of directories containing the ntuple files.
sourceBjetTLA.pyhf_simple_significance Method
pyhf_simple_significance(sig_hist, data_hist; sqrt_s, fit_func=nothing)
Calculate the upper limit on the signal strength using the pyhf
package. The function returns the upper limit on the signal strength at 95% confidence level.
Specifically, the function takes the signal histogram sig_hist
and the data histogram data_hist
. The sqrt_s
parameter is used to normalize the x-axis of the input histograms. (mjj/sqrt_s), and the fit_func
parameter is used to predict the background. If thefit_func
is not provided, the function uses the data histogram as the background prediction – a reverse Asimov fit.
This function first builds a uncorrelated_background()
model using pyhf
and then uses the upper_limit()
function to calculate the upper limit on the signal strength.
Example
julia> using BjetTLA, FHist
julia> h1 = Hist1D(;binedges=0:2, bincounts=[12.0, 11.0]);
julia> h2 = Hist1D(;binedges=0:2, bincounts=[50.0, 52.0], sumw2=[9, 49]);
julia> pyhf_simple_significance(h1, h2; sqrt_s=136000)
Python:
(array(1.22313872), [array(0.65879787), array(0.8805031), array(1.22313872), array(1.7063601),…
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3, 3.4, 3.5, 3.6, 3.7, 3.8,
3.9, 4. ]), [(array(1.), [array(1.), array(1.), array(1.), array(1.), array(1.)]), (arr…
Note
The normalization of the input histograms need to be correct (i.e. the bins should predict actual number of events expected).
BjetTLA.safeLogRatio Method
safeLogRatio(num,den)
Safe log computation, to emulate athena implementation:
If
iszero(den)
, return15.0
If
iszero(num)
, return-10.0
otherwise, return
log(num/den)
as expected
BjetTLA.scan_fluctuate Method
scan_fluctuate(hist_smooth, SplusB::CombinedPDF; N=1, npara=4) -> Vector{Tuple{Float64, Float64}}
Throw N
Poisson-fluctuated histograms (poisson_fluctuate
) from hist_smooth
and fit them with SplusB
. Return a vector of tuples of the fitted signal strength and the χ²/ndf.
As noted in CombinedPDF
, the signal strength is expected to be the last parameter.
BjetTLA.skip_JZ_event Function
pass_JZ_cleaning(j_pt, jz, eps=0.2)::Bool
Apply cleaning filter for JZ slices, checks if the average pT of the two leading jets is in the expected range for slice jz
, within a eps
tolerance.
BjetTLA.split_by_dsid Method
split_by_dsid(strings::Vector{String})
Splits a vector of strings into a dictionary based on the dsid, a sequence of six digits preceeded and followed by ".". If that criteria fails, looks for a run number instead (sequence of eight digits).
sourceBjetTLA.staircase_inbound Method
staircase_inbound(x,y,i,j,k)::Bool
Assume x,y
are bin centers, and i,j,k
are on binedges.
BjetTLA.staircase_scan Method
staircase_scan(hist2d, hist2d_bkg; max=8.0)
Find the best staircase cut in a 2D histogram.
hist2d
: Signal 2D histogramhist2d_bkg
: background histogrammax
: maximum value for the D_b cut
BjetTLA.staircase_sig Method
staircase_sig(hist2d, hist2d_bkg, centers, i, j, k)::Float64
Calculate the significance of a 2D histogram using the staircase method. i < j < k
sourceBjetTLA.xg_cross_train Method
xg_cross_train(df_all; num_round=90, max_depth=7, η=0.8, scale_pos_weight=200, input_names=BDT_input_names, gamma=0, alpha=0, lambda=1, Nfolds=5, lumi=30e3)
Take a DataFrame (potentially loaded from the output of dump_bdt_arrow
) and return Nfolds
The dataset is partitioned into Nfolds
using mod1.(df_all.eventNumber, Nfolds)
.
Examples
julia> Random.seed!(654321)
julia> df_all = shuffle(DataFrame(Arrow.Table("/data/jiling/TLA/julia_arrows/2024_11_04_data2023.feather")));
julia> df_all.proc = categorical(df_all.proc);
julia> bdt_machines = xg_cross_train(df_all);
BjetTLA.@cutflow Macro
@cutflow counter hist name_list name wgt
Convinient macro to generate cutflow histogram filling code. It will increment the counter
by 1
, check if the name_list[counter]
is equal to expected name
, and then push the counter
to the hist
with the given wgt
.
BjetTLA.@cutflow counter hist name_list name wgt
is equivalent to:
counter += 1
@assert name_list[counter] == name
atomic_push!(hist, counter, wgt)
BjetTLA.@fourmomenta Macro
@fourmomenta evt obj_prefix
Creates the fourmomenta for a given object type in the event. Returns a StructArray of LorentzVectorCyl.
It assumes that the event is a structure containig all the projections of the 4-vectors: evt.obj_pt, evt.obj_eta, evt.obj_phi, evt.obj_m (or evt.obj_E for jets).
Concretely,
@fourmomenta evt ph
would expand to:
StructArray(LorentzVectorCyl.(
evt.ph_pt,
evt.ph_eta,
evt.ph_phi,
evt.ph_m
))
For jets, it will call fromPtEtaPhiE
and the return type stays the same.
Example
julia> for evt in tree
hlt_jets = @fourmomenta evt jet
count(>(25), hlt_jets.pt) # count how many jets have pt > 25 GeV
...
end