I am experiencing an unexpected issue while working with DataFrame filtering using the ROOT library in Python. Below is a simplified version of my code:
... df.Define("shift", "getShift()") ...
where:
getShift = """ #include <ctime> #include "TRandom3.h" float getShift() { // smear vertices with a gaussian (interaction region of 2*sigma_beam = 126 mm) auto now = std::chrono::system_clock::now(); auto timeSeed = now.time_since_epoch().count(); auto rnd = TRandom3(timeSeed); auto shift = rnd.Gaus(0, 63.) ; return shift; } """ R.gInterpreter.Declare(getShift)
...
df_Fil = df.Filter("nHits \> 0 && PZ_pip0 \> 0 && PZ_pip1 \> 0 && PZ_pim0 \> 0 && \
std::isnan(theta_pvtv) == 0 && std::isnan(theta_fh) == 0 && std::isnan(theta_TRUE) == 0")
nBhits = (df_Fil.Filter('nHits_mother \> 0')).Count().GetValue()
print(f"stage1: {df_Fil.Count().GetValue()}")
noBhits = (df_Fil.Filter('nHits_mother == 0')).Count().GetValue()
print(f"stage2: {df_Fil.Count().GetValue()}")
ntauhits = (df_Fil.Filter('nHits_daughter \> 0')).Count().GetValue()
print(f"stage3: {df_Fil.Count().GetValue()}")
notauhits = (df_Fil.Filter('nHits_daughter == 0')).Count().GetValue()
print(f"stage4: {df_Fil.Count().GetValue()}")
notauhits = (df_Fil.Filter('nHits_daughter == 0')).Count().GetValue()
print(f"stage5: {df_Fil.Count().GetValue()}")
nr_Fil = df_Fil.Count().GetValue()
eff = nr_Fil / 10000000
print(f"nrFil = {nr_Fil}, nBhits = {nBhits}, ntauhits = {ntauhits}, noBhits = {noBhits}, notauhits = {notauhits}")
However, the output is not as expected:
stage1: 484 stage2: 510 stage3: 499 stage4: 483 stage5: 500 nrFil = 498, nBhits = 36, ntauhits = 493, noBhits = 465, notauhits = 18
(check: it should be nBhits+ntauhits = nrFil, notauhits=nBhits, noBhits=ntauhits"
The issue is that the counts of the DataFrame df_Fil
seem to change unexpectedly after applying filters. I expected the count to remain consistent, but it appears to fluctuate after each filter is applied.
I am quite sure the problem depends on the generation of a random number "shift" that is re-generated everytime I use the Filter method, therefore changing the number of counts in nrFil (it's dependent on the exact shifts").
Anyone can help me understanding how to avoid this issue?
I've tried using TRandom3(42) for initialization and the counts at every stage are consistent, but it's because the shift value remains constant.
Aucun commentaire:
Enregistrer un commentaire