dimanche 23 août 2020

Something odd going on with my triplet optimization algorithm?

I've tried comp sci SE, data science SE, and science comp SE. Some people have suggested SO is the best place for this. Hopefully the geniuses of SO can help with this

I written an algorithm for data analysis using the CERN ROOT framework. It takes in three files of sorted UNIX timestamps from a single year, and pairs them up in the closest triplets, where each file contributes a triplet, and each triplet is unique. I know there are some more "well known" algorithms for accomplishing this, however this algorithm completes the task much faster, clocking in at about 20 minutes on my machine, as compared to many, many hours for some of the other algorithms I've tried. When complete, the algorithm plots the triplets (of the form {a,b,c]) on a 2-dimensional histogram, where the horizontal axis is a-b, and the vertical axis is a-c.

Problem is, it seems to be acting very weird. Namely, when I feed the algorithm one file of real data (these are timestamps generated by an experiment) and two files of completely random data, I get these weird diagonal lines: https://filebin.net/spbswnhkfy8xdkp8/random_plot.pdf?t=48f2yhjq. When I feed the algorithm three files of real data, a single diagonal line through the middle (and two more lines, running horizontally and vertically) appears if I use enough bins. Any idea what's going on with my algorithm?

EDIT: When I use three files of completely random data, I get this weird grid: https://filebin.net/k00htgl9v2wa5e7x/compRand.pdf?t=lrt6dpkr

        void unbiasedAnalysis(){

TNtupleD *D  = new TNtupleD("D","D","x:y");

ROOT::RDataFrame statn1("D", "./pathtodata");
ROOT::RDataFrame statn2("D", "./pathtodata");
ROOT::RDataFrame statn3("D", "./pathtodata");

vector<double> vec_1, vec_2, vec_3;
statn1.Foreach([&](double tstamp){ vec_1.push_back(tstamp); },{"UNIX"});
statn2.Foreach([&](double tstamp){ vec_2.push_back(tstamp); },{"UNIX"});
statn3.Foreach([&](double tstamp){ vec_3.push_back(tstamp); },{"UNIX"});

vector<vector<double>> pairs;
for(auto tstamp : vec_1){

    double first,second;

    //get iterator pointing to closest element greater than or equal to
    auto geq = std::lower_bound(vec_2.begin(), vec_2.end(), tstamp);
    //get iterator pointing to nearest element less than
    auto leq = geq - 1;

    double foo = tstamp - *geq;
    double bar = tstamp - *leq;

    //compare iterators, save the closest 
    if(dabs(foo) <  dabs(bar)){ first = *geq; }
    else { first = *leq; }




    //repeat
    geq = std::lower_bound(vec_3.begin(), vec_3.end(), tstamp);
    leq = geq - 1;

    foo = tstamp - *geq;
    bar = tstamp - *leq;

    if(dabs(foo) < dabs(bar)){ second = *geq; }
    else { second = *leq; }

    //add to pairs
    pairs.push_back({tstamp, first, second, (tstamp-first), (tstamp-second), std::min((tstamp-first), (tstamp-second))});

}

//sort vector of vectors by size of smallest difference
std::sort(pairs.begin(), pairs.end(),
    [](const vector<double>& A, const vector<double>& B){
        return A[5] < B[5];
});

std::set<double> cache;

ROOT::EnableImplicitMT();

for(auto pair : pairs){
    //if not in cache, add to TNtuple
    if(cache.find(pair[1]) == cache.end() && cache.find(pair[2]) == cache.end()){

            D->Fill(pair[3],pair[4]);

        //add to cache
        cache.insert(pair[1]); cache.insert(pair[2]);
    }
}

D->Draw("x:y>>htemp(100,-0.02,0.02,100,-0.02,0.02)","","colz");
}



Aucun commentaire:

Enregistrer un commentaire