mercredi 25 septembre 2019

Should 99 percentile of a set of two lists be the average of 99 percentile of each list?

For example, the 99 percentile value of list A is p99_a, the 99 percentile value of list B is p99_b, list C is the full set of A and B, should the 99 percentile value of list C be the 99 percentile value of p99_a and p99_b or the average value of p99_a and p99_b?

I always thought it should be the former one, however, I tried it on codes:

import numpy as np
import random
data = []
p99list = []
for i in range(10000):
    one_data = [random.randrange(10000) for x in range(1000)]
    data += one_data
    p99list.append(np.percentile(one_data, 99))

print('p99 of all data: \t' + str(np.percentile(data, 99)))
print('average of p99: \t' + str(np.average(p99list)))
print('p99 of p99 : \t' + str(np.percentile(p99list, 99)))

The results were:

p99 of all data:    9899.0
average of p99:     9889.646635999998
p99 of p99 :    9952.01

It showed that average of p99 was closer to the p99 of all data than p99 of p99. On the Contrary, if I changed the sixth line of code to as follows (on the purpose of simulating the response time of HTTP reuqests):

one_data = [random.uniform(0.2, 0.4) for x in range(1000), random.uniform(1.0, 1.2) for y in range(5)]

I ran the code again, and the results were:

p99 of all data:    0.39801099789433964
average of p99:     0.37998116766051837
p99 of p99 :    0.39904330107367425

It turned out that p99 of p99 was closer to the p99 of all data than average of p99.

So which one is more accurate?




Aucun commentaire:

Enregistrer un commentaire