I have a question about the Maximum Entropy Bootstrap algorithm for time series. The steps of the algorithm are shown below:
- Sort the original data in increasing order to create order statistics x(t) and store the ordering index vector.
-
Compute intermediate points zt = (x(t) + x(t+1))/2 for t = 1, . . . , T − 1 from the order statistics.
-
Compute the trimmed mean mtrm of deviations xt − xt−1 among all consecutive observations. Compute the lower limit for left tail as z0 = x(1) − mtrm and upper limit for right tail as zT = x(T) + mtrm. These limits become the limiting intermediate points.
- Compute the mean of the maximum entropy density within each interval such that the ‘mean-preserving constraint’ (designed to eventually satisfy the ergodic theorem) is satisfied. Interval means are denoted as mt . The means for the first and the last interval have simpler formulas.
- Generate random numbers from the [0, 1] uniform interval, compute sample quantiles of the ME density at those points and sort them.
- Reorder the sorted sample quantiles by using the ordering index of step 1. This recovers the time dependence relationships of the originally observed data.
- Repeat steps 2 to 6 several times (e.g., 999).
I'm confused about how to implement step 5. Essentially, how do you go from the red stars, to the blue triangles on the left in this diagram:
The diagram comes from: https://cran.r-project.org/web/packages/meboot/vignettes/meboot.pdf
I've written some python code to do these steps, but I'm not sure it's correct:
def meboot(x, col):
xt = x.sort_values(by=[col])
zt = xt.rolling(2).mean()
diff = x.rolling(window=2).apply(lambda x: x[1] - x[0])
trimmed_mean = diff.mean()
upper = xt.iloc[-1] + trimmed_mean
lower = xt.iloc[0] - trimmed_mean
zt.iloc[0] = lower
zt.iloc[-1] = upper
desired_means = pd.DataFrame(data=np.zeros(x.shape[0],))
desired_means.iloc[0] = (0.75*xt.iloc[0] + 0.25*xt.iloc[1]).values
for k in range(1, xt.shape[0]-1):
desired_means.iloc[k] = (0.25*xt.iloc[k-1] + 0.5*xt.iloc[k] + 0.25*xt.iloc[k+1]).values
desired_means.iloc[-1] = (0.75*xt.iloc[-1] + 0.25*xt.iloc[-2]).values
U = pd.DataFrame(np.sort(np.random.rand(xt.shape[0])), columns=[col], index=xt.index)
N = xt.shape[0]
x = [float(y)/N for y in range(N+1)]
inds = range(len(x))
quantiles = np.zeros(N)
for k in range(N):
yy = np.abs(x - U.iloc[k].values)
ind = np.argmin(yy)
if x[ind] > U.iloc[k].values:
ind -= 1
c = (2*desired_means.iloc[ind-1].values - zt.iloc[ind-1].values - zt.iloc[ind].values) / 2
y0 = zt.iloc[ind - 1].values + c
y1 = zt.iloc[ind].values + c
quantiles[k] = y0 + (U.iloc[k].values - x[ind]) * (y1 - y0) / (x[ind + 1] - x[ind])
quantiles = pd.DataFrame(quantiles, columns=['Passengers'], index=xt.index)
return quantiles.sort_index()
Aucun commentaire:
Enregistrer un commentaire