Expected Performance of a Mean-Reversion Trading Strategy — Part 4
2026-03-03
In Part 1 we derived the asymptotic Sharpe ratio \mathrm{SR}_\infty = \sqrt{\theta/2} of a mean-reversion strategy under the assumption that the mispricing follows an Ornstein-Ulenbeck process and the trader knows its parameters exactly. Part 2 showed that a constant bias M in the fair-value estimate incurs a multiplicative penalty (1 + 2\theta M^2/\sigma^2)^{-1/2}, and Part 3 extended this to correlated bias, establishing that any trailing estimator of the faie value using past log prices degrades both expected PnL and risk.
All three parts treated the OU parameters \theta and \sigma as known. In practice, these must be estimated from a finite calibration sample, and estimation error feeds directly into the trader’s fair-value estimate. In this post we close the loop by quantifying how parameter estimation error affects the realised Sharpe ratio.
We look at the effect of the two estimates separately. The first is the mean estimation error (\hat\mu \neq \mu): the estimated long-run mean acts as a constant bias in the Part 2 sense, with variance \mathrm{Var}(\hat\mu) \approx \sigma^2/(\theta^2 T_{\mathrm{est}}), yielding a penalty (1 + 2/(\theta T_{\mathrm{est}}))^{-1/2} governed entirely by the number of mean-reversion timescales in the calibration window. The second is speed estimation error (\hat\theta \neq \theta): the well-known finite-sample bias in \hat\theta does not degrade realised SR — which is scale-invariant in position size — yet creates systematic overconfidence, as the trader predicts a higher SR than is actually achieved, leading to misallocated capital and underestimated drawdowns.
A further contribution of this post is to show that rolling re-estimation is equivalent to a Part 3 trailing estimator, thereby unifying the estimation-error penalty derived here with the correlated-bias penalty from the previous instalment.
1 Maximum Likelihood Estimation for the OU Process
We work with the same centred OU model as Parts 1–3:
For the trading strategy, X_t represents the mispricing — the deviation from fair value v, with \mu = v denoting the true long-run mean. The trader must estimate the parameter triple (\theta, \mu, \sigma) from a calibration sample of length T_{\mathrm{est}}.
which takes the form of a Gaussian AR(1) model X_{i+1} = a X_i + b + \varepsilon_i with a = e^{-\theta\Delta t} and b = \mu(1-a). Maximum likelihood estimation therefore reduces to OLS regression, and the OU parameters are recovered by inverting:
Three finite-sample difficulties plague this estimator. First, when \theta is small, a \approx 1 and the process resembles a random walk; since the intercept takes the form \mu(1-a), the estimates \hat\mu and \hat\theta become strongly negatively correlated — a slight change in one can be compensated by the other, creating an identification problem. Second, the MLE \hat\theta is upward-biased in finite samples: small-sample variability causes \hat{a} to be pulled below e^{-\theta\Delta t}, which inflates \hat\theta. Third, all three estimators exhibit high variance when \theta T_{\mathrm{est}} is small, reflecting the fact that the process has not completed enough mean-reversion cycles to reveal its parameters reliably.
Figure 1 illustrates these effects across calibration windows of 2, 5, 10, and 20 years. The top row shows the distribution of \hat\theta: the median exceeds the true value at short windows and converges from above as more data is observed. The bottom row shows \hat\mu: the distribution is centred at the true mean in all cases, but its standard deviation shrinks only as \sigma/(\theta\sqrt{T_{\mathrm{est}}}), confirming the slow convergence rate inherent to autocorrelated processes.
Figure 1: Finite-sample distributions of MLE estimates \\hat\\theta (top) and \\hat\\mu (bottom) for calibration windows of 2, 5, 10, and 20 years (\\theta=1, \\sigma=1, \\mu=0). The upward bias in \\hat\\theta and the high variance of \\hat\\mu diminish as the calibration window lengthens.
The trader observes the OU process on a calibration window [0, T_{\mathrm{est}}], estimates (\hat\theta, \hat\mu, \hat\sigma) via MLE, then trades on [T_{\mathrm{est}}, T_{\mathrm{est}} + T_{\mathrm{trade}}] using \hat\mu as the fair-value estimate.
From Part 2, the asymptotic Sharpe ratio when the trader operates with a constant bias M is
Part 2 also demonstrated that expected PnL is unaffected by a constant bias — the degradation acts purely through increased variance. During the trading window, the estimation error in \hat\mu behaves exactly as a Part 2 constant bias: it is fixed at the moment calibration ends and is independent of the future Brownian increments on [T_{\mathrm{est}}, T_{\mathrm{est}} + T_{\mathrm{trade}}].
2.2 Variance of \hat\mu
For a stationary OU process with mean \mu, the sample mean \bar{X} = \frac{1}{T}\int_0^T X_t\,dt satisfies
When \theta T is large (many half-lives), \mathrm{Var}(\bar{X}) \approx \sigma^2/(\theta^2 T). The MLE \hat\mu coincides with \bar{X} asymptotically, giving
Substituting M^2 \to \mathbb{E}[\hat\mu^2] into the Part 2 formula — just as we substituted M^2 \to \sigma_M^2 for a random independent bias in Part 3 — yields
The penalty factor \left(1 + 2/(\theta T_{\mathrm{est}})\right)^{-1/2} depends only on the dimensionless product \theta T_{\mathrm{est}} — the number of mean-reversion timescales observed during calibration. To achieve a penalty smaller than 1/\sqrt{2} \approx 0.71, we need \theta T_{\mathrm{est}} > 2, meaning the trader must observe at least two full mean-reversion timescales. Table 1 shows the penalty across a range of calibration lengths.
Table 1: Estimation penalty as a function of calibration length \theta T_{\mathrm{est}}.
Figure 2: Estimation penalty on Sharpe ratio vs calibration length \\theta T_{\\mathrm{est}}. Left: penalty factor. Right: absolute SR. Monte Carlo dots closely track the theoretical curve (1 + 2/\\theta T_{\\mathrm{est}})^{-1/2}.
In Part 3, a correlated bias was shown to degrade expected PnL via -\theta\int_0^t \mathrm{Cov}(M_u, X_u)\,du, not just risk. A natural concern is whether the one-shot estimator \hat\mu — computed from the calibration window — introduces this additional penalty.
Asymptotically, it does not. Note that X_t for t > T_{\mathrm{est}} depends on the calibration window only through X_{T_{\mathrm{est}}}. Since \hat\mu is a function of \{X_s : s \leq T_{\mathrm{est}}\}, the covariance decays exponentially as the trading window progresses:
as T_{\mathrm{trade}} \to \infty. The correlation effect is washed out by the long trading horizon.
For one-shot estimation, the entire SR penalty therefore comes from the Part 2 variance channel: \hat\mu acts as a random constant bias with \mathbb{E}[\hat\mu^2] \approx \sigma^2/(\theta^2 T_{\mathrm{est}}), yielding the clean formula (1 + 2/(\theta T_{\mathrm{est}}))^{-1/2}. Figure 2 confirms this result with Monte Carlo simulation.
4 Rolling Estimation
In practice, traders do not estimate \hat\mu once and trade indefinitely — in the simplest setting, they re-estimate continuously using a rolling window of length W. At each time t, the fair-value estimate is the trailing sample mean
M_t = \frac{1}{W}\int_{t-W}^{t} X_s\,ds,
which is a simple moving average (SMA), a trailing estimator of the kind analyzed in Part 3. The weighting kernel w(s) = \mathbf{1}_{[0,W]}(s)/W is non-negative and integrates to one, so Part 3’s general theorem guarantees \mathrm{Cov}(M_t, X_t) > 0 at all times. Rolling re-estimation therefore creates the persistent correlated-bias penalty that the one-shot estimator avoids.
The SMA with window W has a characteristic timescale of W/2, analogous to an EMA with decay rate \lambda \approx 2/W. Substituting into Part 3’s EMA Sharpe formula gives
This is the same functional form as the one-shot penalty, with the rolling window W replacing T_{\mathrm{est}}. The unification is satisfying, yet it conceals a crucial asymmetry summarised in Table 2.
Table 2: Penalty source comparison for one-shot vs rolling estimation.
Method
Penalty
Source
One-shot (calibrate then trade)
(1 + 2/(\theta T_{\mathrm{est}}))^{-1/2}
Part 2 variance channel only
Rolling window W
(1 + 2/(\theta W))^{-1/2}
Part 3 covariance channel (dominant)
The one-shot penalty vanishes as T_{\mathrm{trade}} \to \infty (for fixed T_{\mathrm{est}}), whereas the rolling penalty persists at all horizons — it is a structural cost of continuous re-estimation. Figure 3 confirms this unification via Monte Carlo.
Show simulation code
# --- Rolling estimation: SMA bias simulation ---theta_true, sigma_true =1.0, 1.0dt =1/252T =50# total simulation lengthn_paths =2000W_values = [0.5, 1, 2, 5, 10, 20]sr_rolling_mc = []for W in W_values: W_steps =int(W / dt) X = simulate_ou(theta_true, sigma_true, T, dt, n_paths, rng=np.random.default_rng(123))# Compute rolling SMA bias: M_t = (1/W) * integral_{t-W}^{t} X_s ds cumX = np.cumsum(X, axis=1) * dt M = np.zeros_like(X)for k inrange(W_steps, X.shape[1]): M[:, k] = (cumX[:, k] - cumX[:, k - W_steps]) / W# Only measure SR after the window is fully populated burn = W_steps +int(5/ dt)if burn >= X.shape[1] -int(5/ dt): burn = W_steps X_eval = X[:, burn:] M_eval = M[:, burn:] T_eval = (X_eval.shape[1] -1) * dt# SR = mean(annualised PnL) / sqrt(mean(annualised QV_Y)) Y = pnl_paths(X_eval, M_eval, dt) mean_pnl = np.mean(Y[:, -1] / T_eval) dX_eval = np.diff(X_eval, axis=1) dY =-(X_eval[:, :-1] - M_eval[:, :-1]) * dX_eval qvY = np.sum(dY**2, axis=1) mean_qvY = np.mean(qvY / T_eval) sr_rolling_mc.append(mean_pnl / np.sqrt(mean_qvY))sr_rolling_mc = np.array(sr_rolling_mc)# --- Plot: rolling SR vs window ---fig, axes = plt.subplots(1, 2, figsize=(14, 5))# Left: penalty factor vs theta*WW_fine = np.linspace(0.3, 25, 200)penalty_fine =1/ np.sqrt(1+2/ (theta_true * W_fine))axes[0].plot(theta_true * W_fine, penalty_fine, 'k--', lw=2, label=r'Theory: $(1 + 2/\theta W)^{-1/2}$')penalty_rolling = sr_rolling_mc / np.sqrt(theta_true /2)axes[0].scatter(theta_true * np.array(W_values), penalty_rolling, s=80, c='forestgreen', zorder=5, edgecolors='black', label='MC (rolling SMA)')axes[0].set_xlabel(r'$\theta \, W$')axes[0].set_ylabel('Penalty factor')axes[0].set_title('Rolling estimation penalty')axes[0].legend()axes[0].set_ylim(0.3, 1.05)# Right: compare one-shot vs rollingaxes[1].plot(theta_true * W_fine, penalty_fine, 'k--', lw=2, label='Rolling SMA (theory)')# Overlay one-shot results from previous cellaxes[1].scatter(theta_true * T_est_grid, sr_mc / np.sqrt(theta_true /2), s=80, c='steelblue', zorder=5, edgecolors='black', marker='s', label='One-shot (MC)')axes[1].scatter(theta_true * np.array(W_values), penalty_rolling, s=80, c='forestgreen', zorder=5, edgecolors='black', marker='o', label='Rolling SMA (MC)')axes[1].set_xlabel(r'$\theta \times$(window length)')axes[1].set_ylabel('Penalty factor')axes[1].set_title('One-shot vs rolling: same penalty curve')axes[1].legend()axes[1].set_ylim(0.3, 1.05)fig.tight_layout()plt.show()for W, sr inzip(W_values, sr_rolling_mc): th_penalty =1/ np.sqrt(1+2/ (theta_true * W))print(f" W={W:>5.1f}yr θW={theta_true*W:>5.1f} SR_mc={sr:.4f} "f"penalty_mc={sr/np.sqrt(theta_true/2):.3f} penalty_theory={th_penalty:.3f}")
Figure 3: Rolling SMA penalty vs window length. Left: MC penalty factor for rolling SMA against the theoretical curve (1+2/\\theta W)^{-1/2}. Right: one-shot and rolling penalties overlaid on the same curve, confirming they share identical functional form.
It is well known that the MLE overestimates \theta in finite samples: the median \hat\theta exceeds the true value, as shown in Figure 1. A natural question is whether this bias can degrade the realised Sharpe ratio in practice.
It turns out that it does not, at least not directly. The realised SR, defined as Y_T / \sqrt{[Y]_T}, depends only on the actual mispricing process and the trader’s fair-value estimate. Using \hat\theta instead of the true \theta for position sizing simply scales all positions uniformly, and this common factor cancels in the ratio.
However, a trader who computes the predicted Sharpe ratio \mathrm{SR}_\infty = \sqrt{\hat\theta/2} will arrive at a systematic overestimate, since \hat\theta is upward-biased. The resulting overconfidence distorts capital allocation: more capital or higher leverage is deployed than the strategy’s actual risk–reward profile warrants. Risk budgets calibrated to the inflated SR become too tight, raising the probability of premature stop-outs during normal drawdowns. In a multi-strategy portfolio, the allocation problem is compounded — the strategy with the shortest calibration window appears most attractive precisely because its \hat\theta is the most inflated. Figure 4 quantifies this gap between predicted and realised SR for calibration windows of 2–20 years.
Figure 4: Distribution of predicted SR \\sqrt{\\hat\\theta/2} vs true and realised SR for calibration lengths of 2, 5, 10, and 20 years. The median prediction systematically exceeds realised SR, with the gap narrowing as the calibration window lengthens.
The central result of this post is a closed-form expression for the estimation penalty on Sharpe ratio. When a trader calibrates the OU process on a window of length T_{\mathrm{est}} and then trades with the estimated fair value, the asymptotic Sharpe ratio is \mathrm{SR}_\infty = \sqrt{\theta/2}\,(1 + 2/(\theta T_{\mathrm{est}}))^{-1/2}, as confirmed by Monte Carlo in Figure 2 and tabulated in Table 1. The penalty depends entirely on \theta T_{\mathrm{est}}, the number of mean-reversion timescales observed during calibration. Slow mean reversion (small \theta) demands proportionally longer calibration to achieve the same penalty level.
A key structural distinction separates one-shot estimation from rolling re-estimation. The one-shot estimator \hat\mu is fixed at the end of calibration and is independent of subsequent Brownian increments, so it enters as a Part 2 constant bias whose correlation with the trading-window process decays exponentially. For long trading horizons, the Part 3 covariance channel washes out entirely and only the variance penalty remains. The rolling estimator, by contrast, continuously updates \hat\mu using a trailing window of length W, creating a permanent positive correlation between the bias and the mispricing. Although the resulting penalty (1 + 2/(\theta W))^{-1/2} has the same functional form as the one-shot case, it persists at all trading horizons — a structural cost of continuous re-estimation (Figure 3, Table 2).