Expected Performance of a Mean-Reversion Trading Strategy — Part 4

2026-03-03

In Part 1 we derived the asymptotic Sharpe ratio \mathrm{SR}_\infty = \sqrt{\theta/2} of a mean-reversion strategy under the assumption that the mispricing follows an Ornstein-Uhlenbeck process and the trader knows its parameters exactly. Part 2 showed that a constant bias M in the fair-value estimate incurs a multiplicative penalty (1 + \xi^2)^{-1/2}, where \xi = M/s_\infty and s_\infty = \sigma/\sqrt{2\theta} is the stationary mispricing scale. Part 3 extended this to correlated bias, establishing that trailing estimators of fair value degrade both expected PnL and risk.

All three parts treated the OU parameters \theta and \sigma as known. In practice, these must be estimated from a finite calibration sample, and estimation error feeds directly into the trader’s fair-value estimate. In this post we close the loop by quantifying how parameter estimation error affects the realised Sharpe ratio.

We look at the effect of the two estimates separately. The first is the mean estimation error (\hat\mu \neq \mu): the estimated long-run mean acts as a constant bias in the Part 2 sense, with variance \mathrm{Var}(\hat\mu) \approx \sigma^2/(\theta^2 T_{\mathrm{est}}), yielding a penalty (1 + 2/(\theta T_{\mathrm{est}}))^{-1/2} governed entirely by the number of mean-reversion timescales in the calibration window. The second is speed estimation error (\hat\theta \neq \theta): the well-known finite-sample bias in \hat\theta does not alter the pathwise Sharpe ratio of the baseline strategy studied here, but it does create systematic overconfidence when the trader converts \hat\theta into a forecasted Sharpe ratio or a leverage decision, leading to misallocated capital and underestimated drawdowns.

1 Maximum Likelihood Estimation for the OU Process

We work with the same centred OU model as Parts 1–3:

dX_t = -\theta\,(X_t - \mu)\,dt + \sigma\,dW_t, \qquad X_0 = \mu.

For the trading strategy, X_t represents the mispricing — the deviation from fair value v, with \mu = v denoting the true long-run mean. The trader must estimate the parameter triple (\theta, \mu, \sigma) from a calibration sample of length T_{\mathrm{est}}.

As derived in a previous post in our short-rate model series, the exact discretisation of this process is

X_{i+1} \mid X_i \;\sim\; \mathcal{N}\!\big(\underbrace{X_i\,e^{-\theta\Delta t} + \mu(1 - e^{-\theta\Delta t})}_{\text{conditional mean}},\;\; \underbrace{\tfrac{\sigma^2}{2\theta}(1 - e^{-2\theta\Delta t})}_{\text{conditional variance}}\big),

which takes the form of a Gaussian AR(1) model X_{i+1} = a X_i + b + \varepsilon_i with a = e^{-\theta\Delta t} and b = \mu(1-a). Maximum likelihood estimation therefore reduces to OLS regression, and the OU parameters are recovered by inverting:

\hat\theta = -\frac{\log\hat{a}}{\Delta t}, \qquad \hat\mu = \frac{\hat{b}}{1-\hat{a}}, \qquad \hat\sigma^2 = \frac{2\hat\theta\,\hat{s}^2_\varepsilon}{1-\hat{a}^2}.

Three finite-sample difficulties plague this estimator. First, when \theta is small, a \approx 1 and the process resembles a random walk; since the intercept takes the form \mu(1-a), the estimates \hat\mu and \hat\theta become strongly negatively correlated — a slight change in one can be compensated by the other, creating an identification problem. Second, the MLE \hat\theta is upward-biased in finite samples: small-sample variability causes \hat{a} to be pulled below e^{-\theta\Delta t}, which inflates \hat\theta. Third, all three estimators exhibit high variance when \theta T_{\mathrm{est}} is small, reflecting the fact that the process has not completed enough mean-reversion cycles to reveal its parameters reliably.

Figure 1 illustrates these effects across calibration windows of 2, 5, 10, and 20 years. The top row shows the distribution of \hat\theta: the median exceeds the true value at short windows and converges from above as more data is observed. The bottom row shows \hat\mu: the distribution is centred at the true mean in all cases, but its standard deviation shrinks only as \sigma/(\theta\sqrt{T_{\mathrm{est}}}), confirming the slow convergence rate inherent to autocorrelated processes.

Show simulation code
# --- Finite-sample bias in MLE estimates ---
theta_true, sigma_true, mu_true = 1.0, 1.0, 0.0
dt = 1 / 252
n_mc = 3000
T_est_values = [2, 5, 10, 20]

fig, axes = plt.subplots(2, len(T_est_values), figsize=(16, 8), sharex='row', sharey='row')

for j, T_est in enumerate(T_est_values):
    theta_hats, mu_hats = [], []
    for i in range(n_mc):
        X = simulate_ou(theta_true, sigma_true, T_est, dt, 1,
                        mu=mu_true, rng=np.random.default_rng(1000 * j + i))
        th, mh, sh = mle_ou(X[0], dt)
        theta_hats.append(th)
        mu_hats.append(mh)
    theta_hats = np.array(theta_hats)
    mu_hats = np.array(mu_hats)

    # theta histogram
    ax = axes[0, j]
    ax.hist(theta_hats, bins=60, density=True, alpha=0.7, color='steelblue')
    ax.axvline(theta_true, color='black', ls='--', lw=2, label=f'true θ = {theta_true}')
    ax.axvline(np.median(theta_hats), color='crimson', ls='--', lw=1.5,
               label=f'median = {np.median(theta_hats):.2f}')
    ax.set_title(f'$T_{{\\mathrm{{est}}}}$ = {T_est} yr')
    ax.legend(fontsize=9)
    if j == 0:
        ax.set_ylabel('density')

    # mu histogram
    ax = axes[1, j]
    ax.hist(mu_hats, bins=60, density=True, alpha=0.7, color='darkorange')
    ax.axvline(mu_true, color='black', ls='--', lw=2, label=f'true μ = {mu_true}')
    ax.axvline(np.median(mu_hats), color='crimson', ls='--', lw=1.5,
               label=f'median = {np.median(mu_hats):.2f}')
    std_mu = np.std(mu_hats)
    ax.set_title(f'std($\\hat\\mu$) = {std_mu:.3f}')
    ax.legend(fontsize=9)
    if j == 0:
        ax.set_ylabel('density')

axes[0, 0].set_xlim(0, 4)
axes[1, 0].set_xlim(-2, 2)
fig.suptitle(r'MLE estimates from finite OU samples ($\theta=1, \sigma=1, \mu=0$)', fontsize=14)
fig.tight_layout()
plt.show()

# Summary table as Markdown for proper LaTeX rendering
rows = []
for j, T_est in enumerate(T_est_values):
    theta_hats, mu_hats = [], []
    for i in range(n_mc):
        X = simulate_ou(theta_true, sigma_true, T_est, dt, 1,
                        mu=mu_true, rng=np.random.default_rng(1000 * j + i))
        th, mh, sh = mle_ou(X[0], dt)
        theta_hats.append(th)
        mu_hats.append(mh)
    theta_hats = np.array(theta_hats)
    mu_hats = np.array(mu_hats)
    theory_std = sigma_true / (theta_true * np.sqrt(T_est))
    rows.append(
        f"| {T_est} | {np.median(theta_hats):.3f} | {np.mean(theta_hats):.3f} | "
        f"{np.std(theta_hats):.3f} | {np.std(mu_hats):.3f} | {theory_std:.3f} |"
    )

table = "\n".join([
    r"| $T_{\mathrm{est}}$ | median $\hat\theta$ | mean $\hat\theta$ | std $\hat\theta$ | std $\hat\mu$ | theory std $\hat\mu$ |",
    "|:---:|:---:|:---:|:---:|:---:|:---:|",
    *rows,
])
display(Markdown(table))
Figure 1: Finite-sample distributions of MLE estimates \hat\theta (top) and \hat\mu (bottom) for calibration windows of 2, 5, 10, and 20 years (\theta=1, \sigma=1, \mu=0). The upward bias in \hat\theta and the high variance of \hat\mu diminish as the calibration window lengthens.
 T_est |  median θ̂ |    mean θ̂ |     std θ̂ |     std μ̂ |  theory std μ̂
----------------------------------------------------------------------
     2 |      3.058 |      3.594 |      2.399 |      1.105 |          0.707
     5 |      1.748 |      1.956 |      1.087 |      0.473 |          0.447
    10 |      1.337 |      1.442 |      0.621 |      0.317 |          0.316
    20 |      1.153 |      1.207 |      0.380 |      0.226 |          0.224

2 The Estimation Penalty on Sharpe Ratio

2.1 Setup: one-shot estimation then trading

The trader observes the OU process on a calibration window [0, T_{\mathrm{est}}], estimates (\hat\theta, \hat\mu, \hat\sigma) via MLE, then trades on [T_{\mathrm{est}}, T_{\mathrm{est}} + T_{\mathrm{trade}}] using \hat\mu as the fair-value estimate.

From Part 2, the asymptotic Sharpe ratio when the trader operates with a constant bias M is

\mathrm{SR}_\infty = \sqrt{\frac{\theta}{2}} \cdot \frac{1}{\sqrt{1 + \xi^2}}, \qquad \xi = \frac{M}{s_\infty}, \qquad s_\infty = \frac{\sigma}{\sqrt{2\theta}}.

Part 2 also demonstrated that expected PnL is unaffected by a constant bias: the degradation acts purely through increased path variance. During the trading window, the estimation error in \hat\mu behaves exactly as a Part 2 constant bias. It is fixed at the moment calibration ends and is independent of the future Brownian increments on [T_{\mathrm{est}}, T_{\mathrm{est}} + T_{\mathrm{trade}}].

2.2 Variance of \hat\mu

For a stationary OU process with mean \mu, the sample mean \bar{X} = \frac{1}{T}\int_0^T X_t\,dt satisfies

\mathrm{Var}(\bar{X}) = \frac{1}{T^2}\int_0^T\!\int_0^T \mathrm{Cov}(X_s, X_t)\,ds\,dt = \frac{\sigma^2}{\theta^2 T}\left(1 - \frac{1 - e^{-\theta T}}{\theta T}\right).

This exact expression uses the stationary OU autocovariance. Our simulations below instead start from X_0 = \mu, so the finite-sample covariance kernel differs by a boundary term; the two setups share the same leading asymptotic behaviour, which is the quantity that matters for the long-horizon Sharpe penalty. When \theta T is large, therefore,

\mathrm{Var}(\hat\mu) \approx \mathrm{Var}(\bar{X}) \approx \frac{\sigma^2}{\theta^2 T_{\mathrm{est}}}.

Equivalently, in stationary-scale units \xi_{\hat{\mu}} = \hat\mu / s_\infty,

\mathbb{E}\!\left[\xi_{\hat{\mu}}^2\right] = \frac{\mathbb{E}[\hat\mu^2]}{s_\infty^2} \approx \frac{2}{\theta T_{\mathrm{est}}}.

2.3 The penalty formula

Part 3 showed that for a random bias drawn independently of the future price path, the constant-bias formula carries over after averaging over the bias distribution: one replaces \xi^2 by its second moment. Applying that result here with M = \hat\mu therefore yields

\boxed{ \mathrm{SR}_\infty = \sqrt{\frac{\theta}{2}} \cdot \frac{1}{\sqrt{1 + \mathbb{E}\left[\xi_{\hat{\mu}}^2\right]}} \approx \sqrt{\frac{\theta}{2}} \cdot \frac{1}{\sqrt{1 + \dfrac{2}{\theta T_{\mathrm{est}}}}} }

The penalty factor \left(1 + 2/(\theta T_{\mathrm{est}})\right)^{-1/2} depends only on the dimensionless product \theta T_{\mathrm{est}} — the number of mean-reversion timescales observed during calibration. To achieve a penalty smaller than 1/\sqrt{2} \approx 0.71, we need \theta T_{\mathrm{est}} > 2, meaning the trader must observe at least two full mean-reversion timescales. Table 1 shows the penalty across a range of calibration lengths.

Table 1: Estimation penalty as a function of calibration length \theta T_{\mathrm{est}}.
\theta T_{\mathrm{est}} Penalty OU half-lives in window
1 0.577 1.4
2 0.707 2.9
5 0.845 7.2
10 0.913 14.4
20 0.953 28.9
Show simulation code
# --- SR vs estimation window: theory + MC ---
theta_true, sigma_true, mu_true = 1.0, 1.0, 0.0
dt = 1 / 252
T_trade = 100         # long trading window for asymptotic SR
n_mc = 2000           # MC paths for estimation

T_est_grid = np.array([1, 2, 3, 5, 7, 10, 15, 20])
sr_theory = np.sqrt(theta_true / 2) / np.sqrt(1 + 2 / (theta_true * T_est_grid))
sr_mc = []

for T_est in T_est_grid:
    pnl_per_year = []
    qvY_per_year = []
    for i in range(n_mc):
        seed_est = int(10000 * T_est + i)
        X_full = simulate_ou(theta_true, sigma_true, T_est + T_trade, dt, 1,
                             mu=mu_true, rng=np.random.default_rng(seed_est))
        n_est_steps = int(T_est / dt)
        X_cal = X_full[0, :n_est_steps + 1]
        X_trade = X_full[:, n_est_steps:]  # keep 2D

        _, mu_hat, _ = mle_ou(X_cal, dt)

        # PnL and QV of Y (not X)
        Y = pnl_paths(X_trade, mu_hat, dt)
        pnl_per_year.append(Y[0, -1] / T_trade)
        dX_tr = np.diff(X_trade, axis=1)
        dY = -(X_trade[:, :-1] - mu_hat) * dX_tr
        qvY = np.sum(dY**2, axis=1)[0]
        qvY_per_year.append(qvY / T_trade)

    mean_pnl = np.mean(pnl_per_year)
    mean_qvY = np.mean(qvY_per_year)
    sr_mc.append(mean_pnl / np.sqrt(mean_qvY))

sr_mc = np.array(sr_mc)

# --- Plot: SR penalty vs theta * T_est ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

theta_T_fine = np.linspace(0.5, 25, 200)
penalty_theory = 1 / np.sqrt(1 + 2 / theta_T_fine)
axes[0].plot(theta_T_fine, penalty_theory, 'k--', lw=2,
             label='Theory: $(1 + 2/\\theta T_{\\mathrm{est}})^{-1/2}$')
penalty_mc = sr_mc / np.sqrt(theta_true / 2)
axes[0].scatter(theta_true * T_est_grid, penalty_mc, s=80, c='steelblue',
                zorder=5, edgecolors='black', label='Monte Carlo')
axes[0].set_xlabel(r'$\theta \, T_{\mathrm{est}}$')
axes[0].set_ylabel('Penalty factor')
axes[0].set_title('Estimation penalty on Sharpe ratio')
axes[0].legend()
axes[0].set_ylim(0.3, 1.05)

sr_fine = np.sqrt(theta_true / 2) * penalty_theory
axes[1].plot(theta_T_fine, sr_fine, 'k--', lw=2, label=r'Theory')
axes[1].scatter(theta_true * T_est_grid, sr_mc, s=80, c='darkorange',
                zorder=5, edgecolors='black', label='Monte Carlo')
axes[1].axhline(np.sqrt(theta_true / 2), color='gray', ls=':', lw=1,
                label=r'$\mathrm{SR}_\infty = \sqrt{\theta/2}$ (no error)')
axes[1].set_xlabel(r'$\theta \, T_{\mathrm{est}}$')
axes[1].set_ylabel(r'$\mathrm{SR}_\infty$')
axes[1].set_title('Asymptotic Sharpe ratio with estimation error')
axes[1].legend()

fig.tight_layout()
plt.show()

# Summary table as Markdown
rows = []
for T_est, sr in zip(T_est_grid, sr_mc):
    sr_th = np.sqrt(theta_true / 2) / np.sqrt(1 + 2 / (theta_true * T_est))
    rows.append(f"| {T_est} | {theta_true * T_est:.1f} | {sr:.4f} | {sr_th:.4f} |")

table = "\n".join([
    r"| $T_{\mathrm{est}}$ (yr) | $\theta T_{\mathrm{est}}$ | SR (MC) | SR (theory) |",
    "|:---:|:---:|:---:|:---:|",
    *rows,
])
display(Markdown(f"$\\mathrm{{SR}}_\\infty$ (no error): {np.sqrt(theta_true/2):.4f}\n\n" + table))
Figure 2: Estimation penalty on Sharpe ratio vs calibration length \theta T_{\mathrm{est}}. Left: penalty factor. Right: absolute SR. Monte Carlo dots closely track the theoretical curve (1 + 2/\theta T_{\mathrm{est}})^{-1/2}.
SR_∞ (no error): 0.7071
  T_est= 1yr  θT=  1.0  SR_mc=0.3790  SR_theory=0.4082
  T_est= 2yr  θT=  2.0  SR_mc=0.4774  SR_theory=0.5000
  T_est= 3yr  θT=  3.0  SR_mc=0.5471  SR_theory=0.5477
  T_est= 5yr  θT=  5.0  SR_mc=0.5990  SR_theory=0.5976
  T_est= 7yr  θT=  7.0  SR_mc=0.6151  SR_theory=0.6236
  T_est=10yr  θT= 10.0  SR_mc=0.6393  SR_theory=0.6455
  T_est=15yr  θT= 15.0  SR_mc=0.6649  SR_theory=0.6642
  T_est=20yr  θT= 20.0  SR_mc=0.6748  SR_theory=0.6742

3 The Correlation Channel is Transient

In Part 3, a correlated bias was shown to degrade expected PnL via -\theta\int_0^t \mathrm{Cov}(M_u, X_u)\,du, not just risk. A natural concern is whether the one-shot estimator \hat\mu — computed from the calibration window — introduces this additional penalty.

Asymptotically, it does not. Note that X_t for t > T_{\mathrm{est}} depends on the calibration window only through X_{T_{\mathrm{est}}}. Since \hat\mu is a function of \{X_s : s \leq T_{\mathrm{est}}\}, the covariance decays exponentially as the trading window progresses:

\mathrm{Cov}(\hat\mu, X_t) = \mathrm{Cov}\big(\hat\mu, X_{T_{\mathrm{est}}}\big) \cdot e^{-\theta(t - T_{\mathrm{est}})}, \qquad t > T_{\mathrm{est}}

The cumulative PnL degradation from the Part 3 covariance channel is therefore

\frac{\theta}{T_{\mathrm{trade}}}\int_{T_{\mathrm{est}}}^{T_{\mathrm{est}}+T_{\mathrm{trade}}} \mathrm{Cov}(\hat\mu, X_t)\,dt \;=\; \mathrm{Cov}(\hat\mu, X_{T_{\mathrm{est}}})\cdot\frac{1 - e^{-\theta T_{\mathrm{trade}}}}{T_{\mathrm{trade}}} \;\to\; 0

as T_{\mathrm{trade}} \to \infty. The same decaying covariance also enters the finite-horizon path risk through

\mathbb{E}\big[(X_t - \hat\mu)^2\big] = \mathbb{E}[X_t^2] + \mathbb{E}[\hat\mu^2] - 2\,\mathrm{Cov}(\hat\mu, X_t),

so the quadratic-variation correction is transient for the same reason. For one-shot estimation, the asymptotic Sharpe penalty therefore comes entirely from the Part 2 variance channel: \hat\mu acts as a random constant bias with \mathbb{E}[\hat\mu^2] \approx \sigma^2/(\theta^2 T_{\mathrm{est}}), yielding the clean formula (1 + 2/(\theta T_{\mathrm{est}}))^{-1/2}. Figure 2 confirms this result with Monte Carlo simulation.

4 Rolling Estimation

In practice, traders do not estimate \hat\mu once and trade indefinitely. In the simplest adaptive setting, they re-estimate continuously using a rolling window of length W. At each time t, the fair-value estimate is the trailing sample mean

M_t = \frac{1}{W}\int_{t-W}^{t} X_s\,ds,

which is a simple moving average (SMA), a trailing estimator of the kind analysed in Part 3. The weighting kernel w(s) = \mathbf{1}_{[0,W]}(s)/W is non-negative and integrates to one, so Part 3’s general theorem guarantees \mathrm{Cov}(M_t, X_t) > 0 at all times. Rolling re-estimation therefore creates the persistent correlated-bias penalty that the one-shot estimator avoids.

For a stationary OU process, the covariance between the rolling estimate and the current mispricing is available in closed form:

\Sigma_{XM}^{\mathrm{SMA}} = \mathrm{Cov}(M_t, X_t) = \frac{1}{W}\int_0^W \frac{\sigma^2}{2\theta} e^{-\theta u}\,du = \frac{\sigma^2}{2\theta^2 W}\bigl(1 - e^{-\theta W}\bigr).

Substituting this into the Part 3 covariance correction gives the asymptotic expected PnL rate

\mathbb{E}\!\left[\frac{Y_t}{t}\right] \xrightarrow[t\to\infty]{} \frac{\sigma^2}{2}\left[1 - \frac{1 - e^{-\theta W}}{\theta W}\right].

The stationary variance of the rolling estimate is likewise

\Sigma_{MM}^{\mathrm{SMA}} = \mathrm{Var}(M_t) = \frac{\sigma^2}{\theta^3 W^2}\Bigl(\theta W - (1 - e^{-\theta W})\Bigr).

Writing a = \theta W, the asymptotic Sharpe ratio of the rolling-SMA strategy can therefore be written in the scale-free form

\mathrm{SR}^{\mathrm{SMA}}_\infty = \sqrt{\theta}\,\frac{A(a)}{2\sqrt{B(a)}},

where

A(a) = 1 - \frac{1 - e^{-a}}{a}, \qquad B(a) = \frac{1}{2} - \frac{1 - e^{-a}}{a} + \frac{a - (1 - e^{-a})}{a^2}

Equivalently, the rolling penalty relative to the Part 1 benchmark is

P_{\mathrm{SMA}}(a) = \frac{\mathrm{SR}^{\mathrm{SMA}}_\infty}{\sqrt{\theta/2}} = \frac{A(a)}{\sqrt{2B(a)}}, \qquad a = \theta W

This exact SMA penalty depends only on the dimensionless window length \theta W, just as the one-shot penalty depends only on \theta T_{\mathrm{est}}. The two are not algebraically identical, however: the EMA-based shortcut (1 + 2/(\theta W))^{-1/2} captures the right large-window scaling, but the exact SMA penalty is slightly lower for short windows. That distinction is summarised in Table 2 and confirmed numerically in Figure 3.

Table 2: Penalty source comparison for one-shot vs rolling estimation.
Method Penalty Source
One-shot (calibrate then trade) (1 + 2/(\theta T_{\mathrm{est}}))^{-1/2} Part 2 variance channel only
Rolling SMA window W P_{\mathrm{SMA}}(\theta W) Part 3 covariance channel plus extra path risk

For one-shot estimation, only the Part 2 variance-channel penalty survives at long horizons (the covariance channel washes out), whereas the rolling estimator adds a permanent Part 3 covariance penalty — a structural cost of continuous re-estimation.

Show simulation code
# --- Rolling estimation: SMA bias simulation ---
theta_true, sigma_true = 1.0, 1.0
dt = 1 / 252
T = 50                 # total simulation length
n_paths = 2000
W_values = [0.5, 1, 2, 5, 10, 20]


def sma_penalty_exact(theta, W):
    a = theta * np.asarray(W, dtype=float)
    one_minus_exp = 1 - np.exp(-a)
    A = 1 - one_minus_exp / a
    B = 0.5 - one_minus_exp / a + (a - one_minus_exp) / a**2
    return A / np.sqrt(2 * B)


sr_rolling_mc = []

for W in W_values:
    W_steps = int(W / dt)
    X = simulate_ou(theta_true, sigma_true, T, dt, n_paths,
                    rng=np.random.default_rng(123))
    # Compute rolling SMA bias: M_t = (1/W) * integral_{t-W}^{t} X_s ds
    cumX = np.cumsum(X, axis=1) * dt
    M = np.zeros_like(X)
    for k in range(W_steps, X.shape[1]):
        M[:, k] = (cumX[:, k] - cumX[:, k - W_steps]) / W

    # Only measure SR after the window is fully populated
    burn = W_steps + int(5 / dt)
    if burn >= X.shape[1] - int(5 / dt):
        burn = W_steps
    X_eval = X[:, burn:]
    M_eval = M[:, burn:]
    T_eval = (X_eval.shape[1] - 1) * dt

    # SR = mean(annualised PnL) / sqrt(mean(annualised QV_Y))
    Y = pnl_paths(X_eval, M_eval, dt)
    mean_pnl = np.mean(Y[:, -1] / T_eval)
    dX_eval = np.diff(X_eval, axis=1)
    dY = -(X_eval[:, :-1] - M_eval[:, :-1]) * dX_eval
    qvY = np.sum(dY**2, axis=1)
    mean_qvY = np.mean(qvY / T_eval)
    sr_rolling_mc.append(mean_pnl / np.sqrt(mean_qvY))

sr_rolling_mc = np.array(sr_rolling_mc)

# --- Plot: rolling SR vs window ---
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left: exact rolling-SMA penalty vs theta*W
W_fine = np.linspace(0.3, 25, 200)
penalty_sma_theory = sma_penalty_exact(theta_true, W_fine)
axes[0].plot(theta_true * W_fine, penalty_sma_theory, 'k-', lw=2,
             label=r'Exact SMA theory')
penalty_rolling = sr_rolling_mc / np.sqrt(theta_true / 2)
axes[0].scatter(theta_true * np.array(W_values), penalty_rolling, s=80,
                c='forestgreen', zorder=5, edgecolors='black', label='MC (rolling SMA)')
axes[0].set_xlabel(r'$\theta \, W$')
axes[0].set_ylabel('Penalty factor')
axes[0].set_title('Rolling estimation penalty')
axes[0].legend()
axes[0].set_ylim(0.2, 1.05)

# Right: compare one-shot and rolling on the same dimensionless scale
one_shot_theory = 1 / np.sqrt(1 + 2 / W_fine)
axes[1].plot(W_fine, one_shot_theory, 'k--', lw=2, label='One-shot theory')
axes[1].plot(W_fine, penalty_sma_theory, color='forestgreen', lw=2, label='Rolling SMA theory')
axes[1].scatter(theta_true * T_est_grid, sr_mc / np.sqrt(theta_true / 2),
                s=80, c='steelblue', zorder=5, edgecolors='black',
                marker='s', label='One-shot (MC)')
axes[1].scatter(theta_true * np.array(W_values), penalty_rolling, s=80,
                c='forestgreen', zorder=5, edgecolors='black',
                marker='o', label='Rolling SMA (MC)')
axes[1].set_xlabel(r'Dimensionless history length ($\theta T_{\mathrm{est}}$ or $\theta W$)')
axes[1].set_ylabel('Penalty factor')
axes[1].set_title('One-shot vs rolling: same scale, different curves')
axes[1].legend()
axes[1].set_ylim(0.2, 1.05)

fig.tight_layout()
plt.show()

# Summary table as Markdown
rows = []
for W, sr in zip(W_values, sr_rolling_mc):
    th_penalty = sma_penalty_exact(theta_true, W)
    rows.append(
        f"| {W:.1f} | {theta_true * W:.1f} | {sr:.4f} | "
        f"{sr / np.sqrt(theta_true / 2):.3f} | {th_penalty:.3f} |"
    )

table = "\n".join([
    r"| $W$ (yr) | $\theta W$ | SR (MC) | Penalty (MC) | Penalty (exact SMA theory) |",
    "|:---:|:---:|:---:|:---:|:---:|",
    *rows,
])
display(Markdown(table))
Figure 3: Rolling SMA penalty vs window length. Left: Monte Carlo penalty factor for the rolling SMA against the exact stationary SMA theory. Right: one-shot and rolling penalties compared on the common scale variable, showing similar scaling but different exact curves.
  W=  0.5yr  θW=  0.5  SR_mc=0.2871  penalty_mc=0.406  penalty_theory=0.447
  W=  1.0yr  θW=  1.0  SR_mc=0.3777  penalty_mc=0.534  penalty_theory=0.577
  W=  2.0yr  θW=  2.0  SR_mc=0.4769  penalty_mc=0.674  penalty_theory=0.707
  W=  5.0yr  θW=  5.0  SR_mc=0.5890  penalty_mc=0.833  penalty_theory=0.845
  W= 10.0yr  θW= 10.0  SR_mc=0.6405  penalty_mc=0.906  penalty_theory=0.913
  W= 20.0yr  θW= 20.0  SR_mc=0.6721  penalty_mc=0.950  penalty_theory=0.953

5 Bias in \hat\theta

It is well known that the MLE overestimates \theta in finite samples: the median \hat\theta exceeds the true value, as shown in Figure 1. A natural question is whether this bias can degrade the realised Sharpe ratio in practice.

For the baseline strategy studied throughout this series, it does not enter the pathwise Sharpe ratio directly. The trading rule uses only the estimated fair value, with PnL

dY_t = -(X_t - \hat\mu)\,dX_t,

so replacing \theta by \hat\theta does not change the signal itself. In that sense the realised Sharpe ratio of the base strategy is invariant to speed-estimation error. A \theta-dependent sizing rule would be a different strategy and would generally reintroduce direct dependence on \hat\theta.

However, a trader who computes the predicted Sharpe ratio \mathrm{SR}_\infty = \sqrt{\hat\theta/2} will arrive at a systematic overestimate, since \hat\theta is upward-biased. The resulting overconfidence distorts capital allocation: more capital or higher leverage is deployed than the strategy’s actual risk-reward profile warrants. Risk budgets calibrated to the inflated SR become too tight, raising the probability of premature stop-outs during normal drawdowns. In a multi-strategy portfolio, the allocation problem is compounded: the strategy with the shortest calibration window appears most attractive precisely because its \hat\theta is the most inflated. Figure 4 quantifies this gap between predicted and realised SR for calibration windows of 2-20 years.

Show simulation code
# --- Overconfidence: predicted vs realised SR ---
theta_true, sigma_true = 1.0, 1.0
dt = 1 / 252
T_trade = 100
n_mc = 2000
T_est_values = [2, 5, 10, 20]

fig, axes = plt.subplots(1, len(T_est_values), figsize=(16, 4.5), sharey=True)

for j, T_est in enumerate(T_est_values):
    theta_hats = []
    pnl_ann_list = []
    qvY_ann_list = []
    for i in range(n_mc):
        seed = int(50000 * j + i)
        X_full = simulate_ou(theta_true, sigma_true, T_est + T_trade, dt, 1,
                             rng=np.random.default_rng(seed))
        n_est = int(T_est / dt)
        X_cal = X_full[0, :n_est + 1]
        X_trade = X_full[:, n_est:]

        th_hat, mu_hat, _ = mle_ou(X_cal, dt)
        theta_hats.append(th_hat)

        # Realised PnL and QV_Y for cross-path SR
        Y = pnl_paths(X_trade, mu_hat, dt)
        pnl_ann_list.append(Y[0, -1] / T_trade)
        dX_tr = np.diff(X_trade, axis=1)
        dY = -(X_trade[:, :-1] - mu_hat) * dX_tr
        qvY_ann_list.append(np.sum(dY**2, axis=1)[0] / T_trade)

    theta_hats = np.array(theta_hats)
    predicted_srs = np.sqrt(theta_hats / 2)
    realized_sr = np.mean(pnl_ann_list) / np.sqrt(np.mean(qvY_ann_list))
    theory_sr = np.sqrt(theta_true / 2) / np.sqrt(1 + 2 / (theta_true * T_est))

    ax = axes[j]
    ax.hist(predicted_srs, bins=50, density=True, alpha=0.7, color='steelblue',
            label=f'predicted $\\sqrt{{\\hat\\theta/2}}$')
    ax.axvline(np.sqrt(theta_true / 2), color='black', ls='--', lw=2,
               label=f'true SR = {np.sqrt(theta_true/2):.3f}')
    ax.axvline(realized_sr, color='crimson', ls=':', lw=2,
               label=f'realised = {realized_sr:.3f}')
    ax.axvline(np.median(predicted_srs), color='steelblue', ls='--', lw=1.5,
               label=f'pred median = {np.median(predicted_srs):.3f}')
    ax.set_title(f'$T_{{\\mathrm{{est}}}}$ = {T_est} yr')
    ax.set_xlabel(r'Sharpe ratio')
    if j == 0:
        ax.set_ylabel('density')
    ax.legend(fontsize=7.5, loc='upper right')
    ax.set_xlim(0, 2.5)

fig.suptitle(r'Overconfidence: predicted SR vs true and realised SR ($\theta=1, \sigma=1$)',
             fontsize=13)
fig.tight_layout()
plt.show()

# Summary table as Markdown
rows = []
for j, T_est in enumerate(T_est_values):
    theta_hats = []
    pnl_ann, qvY_ann = [], []
    for i in range(n_mc):
        seed = int(50000 * j + i)
        X_full = simulate_ou(theta_true, sigma_true, T_est + T_trade, dt, 1,
                             rng=np.random.default_rng(seed))
        n_est = int(T_est / dt)
        X_cal = X_full[0, :n_est + 1]
        X_trade = X_full[:, n_est:]
        th_hat, mu_hat, _ = mle_ou(X_cal, dt)
        theta_hats.append(th_hat)
        Y = pnl_paths(X_trade, mu_hat, dt)
        pnl_ann.append(Y[0, -1] / T_trade)
        dX_tr = np.diff(X_trade, axis=1)
        dY = -(X_trade[:, :-1] - mu_hat) * dX_tr
        qvY_ann.append(np.sum(dY**2, axis=1)[0] / T_trade)

    pred_median = np.median(np.sqrt(np.array(theta_hats) / 2))
    real_sr = np.mean(pnl_ann) / np.sqrt(np.mean(qvY_ann))
    true_sr = np.sqrt(theta_true / 2)
    rows.append(
        f"| {T_est} | {pred_median:.4f} | {true_sr:.4f} | {real_sr:.4f} | {pred_median - real_sr:.4f} |"
    )

table = "\n".join([
    r"| $T_{\mathrm{est}}$ (yr) | Median predicted | True SR | Realised SR | Overconfidence |",
    "|:---:|:---:|:---:|:---:|:---:|",
    *rows,
])
display(Markdown(table))
Figure 4: Distribution of predicted SR \sqrt{\hat\theta/2} vs true and realised SR for calibration lengths of 2, 5, 10, and 20 years. The median prediction systematically exceeds realised SR, with the gap narrowing as the calibration window lengthens.
 T_est | median predicted |    true SR |  realised SR |  overconfidence
----------------------------------------------------------------------
     2 |           1.2413 |     0.7071 |       0.3486 |          0.8927
     5 |           0.9465 |     0.7071 |       0.5990 |          0.3475
    10 |           0.8098 |     0.7071 |       0.6393 |          0.1705
    20 |           0.7642 |     0.7071 |       0.6758 |          0.0884

6 Discussion

The central result of this post is a closed-form expression for the one-shot estimation penalty on Sharpe ratio. When a trader calibrates the OU process on a window of length T_{\mathrm{est}} and then trades with the estimated fair value, the asymptotic Sharpe ratio is \mathrm{SR}_\infty = \sqrt{\theta/2}\,(1 + 2/(\theta T_{\mathrm{est}}))^{-1/2}, as confirmed by Monte Carlo in Figure 2 and tabulated in Table 1. The penalty depends entirely on \theta T_{\mathrm{est}}, the number of mean-reversion timescales observed during calibration. Slow mean reversion (small \theta) therefore demands proportionally longer calibration to achieve the same penalty level.

A key structural distinction separates one-shot estimation from rolling re-estimation. The one-shot estimator \hat\mu is fixed at the end of calibration and is independent of subsequent Brownian increments, so it enters as a Part 2 constant bias whose correlation with the trading-window process decays exponentially. For long trading horizons, the Part 3 covariance channel washes out entirely and only the variance penalty remains. The rolling estimator, by contrast, continuously updates \hat\mu using a trailing window of length W, creating a permanent positive correlation between the bias and the mispricing. Its exact SMA penalty is a different function of the dimensionless window length \theta W, even though both one-shot and rolling estimation are governed by the same basic principle: too little effective history relative to the mean-reversion timescale degrades risk-adjusted performance.

Taken together, Parts 1-4 give a clean decomposition of where Sharpe-ratio losses come from in this simple mean-reversion setting. Perfect information delivers the benchmark \sqrt{\theta/2}. A constant bias leaves expected PnL intact but raises path risk. A trailing estimator adds a covariance penalty on top of that. Finite-sample parameter estimation then links these abstract bias channels to a concrete statistical object: the amount of usable calibration history measured in OU timescales.