Volatility Forecasts (Part 7 - Multi-Dimensional VolGRU)

2026-03-06

1 Introduction

Part 6 showed that STES can be written as a constrained scalar GRU and then studied what happens when those scalar restrictions are removed one at a time. That analysis still forced the latent state to be one-dimensional: all memory about the past had to live in a single number.

This post asks what changes when that restriction is removed. Instead of a scalar hidden state, we allow VolGRU to keep a vector-valued hidden state. That gives the model more room to store different kinds of information at once, such as short-run shocks, slower-moving persistence, or other latent components that a scalar state must compress into one summary.

The central questions are: what the multi-dimensional generalization looks like mathematically, how it relates back to the scalar model from Part 6, and whether the extra state capacity produces a meaningful forecasting gain.

2 Why Scalar State Is Restrictive

A scalar hidden state is a strong bottleneck. It forces the model to summarize the entire relevant past in one number. In Part 6, that made the reset mechanism nearly inert and limited how useful richer candidate dynamics could become.

A multi-dimensional hidden state relaxes that bottleneck. Instead of one running summary, the model can keep several latent components at once. The natural empirical question is whether those extra dimensions capture genuinely useful structure or merely add flexibility without payoff.

3 Multi-Dimensional VolGRU

Let the hidden state now be a vector h_t \in \mathbb{R}^d. The update equation becomes

h_t = (1-z_t) \odot h_{t-1} + z_t \odot \tilde h_t,

where the gates and candidate are also vector-valued or otherwise dimension-aware. The key conceptual change is that the model is no longer forced to store all latent volatility information in one scalar state.

The exact gate parameterization used in this post will determine how much of that flexibility is shared across dimensions and how much is specific to each dimension.

4 Relation to Part 6 and STES

The scalar model from Part 6 is recovered when d=1. STES remains a special case, but now only as a highly degenerate corner of the larger architecture: one state dimension, fixed candidate, no active reset behavior, and a tightly restricted gate.

That means Part 7 is not a completely new modeling direction. It is the next architectural step in the same sequence: first identify STES as a constrained scalar GRU, then ask what becomes possible once the hidden state is allowed to be genuinely multi-dimensional.

5 Planned Empirical Design

The initial design goal is to keep the same SPY forecasting protocol used in Parts 5 and 6 while changing only the architecture. That isolates the effect of state dimension before introducing a new target or a multi-asset covariance problem.

This suggests a first multi-dimensional experiment in which the target remains next-day squared return, the augmented feature set remains unchanged, and the main comparison is between scalar VolGRU and multi-dimensional VolGRU variants with increasing state capacity.

6 Planned Model Sequence

A natural ordered sequence for this post is:

  1. Scalar baseline carried over from Part 6.
  2. Multi-dimensional hidden state with the simplest shared gate structure.
  3. Multi-dimensional state with state-dependent gating.
  4. Multi-dimensional state with active reset behavior.
  5. Multi-dimensional state with a richer nonlinear candidate map.

This sequence preserves the spirit of the Part 6 ladder while making the architectural leap in a controlled way.

7 Implementation Notes

Before the empirical analysis can be run, the VolGRU implementation needs to support state_dim > 1 while preserving the scalar reduction when state_dim=1. That means updating the backend model code, verifying shape compatibility, and checking that the scalar case still reproduces the Part 6 behavior.

Once that is in place, the notebook can reuse the feature pipeline and evaluation framework from Part 6, then add diagnostics that are meaningful only in the multi-dimensional setting, such as hidden-state trajectories, dimension usage, and reset-gate activity.