2025-12-14
I am starting a series on the signature method now that I have some free time on my hands over the 2025 holiday season. In this post I talk about what motivates me to study the signature method, and as the series progresses I will build up examples and applications to see how well it works in practice. The signature method has deep connections with rough path theory and Lie algebra, neither of which I have studied enough to write about intelligently. Hopefully, towards the end of the series I will have accumulated enough knowledge to write about their connections to the signature method.
In quantitative finance we often deal with trajectories: prices evolving, spreads widening then snapping back, volume or volatility arriving in bursts, and signals activating in sequences. However, many modeling pipelines force trajectories into a small set of hand-crafted summaries such as moving averages and volatility, crossovers, rolling betas, event counts, or regime flags. These features are often effective, but they mostly work by compressing a path into a few numbers, and any information about ordering and interactions across variables (channels) is mostly lost. That loss matters because many market phenomena are genuinely path-dependent.
For example, volatility clustering depends on the recent sequence of shocks, not just the net move. We are often interested in “who moved first” (and whether there is a predictive lead–lag relationship) by looking at cross-impact. Intraday microstructure effects such as bid–ask bounce and spread dynamics depend on ordering as well. These patterns are often difficult to capture with a small set of scalar summaries.
The signature method starts from a different premise: instead of choosing a small list of summaries, we build a systematic feature map for paths. The goal is to represent a multivariate time series window as a feature vector that retains time-ordered interactions, so that downstream models can learn path-dependence without requiring us to manually enumerate and craft interaction patterns.
Another useful way to motivate the method before defining it is to view it as a path analogue of polynomial feature maps.
For scalar inputs x, we can use the polynomial features
\phi(x) = (1, x, x^2, x^3, \dots)
to approximate many nonlinear targets f(x) using a linear model in the truncated (up to m-th order polynomial) feature space:
f(x) \approx \langle w,\ \phi^{\le m}(x)\rangle
For sequential inputs X_{[a, b]} = \left\{X_t \vert t \in [a, b], X_t \in \mathbb{R}^d \right\}, we seek an analogous map from a window of a multivariate time series (a path) to a feature vector, such that many continuous, path-dependent targets can be approximated by linear functionals of those features:
F(X_{[a,b]}) \approx \langle \ell,\ S^{\le m}(X_{[a,b]})\rangle
Signatures provide precisely such features S(X_{[a,b]}). Similar to polynomial features, the full signature is infinite dimensional, and the approximation is done using the truncated signature up to level m. Here S^{\le m}(X) is the truncated signature up to level m, a finite-dimensional vector.
More precisely, the signature has the following universal approximation property:
Let \mathcal{P} be a set of d-dimensional paths on [a,b] (for example, continuous paths of bounded variation), and let S(X_{[a,b]}) denote the (full) signature of a path X on [a,b]. Fix a compact subset K\subset\mathcal{P} (compact in a topology where the signature map is continuous).
For any continuous functional F:K\to\mathbb{R} and any \varepsilon>0, there exist: - a truncation depth m, and - a coefficient vector \ell supported on signature coordinates up to level m,
such that
\sup_{X\in K}\left|F(X)-\langle \ell,\ S^{\le m}(X)\rangle\right|<\varepsilon
Here \ell is simply a vector in the same space as the truncated signature up to level m, and \langle \ell, S^{\le m}(X)\rangle means an ordinary dot product: it is a finite linear combination of signature coordinates,
\langle \ell, S^{\le m}(X)\rangle= \ell_{\varnothing}\cdot 1+ \sum_{i} \ell_{(i)}\, S^{(i)}(X)+ \sum_{i,j} \ell_{(i,j)}\, S^{(i,j)}(X) + \cdots + \sum_{i_1,\dots,i_m} \ell_{(i_1,\dots,i_m)}\, S^{(i_1,\dots,i_m)}(X)
At first glance, signatures can sound like a silver bullet that eliminates the need to hand-craft time-series-based trading signals: if many path-dependent targets can be approximated by linear functionals of signatures, why not dump our panel data into a signature transformer, take the outputs, and run a linear regression? In practice, there are still important design choices to make in order to use signature features effectively:
In the next post we will define signatures explicitly and provide a few examples.