Short Rate Models (Part 5: Affine Term Structure Models I)

2026-03-17

1 Introduction

In the previous post, we focused on the practical side of the Vasicek model and saw that estimation can be much harder than the mathematics suggests at first sight. In this post we take a step back and ask a structural question: what common machinery makes the Merton and Vasicek models feel so similar despite their different drift terms? The answer is that both belong to the affine term structure family.

The affine framework is the bridge between the classical short-rate models in the first four posts and the richer models we will study next. It gives us a unified way to describe the state dynamics, the short rate, zero-coupon bond prices, and the yield curve. It also forces us to distinguish clearly between the physical measure, which is used for dynamics and estimation, and the risk-neutral measure, which is used for bond pricing. That distinction was deferred in the first four posts. We can no longer postpone it.

My goal here is not to present the affine term structure model at the level of a monograph. Instead, I want to rebuild the core formulas from the ground up in a way that an undergraduate STEM student can follow. Once the structure is clear, the later posts on long-run expectations, policy rules, local momentum, and macro-finance will all look like variations on the same theme rather than disconnected pieces of notation.

2 Notation

Let x_t \in \mathbb{R}^n denote the latent state vector. In the first four posts the state was one-dimensional and equal to the short rate itself. In the affine framework the short rate is only one function of the state. We write

r_t = \delta_0 + \delta_1^\top x_t

where \delta_0 is a scalar and \delta_1 \in \mathbb{R}^n is a loading vector. This is the first affine restriction: the short rate is an affine function of the state.

We also write the time to maturity as

\tau = T - t

and we use the same notation throughout the remaining posts. This is a small choice, but it reduces a large amount of algebra later because the bond-pricing functions depend on maturity only through \tau in the time-homogeneous setting that we study first.

3 Measures

The second structural idea is that we need two probability measures. Under the physical measure P, the state vector describes how rates and other factors evolve in real time. This is the measure used when we fit the model to historical data. Under the risk-neutral measure Q, the discounted price of a traded asset is a martingale. This is the measure used when we price zero-coupon bonds and interest-rate derivatives.

For the affine Gaussian case, the state dynamics under the two measures take the form

dx_t = \left(K_0^P + K_1^P x_t\right) dt + \Sigma dW_t^P

and

dx_t = \left(K_0^Q + K_1^Q x_t\right) dt + \Sigma dW_t^Q

The diffusion matrix \Sigma is written the same way in both equations only to keep the exposition focused. In later models, the mapping from P to Q can be more complicated. What matters here is the conceptual split. The historical law of motion and the pricing law of motion need not be the same.

This is exactly the gap we left open in the early posts. In Merton’s model we used one drift when we simulated short rates and another drift when we priced bonds, but we did not isolate the mechanism. In the affine framework that mechanism becomes explicit.

4 Pricing

Under the risk-neutral measure, the price of a zero-coupon bond satisfies

P(t, T) = \mathbb{E}_t^Q \left[\exp\left(-\int_t^T r_s ds\right)\right]

Rather than solving this expectation directly for every model, the affine approach starts from a conjecture. We assume the bond price has the form

P(t, T) = \exp\left(A(\tau) - B(\tau)^\top x_t\right)

where A(\tau) is a scalar and B(\tau) \in \mathbb{R}^n. This is the second affine restriction: the logarithm of the bond price is affine in the state.

Why is this a sensible guess? Because in both the Merton and Vasicek posts the bond price already looked like an exponential of a constant term minus a loading on the current short rate. The affine framework says that this is not an accident. It is the generic form generated by linear state dynamics together with an affine short rate.

5 Derivation

To derive the equations for A(\tau) and B(\tau), let us treat the bond price as a function of time and state and apply the pricing partial differential equation under Q. If

P(t, T, x) = \exp\left(A(\tau) - B(\tau)^\top x\right)

then the gradient with respect to the state is

\nabla_x P = -P B(\tau)

and the Hessian is

\nabla_x^2 P = P B(\tau) B(\tau)^\top

The risk-neutral pricing equation is

\frac{\partial P}{\partial t} + \left(K_0^Q + K_1^Q x\right)^\top \nabla_x P + \frac{1}{2}\operatorname{tr}\left(\Sigma \Sigma^\top \nabla_x^2 P\right) - r(x) P = 0

Since \tau = T - t, differentiating with respect to t is the same as differentiating with respect to \tau and then changing the sign. After substituting the affine ansatz and dividing by P, we obtain

-A'(\tau) + B'(\tau)^\top x - \left(K_0^Q + K_1^Q x\right)^\top B(\tau) + \frac{1}{2} B(\tau)^\top \Sigma \Sigma^\top B(\tau) - \delta_0 - \delta_1^\top x = 0

This equation must hold for every possible state vector x. Therefore the constant term and the term multiplying x must vanish separately. We get the coupled ordinary differential equations

A'(\tau) = -K_0^{Q \top} B(\tau) + \frac{1}{2} B(\tau)^\top \Sigma \Sigma^\top B(\tau) - \delta_0

and

B'(\tau) = K_1^{Q \top} B(\tau) + \delta_1

with boundary conditions

A(0) = 0

and

B(0) = 0

It is worth pausing here. Once these two equations are solved, the rest of bond pricing is finished. The entire term structure is encoded in A(\tau) and B(\tau). The hard part of an affine model is therefore shifted from repeated pricing integrals to solving a system of Riccati equations once and for all.

6 Yields

The continuously compounded zero-coupon yield is

y_t(\tau) = -\frac{1}{\tau}\log P(t, t + \tau)

Substituting the affine bond-price formula gives

y_t(\tau) = -\frac{A(\tau)}{\tau} + \frac{B(\tau)^\top}{\tau} x_t

Define

a(\tau) = -\frac{A(\tau)}{\tau}

and

b(\tau) = \frac{B(\tau)}{\tau}

Then the yield curve becomes

y_t(\tau) = a(\tau) + b(\tau)^\top x_t

This is one of the most useful formulas in the entire term structure literature. It tells us that the yield curve is a linear observation equation for the latent state. Under the physical measure the state follows a transition equation. Under the risk-neutral measure the same state produces bond prices and yields. Once we write the model this way, state-space estimation methods such as the Kalman filter become natural rather than mysterious.

7 Examples

The Merton and Vasicek models now fall out as one-factor cases.

For Merton, let the state be the short rate itself. Then n = 1, \delta_0 = 0, \delta_1 = 1, K_0^Q = \mu^Q, K_1^Q = 0, and \Sigma = \sigma. The loading equation becomes

B'(\tau) = 1

so B(\tau) = \tau. Substituting this into the equation for A(\tau) gives

A(\tau) = -\frac{\mu^Q \tau^2}{2} + \frac{\sigma^2 \tau^3}{6}

which is exactly the bond-pricing formula we derived directly in Part 1.

For Vasicek, we again use a one-dimensional state with \delta_0 = 0 and \delta_1 = 1, but now K_0^Q = \kappa^Q \theta^Q and K_1^Q = -\kappa^Q. The loading equation becomes

B'(\tau) = 1 - \kappa^Q B(\tau)

whose solution is

B(\tau) = \frac{1 - e^{-\kappa^Q \tau}}{\kappa^Q}

Substituting this into the equation for A(\tau) reproduces the Vasicek bond-price expression from Part 3. The point is not just that we can recover the old formulas. The point is that we can now see why the formulas had the same general shape.

8 Interpretation

The affine term structure model separates three ideas that were intertwined in the first four posts. The first is state dynamics under P, which is what we estimate from time-series data. The second is pricing dynamics under Q, which is what determines bond prices. The third is the observation equation that maps latent factors into yields at different maturities.

This separation is exactly what we need for the rest of the series. Long-run expectations models change the economic interpretation of the factors and the market prices of risk. Policy-rule models add structure to the short end of the curve. Local momentum models change the transition dynamics. Macro-finance models enlarge the state vector and let macro variables enter the transition and observation equations together. The affine framework does not solve every problem, but it gives every later problem a common language.

9 Wrapping Up

In this post we derived the affine bond-pricing formula from the risk-neutral pricing equation and showed that Merton and Vasicek are special cases of the same state-space structure. The main takeaway is that the yield curve is an affine observation on a latent state whose dynamics under P need not coincide with its pricing dynamics under Q.

In the next post, we will turn this mathematics into code. We will build the package scaffold that separates structural model definitions from simulation and estimation, and we will show how the same interface recovers the classical models while preparing us for the longer-run expectation and macro-finance papers that follow.