Graph-Regularized Low-Rank Models (Part 1: JGB Curve)

Regression and PCA in finance typically ignore two sources of structure that practitioners know exist: the relationships between variables and the dependence along the time index. A cross-sectional regression of returns on characteristics treats each predictor as exchangeable; PCA of a yield panel treats each tenor as an unordered column. In reality, variables cluster into hierarchies—sector, asset class, maturity bucket—and observations are temporally ordered. I have used graph-based priors in both regression and PCA settings to encode this exogenous information, for example using hierarchical graphs to group related variables together so that coefficient or loading estimates borrow strength within economically meaningful clusters.

Econometric models take the opposite approach: VARs, state-space models, and panel regressions handle time-series and cross-sectional dependence natively, but they impose parametric structure on the dynamics—linearity, fixed lag order, homogeneous coefficients—that can be unrealistic when the data-generating process is nonstationary or high-dimensional.

Graph-regularized dimensionality reduction sits between these two traditions. The approach preserves the nonparametric flexibility of PCA while injecting structural priors through Laplacian penalties, and it can encode both feature relationships and temporal continuity simultaneously. I came across a paper on Graph-Regularized Generalized Low-Rank Models (Paradkar, Udell. (2017)) that formalises this idea within a unified GLRM framework—supporting arbitrary loss functions, missing data, and two-sided graph penalties. The generality of the framework is what attracted me: the same objective can express PCA with temporal smoothing, PCA with hierarchical feature grouping, or both at once, and it extends naturally to non-Gaussian losses.

I built the grpca package to implement this framework with two normalisation techniques—spectral Laplacian scaling and spectral data scaling—that Paradkar and Udell did not use. These normalizations make the regularization hyperparameters dimensionless and portable across rolling windows with different sample sizes, variance levels, and graph topologies, which is essential for the kind of rolling out-of-sample evaluation used in this series.

The present post is the first in a planned series. I chose the JGB yield curve as the starting point because it is one of the first applications of PCA that people in finance learn about—a low-dimensional, well-understood setting where readers can build intuition before the framework is applied to harder problems. The yield curve is deliberately too easy for feature-side regularization: with only 11 tenors, PCA loadings are already smooth, and the interesting question is whether temporal regularization of the score matrix provides out-of-sample gains. Later posts will move to higher-dimensional settings where feature-side graph priors should matter more, with the eventual goal of incorporating graph-regularized PCA into nowcasting frameworks such as FAVAR, where the factor extraction step is standard PCA and replacing it with GR-PCA could improve both the factors and the downstream forecasts.

Principal component analysis is the minimum-variance linear projector and coincides with the maximum-likelihood estimator under isotropic Gaussian noise, yet it makes no use of the ordered tenor geometry or temporal continuity of yield-curve data. Graph-regularized PCA adds Laplacian roughness penalties—on the maturity axis, the time axis, or both—to the PCA objective and introduces a structure-aware bias-variance trade-off. This note evaluates PCA and GR-PCA on Japanese Ministry of Finance constant-maturity JGB yields with short rolling train-test windows and frozen out-of-sample reconstruction. The central question is which graph channel—tenor smoothness, temporal smoothness, or both—provides the most benefit, and whether any effect is concentrated in higher components.

1 Theoretical underpinning

Let X\in\mathbb{R}^{n\times p} denote the data matrix, with rows indexed by observations and columns indexed by features. Missing entries are allowed and are encoded as np.nan. We write \Omega\subset\{1,\dots,n\}\times\{1,\dots,p\} for the observed-entry set and use P_\Omega(\cdot) for the projection that keeps observed entries and sets missing entries to zero. After standardization, the working matrix is Z.

Given this notation, the estimator minimizes a masked two-sided graph-regularized low-rank objective:

\min_{U\in\mathbb{R}^{n\times k},\,V\in\mathbb{R}^{p\times k}}\; \tfrac{1}{2\,s_X}\left\|P_\Omega\!\left(Z-UV^\top\right)\right\|_F^2 \; +\; \tfrac12\lambda_s\,\operatorname{tr}(U^\top \widetilde{L}_s\, U) \; +\; \tfrac12\lambda_f\,\operatorname{tr}(V^\top \widetilde{L}_f\, V) \; +\; \tfrac12\mu_u\|U\|_F^2 \; +\; \tfrac12\mu_v\|V\|_F^2.

The factor U contains observation embeddings, V contains feature loadings, \widetilde{L}_s is the observation-side graph Laplacian, and \widetilde{L}_f is the feature-side graph Laplacian. The ridge terms with \mu_u,\mu_v>0 improve coercivity and numerical conditioning.

Each graph entering the objective is assumed to be undirected and possibly weighted, with a symmetric positive-semidefinite Laplacian L = D - W where W is the weighted adjacency matrix and D is the degree matrix. No specific topology is assumed: the framework accommodates chains, trees, complete graphs, k-nearest-neighbour graphs, or any other undirected weighted graph. The only mathematical requirements are symmetry (W = W^\top), positive semidefiniteness (L \succeq 0, guaranteed by construction for any combinatorial Laplacian), and separability (the observation graph L_s \in \mathbb{R}^{n \times n} and the feature graph L_f \in \mathbb{R}^{p \times p} are specified independently). The Laplacian quadratic form \operatorname{tr}(U^\top L_s\, U) penalises disagreement between connected observations in the latent space, and \operatorname{tr}(V^\top L_f\, V) penalises disagreement between connected features. The strength of the penalty between any pair (i,j) is proportional to the edge weight w_{ij}; pairs with no edge contribute nothing.

Paradkar & Udell (2017) use raw combinatorial Laplacians L = D - W and unscaled data terms. When the same model is applied across multiple data windows of different sizes, this creates a practical problem: the magnitude of the graph penalty grows with \lambda_{\max}(L), which depends on graph size and edge weights, while the magnitude of the data-fidelity term scales with the variance and dimension of the data matrix. A fixed \lambda that provides moderate regularization in one setting can be negligible or overwhelming in another. Our implementation introduces two normalizations that make the hyperparameters dimensionless and portable. First, every graph Laplacian entering the objective is divided by its largest eigenvalue: \widetilde{L} = L / \lambda_{\max}(L), so that \lambda_{\max}(\widetilde{L}) = 1. (The spectral normalization is distinct from the symmetric normalized Laplacian D^{-1/2}LD^{-1/2}; our normalization applies to whatever Laplacian the user supplies, including already degree-normalized ones.) With this convention, the quadratic form \operatorname{tr}(V^\top \widetilde{L} V) lies in [0, \|V\|_F^2] for any graph, so the regularization strength has a graph-independent upper bound on its effect. Second, the reconstruction term is divided by s_X = \sigma_1^2(Z_{\text{fill}}), the squared leading singular value of the zero-filled standardized data matrix, scaling the data-fidelity term to order one regardless of sample size or variance level. Together, these normalizations mean that a given \tau value represents roughly the same bias-variance trade-off regardless of sample size, data variance, or graph topology. The combined normalization allows a single hyperparameter grid to be used across multiple estimation windows without per-window rescaling.

The factorization UV^\top is not unique: for any orthogonal R\in\mathbb{R}^{k\times k}, the rotated pair (UR,VR) produces the same product. Individual columns of V are therefore not uniquely identified as fixed population eigenvectors, and raw column-wise comparisons across estimation windows can be misleading. Meaningful comparison operates at the subspace level or after Procrustes alignment. Ridge penalties on both U and V improve numerical conditioning but do not resolve this rotational ambiguity. To simplify the hyperparameter search space, we parameterize regularization by total strength \tau\ge 0 and allocation \rho\in[0,1]:

The reparametrization separates overall shrinkage intensity from how shrinkage is split between observation-side smoothness (U) and feature-side smoothness (V). Since both Laplacians and data are normalized, \tau is dimensionless, comparable across windows, and more interpretable than tuning raw penalty magnitudes independently.

Paradkar & Udell (2017) solve Graph-Regularized GLRMs with PALM (Proximal Alternating Linearized Minimization), a framework designed for general (possibly non-smooth) loss functions such as Huber, hinge, or ordinal losses. Each PALM iteration linearizes the smooth data-fidelity term around the current iterate and applies a loss-specific proximal operator, with convergence guaranteed under a Lipschitz step-size rule. Restricting to squared-error loss makes the objective everywhere differentiable in (U,V), and the proximal operator reduces to the identity. The full PALM machinery is therefore unnecessary. Our implementation (grpca) replaces it with alternating gradient descent with monotone backtracking line search: each outer iteration computes the closed-form gradient

\nabla_U = \frac{1}{s_X}\,P_\Omega\!\bigl(UV^\top\!-Z\bigr)\,V + \lambda_s\,\widetilde{L}_s\,U + \mu_u\,U, \qquad \nabla_V = \frac{1}{s_X}\,P_\Omega\!\bigl(UV^\top\!-Z\bigr)^\top U + \lambda_f\,\widetilde{L}_f\,V + \mu_v\,V,

takes a gradient step U \leftarrow U - \eta\,\nabla_U (then V \leftarrow V - \eta\,\nabla_V), and halves the step size \eta until the objective decreases monotonically. Convergence to a stationary point follows from the standard descent-lemma argument for smooth bi-convex problems. Unlike Paradkar & Udell’s general framework, which optionally includes \ell_1 sparsity on U or V, we use dense (unpenalized) factors throughout—appropriate for settings where all features contribute to each latent factor. We initialize factors with a truncated SVD of the zero-filled matrix, providing a good starting point that typically converges in tens of iterations.

The two penalty terms have natural economic readings in the yield-curve context. The observation-side penalty on U encodes the prior that factor exposures—the market’s aggregate assessment of the policy rate (level), the term premium (slope), and convexity demand (curvature)—evolve smoothly from one trading day to the next. This is the same “slow-moving factor” assumption that underlies Kalman-filtered dynamic factor models (Stock and Watson, Doz et al.), where the state-transition equation u_t = u_{t-1} + \eta_t imposes temporal continuity; the day-chain penalty strength \tau\rho plays an analogous role to the signal-to-noise ratio in that framework. The feature-side penalty on V encodes the prior that adjacent maturities load similarly on each factor—a discrete, nonparametric analogue of the smooth loading functions B(\tau) in affine term-structure models, where the Ricatti ODEs guarantee that loadings vary continuously with maturity. Nelson-Siegel achieves this smoothness parametrically through exponential-decay basis functions; the tenor-chain Laplacian achieves it nonparametrically. Finally, the masked reconstruction loss has a direct financial application: it asks “given 10 of 11 tenors, how well can the model predict the held-out tenor?”—precisely the question a relative-value trader poses when identifying rich or cheap points on the curve, or a risk manager faces when interpolating illiquid tenor points from liquid ones.

2 JGB experiment setup

The data are Japanese Ministry of Finance constant-maturity JGB yields. We work with a stable subset of tenors (in years) and compute daily yield changes. We standardize each rolling window using train-window statistics only; the train mean and standard deviation are applied to the test window with no leakage.

We instantiate the framework with two chain graphs that encode the known structure of yield-curve data. The tenor-chain (feature graph) connects adjacent maturities with edge weights inversely proportional to the squared maturity gap:

w_{i,i+1}^{(f)} = \frac{1}{(\Delta m_{i})^2}, \qquad \Delta m_i = m_{i+1} - m_i,

where m_i is the maturity in years of the i-th tenor. The weighting gives tighter coupling between closely spaced tenors (e.g., 1Y–2Y) and weaker coupling between widely spaced ones (e.g., 20Y–30Y). The Laplacian quadratic form admits the pairwise-difference identity

\operatorname{tr}(V^\top \widetilde{L}_f\, V) = \tfrac{1}{2}\sum_{i} \tilde{w}_{i,i+1}^{(f)}\,\|v_i - v_{i+1}\|_2^2,

which penalizes high-frequency oscillation in the loading vectors: neighboring tenors are encouraged to have similar loading coordinates. The day-chain (observation graph) connects consecutive trading dates with unit weight. The day-chain is rebuilt on each training window so that its dimension matches the number of training observations. The analogous quadratic form

\operatorname{tr}(U^\top \widetilde{L}_s\, U) = \tfrac{1}{2}\sum_{t} \tilde{w}_{t,t+1}^{(s)}\,\|u_t - u_{t+1}\|_2^2

penalizes abrupt jumps in the latent score trajectories between adjacent dates, encoding temporal continuity of the factor structure.

Model	Observation graph	Feature graph	CV search
PCA	—	—	\tau = 0 (no CV)
Day	day-chain	—	\tau with \rho = 1
Tenor	—	tenor-chain	\tau with \rho = 0
Day+Tenor	day-chain	tenor-chain	\tau and \rho \in \{0.25, 0.5, 0.75\}

Nested time-series cross-validation selects (\tau, \rho) window by window. Before specifying hypotheses, we examine the data and graph structure that these penalties will act on.

Figure 1: Cross-tenor average daily yield change over the sample period

Figure 1 reveals several distinct volatility regimes driven by Bank of Japan policy shifts. The 2013 Quantitative and Qualitative Easing (QQE) announcement—Kuroda’s “shock-and-awe” bond-buying programme—triggered a 10Y limit-down and an abrupt widening of daily yield moves across the curve. Three years later the introduction of Yield Curve Control (YCC) in September 2016 explicitly capped the 10Y yield at approximately 0%, compressing daily volatility to near-zero for several years. The unwinding of that regime from December 2022 (YCC band widened to ±50 bp) through March 2024 (exit from YCC) produced a volatility resurgence, particularly at intermediate tenors.

Two features are relevant for the graph-regularization setup. First, the regime heterogeneity motivates per-window standardisation (train-window mean and variance applied without leakage) and short one-year rolling windows that adapt to local market conditions. Second, the visible low-frequency structure in daily yield changes—slow regime evolution punctuated by policy-driven jumps—suggests the day-chain temporal-smoothness prior should be well-calibrated: true factor scores evolve on weekly-to-monthly time scales, so penalising day-to-day score jumps is unlikely to remove genuine signal. The day-chain itself is simply a unit-weighted path graph on n consecutive training dates and needs no separate visualisation; its only role is to encode the ordering of observations.

Figure 2 displays the tenor-chain adjacency matrix W that enters the feature-side Laplacian L_f = D - W. The inverse-squared-gap weighting produces a tight cluster at 7Y–8Y–9Y (gap = 1 year, weight = 1.0) and much weaker coupling between distant tenors such as 20Y–30Y (gap = 10 years, weight = 0.01). The penalty therefore concentrates its effect on tenors that already co-move almost perfectly—precisely the region where PCA loadings are smoothest and least in need of regularisation.

There is also a structural mismatch with the JGB policy environment. Under YCC the BOJ intervened specifically at the 10Y tenor, decoupling it from its nearest neighbours and creating a kink in the yield curve that traditional chain-graph smoothing would treat as noise. The wider maturity gaps where such policy-induced discontinuities appear receive the least penalty weight (0.04 for 5Y–7Y, 0.01 for 20Y–30Y). Whether this graph topology provides any variance reduction, or merely adds bias by smoothing through genuine structural breaks, is the subject of the hypotheses below.

Figure 2: Adjacency matrix for the tenor-chain feature graph

3 Hypotheses

Variance-reduction channel. In one-year rolling windows with p\approx 11 tenors, estimation noise can destabilise higher-order loading vectors. Tenor-side Laplacian shrinkage should primarily reduce loading roughness \operatorname{tr}(V^\top L_f V), while day-side shrinkage should primarily reduce score roughness \operatorname{tr}(U^\top L_s U). The low-frequency regime structure visible in Figure 1 suggests the day-chain smoothness prior is well-matched to the data-generating process: if the true score trajectory varies on weekly-to-monthly scales, penalising daily jumps removes noise without distorting signal. Economically, this encodes the belief that JGB factor exposures—policy-rate expectations, term-premium dynamics, supply-demand imbalances—do not jump overnight except during explicit policy surprises, an assumption broadly consistent with the low-frequency evolution of BOJ accommodation since 1999. If this translates to out-of-sample gains, the effect should be concentrated in the third component where the eigengap is smallest.

Bias-dominance channel. If the underlying signal is already smooth and well identified at the subspace level, additional shrinkage increases bias without sufficient variance reduction. The tenor-chain structure in Figure 2 concentrates penalty weight on the 7Y–9Y cluster where PCA loadings are already near-smooth, suggesting the bias-dominance channel may prevail for the feature-side graph in this low-dimensional setting.

These predictions are testable through roughness diagnostics, two window-to-window stability metrics—subspace distance \sin\theta_{\max}, the sine of the largest principal angle between consecutive loading subspaces (Davis–Kahan), and Procrustes drift \min_R \|V_t - V_{t-1}R\|_F, the Frobenius residual after optimal orthogonal alignment—and masked reconstruction error. The exercise is explicitly falsifiable and does not presuppose improvement. We test these predictions with full-sample fits followed by rolling out-of-sample evaluation at two window lengths.

4 Full-sample study

Before rolling evaluation, we fit four GLRM specifications on the full sample with k=3 components. Three of the four models use nested time-series cross-validation to select their hyperparameters; PCA is the unregularized baseline. The models are:

We align loadings from each model to the PCA baseline via orthogonal Procrustes before comparison.

	day_graph	tenor_graph	tau	rho	lambda_s	lambda_f	recon_error	loading_roughness	score_roughness
PCA	False	False	0.0000	0.0000	0.0000	0.0000	0.9874	0.0011	5908694.7152
Day	True	False	0.0100	1.0000	0.0100	0.0000	0.9728	0.0024	125820.9565
Tenor	False	True	0.0000	0.0000	0.0000	0.0000	0.9874	0.0011	5908694.7152
Day+Tenor	True	True	0.0300	0.5000	0.0150	0.0150	0.9734	0.0023	66555.2307

In GLRM the factorisation UV^\top is invariant under orthogonal rotation, so individual columns of V are not uniquely identified. The loading comparison above first aligns each model to the PCA baseline via orthogonal Procrustes, which finds the rotation R^\star=\arg\min_R\|V_{\text{PCA}}-V_{\text{model}}R\|_F subject to R^\top R=I. Differences that remain after alignment reflect genuine changes in the factor geometry, not relabelling or sign ambiguity.

The Procrustes-aligned loading profiles are consistent with the standard yield-curve factor taxonomy. Component 1 is a level factor: positive loadings across all tenors with a broad plateau at intermediate maturities, corresponding to parallel shifts driven by the common component of rate expectations. Component 2 is a slope factor: monotonically decreasing from positive at the short end to negative at the long end, capturing the steepening and flattening moves associated with BOJ accommodation cycles and term-premium variation. Component 3 is a curvature (butterfly) factor: a hump-shaped profile peaking around 5Y–7Y, reflecting the belly-vs.-wings dynamics that arise from supply concentration in benchmark tenors and relative-value positioning. Together the three components explain the vast majority of daily yield-change variation, consistent with decades of term-structure research on both U.S. Treasuries and JGBs.

The results table reveals an immediate finding: cross-validation selects \tau=0 for the Tenor model, collapsing it to PCA. The tenor-chain smoothness prior brings no improvement to the full-sample validation loss, so the conservative selection rule defaults to zero regularisation. By contrast, the Day model selects \tau=0.01 and the Day+Tenor model selects \tau=0.03 (\rho=0.5), and both achieve lower in-sample reconstruction error than PCA. Day-graph regularisation sharply reduces score roughness \operatorname{tr}(U^\top L_s U), confirming that the temporal-smoothness penalty reaches its intended target. The rolling study below examines whether these in-sample improvements persist out of sample.

Table 1: Explained variance by PCA component (full sample)

	Variance explained (%)	Cumulative (%)
Component
1	83.39	83.39
2	10.53	93.92
3	3.08	97.00
4	0.98	97.98
5	0.62	98.60

Figure 4: Full-sample factor scores (monthly last, level/slope/curvature) for PCA vs Day-regularised model with BOJ policy events

The score trajectories (plotted at month-end frequency for visual clarity; the smoothing effect is present in the underlying daily data) make the effect of day-chain regularisation visually concrete. The PCA level score (Component 1) displays rapid oscillations that are largely absent in the Day-regularised counterpart, which tracks a smoother path through the same macro regimes—the 2003 VaR shock sell-off, the 2008 GFC flight-to-quality, the 2013 QQE-induced volatility, and the 2022–2024 YCC exit. The smoothing is most dramatic in the curvature score (Component 3), where PCA produces chaotic high-frequency noise while the Day model isolates a slowly evolving butterfly signal. Economically, this smoothing aligns with the prior that macro and policy factors shift on weekly-to-monthly frequencies; the high-frequency residual captured by PCA is predominantly market microstructure noise (bid-ask bounce, inventory effects) rather than tradeable signal.

5 Rolling study: one-year training windows

Each rolling window uses one year of training data with three months of out-of-sample testing, advancing by one month. The sample begins on the first date where all eleven tenors have real observations. Within each training window, nested time-series cross-validation splits the first nine months as inner-train and the last three months as validation, then selects (\tau,\rho) via a conservative rule: find the minimum validation loss, keep candidates within a small tolerance neighborhood (1+\delta), and choose the smallest admissible \tau with optional Procrustes stability tie-breaking against the previous window’s loadings. PCA is the unregularized baseline with no hyperparameters. The feature graph (tenor-chain) remains constant across windows; we rebuild the example graph (day-chain) on each window’s training dates. Final outer test evaluation refits on the full outer training block with the selected hyperparameters and applies the frozen train-window scaler to the test data with no leakage.

The summary tables below report, for each method: the mean, median, and 90th-percentile OOS masked reconstruction loss; the fraction of windows where the method strictly outperforms PCA (win_rate_vs_PCA); and a Diebold–Mariano test statistic (DM_stat) with Newey–West HAC-corrected p-value (DM_p) that accounts for the serial dependence induced by overlapping test windows.

(a) Rolling out-of-sample reconstruction error (k=3) for PCA, Day, Tenor, and Day+Tenor

	mean	median	p90	win_rate_vs_PCA	DM_stat	DM_p
Day	1.1476	0.6179	2.2803	0.9967	-2.2507	0.0122
Day+Tenor	1.1009	0.6192	2.2511	0.9967	-8.0592	0.0000
PCA	1.2930	0.7404	2.6745	nan	nan	nan
Tenor	1.2930	0.7404	2.6745	0.0000	nan	nan

The rolling OOS reconstruction paths separate into two groups: Day and Day+Tenor consistently outperform PCA, while Tenor tracks PCA exactly in OOS reconstruction because cross-validation always selects \tau=0 for the tenor-only model. Both Day and Day+Tenor achieve near-100% window-level win rates against PCA, and the Diebold–Mariano test (with Newey–West HAC standard errors for overlapping-window dependence) rejects equal predictive accuracy at conventional significance levels. The day-chain temporal-smoothing channel is the primary driver: it is both necessary and sufficient for the OOS improvement, while the tenor-chain maturity-smoothing channel adds no incremental benefit on its own.

Figure 6: OOS improvement of Day model over PCA (negative = Day better) with BOJ policy events

The improvement is persistent and most pronounced during policy-transition windows. The largest gains cluster around the late-2012 to mid-2013 period (QQE anticipation and initial implementation) and again during the 2022–2024 YCC unwinding, both episodes where daily yield volatility spiked abruptly. In such regimes the PCA score matrix overfits high-frequency noise, while the day-chain penalty absorbs the shock by smoothing scores across the transition. Conversely, during the quiet YCC-era windows (2017–2019), the improvement is small but still consistently negative, indicating that temporal regularisation helps even when volatility is suppressed.

(a) Rolling subspace and Procrustes loading stability (k=3)

	subspace_sinmax_mean	subspace_sinmax_median	procrustes_drift_mean	procrustes_drift_median	win_sub_vs_PCA	win_proc_vs_PCA
Day	0.1418	0.1120	0.2147	0.1560	0.1405	0.0000
Day+Tenor	0.1331	0.1020	0.1804	0.1366	0.1830	0.0033
PCA	0.0818	0.0496	0.0303	0.0205	nan	nan
Tenor	0.0818	0.0496	0.0303	0.0205	0.0000	0.0000

Subspace distance \sin\theta_{\max} ranges between 0.08 and 0.14 across methods, indicating that the three-dimensional loading subspace is well identified in one-year windows regardless of regularisation. The spike in subspace instability around the 2013 QQE launch and the 2016 YCC introduction reflects genuine structural breaks in the factor geometry: shifts in the yield-curve regime alter which linear combination of tenors constitutes the third (curvature) component, and all models must re-estimate. Procrustes drift, by contrast, reveals a cost of temporal regularisation: Day and Day+Tenor exhibit roughly 6–7\times the drift of PCA/Tenor on average, because the smooth-score constraint redistributes variance across the loading columns, producing window-to-window loading rotations that inflate the Frobenius residual. This loading instability is a mechanical trade-off, not a sign of model failure—the same constraint that causes it also delivers the OOS reconstruction gains documented above.

Figure 8: Cross-validated hyperparameters over time for the three GLRM models

The hyperparameter paths reveal a stark asymmetry. The Tenor model selects \tau=0 in every window, meaning cross-validation never finds the tenor-chain penalty beneficial—the regularised model collapses to PCA across the entire sample. Day selects moderate to strong \tau values (mean 0.056, never zero), indicating persistent benefit from temporal smoothing. Day+Tenor also selects non-zero \tau in every window, with \rho values reflecting the data-driven balance between temporal and maturity smoothness.

The event lines reveal an economically coherent pattern in the Day \tau path. Regularisation strength rises during transitions between calm and volatile regimes—notably around the 2013 QQE launch and the 2022 YCC band widening—when the cross-validation inner-loop penalises PCA’s tendency to overfit abrupt score jumps. During the sustained low-volatility YCC era (2017–2019), \tau settles at moderate levels: temporal smoothing remains beneficial but the penalty budget shrinks because there is less high-frequency noise to absorb.

(a) Loading and score roughness over time for models with the corresponding graph

	loading_roughness_mean	loading_roughness_median	score_roughness_mean	score_roughness_median
Day	nan	nan	441.4170	123.9683
Day+Tenor	0.0259	0.0157	385.7558	81.3492
PCA	nan	nan	nan	nan
Tenor	0.0154	0.0108	nan	nan

Day-graph regularisation cuts score roughness \operatorname{tr}(U^\top L_s U), confirming the temporal-smoothness penalty reaches its intended target. The Tenor model, which always selects \tau=0, produces roughness identical to PCA. The day-side score-roughness improvement translates directly to OOS accuracy gains, as confirmed by the near-100% window-level win rates and significant DM statistics above. Score roughness spikes visibly around the 2013 QQE launch and the 2022 YCC band widening for all models, but the Day-regularised variants dampen these spikes substantially—consistent with the economic interpretation that policy-driven volatility increases daily score noise, and temporal smoothing absorbs precisely this component.

(a) Procrustes-aligned loadings at representative windows from different market regimes

The regime snapshots show that PCA and Tenor (which selects \tau=0 and is therefore PCA-equivalent) produce identical loading shapes, while Day and Day+Tenor share a distinct pattern with smoother profiles after Procrustes alignment. The differences between the two groups are most visible in Component 3, where the signal-to-noise ratio is lowest and the day-chain prior has the most room to act. Differences across regimes remain larger than differences across models within a regime, reflecting the nonstationarity of the JGB curve.

Taken together, the one-year rolling results support both hypotheses posed above. The variance-reduction channel operates as predicted: the day-chain penalty targets the high-variance score matrix, reduces its roughness, and delivers statistically significant OOS improvement—with the effect most visible in Component 3 where the eigengap is smallest. The bias-dominance channel also holds: the tenor-chain penalty finds nothing to smooth in the already-regular loading matrix and adds only bias, so cross-validation zeroes it out in every window. A natural follow-up is whether these patterns intensify when training data is scarce enough that PCA overfits.

6 Rolling study: three-month training windows

We repeat the rolling evaluation with three-month training windows (~63 business days), one-month test windows, and one-month steps. With n/p \approx 6, the sample covariance matrix is poorly conditioned and higher-order eigenvectors are noisy. Within each training window, nested time-series cross-validation uses the first two months for inner training and the last month for validation.

(a) Rolling out-of-sample reconstruction error (k=3, 3M training windows)

	mean	median	p90	win_rate_vs_PCA	DM_stat	DM_p
Day	0.9080	0.6460	1.5906	0.9968	-8.5778	0.0000
Day+Tenor	0.9165	0.6023	1.6121	0.9842	-7.5292	0.0000
PCA	1.1960	0.8102	1.9875	nan	nan	nan
Tenor	1.1960	0.8102	1.9875	0.0000	1.0016	0.8417

Shortening the training window to three months (≈63 observations) amplifies the day-chain regularisation advantage. Day wins 99.7% of windows (DM = -8.58, p < 10^{-17}), and Day+Tenor wins 98.4% (DM = -7.53, p < 10^{-13}). Mean reconstruction error drops from 1.20 (PCA) to 0.91 (Day)—a 24% reduction, more pronounced than the 11% improvement seen with one-year windows. Tenor again selects \tau\approx 0 in virtually every window and is indistinguishable from PCA.

The wider error separation during stress episodes aligns with identifiable policy events: the 2013 QQE launch (when the three-month training window spans both the pre- and post-announcement regimes), the 2016–2017 NIRP/YCC transition (when short samples straddle a discrete jump in the yield-curve anchoring point), and the 2022 YCC band widening. In each case, three months of data is too short for PCA to disentangle signal from noise, and day-chain temporal smoothing provides an effective shrinkage remedy by enforcing the prior that factor exposures evolve slowly through these transitions.

(a) Rolling subspace and Procrustes loading stability (k=3, 3M training windows)

	subspace_sinmax_mean	subspace_sinmax_median	procrustes_drift_mean	procrustes_drift_median	win_sub_vs_PCA	win_proc_vs_PCA
Day	0.2921	0.2307	0.3743	0.3467	0.4146	0.0032
Day+Tenor	0.2945	0.2359	0.4288	0.3543	0.4589	0.0285
PCA	0.2860	0.2297	0.1749	0.1459	nan	nan
Tenor	0.2860	0.2297	0.1749	0.1459	0.0032	0.0032

Figure 13: Cross-validated hyperparameters over time (3M training windows)

Fraction of windows selecting τ=0 (3M):
  PCA: 100.0%
  Day: 0.0%
  Tenor: 99.7%
  Day+Tenor: 0.0%

Mean τ by method (3M):
               mean  median
method                     
Day        0.003000   0.003
Day+Tenor  0.010606   0.010
PCA        0.000000   0.000
Tenor      0.000095   0.000

All four models exhibit markedly higher subspace instability with 3M windows compared to 1Y, with mean \sin(\theta_{\max}) rising to ≈0.29 across all methods (vs. ≈0.08–0.14 with 1Y windows). The Procrustes-drift ranking mirrors 1Y but with wider gaps: PCA/Tenor remain the most stable (mean drift 0.17) while Day (0.37) and Day+Tenor (0.43) drift 2–2.5× more. Procrustes win rates for Day and Day+Tenor against PCA are ≈0–3%, confirming that the day-chain penalty’s indirect effect on loadings worsens with shorter windows.

Tenor cross-validation again selects \tau\approx 0 in virtually every window. Day selects much smaller \tau values (mean 0.003 vs. 0.056 in 1Y), consistent with the reduced sample providing less room for temporal smoothing before over-regularising. Day+Tenor selects mean \tau = 0.011 with \rho varying across the full 0.25–0.75 grid, reflecting greater uncertainty in the optimal regularization balance when data is scarce.

(a) Loading and score roughness over time (3M training windows)

	loading_roughness_mean	loading_roughness_median	score_roughness_mean	score_roughness_median
Day	nan	nan	992.5572	811.1620
Day+Tenor	0.1118	0.0835	823.6332	936.4316
PCA	nan	nan	nan	nan
Tenor	0.0538	0.0448	nan	nan

Loading roughness remains low for Tenor (mean \operatorname{tr}(V^\top L_f V) = 0.054) because the model collapses to PCA, which naturally produces smooth loadings with only 11 tenors. Day+Tenor loading roughness is approximately double (0.112), reflecting the interaction between the two penalty channels: the optimisation trades some loading smoothness for score smoothness. Score roughness \operatorname{tr}(U^\top L_s U) is roughly 2\times higher than in 1Y windows for both Day (mean 993 vs. 441) and Day+Tenor (mean 824 vs. 386), with Day+Tenor achieving smoother temporal trajectories—suggesting the dual-penalty formulation marginally improves score regularization.

(a) Procrustes-aligned loadings at representative windows (3M training windows)

The Procrustes-aligned regime snapshots with 3M windows show broadly similar patterns to 1Y: PCA and Tenor remain indistinguishable, while Day and Day+Tenor form a visually distinct group, particularly in Component 3 where the signal-to-noise ratio is lowest. The loading-shape differences between models are more pronounced with 3M windows, consistent with the greater subspace instability identified above. Cross-regime variation (e.g., the pre-YCC vs. YCC-era loading inversion in Component 2) remains the dominant source of variation, reinforcing that JGB curve nonstationarity dominates model-choice effects.

7 Discussion

The rolling OOS evaluation reveals a clear and robust separation across both window lengths. With one-year windows, Day and Day+Tenor achieve near-100% window-level win rates (99.7%) against PCA, with Diebold–Mariano statistics rejecting equal predictive accuracy at conventional levels (Day: DM = -2.25, p = 0.012; Day+Tenor: DM = -8.06, p < 10^{-15}). Shortening to three-month windows amplifies the absolute MSE reduction from 11% to 24%, with Day maintaining a 99.7% win rate (DM = -8.58, p < 10^{-17}) and Day+Tenor at 98.4% (DM = -7.53, p < 10^{-13}). The Tenor model produces OOS results identical to PCA in both settings because cross-validation selects \tau=0 in every 1Y window and \tau\approx 0 in virtually every 3M window. The effective regularization channel is day-chain (temporal) smoothing of the score matrix U, not tenor-chain (maturity) smoothing of the loading matrix V. These results map directly onto the two channels posed in the Hypotheses section.

The central result of this study is the stark asymmetry between the two regularization channels. Understanding it requires examining the bias–variance trade-off separately for the score matrix U and the loading matrix V.

Estimation noise is concentrated in U, not V. In unregularised PCA, the loading matrix V \in \mathbb{R}^{p \times k} is estimated from the full n \times p training matrix and benefits from averaging across all n dates. The score at date t, \hat{u}_t = V^\top z_t, is a point estimate from a single p-dimensional observation—it carries the full noise of that one cross-section with no temporal averaging. With p = 11, each score estimate has very few effective degrees of freedom. The roughness diagnostics reflect this asymmetry. Empirical score roughness \operatorname{tr}(U^\top \widetilde{L}_s U) averages roughly 440 in one-year windows and 990 in three-month windows—large day-to-day jumps—while loading roughness \operatorname{tr}(V^\top \widetilde{L}_f V) is approximately 0.02–0.05 across settings, placing PCA loadings near the null space of the tenor-chain Laplacian. The two penalties therefore face fundamentally different variance-reduction opportunities.

Why the day-chain penalty reduces MSE—confirming the variance-reduction channel. The day-chain penalty \operatorname{tr}(U^\top \widetilde{L}_s U) = \tfrac{1}{2}\sum_{t=1}^{n-1}\|u_t - u_{t+1}\|_2^2 is a first-difference roughness penalty on each score trajectory. Writing the regularised score estimate as an approximate local average,

\tilde{u}_t \approx (1 - \alpha)\,\hat{u}_t + \tfrac{\alpha}{2}\bigl(\hat{u}_{t-1} + \hat{u}_{t+1}\bigr), \qquad \alpha \propto \tau,

the variance of \tilde{u}_t decreases by a factor proportional to the effective smoothing bandwidth, at a bias cost proportional to the local curvature \|u_{t+1} - 2u_t + u_{t-1}\|. This trade-off favours smoothing whenever the true score trajectory is smoother than white noise. JGB factor scores—driven by BOJ policy rates, term premia, and inflation expectations—evolve on weekly-to-monthly time scales, as the low-frequency regime structure in Figure 1 suggests. Day-to-day changes are small relative to estimation noise, so the smoothness prior is well-calibrated to the true data-generating process. Spectrally, the day-chain Laplacian \widetilde{L}_s = \Phi\Lambda\Phi^\top has eigenvalues \lambda_j \propto 1 - \cos(\pi j/n); the penalty shrinks high-frequency Laplacian eigenmodes (large \lambda_j) more than the low-frequency modes where the true factor dynamics are concentrated. The result is a low-pass filter that removes estimation artifacts while preserving signal. The effect intensifies with shorter windows: with 3M training (n \approx 63), PCA estimates each score from a single p=11 cross-section drawn from a noisier subsample. The day-chain penalty constrains the trajectory to be temporally coherent, acting as an implicit prior that factor exposures do not jump overnight. The MSE reduction grows from 11% (1Y) to 24% (3M)—exactly the regime where shrinkage should help most.

Why the tenor-chain penalty adds only bias—confirming the bias-dominance channel. The tenor-chain penalty \operatorname{tr}(V^\top \widetilde{L}_f V) = \tfrac{1}{2}\sum_{i,j} w_{ij}\|v_i - v_j\|^2 shrinks adjacent-tenor loading rows toward each other. Yet the PCA eigenvectors of the 11 \times 11 sample covariance are already near the null space of this quadratic form: PC1 (level) is nearly flat, PC2 (slope) is monotone, and PC3 (curvature) is a single-hump shape—all smooth functions with low Laplacian energy. As a result, the variance reduction is negligible:

\text{Var reduction} \propto \tau \cdot \underbrace{\operatorname{tr}(V_{\text{true}}^\top \widetilde{L}_f\, V_{\text{true}})}_{\approx\, 0} \approx 0, \qquad \text{Bias}^2 \propto \tau^2 > 0.

The OOS MSE therefore increases monotonically in \tau, and the conservative selection rule picks \tau = 0. Several features of this JGB application compound the effect: (1) MOF pre-smoothing—constant-maturity yields the Ministry of Finance interpolates onto a fixed maturity grid, removing cross-tenor noise at the data level before PCA ever sees it; (2) inverse-squared-gap weighting concentrates penalty on closely spaced tenors (7Y–8Y–9Y) that already co-move almost perfectly (visible in Figure 2), while under-penalizing the wider gaps (10Y–15Y) where structural breaks could appear; (3) under Yield Curve Control (2016–2024) the BOJ pinned the 10Y rate, creating genuine loading discontinuities around 10Y that the tenor-chain penalty incorrectly treats as roughness to be smoothed away; (4) with p = 11, the sample covariance \hat\Sigma \in \mathbb{R}^{11\times 11} is estimated from at least n/p \geq 6 observations per parameter even with 3M windows, so the dimensionality that motivates loading regularisation does not bite.

Dimensionality of the regularisation targets. The asymmetry extends to the leverage available to each penalty. The day-chain Laplacian is n \times n with n-1 edges, providing n-1 independent smoothness constraints on the n \times k score matrix—a substantial number of soft constraints relative to the free parameters. The tenor-chain Laplacian is 11 \times 11 with only 10 edges acting on 33 loading parameters, and applied to an object that is already smooth. The day-side penalty has far more room to reshape the solution.

Despite their OOS reconstruction advantage, Day and Day+Tenor models produce less stable loadings than PCA across all stability metrics and both window lengths. With 1Y windows, subspace win rates against PCA are 14.1% (Day) and 18.3% (Day+Tenor), with Procrustes win rates near 0%. With 3M windows, all models exhibit far higher instability (\sin(\theta_{\max}) \approx 0.29 vs. 0.08–0.14 for 1Y), and PCA’s Procrustes-drift advantage grows (mean drift 0.17 vs. 0.37–0.43 for the regularised models). The result is not paradoxical once the bilinear structure of the factorisation Z \approx UV^\top is considered. The day-chain penalty directly constrains U, but the first-order optimality conditions couple U and V: any change in the regularised score trajectory \tilde{U} propagates through the loading update V \leftarrow (U^\top U + \lambda I)^{-1} U^\top Z. When CV selects different \tau values across adjacent windows, the resulting shifts in \tilde{U} produce corresponding rotations in V, amplifying Procrustes drift relative to the unregularised PCA baseline. Shorter windows (n \approx 63) exacerbate this effect because the \tau-selection surface becomes noisier, leading to larger window-to-window jumps in the chosen penalty strength. The practical implication is that day-chain regularisation improves OOS reconstruction at the cost of loading interpretability. Applications that require stable factor loadings (e.g., structural interpretation of principal components) may prefer PCA despite its higher reconstruction error, while applications that prioritise predictive accuracy (e.g., imputation, forecasting, portfolio construction) should prefer the Day model.

Cross-validation reveals a consistent hyperparameter pattern across both window lengths: Day and Day+Tenor always select non-zero \tau, while Tenor selects \tau=0 in every 1Y window and \tau\approx 0 in virtually every 3M window. The optimal \tau scales with window length—Day mean 0.056 (1Y) vs. 0.003 (3M)—consistent with the bias–variance interpretation above. With fewer observations, the smoothing bandwidth must shrink to avoid over-regularising the score trajectory; CV correctly adapts the penalty strength to the available sample size. For Day+Tenor with 1Y windows, \rho is typically above 0.5, allocating most regularisation to the day-chain channel; with 3M windows, \rho varies more widely across the 0.25–0.75 grid, reflecting greater uncertainty in the validation surface when data is scarce. The Procrustes-aligned loading snapshots confirm that PCA and Tenor are indistinguishable in every regime and at both window lengths, while Day and Day+Tenor form a separate group with smoother Component 3 profiles. The pattern is consistent with the mathematical analysis: PC3 has the smallest eigengap and hence the highest estimation variance in its loading vector (Davis–Kahan), giving the day-chain penalty the most room to reshape the solution through the bilinear coupling. Cross-regime differences (e.g., pre-YCC vs. post-YCC) remain an order of magnitude larger than cross-model differences within a regime, reflecting the nonstationarity of the JGB curve.

Several patterns merit caution. The day-chain OOS improvement, while statistically significant and perfectly consistent, is modest in absolute terms—mean reconstruction error drops from 1.196 to 0.908 (Day) with 3M windows; whether this magnitude matters depends on the downstream application. With 3M windows, the day-chain model’s Procrustes drift reaches 2–2.5\times PCA’s (and 6–7\times with 1Y windows), so interpretations based on individual loading vectors require care, particularly with short training windows. Day+Tenor introduces a \rho degree of freedom and additional tuning cost relative to Day-only; while it achieves statistically significant OOS gains against PCA at both horizons, it offers negligible incremental improvement over Day-only (slightly better at 1Y, slightly worse at 3M), so parsimony favours the Day-only specification. The tenor-chain penalty assumes smooth local maturity dependence, but JGB curves exhibit structural kinks (e.g., around the 10Y delivery basket boundary under YCC) that the penalty incorrectly treats as noise. An alternative graph topology—such as inverse-gap weighting or a graph learned from cross-tenor correlations—might partially address this, though the fundamental low-dimensionality constraint (p = 11) would likely still limit the variance-reduction channel. The Hypotheses also predicted that OOS gains should concentrate in the third component where the eigengap is smallest; the regime snapshots and Component 3 loading differences are consistent with this prediction, though we did not perform a formal per-component OOS decomposition to test it directly.

8 Conclusion

This notebook evaluated four GLRM specifications—PCA, Day-only, Tenor-only, and Day+Tenor graph regularization—on rolling one-year and three-month windows of daily JGB yield-curve changes with k=3 components and nested cross-validated hyperparameter selection.

PCA serves as the unregularized baseline. With p\approx 11 tenors and well-conditioned covariance estimates, its loadings are smooth, stable, and interpretable. Day-chain regularization consistently outperforms PCA in out-of-sample reconstruction, winning 99.7% of 1Y windows and 99.7% of 3M windows.

Tenor-graph regularization targeting the maturity axis is entirely ineffective in this setting. Cross-validation selects \tau=0 in every 1Y rolling window and \tau\approx 0 in virtually every 3M window, collapsing the model to PCA at both horizons and in the full-sample study. The low dimensionality of the loading matrix makes the tenor-chain penalty purely biasing, confirming the bias-dominance hypothesis for the feature-side channel.

Day-graph regularization targeting the temporal axis is the clear winner. Day-graph regularization achieves near-perfect OOS win rates against PCA at both window lengths (99.7% at 1Y, 99.7% at 3M) with significant Diebold–Mariano statistics. By smoothing the score trajectories, the day-chain penalty reduces overfitting and improves generalization—an effect that grows in proportional magnitude as the training window shrinks (11% MSE reduction at 1Y, 24% at 3M). The trade-off is reduced loading stability: the bilinear coupling between scores and loadings means that day-side regularization indirectly destabilizes the loading geometry, particularly in short-window regimes.

Day+Tenor regularization achieves statistically significant OOS gains against PCA at both horizons, but offers negligible incremental improvement over Day-only. CV allocates most of the regularization budget to the day-chain channel. The additional tenor-chain component adds tuning complexity without meaningful incremental benefit, making Day-only the preferred specification on parsimony grounds.

The key finding is that temporal smoothing of scores provides the OOS improvement, not maturity smoothness of loadings. Temporal smoothing was not the expected channel—the original motivation targeted tenor-side regularization—but it is consistent with the view that short rolling windows benefit from temporal coherence constraints on the latent factors. The 3M results strengthen this conclusion: when PCA has fewer observations to estimate scores, the day-chain shrinkage benefit is proportionally larger.

In the next post we apply the same GLRM framework to a higher-dimensional macroeconomic panel where the bias–variance balance may favor feature-side regularization more .

9 Appendix: Missing-data imputation with the 40Y tenor

The GLRM objective optimises over observed entries only, via the mask projection P_\Omega. The main analysis uses a complete panel (p = 11, no missing data), so this capability was not exercised. To demonstrate it, we expand the tenor set to include the 40Y JGB—which was not issued until 2007—creating a naturally jagged missingness pattern in the early part of the sample. The masked factorisation Z \approx UV^\top is fit on observed entries, and the product UV^\top provides imputed values for the missing 40Y entries in the pre-2007 period.

This section is a single full-sample proof of concept, not a rolling evaluation. Its purpose is to verify that the masked GLRM solver handles partial observations gracefully and to examine whether graph regularisation affects imputation quality.

Extended panel: 6470 dates x 12 tenors
Total entries: 77,640  |  Missing: 2,012  (2.6%)

Missing days per tenor:
40Y    2012

Panel starts: 1999-09-03, 40Y first available: 2007-11-06

Figure 16: Missing-data pattern in the extended 12-tenor JGB panel

  PCA: τ=0.0000, ρ=0.00, recon=0.9913
  Day: τ=0.0100, ρ=1.00, recon=0.9809
  Tenor: τ=0.0000, ρ=0.00, recon=0.9913
  Day+Tenor: τ=0.0300, ρ=0.50, recon=0.9812

(a) Full-sample fit on the extended 12-tenor panel with structural missingness

	tau	rho	recon_error	loading_roughness	score_roughness
PCA	0.0000	0.0000	0.9913	0.0007	6169316.1365
Day	0.0100	1.0000	0.9809	0.0015	131843.4958
Tenor	0.0000	0.0000	0.9913	0.0007	6169316.1365
Day+Tenor	0.0300	0.5000	0.9812	0.0015	69586.3455

Figure 18: Imputed vs. observed 40Y yield changes: missing period (top) and overlap period (bottom)


Reconstruction RMSE on observed 40Y entries (standardised scale):
  PCA: 0.1927
  Day: 0.7587
  Tenor: 0.1927
  Day+Tenor: 0.8193

The extended panel contains 6,470 dates and 12 tenors. The 40Y JGB was not issued until November 2007, creating a contiguous block of 2,012 missing entries (2.6% of the panel) concentrated in the first ~8 years; the remaining 11 tenors are fully observed throughout. All four models fit successfully on this jagged panel, confirming that the masked GLRM solver handles structural missingness without special-case preprocessing.

Cross-validation again selects \tau = 0 for the Tenor model, making it numerically identical to PCA (same reconstruction of 0.9913, same roughness values). This replicates the main-body finding in a setting where one might expect tenor smoothness to help: the 40Y loading must be inferred partly from the graph prior because 31% of the 40Y column is unobserved. Even so, the data from the observed period is sufficient for the GLRM to determine the 40Y loading without graph assistance. The Day model (\tau = 0.01, \rho = 1) and Day+Tenor model (\tau = 0.03, \rho = 0.5), by contrast, accept a modest reconstruction penalty (0.98 vs. 0.99) in exchange for large reductions in score roughness—47\times for Day (131,843 vs. 6,169,316) and 89\times for Day+Tenor (69,586 vs. 6,169,316). This smoothing effect is consistent with the rolling-window results: temporal regularisation dampens daily score noise at a small cost to in-sample fit.

Procrustes-aligned loadings show all models extending the 1Y–30Y profile naturally to 40Y. For Component 1 (level), the 40Y loading declines from its 30Y value. This is consistent with the convexity-driven attenuation of parallel-shift sensitivity at the ultra-long end: a 40Y zero-coupon bond has substantially higher convexity than 30Y, so for a given parallel yield move its price return is increasingly dominated by the second-order (convexity) term, reducing the linear DV01 loading. For Component 2 (slope), the monotonic short-vs.-long pattern continues through 40Y. Component 3 (curvature) rises sharply beyond 30Y, reflecting the distinct supply-demand dynamics of the 40Y sector—dominated by Japanese life insurers and pension funds hedging long-duration liabilities, with limited liquidity relative to the 10Y and 20Y benchmarks. The Tenor model, despite selecting \tau = 0, shows visibly compressed loading magnitudes relative to PCA; the compression reflects the bilinear scaling ambiguity of the UV^\top factorisation, where the solver distributes variance between U and V differently when traversing from the SVD warm-start.

Imputation quality, however, diverges across models. On the observed 40Y entries (post-2007), PCA and Tenor achieve a standardised RMSE of 0.19, while Day and Day+Tenor are markedly worse (0.76–0.82). The temporal smoothing that benefits score denoising in the rolling study actively harms reconstruction of the 40Y column here, because 40Y yield changes are the most volatile tenor in the panel and score smoothing attenuates precisely the high-frequency variation needed to fit them. In the missing period (1999–2007), all four models produce imputed 40Y yield changes with plausible time-varying volatility; the imputed series displays spike clusters during the 2001 recession and the June 2003 VaR shock—the episode in which a pro-cyclical cascade of risk-limit breaches among Japanese banks forced simultaneous liquidation of long-duration positions, sending 10Y–30Y yields up by 100 bp in three weeks. Since no ground truth exists for this period, the imputed values are best regarded as model-consistent interpolations rather than validated estimates.

Taken together, the appendix demonstrates that grpca’s masked GLRM formulation handles structural missingness correctly. Graph regularisation does not improve imputation quality in this setting—the unregularised factorisation already extrapolates the 40Y loading from the observed-entry structure—but the day-chain model still achieves its primary purpose of temporal score smoothing, at a measurable cost to per-tenor reconstruction fidelity.

	mean	median	p90	win_rate_vs_PCA	DM_stat	DM_p
method
Day	1.1476	0.6179	2.2803	0.9967	-2.2507	0.0122
Day+Tenor	1.1009	0.6192	2.2511	0.9967	-8.0592	0.0000
PCA	1.2930	0.7404	2.6745	nan	nan	nan
Tenor	1.2930	0.7404	2.6745	0.0000	nan	nan