Added
- **`bhy_hierarchical` two-stage FDR verb + `simes_p` primitive** (264). `factrix.multi_factor.bhy_hierarchical(profiles, *, group: str, estimator=None, q=0.05) -> Survivors` implements the Yekutieli (2008) procedure for factor sets with natural group structure (momentum / value / quality families, cross-region universes). Outer Benjamini-Yekutieli step-up on Simes group representatives controls *group-level* FDR ≤ `q`; inner BHY within each passing group controls *within-group* FDR ≤ `q`. The cell-level `adj_p` is the max-of-layers fold `max(outer_adj_p[g], inner_adj_p[i])` so the universal `Survivors` duality `survivor[i] iff adj_p[i] <= q` still holds — both layer signals are folded into one number rather than splitting into `q_outer` / `q_inner` kwargs that would break the contract. `factrix.stats.multiple_testing.simes_p(p_values)` lands as a standalone primitive (Simes 1986 global-null combiner, dominates Bonferroni `m * min(p)`, valid under PRDS). The verb closes the third leg of the v0.13 multi-factor-verb surface (alongside `bhy(expand_over=)` and `partial_conjunction`); the three are distinguished by *survivor unit* — pair / identity-joint / identity-group-then-within — and the new `docs/api/bhy-hierarchical.md` opens with the routing table so a factor-zoo researcher picks the right verb without leaving the docs. Three failure modes that would otherwise produce surface-valid output are now blocked at the call site: single-group input raises and points at `bhy()`; every-profile-is-its-own-group at `n >= 3` raises (group axis is near-unique, probably a continuous variable mistakenly passed); majority-singleton-group inputs emit `RuntimeWarning` (inner BHY on n=1 is a raw cutoff, outer Simes on n=1 equals that p — no FDR correction at either layer for those groups). Reuses `_resolve_family` for group-key validation (identity-shadowing rejected, missing-context-key surfaced fail-loud) and the existing `estimator=` selection path so a Newey-West / Hansen-Hodrick p-value can drive both layers without per-call reconfiguration.
python
import factrix as fx
"Which factor families show signal, and within those, which factors?"
survivors = fx.multi_factor.bhy_hierarchical(
profiles, group="family", q=0.05,
)
survivors.adj_p max-of-layers fold per cell
survivors.n_tests {(family,): m_in_family} for every input group
per-survivor group label is profile.context["family"]
- **`MomentEstimator(Estimator)` sub-protocol + `GMM` Hansen J-test** (191). A symmetric third Estimator layer alongside `HACEstimator`, for over-identifying-restriction tests on a multivariate moment-condition system. `MomentEstimator` adds `min_periods: int` and `compute(moments: np.ndarray, *, forward_periods: int) -> GMMResult`; `GMMResult` is a frozen dataclass parallel to `InferenceResult` carrying `j_stat` / `df` / `overid_p` / `n_moments` / `n_params` / `metadata` / `warnings` (no `stat_name` / `p_name` — the type itself implies the `(StatCode.J_GMM, StatCode.P_GMM)` pair). `factrix.stats.GMM` is the concrete instance: hand-rolled Hansen (1982) two-step efficient GMM in `factrix._stats.gmm` (no statsmodels dependency, matching the lean-dep pattern of `factrix._stats.hac`), pure over-identification (`n_params = 0`) only — parametric GMM is a forward hook. `StatCode.J_GMM` and `WarningCode.SINGULAR_WEIGHT_MATRIX` ship alongside (the latter distinguishes a rank-deficient long-run covariance from a generic short-sample warning, mirroring how `RECT_KERNEL_NEGATIVE_VARIANCE` separates Hansen-Hodrick's kernel-specific failure). `AnalysisConfig.moment_estimator: MomentEstimator | None = None` is wired through the four factory methods + `to_dict` / `from_dict` (backward-compatible with pre-191 serialized configs that omit the key); `_moment_inference(cfg, moments)` mirrors `_hac_inference(cfg, series)` and stitches `GMMResult` into the procedure-layer `(stats, metadata)` contract keyed by `(J_GMM, P_GMM)`. Applicability gate runs at `__post_init__` time.
python
import numpy as np
from factrix.stats import GMM
Construct a (T, K) moment matrix yourself — e.g. per-date IC at K
forward horizons, factor-sorted decile spreads, cross-asset shared-β
residuals, etc. Choice of moment system is a research-design call.
moments = build_my_moment_system(panel, ...) shape (T, K)
result = GMM().compute(moments, forward_periods=max_horizon)
result.j_stat Hansen J statistic, χ²(K) under H₀
result.overid_p right-tail χ² p-value
result.df K - n_params (n_params = 0 in this release)
**Why ship as a standalone primitive rather than a fully integrated cell**: HansenHodrick (184) plugged into the existing IC PANEL cell because both NW and HH consume the same 1-D mean-IC series — same data shape, different kernel. GMM is fundamentally different: it consumes a multi-dimensional moment matrix, and the *choice of moment-condition system* (multi-horizon panel? multi-bucket spread? cross-sectional shared-β? — see Hansen 1982 §3) is a research-design question without one canonical factrix default. Forcing one would either pre-commit factrix to a single moment system or scaffold a configurable horizon grid that bloats the dispatch path well past 184's simplicity. So 191 ships the primitive + dispatch infrastructure; the integrated multi-horizon panel cell (auto-construction of `(T, K)` IC matrix from a raw forward-return panel + cell-registration / EMITS_STATS extension / horizon-grid spec) lands as a focused follow-up so its design — interacting with 255 (`list_estimators` cell-filter), 257 (provenance schema), and the moment-system choice — gets a clean review pass. Users who already have a moment system in mind can run J-tests today via the standalone API; users who want the integrated multi-horizon convention wait for the follow-up.
- **`HACEstimator(Estimator)` sub-protocol + cell-internal estimator swap** (163). `Estimator` (the 170 selection protocol) gains a runtime-checkable sub-protocol `HACEstimator` adding `min_periods: int` and `compute(series: np.ndarray, *, forward_periods: int) -> InferenceResult`. `NeweyWest` and `HansenHodrick` implement it; the slice-test instances (`WaldNWCluster` / `WaldTwoWayCluster` / `BlockBootstrap`) stay on the selection-only base (their compute paths are multivariate). `AnalysisConfig` gains a fifth field `estimator: HACEstimator = NeweyWest()`, surfaced through the four factory methods as the new `estimator=` kwarg; `__post_init__` validates `estimator.applicable_to(scope, signal)` at construction time so `AnalysisConfig.common_continuous(estimator=HansenHodrick())` raises immediately rather than deep inside `evaluate`. `to_dict` / `from_dict` round-trip the estimator by name string via the new `factrix.stats.get_estimator(name)` registry helper; missing-`estimator` keys in legacy v0.11/v0.12 dicts fall back to `NeweyWest()`. `UnknownEstimatorError(ConfigError, ValueError)` lists every registered estimator on a name miss. IC PANEL / FM PANEL / CAAR PANEL procedures now route through `cfg.estimator.compute(...)` instead of hardcoded `_newey_west_t_test`; `profile.context["estimator"]` records the cfg-driven choice on every procedure for audit-time provenance (HLZ2016 spec-search defence — see "Why" below). Retires the v0.12.0 `ComputableEstimator` placeholder name from the 170 entry below: the design landed as the `HACEstimator` split (selection vs HAC-on-mean), not as a single computable extension; moment-condition estimators (GMM J-test, 191) and slope-axis HAC (TS β / TS Dummy) live on parallel sub-protocols when they land rather than overloading `HACEstimator.compute`.
python
default — bit-equal v0.12 NW path
cfg = AnalysisConfig.individual_continuous(metric=Metric.IC)
profile = fx.evaluate(panel, cfg)
profile.primary_stat_name StatCode.T_NW
profile.context["estimator"] "NeweyWest"
swap to Hansen-Hodrick at study scope
cfg = AnalysisConfig.individual_continuous(
metric=Metric.IC, estimator=HansenHodrick(),
)
profile = fx.evaluate(panel, cfg)
profile.primary_stat_name StatCode.T_HH
profile.primary_p == profile.stats[StatCode.P_HH] True
profile.context["estimator"] "HansenHodrick"
**Why**: HLZ2016's spec-search defence is "don't pick an estimator after seeing results," not "always use a single estimator forever." v0.12 hardcoded NW + auto-side-emitted HH, which let downstream code cherry-pick whichever p was smaller. v0.13's design forces estimator choice to be cfg-scoped (study-level, not per-call) and stamps the choice in `profile.context` so audit-time review can see whether multiple estimators ran on the same study. Per-call `evaluate(panel, cfg, estimator=...)` is deliberately not opened — the cfg object is the spec-search lock. (Related follow-ups: `list_estimators(scope, signal)` cell-filter 255, third-party `register_estimator` 256, provenance asymmetry on slope-axis cells 257.)
- **`Examples:` blocks on every public metric callable** (322). All 44 callables exported via `factrix.metrics.__all__` across 19 metric modules (`caar`, `clustering`, `concentration`, `corrado`, `event_horizon`, `event_quality`, `fama_macbeth`, `hit_rate`, `ic`, `mfe_mae`, `monotonicity`, `oos`, `quantile`, `spanning`, `tradability`, `trend`, `ts_asymmetry`, `ts_beta`, `ts_quantile`) carry an `Examples:` block placed last in the docstring per the NumPy trailing-section order sealed by 319. Examples follow the call-shape-over-fragile-output convention from 307 — output lines assert structural facts (column-set superset, `MetricOutput.name`, `t_stats_inference_invalid` flag), no concrete floats / DataFrame reprs. Where a metric consumes another callable's output from the same module (`caar` ← `compute_caar`, `fama_macbeth` / `beta_sign_consistency` ← `compute_fm_betas`, `ts_beta` / `mean_r_squared` / `ts_beta_sign_consistency` ← `compute_ts_betas`, `mfe_mae_summary` ← `compute_mfe_mae`), the Example chains from the upstream output rather than re-running setup. `pytest --doctest-modules factrix/metrics` (314 CI runner) exercises 44 metric-page doctests on every push.
- **Per-module `__all__` as SSOT for the metrics API render surface** (322). Every metric module under `factrix.metrics/` now declares an `__all__` in teaching order — the same order users encounter on the rendered API page (`compute_caar` → `caar` → `bmp_test`, `compute_fm_betas` → `fama_macbeth` → `pooled_ols` → `beta_sign_consistency`, etc.). Public return-type dataclasses (`spanning.SpanningResult` / `ForwardSelectionResult`, `oos.SplitDetail`) join the `__all__` of their owning module under the same convention used in `factrix.multi_factor.__all__` for `Survivors`. The `members:` list on each `docs/api/metrics/<mod>.md` mirrors `__all__` exactly (mkdocstrings-python does not auto-follow `__all__` when `members:` is omitted — its default filter is surface-by-prefix); `tests/test_metric_api_members_match_all.py` enforces the invariant `docs members: == module.__all__` across 19 modules so a future contributor cannot drift the two. Two callables that previously rendered on the API pages despite being internal coordination helpers — `hit_rate.per_date_series` (slice-test capability protocol implementation, dispatched via `factrix.metrics._metric_capabilities.resolve_per_date_series`) and `ts_beta.ts_beta_single_asset_fallback` (dispatch-registry N=1 fallback flagged in `_metric_index._STAGE1_HELPERS`) — are now correctly excluded from the rendered surface (kept out of `__all__`, kept out of `members:`).
Changed
- **Internal `verb=` kwarg sweep on error raise sites** (317). Every source-side `verb="..."` kwarg into `UserInputError` (and the helpers `_resolve_family` / `_expand_over_values` / `_resolve_p_value` / `_build_per_date_panel` / `_resolve_estimator` that thread it through) is renamed to `func_name=`. `UserInputError.__init__` drops the legacy `verb=` kwarg bridge added in 316 — the constructor signature is now `func_name=` only (keyword-only). No user-visible behaviour change: the public attribute is already `e.func_name` (since 316), the rendered error message form is byte-identical, and the rename is purely an internal source-side cleanup so the design-register `verb` token no longer survives at error-raise call sites. The slice-test internal error messages that interpolated the helper's `verb` arg (`f"{verb}: <2 aligned dates ..."`) now interpolate `func_name`; one adjacent prose mention inside a slice-test error string ("this verb currently supports WaldNWCluster ...") is also retargeted to "this function" for register consistency.
- **Contributing guide records the docstring-style boundary** (313). `docs/development/contributing.md` gains two new policy sections under §8: (1) "Docstring style boundary" makes explicit that code formatting / line length / naming follow PEP 8 + the ruff configuration in `pyproject.toml`, while only the docstring section convention (`Args:` / `Returns:` / `Raises:` / `Warns:` / `Notes:` / `Examples:` / `References:`, plural, in that order) is taken from Google — the Google Python Style Guide as a whole (its 80-character limit, single-quote preference, yapf formatter) is **not** adopted; (2) "Markdown code-block intent layers" records that `pycon` blocks under `docs/api/**` are runnable autodoc-injected examples (copy-button strips `>>>`) while hand-authored `python` blocks with unbound names are illustrative schemas — both layers are intentional, and editors must verify the intent before "fixing" an illustrative block into runnable form. Establishes the contract the rest of 313 (NumPy-underline → Google sweep, plural unification, ruff `D` enablement) executes against. Pre-existing ` Metric docstring style` is retained as the metric-specific extension on top of this baseline.
- **`Survivors.n_total` / `bhy_adjust(..., n_total=)` / `bhy_adjusted_p(..., n_total=)` → `n_tests`** (breaking, 264). The BHY denominator field / kwarg sat alongside `FactorProfile.n_obs` / `n_periods` / `n_pairs` / `n_assets` under the same `n_*` prefix but answered a structurally different question — those four are sample-size axes (observation counts inside one cell), while the BHY denominator is the multiple-testing family size (count of hypotheses in the step-up). A reader scanning `survivors.n_total` would first read "total observations" before noticing the field is a `Mapping[bucket_key, int]`; the name actively misled. `n_tests` names the domain directly. Migration is a single-token find-and-replace at every reading site:
python
before
fx.multi_factor.bhy(profiles, q=0.05).n_total
fx.stats.bhy_adjusted_p(p, n_total=1000)
after
fx.multi_factor.bhy(profiles, q=0.05).n_tests
fx.stats.bhy_adjusted_p(p, n_tests=1000)
Unrelated `n_total` occurrences in event-study / hit-rate / Corrado / orthogonalize modules are genuine sample counts (events processed, rows retained) and stay untouched.
- **`primary_p` / `primary_stat_name` semantic — cfg-driven, not hardcoded NW** (breaking, 163). On IC PANEL / FM PANEL / CAAR PANEL the canonical pair now reflects whichever `HACEstimator` was wired on `cfg.estimator`, not the hardcoded `(T_NW, P_NW)` v0.12 wrote unconditionally. With default `NeweyWest()` cfg the values are bit-equal to v0.12; with `HansenHodrick()` they shift to `(T_HH, P_HH)`. Downstream callers reading `profile.stats[StatCode.T_NW]` or `profile.stats[StatCode.P_NW]` directly will `KeyError` when a non-NW estimator was used — read `profile.primary_stat` / `profile.primary_stat_name` / `profile.primary_p` instead (these stay populated regardless of estimator) and consult `profile.context["estimator"]` for provenance. TS β / TS Dummy / common-panel procedures are unchanged because they run NW HAC on an OLS slope or an iid cross-asset t (neither fits the HAC-on-mean `compute(series, *, forward_periods)` contract); a slope-axis sub-protocol is tracked separately. `UNRELIABLE_SE_SHORT_PERIODS` is now emitted uniformly on `0 < n_periods < 30` across IC / FM / CAAR — previously only the FM procedure's procedure-side guard fired (4 ≤ n < 30), so a 25-period IC analysis silently lacked the short-sample tag; the estimator-side emission consolidates this and the FM-side guard is removed as redundant.
python
before (v0.12.0) — stats[T_NW] always populated
profile = fx.evaluate(panel, cfg)
t = profile.stats[StatCode.T_NW] safe under any cfg
after (v0.13.0) — read primary_stat to stay estimator-agnostic
profile = fx.evaluate(panel, cfg_with_hh)
t = profile.primary_stat ← canonical pair, regardless of estimator
code = profile.primary_stat_name StatCode.T_HH under cfg_with_hh
who = profile.context["estimator"] "HansenHodrick" — provenance
- **`MetricOutput.n_obs` first-class field** (breaking, 248). Promoted from a metadata dict key to a first-class dataclass field — `n_obs: int | None = None` — so user / AI-agent consumers reach the metric primitive's sample size with `output.n_obs` instead of `output.metadata.get("n_obs")`. Builds on the `n_observed → n_obs` metadata rename from 246. `_short_circuit_output(name, reason, n_obs=n, ...)` callers (~38 sites) auto-route via kwarg-only signature change; direct `MetricOutput(...)` constructions in `factrix/metrics/spanning.py` and `factrix/metrics/fama_macbeth.py` are updated to pass `n_obs=` at the top level instead of nesting under `metadata={"n_obs": ...}`. `__repr__` now surfaces `n_obs=` between `value=` and `stat=` when populated. Same family name as `FactorProfile.n_obs` but scoped per metric primitive (single-stage estimator count) rather than per dispatched cell (final-stage test denominator).
python
before (v0.12.0 — pre-rename)
n = out.metadata["n_observed"]
intermediate (post-246 metadata key rename, same v0.13.0 release)
n = out.metadata["n_obs"]
after (248, v0.13.0 final shape)
n = out.n_obs
- **`FactorProfile.diagnose()` schema overhaul** (breaking, 246). Six UX gaps in the structured triage interface land in one schema change. (1) **Four sample axes** replace the previous polymorphic `n_obs` + `n_assets` pair: `n_obs` (cell-canonical final-stage test denominator — semantics unchanged), `n_pairs` (non-null `(period, asset)` pair count, first-stage), `n_periods` (unique periods in raw panel), `n_assets` (unique assets, semantics unchanged). Each axis answers one question and never overlaps with another; a small `n_obs` is now disambiguated by the three companion axes without reverse-engineering the cell. (2) **`primary_stat` / `primary_stat_name` family** added top-level — `primary_stat: float | None` carries the test statistic value paired with `primary_p` (e.g. `t_nw` value for an NW HAC t-test), `primary_stat_name: str` slugs the `stats` key (e.g. `"t_nw"`) so the user can connect `primary_p` to its `stats` entry without consulting the procedure registry. Generic across statistic families: the `None` arm handles future empirical-p primaries (block bootstrap) where there is no test stat. Invariant `stats[primary_stat_name] == primary_stat` (when not `None`) is pinned in docstring. (3) **`cell` group** wraps the dispatch coordinate — `{"scope", "signal", "metric", "mode"}` — so consumers identify which procedure ran without grepping `config`. (4) **Reader-flow key order**: `identity` → `context` → `cell` → sample axes → `primary_p` / `primary_stat` / `primary_stat_name` → `warnings` / `info_notes` → `stats` / `metadata`. (5) **`n_observed` → `n_obs` metadata key rename** across ~38 `factrix/metrics/*.py` short-circuit sites — the two first-class surfaces (`FactorProfile` and `MetricOutput`-metadata) now share the same family name; the `MetricOutput.n_obs` first-class field promotion is tracked separately at 248. (6) **`SuggestConfigResult.diagnose()` "symmetric" docstring** is downgraded to scope-limited — the two surfaces share `warnings` serialisation but answer structurally different questions; no schema change there. Docs (`api/factor-profile.md`, `development/architecture.md`, `llms-full.txt`) are rewritten in lockstep; bare `N` / `T` letter codes in `factor-profile.md` are replaced with full axis names to reduce reader cognitive load.
python
before (v0.12.0)
d = profile.diagnose()
→ {"identity": {...}, "context": {...},
"mode": "panel",
"n_obs": 30, "n_assets": 500,
"primary_p": 0.0046,
"warnings": [...], "info_notes": [...],
"stats": {...}, "metadata": {...}}
after (v0.13.0)
d = profile.diagnose()
→ {"identity": {...}, "context": {...},
"cell": {"scope": "individual", "signal": "continuous",
"metric": "ic", "mode": "panel"},
"n_obs": 30, "n_pairs": 12450, "n_periods": 30, "n_assets": 500,
"primary_p": 0.0046, "primary_stat": 2.84,
"primary_stat_name": "t_nw",
"warnings": [...], "info_notes": [...],
"stats": {...}, "metadata": {...}}
Migration:
- d["mode"] → d["cell"]["mode"]
- new keys n_pairs / n_periods / primary_stat / primary_stat_name carry
additional info; old keys (n_obs / n_assets / primary_p) keep semantics.
- MetricOutput.metadata key "n_observed" → "n_obs" across metric primitives.
- **`Survivors.adj_q` → `Survivors.adj_p`** (breaking, 245). Aligned the adjusted-p-value column name with statistical-software conventions (R `p.adjust`, statsmodels `multipletests`) where adjusted p-values are uniformly named `adj_p` / `p_adj` regardless of whether the underlying procedure controls FWER (Bonferroni / Holm) or FDR (BH / BHY). The previous `adj_q` reflected an internal-consistency goal with the `bhy(q=0.05)` kwarg, but read awkwardly in FWER contexts (where the threshold is α, not q) and required first-time users to ask "what is `adj_q`?". The `q=` kwarg name is **kept** (it remains the API-uniform threshold name across procedure families); only the output column renames. `bhy_adjusted_p()` function name was already `_p` — this change extends the same convention to the survivor container field.
python
before (v0.12.0)
survivors = fx.multi_factor.bhy(profiles, q=0.05)
for prof, adj in zip(survivors.profiles, survivors.adj_q, strict=True):
...
after (v0.13.0)
survivors = fx.multi_factor.bhy(profiles, q=0.05)
for prof, adj in zip(survivors.profiles, survivors.adj_p, strict=True):
...
Removed
- **`UserInputError.verb` attribute** (breaking, 316). The user-facing attribute carrying the failing function name renames `.verb` → `.func_name`, aligning the error contract with the published user-facing register (see `contributing.md` §Two-register convention). The rendered error message form is byte-identical (`<func_name>(): unknown <field>=<value>`); only the source-side attribute name changes. User code catching `UserInputError` and reading `e.verb` will now hit `AttributeError` — replace with `e.func_name`. No deprecation cycle: factrix is pre-1.0 with no published user, and v0.x minor bumps are allowed to break compatibility under SemVer. The constructor accepts the legacy `verb=` kwarg without warning as an internal bridge until 317 sweeps the 59 source-side raise sites; this is *not* a user-facing back-compat alias.
- **Auto side-emission of `(T_HH, P_HH)` on default cfg** (breaking, 163). v0.12 IC PANEL / FM PANEL procedures populated `StatCode.P_HH` / `StatCode.T_HH` alongside `P_NW` / `T_NW` whenever `forward_periods > 1`, on every cfg, so callers could read either pair without re-running. Under the new dispatch (163 Added entry above) only the `HACEstimator` wired on `cfg.estimator` populates its `(stat_name, p_name)` pair — default `NeweyWest()` cfg writes `(T_NW, P_NW)` only; `HansenHodrick()` cfg writes `(T_HH, P_HH)` only. Switch downstream code that hardcoded `profile.stats[StatCode.P_HH]` lookups to either (a) construct a separate `cfg_hh = AnalysisConfig.individual_continuous(metric=..., estimator=HansenHodrick())` and call `evaluate(panel, cfg_hh)` explicitly, or (b) read `profile.primary_p` + `profile.context["estimator"]` for the cfg-driven canonical pair. Family-verb selection — `bhy(profiles, estimator=HansenHodrick())` reading an already-populated `StatCode.P_HH` — still works against profiles produced under an HH cfg.
- **`FactorProfile.verdict()` and `Verdict` enum** (breaking, 243). `verdict(*, threshold=0.05, gate=None)` was a `primary_p < threshold` wrapper. Removed because (a) for N candidate factors, iterating `profile.verdict()` and counting passes is the spec-search anti-pattern factrix explicitly avoids — multi-factor decisions belong to `multi_factor.bhy` survivors, not per-factor threshold gates; (b) the `Verdict` `PASS / FAIL` outcome ignored emitted `WarningCode` (e.g. `UNRELIABLE_SE_SHORT_PERIODS`), letting unreliable inference report `PASS`. The `Verdict` enum is removed alongside the method.
python
before (v0.12.0)
if profile.verdict() is fx.Verdict.PASS:
...
after — single-factor pre-registered analysis only
if profile.primary_p < 0.05:
...
after — N candidate factors: route through BHY, read survivors
survivors = fx.multi_factor.bhy(profiles, q=0.05)
for prof, adj_p in zip(survivors.profiles, survivors.adj_p, strict=True):
...
The `gate=StatCode.X` kwarg (read alternative stat instead of `primary_p`) has no direct replacement; reach `profile.stats[StatCode.X]` and compare to your chosen threshold inline. The `examples/multi_factor_screening.ipynb` per-factor verdict loop is rewritten to demonstrate the BHY path instead.
Docs
- **Citation accuracy + bibliography role-note layering** (321, 330, 359). Project-wide sweep across docstrings and `docs/reference/bibliography.md`: every inline paper reference unified to autorefs-linked form (`[Newey-West 1987][newey-west-1987]`), bibliography organised by methodological role rather than chronology, and role-notes audited for attribution accuracy (Andrews vs HH `T^(1/3)`, Shanken EIV framing, factor-zoo / factor-spanning / unit-root / robust-stats / multiple-testing / event-study / cross-section pricing chains). Horizon-shopping multiplicity and per-period forward-return normalisation references added.
- **Per-metric `Examples:` blocks across the public API** (312, 322). All 44 metric callables and the remaining public-API surface ship runnable `Examples:` blocks under the call-shape-over-fragile-output convention, exercised by `pytest --doctest-modules` in CI (314).
- **Mkdocs navigation IA cleanup** (329, 336 series, 341, 343, 347, 350, 351). Top-level nav restructured around Reading flows (User guide / Concepts / Reference / API / Development), new `Reading results` and `Preparing data` pages, `where-factrix-fits` exit pointer + `common_sparse` dispatch arm, glossary + axis-table dedup, routing surfaces demoted to API landing.
- **Docstring style + register conventions** (313, 331, 332, 357). Google-style section sweep (`Args:` / `Returns:` / `Raises:` / `Warns:` / `Notes:` / `Examples:` / `References:` plural, fixed order) + ruff `D` rule enablement; design-register "verb" prose dropped from user-facing docs (kept as RFC vocabulary); abbreviations expanded on first use per page.
- **Mkdocs UX polish** (328, 345, 360, 363, 376). Mkdocstrings inventory cross-refs wired for third-party data types; sidebar active item contrast fix scoped to primary nav so the right-side TOC active heading remains legible; internal dense-reindex / event-HHI bin-grid descriptions rephrased from "calendar" to period / time-axis vocabulary (academic methodology names preserved).
---