Research Papers | Bulkhead τ

43 Primary Papers 12 Live Sites 12 Research Clusters

2026 Active

Operating front: these papers are the evidence base behind Bulkhead τ — the public release line for a deterministic governance substrate. Bulkhead Tau is the engineering core under that name; both surfaces will run in parallel for now, with /bulkhead-tau/ as the external front.

The finding: For grounded domain tasks — well-defined task classes with deterministic substrates — harness configuration is the binding constraint. Model identity is not. These papers prove this claim under stress test conditions: local models, which cannot compensate for a weak harness, converge with frontier models at the semantic usefulness level when the harness is sufficient. This scope is deliberate. Outside it, model capability matters in ways the framework does not cover.

Local Model Addendum: the Bulkhead Tau Local Model Details page covers the three supporting papers that feed the orchestration synthesis: TourAgent (1.13), ShowcaseAgent (1.12), and Local Model Role Suitability (1.11).

Boundary Results: the Bulkhead Tau Boundary Results page covers three papers that map where the organized stack hits its limits: Grounded Agent Failure Is Structurally Determined (1.10), True Ski Chalet Boundary Result (1.14), and When The Organized Stack Loses (1.15).

RVH / ML Evaluation: Rough Volatility as ML Benchmark covers Papers 1.8 and 1.9 — why domain expertise, not ML capability, is the binding constraint in rough volatility forecasting and the cross-domain benchmark principle it reveals.

Measurement Integrity, Operator Layer & Applied Evidence: Papers 1.16–1.19 extend the framework outward. Paper 1.16 shows that evaluation infrastructure can fail at the capture boundary — a VT100 terminal artifact was corrupting protocol scores for thinking-mode models. Paper 1.17 documents the operator shell pattern: how OpenClaw wraps Bulkhead Tau as an access layer without becoming the authority. Paper 1.18 is the framework's first numbered production case — PPR Agent, 92M regulated cardiac device implants across 18 years, behind a deterministic SQLite substrate. Paper 1.19 is a short companion to 1.16 on the other side of the apparatus: when stronger models override literal substrate inspection, capability itself becomes a source of non-neutrality. Papers 1.20–1.26, 1.30, 1.31, and 1.34–1.37 add the Local LLM Operator Judgment cluster: strict handoff discipline, privacy cost accounting, local model sizing, token-cost attention, substitution discipline, multi-agent HIL workflow boundaries, cross-audit failure cataloging, narration-surface failure analysis, deterministic validator discipline, and image-generation prompt discipline. Papers 1.39–1.43 extend the cluster: a negative result on heterogeneous local clustering, repository-fact encoding that converts cold-start architecture recovery into lookup, a governance case study admitting a new frontier harness as a peer through evidence rather than brand rank, a residency-verified hardware measurement showing dedicated VRAM beats unified memory for local inference, and a code-graph tool's reduction ratio traced to graph density rather than code quality after two prior hypotheses were each falsified by a real contrasting corpus.

Sensor-to-Simulation Engineering: Paper 1.27 establishes the data landscape and fidelity boundaries for wearable sport sensors, showing why sensor-driven HIL is necessarily event-driven rather than waveform-driven. Paper 1.28 publishes the LabWired platform boundary for register-level firmware simulation. Paper 1.29 closes the sensor-driven HIL loop with a documented physical proximity replay. Paper 1.32 applies the sensor-corpus framing to a single dual-sensor-instrumented match at depth, and Paper 1.33 defines the proof boundary separating harness testing from component verification.

Where to Start

Each paper stands alone. Use the cluster that matches your interest:

New to Bulkhead Tau

Start with Paper 1.1 for the framework framing, then try the TourAgent live demo — ten tennis questions with repeatable answers — to see the deterministic approach in action.

Local Inference & Offline Systems

Papers 1.2, 1.3, 1.5 form a cluster: offline grounded agent → ski chalet hardware boundary → TSP solver-backed orchestration. The common argument: harness level, not model size, drives usefulness.

Orchestration & Role Assignment

Papers 1.5, 1.6, 1.11, 1.12, 1.13 address where correctness should live and how grounding, routing, and repair beat raw power in identifiable regimes.

Failure Modes & Boundary Conditions

Papers 1.7, 1.10, 1.14, 1.15 cover the failure taxonomy, empirical failure prediction, the true local ceiling, and the five conditions under which the organized stack's advantage collapses.

ML Evaluation & Cross-Domain Benchmarks

Papers 1.8 and 1.9 establish why realized volatility forecasting is high-signal benchmark territory — and what the same structural argument implies across semiconductor defectivity and other rough-process domains.

Measurement Integrity & Operator Layer

Papers 1.16, 1.17, and 1.19 address the infrastructure surrounding the Bulkhead Tau system. 1.16: capture pipeline failures produce false evaluation verdicts. 1.17: an operator shell can expose the deterministic stack without replacing it as the authority. 1.19: when stronger models override literal substrate inspection, the model itself becomes part of the non-neutrality.

Applied / Production Evidence

Paper 1.18 is the first numbered production case — PPR Agent running against government-mandated cardiac device data for 18 years. This is field validation, not lane validation — the framework operating against regulated disclosures from three manufacturers.

Local LLM Operator Judgment

Papers 1.20–1.26, 1.30, 1.31, and 1.34–1.37 turn the Bulkhead Tau evidence base into operating guidance for local LLM decisions: handoff discipline, privacy tradeoffs, model sizing, token-cost attention, when a model call should not be a model call, multi-agent HIL workflows, cross-audit cataloging, narration-surface failure analysis, validator discipline, and image-generation prompt discipline. Papers 1.39–1.43 add: a local-cluster negative result, encoding-as-lookup for cold-start architecture recovery, evidence-based harness governance, dedicated-vs-unified hardware measurement, and a graph-density explanation for a code-compression tool's reduction ratio.

Agent Path Evaluation

Paper 1.38 compares implementation paths rather than model identities. In the Amkor/xAmkor case study, the useful result is compositional: xAmkor owns the verifier surface, while Amkor owns broader cockpit/application surface.

Sensor-to-Simulation Engineering

Papers 1.27, 1.28, 1.29, 1.32, and 1.33 characterize wearable sport sensors through a fidelity-boundary lens (1.27), define the LabWired simulation platform boundary (1.28), close the physical-replay HIL loop (1.29), apply the corpus framing to a single instrumented match at depth (1.32), and define the proof boundary separating harness testing from component verification (1.33).

I · Grounding, Local Systems & Hardware

What makes a local or offline system actually useful — and what the evidence honestly supports.

Paper 1.2

Offline Grounded Domain Agent

The real unit of local usefulness is the harnessed domain system, not the raw model. A local model becomes operationally useful when paired with a deterministic substrate, grounding layer, explicit provenance, and a controlled escalation path. Raw local model, grounded local harness, and full local implementation-agent are three distinct things — not interchangeable.

Open site → Paper 1.3

Ski Chalet Harness Boundary

A prepared local 3090 system — Ollama, portable domain harness, and data bundle — can support grounded offline domain answering. The claim is narrow and honest: it is the harness that enables usefulness, not the raw model alone. The variable that matters most is harness level, not model size.

Open site → Paper 1.4

Fab Simulation & RVH

Semiconductor fab defectivity should be modeled as a dynamic rough process (RVH — Rough Volatility Hypothesis), not a static mean. Moving from a stable to an unstable fab produces a 7.1% loss in shippable output — a result that emerges from the path, not the average. Product complexity and process instability are separable causes of yield loss.

Open site →

II · Orchestration & Role Assignment

Where correctness should live in an AI system — and what happens when it lives in the wrong place.

Paper 1.5

LocalLLMTSP — Solver-Backed Orchestration

In a route-optimization workflow, correctness should live in the solver, not the model. Stronger models delay failure but do not eliminate the need for solver-backed architecture. Local models range from exact to structurally invalid at small scales and collapse at the world rung; the orchestrated path remains stable across the full ladder.

Open site → Paper 1.6

Where Orchestration Beats Raw Model Power

Once hardware is good enough, the organized operating stack settles the outcome before raw model size alone does. TourAgent, ShowcaseAgent, and Local Model Role Suitability together support a boundary claim: grounding, routing, and repair beat raw power in identifiable regimes.

Open site →

III · Framework & Operating Discipline

The standards, supervision structures, and failure taxonomy that make agentic work trustworthy.

Paper 1.1

Bulkhead Tau — Open-Core Standards

Bulkhead Tau is best understood as an open-core framework for grounded domain systems — not a single agent or benchmark story. Useful agentic systems require domain grounding, explicit validation, clear trust boundaries, and operating discipline. Standards, not prompt optimism.

Open site → Paper 1.7

Agentic Coding Failure Patterns

Agentic coding successes vary widely; failures recur in recognizable families. Drift, summit fever, bad context selection, false success, doom loops, and premature closure are documented across Bulkhead Tau operations. The practical response is standards, supervision, and lessons learned — not blind faith in scaling alone.

Open site →

IV · Local Model Addendum

Three empirical papers feeding the orchestration synthesis — grounded reliability, routing, and role suitability at portfolio scale.

Paper 1.13

TourAgent Local Model Screen

Grounding removes wrong-or-missing answers before it creates artifact-level precision. The local model screen result holds across model families once a deterministic substrate is in the path.

Open site → Paper 1.12

ShowcaseAgent Routing And Compression

Routing and compression are the first reliable local-LLM win at portfolio scale. Miss families are design signals, not capability failures — they identify where the harness, not the model, needs attention.

Open site → Paper 1.11

Local Model Role Suitability

Grounded response quality is largely model-family-independent once a deterministic substrate is in the path. The binding variable is harness configuration, not model identity.

Open site →

V · Boundary Conditions & Failure Prediction

Where the organized stack's advantage collapses — and why failure family is predictable from configuration, not query content.

Paper 1.10

Grounded Agent Failure Is Structurally Determined

Failure family is predictable from harness configuration features — not query content — confirming that domain expertise is the binding constraint. Empirically confirmed on 780 labeled rows from two Bulkhead Tau domains.

Open site → Paper 1.14

True Ski Chalet Boundary Result

Capability is not the local-only ceiling; operational speed on derived queries is. The true boundary separates what the harness can answer from what it cannot — not strong model from weak model.

Open site → Paper 1.15

When The Organized Stack Loses

Maps the five failure modes under which the organized stack's advantage collapses or inverts: latency ceiling (coordination overhead consumes the time budget), coverage gap (harness design failures invisible to stronger models), optimization maturity gap (PyTorch beats fused Numba CUDA 5.5×), runtime mismatch (ROCm wheel lacks gfx1151 target), and policy/role mismatch (larger model loses to better-fit smaller model in the specific regime).

Open site →

VI · RVH / ML Evaluation

Realized volatility forecasting as high-signal ML benchmark territory — and the cross-domain principle it reveals.

Paper 1.8

Rough Volatility — Cross-Domain Benchmark Principle

Both financial volatility and semiconductor defectivity satisfy the same four conditions for high-signal ML benchmark territory. The cross-domain parallel is structural, not analogical — the same rough-path argument applies to both.

Open site → Paper 1.9

Rough Volatility — ML Evaluation Domain

Realized volatility forecasting is a high-signal benchmark because naive pipeline failures are structural, not tunable. Empirically confirmed: a standard LSTM fails on realized volatility in a way that reveals domain ignorance, not hyperparameter sensitivity.

Open site →

VII · Measurement Integrity

When the evaluation infrastructure itself fails — or when the model's own disposition toward the substrate becomes part of the apparatus.

Paper 1.16

The Model Did Not Fail the Protocol. The Terminal Did.

Subprocess capture of ollama run output includes VT100 cursor-rewrite sequences that corrupt multi-line JSON for thinking-mode models, producing systematic false negatives. Under clean REST API capture, gemma4:31b passes all six protocol probes — the strongest result on this lane. The selective recovery pattern (only thinking-mode models affected) proves the failure was at the capture boundary, not the model boundary.

Open site → Paper 1.19

Literal Substrate Inspection — When Stronger Models Override the Evidence

Stronger models do not remove the need for harnesses; sometimes they increase it. When semantic correction overrides literal substrate inspection, a more capable model can produce a worse answer than a smaller or less opinionated one. A ten-prompt local matrix and a single-prompt strawperry probe show at least three distinct wrong-count mechanisms. The fix is not a smarter model — it is a harness that preserves the exact substrate and routes literal operations to deterministic tools.

Open site →

VIII · Operator Layer

Building an operator-facing outer layer over the deterministic stack — and keeping it outside the authority boundary.

Paper 1.17

The Operator Shell Pattern

An operator shell reduces friction, surfaces information, and enforces discipline without the model touching correctness. OpenClaw wraps Bulkhead Tau as the access layer — five HTTP operator surfaces, a hardening gate, and an incident workflow — while Bulkhead Tau remains the deterministic authority. Seven use cases across measurement, discipline, and decision surfaces confirm the pattern: the shell stays outside, the authority stays inside.

Open site →

IX · Applied / Production Evidence

Field validation, not lane validation — the framework operating against regulated data in a real domain.

Paper 1.18

PPR Agent — A Deterministic Substrate for Auditable Medical-Device Intelligence

Government-mandated Product Performance Reports from Abbott, Boston Scientific, and Medtronic — 3,576 device rows, 92 million US implants, 18 years (2008–2025), complete three-company coverage since 2014 — behind a SQLite substrate and a deterministic tool surface. The model selects a tool and formats the answer; the database supplies the facts. A frozen canonical query suite enforces that unknown values stay unknown, source gaps stay visible, and no answer is invented. Field evidence for the Bulkhead Tau thesis: substrate is authority, model is interface.

Open site →

X · Local LLM Operator Judgment

Operational judgment for local LLM lanes: handoff discipline, privacy cost, sizing discipline, token-cost attention, cross-audit failure cataloging, narration-surface risk, validator discipline, prompt-generation discipline, cluster-vs-solo hardware tradeoffs, encoding-as-lookup, harness governance, and dedicated-vs-unified memory economics.

Paper 1.20

Smarter, Faster, and Bounded by Handoff Discipline

Handoff-discipline doctrine for strict machine-facing local-model lanes.

Open paper → Paper 1.21

Privacy Is Worth Paying For

Separates the privacy argument for local inference from the false claim that the local lane is free. The operational question is when privacy, control, and auditability justify the real cost.

Open paper → Paper 1.22

Slow Is Not Smart

Larger, slower local models do not automatically improve validated Bulkhead Tau lanes. Strict-handoff systems are often bounded by prompt, schema, validator, and orchestration design rather than raw model size.

Open paper → Paper 1.23

Please Is Sand Off A Beach

Argues that courtesy tokens are negligible compared with structural token waste such as giant context dumps, repeated scaffolding, retries, and missing decomposition.

Open paper → Paper 1.24

The Model Is Not The Function

For bounded predicates with deterministic oracles, an LLM must earn its runtime against a written spec rather than against the visual length of the code it replaces. CAP-001 and LIB-001 evidence packets, with a num_predict verification ruling out the obvious counter-explanation.

Open paper → Paper 1.25

Orchestration Is Cheaper Than Reasoning

For models with extensive reasoning capacity, the computational cost of finding the answer often exceeds the cost of explaining it. Today's TSP and Scheduling benchmarks demonstrate that orchestration provides a 2x to 14x speedup over direct reasoning while significantly improving reliability.

Open paper → Paper 1.26

Multi-Agent AI Workflows in Hardware-in-the-Loop Simulation

Heterogeneous HIL stacks decompose into agent roles along language and permission boundaries. The handoff artifact is the critical interface for multi-agent continuity. Field data from GRAFANA-OBS-001/PROX-HIL-001 on the Z13 laptop: Rust simulation, Python harness, Claude Code orchestration, Grafana Tempo observability.

Open paper → Paper 1.30

Local Models Cost Frontier Tokens: The Hidden Supervisor-Side Bill in Local-Inference Workflows

Local-inference workflows do not eliminate the frontier-token bill; they shift it from inference billing to the supervising operator's session budget at audit, repair, and convergence time. Retrospective evidence across 10 historical Bulkhead Tau workflows shows the cost concentrates in REPAIRED cases, and while inference-side optimizations narrow the wall-clock penalty, they do not touch audit/repair cost.

Open paper → Paper 1.31

Models Don't Get Better, Catalogs Do: Cross-Audit Failure Cataloging as Operator-Side Reliability Infrastructure

Reliability gains from waiting for the next model release are slow, diffuse, and outside operator control; gains from operator-side cross-audit failure catalogs are fast, specific, and controllable. The append-only, severity-rated catalog survives agent identity changes across model and CLI vendor releases. The claim is relative and bounded: it binds for supervised multi-agent workflows with patterned recurrence, not single-agent or unsupervised stacks.

Open paper → Paper 1.34

The Narration Surface — Where Agentic LLM Fabrication Lives

Fabrication clusters in narration surfaces — summaries, framing, citations, and sign-off text — and is rare on clean execution surfaces in this Bulkhead Tau catalog. The robust cross-rater finding is 17/18 formal failure entries narration-tainted.

Open paper → Paper 1.35

Trust the Validator, Not the Model

The DBB-002 matrix shows why robust agentic systems must treat the model as an untrusted generation substrate and offload safety, pathing, and semantic enforcement to deterministic validation loops.

Open paper → Paper 1.36

Design Rule: Image-to-Image Sketch Poisoning

Abstract text labels inside image-to-image sketches act as visual noise. Semantics belong in the prompt; reference sketches should carry geometry, not literal labels.

Open paper → Paper 1.37

Local LLM Prompt Style Divergence in Text-to-Image Pipelines

Across three image-prompt briefs, gemma4:12b favored conversational prose while gemma4:26b produced denser tag-heavy prompts better suited to automated text-to-image pipelines.

Open paper → Paper 1.39

When Local LLM Clusters Do Not Help

RPC sharding across a desktop RTX 3090 and a z13 Radeon 8050S worked but underperformed desktop-local execution alone — z13's GPU-only ceiling stopped below the 26B/27B class. The useful result is orchestration boundaries, not distributed inference.

Open paper → Paper 1.40

Encoding Converts Architecture Recovery into Lookup

Encoding a multi-repo system's architecture as loadable repository facts converts cold-start map recovery from synthesis into lookup — a 31B tool-using agent and a 14B funnel-fed model both recovered the real boundary map, with every failure tracing to a documentation gap, not a model limit.

Open paper → Paper 1.41

Grok Built Into Operator By Leaving Residue

xAI's Grok CLI was registered as a peer harness in Operator and tested empirically: one session produced two independently verified commits plus a real product finding, alongside a documented miss — evidence over brand rank, scoped honestly to a single session.

Open paper → Paper 1.42

Dedicated 24 GB Beats Unified 27 GB: The Capacity Trap in Local Inference

A 24 GB RTX 3090 desktop beats a 27 GB Strix Halo unified-memory laptop on local inference: the larger shared-memory label hides a smaller GPU-resident ceiling, and dense-model spill collapses throughput. A 2026-07-20 residency-verified re-measurement (num_ctx 8192) widened the desktop's advantage to 3.7–4.9x.

Open paper → Paper 1.43

Graphify's Reduction Ratio Tracks Graph Density, Not Code Quality

Two intuitive hypotheses about a code-graph tool's token-reduction ratio — modularity, then raw corpus size — were each falsified by a real contrasting corpus. Graph density (edges/node, inverse) held direction across two advance predictions, one retracted from flagship status after its corpus turned out to be of uncertain, likely eval-origin provenance; the magnitude of the second prediction missed by 5x, left as an open question rather than smoothed over.

Open paper →

XI · Agent Path Evaluation

Comparing implementation paths under evidence discipline rather than treating model identity as a leaderboard.

Paper 1.38

Codex vs Claude Code Is a False Choice

In the Amkor/xAmkor case study, the artifact-backed result is compositional: xAmkor was stronger on the benchmark/verifier surface, while Amkor carried broader cockpit/application surface.

Open paper →

XII · Sensor-to-Simulation Engineering

Characterizing wearable sport sensors and building cycle-accurate hardware simulation for firmware validation.

Paper 1.27

A Field Guide to Wearable Sport Sensors: Data Landscape, Fidelity Boundaries, and Engineering Constraints

Characterizes the five-sensor corpus deployed in Bulkhead Tau through a fidelity-boundary lens: the point at which each sensor's output stops being measurement and starts being vendor interpretation. Concludes that no sensor in the corpus exposes a sample-accurate waveform, so sensor-driven HIL on this corpus is necessarily event-driven, not waveform-driven.

Open paper → Paper 1.28

LabWired: Cycle-Accurate Hardware Simulation for Embedded Sensor Systems

Explains the LabWired hardware simulation platform: architecture, expanded component library, Path A declarative register-bank modeling vs Path B behavioral/shared-memory device models, and the corrected STM32F401 fidelity boundary.

Open paper → Paper 1.29

Closing the Loop: From Real Sensor Data to Cycle-Accurate Firmware Validation

Uses documented physical proximity data to drive the ProximityAgent HIL firmware path through LabWired and the shm_i2c bridge, closing the real-data gate for this bounded physical-replay case.

Open paper → Paper 1.32

Deep Dive Into a Tennis Match Data Pool: How Much Data One Competitive Bout Yields — A 2023 USTA Round-of-16 Loss Under Dual-Sensor Wearable Instrumentation

A single 2023 USTA Round-of-16 loss instrumented with two wearable sensors simultaneously. Zepp2 captured 352 shots with per-shot impact location, stroke type, ball speed, and spin; Babolat POP captured 284 in the same window — a 19% shot-count disagreement that empirically supports the cross-sensor-divergence claim from Paper 1.27. Single-match data supports pattern description but refuses causal attribution of the loss; the n=1 limitation is explicit.

Open paper → Paper 1.33

The Proof Boundary: Defining the Edge of Verification in Hardware-in-the-Loop Simulation

Defines the "Proof Boundary" separating testing of a simulation harness from verification of a target component's behavior. Analyzes four boundary-crossing failure modes and four operational tests — Provenance, Path, Triviality, and Output Dependence — to determine boundary status, and examines susceptibility to layered offload in multi-agent supervisor-supervised workflows.

Open paper →

Full Inventory

#	Title	Track	Site
Primary Papers — 1.1 through 1.7
1.1	Bulkhead Tau — Open-Core Standards	Framework	bulkhead-tau/
1.2	Offline Grounded Domain Agent	Grounding	offline-agent/
1.3	Ski Chalet Harness Boundary	Grounding	ski-chalet/
1.4	Fab Simulation & RVH	Grounding	fab-rvh/
1.5	LocalLLMTSP — Solver-Backed Orchestration	Orchestration	local-llm-tsp/
1.6	Where Orchestration Beats Raw Model Power	Orchestration	orchestration/
1.7	Agentic Coding Failure Patterns	Operations	agentic-coding/
RVH — 1.8 and 1.9
1.8	Rough Volatility — Cross-Domain Benchmark Principle	RVH / ML Eval	rough-volatility/
1.9	Rough Volatility — ML Evaluation Domain	RVH / ML Eval
Boundary & Details — 1.10 through 1.15
1.10	Grounded Agent Failure Is Structurally Determined	Boundary	failure-details/
1.11	Local Model Role Suitability	Local Model	local-model-role-suitability/
1.12	ShowcaseAgent Routing And Compression	Local Model	details/
1.13	TourAgent Local Model Screen	Local Model
1.14	True Ski Chalet Boundary Result	Boundary
1.15	When The Organized Stack Loses	Boundary
Measurement Integrity — 1.16 and 1.19
1.16	The Model Did Not Fail the Protocol. The Terminal Did.	Measurement	capture-integrity/
1.19	Literal Substrate Inspection — When Stronger Models Override the Evidence	Measurement	#paper-1-19
Operator Layer — 1.17
1.17	The Operator Shell Pattern	Operator Layer	operator-shell/
Applied / Production Evidence — 1.18
1.18	PPR Agent — A Deterministic Substrate for Auditable Medical-Device Intelligence	Applied	ppr-agent/
Local LLM Operator Judgment — 1.20 through 1.26, 1.30, 1.31, 1.34 through 1.37, and 1.39 through 1.43
1.20	Smarter, Faster, and Bounded by Handoff Discipline	Local LLM	gemma-handoff-discipline
1.21	Privacy Is Worth Paying For	Local LLM	privacy-is-worth-paying-for
1.22	Slow Is Not Smart	Local LLM	slow-is-not-smart
1.23	Please Is Sand Off A Beach	Local LLM	please-is-sand-off-a-beach
1.24	The Model Is Not The Function	Local LLM	the-model-is-not-the-function
1.25	Orchestration Is Cheaper Than Reasoning	Local LLM	orchestration-is-cheaper-than-reasoning
1.26	Multi-Agent AI Workflows in Hardware-in-the-Loop Simulation	Local LLM	multi-agent-hil-workflows
1.30	Local Models Cost Frontier Tokens: The Hidden Supervisor-Side Bill in Local-Inference Workflows	Local LLM	local-models-cost-frontier-tokens
1.31	Models Don't Get Better, Catalogs Do: Cross-Audit Failure Cataloging as Operator-Side Reliability Infrastructure	Local LLM	models-dont-get-better-catalogs-do
1.34	The Narration Surface — Where Agentic LLM Fabrication Lives	Local LLM	the-narration-surface
1.35	Trust the Validator, Not the Model: Deterministic Quality Gates in Bounded Domain Building under Bulkhead Tau	Local LLM	trust-the-validator
1.36	Design Rule: Image-to-Image Sketch Poisoning	Local LLM	design-rule-sketch-poisoning
1.37	Local LLM Prompt Style Divergence in Text-to-Image Pipelines	Local LLM	local-llm-image-prompt-style
1.39	When Local LLM Clusters Do Not Help	Local LLM	when-local-llm-clusters-do-not-help
1.40	Encoding Converts Architecture Recovery into Lookup	Local LLM	encoding-converts-synthesis-into-lookup
1.41	Grok Built Into Operator By Leaving Residue	Local LLM	grok-built-into-operator
1.42	Dedicated 24 GB Beats Unified 27 GB: The Capacity Trap in Local Inference	Local LLM	dedicated-vs-unified-memory
1.43	Graphify's Reduction Ratio Tracks Graph Density, Not Code Quality	Local LLM	graphify-reduction-tracks-density
Agent Path Evaluation — 1.38
1.38	Codex vs Claude Code Is a False Choice	Agent Path	codex-vs-claude-false-choice
Sensor-to-Simulation Engineering — 1.27, 1.28, 1.29, 1.32, and 1.33
1.27	A Field Guide to Wearable Sport Sensors: Data Landscape, Fidelity Boundaries, and Engineering Constraints	Simulation	wearable-sensor-corpus
1.28	LabWired: Cycle-Accurate Hardware Simulation for Embedded Sensor Systems	Simulation	labwired-simulation-platform
1.29	Closing the Loop: From Real Sensor Data to Cycle-Accurate Firmware Validation	Simulation	sensor-driven-hil
1.32	Deep Dive Into a Tennis Match Data Pool: How Much Data One Competitive Bout Yields — A 2023 USTA Round-of-16 Loss Under Dual-Sensor Wearable Instrumentation	Simulation	deep-dive-tennis-match-pool
1.33	The Proof Boundary: Defining the Edge of Verification in Hardware-in-the-Loop Simulation	Simulation	the-proof-boundary

All sites live at bulkheadtau.com. Papers 1.8–1.9 share the rough-volatility site; 1.12–1.13 share the details site; 1.10/1.14/1.15 share the failure-details site; 1.16 and 1.19 share the capture-integrity site. Paper 1.17 has a dedicated site at operator-shell/. Paper 1.18 has a dedicated site at ppr-agent/. Papers 1.20–1.28 and 1.30–1.43 live under bulkhead-tau/generated-papers/ and are also exposed through phoenix-groups.html.