workTracker
arrow_backAll shared work
proposal Updated June 23, 2026

A vision for publishing

Text has no concept of a release — a boolean flips master live. Port the audio pipeline's "blessed checkpoint" idea to text with an additive release tag on rendered_content.

folderpublic-data-api #publishing #releases #data-quality #public-data-api

This isn’t really an audio problem or a text problem. The publishing model below is the same for both — theoretically the only thing that differs is the storage mechanism for the bytes. So the goal is one friendly button: OPS presses publish, and on IT’s side that’s just an interface addition over a single release model, whatever the content happens to be.

The short version: text has no concept of a release. A single boolean flips a repo’s master tip live on BIEL, so a stray commit from an authoring tool can land bad data in front of the world. The duplicate-books problem documented in the data-inconsistencies note is one visible symptom of exactly this looseness — run the live check there to see current offenders. The fix is not a new app — it is one missing idea: a blessed checkpoint.

The four questions any pipeline has to answer

Every option below is judged against these four, and the split between who decides and who builds is half the point:

  • Storage — where the bytes physically live. (IT)
  • Versioning / publication — how we mark a blessed checkpoint and expose it. (OPS decides, IT how)
  • Working area — where work-in-progress lives, and what the unit of a “product” is. (IT)
  • Updates — how the world learns something changed. (OPS triggers, IT propagates)

How text works today — and why it bites

This is the real flow as it stands, not a strawman:

  • OPS blesses content by flipping show_on_biel + status: Primary — correctly their call.
  • But the boolean means “master is always latest.” No checkpoint, no release, no schedule. The live site tracks the raw tip of master.
  • A push triggers a webhook into the USFM rendering pipeline, which writes the rendered output to Azure Blob Storage, then dispatches a message onto an Azure Event Bus — “this repo was rendered, here is where.” An API listener consumes that event and inserts the render metadata, producing the rendered_content rows BIEL reads.
  • So a junk commit flows straight through to publication. There is no gate between committed and published.
  • And some products are stitched together from many per-book repos at read time while others are not — the inconsistency the reader app then pays for.

OPS shouldn’t have to know how releases work under the hood. We just need them to say “yep, this batch is approved” — and the system takes it from there.

The proposed flow, one repo at a time

OPS cuts a release on a repo. That release is a date-based tag (e.g. 2026.06.23 — semantic versions mean nothing here; this is not software, and the date is the only signal that “something we’re happy with changed”). Cutting the release fires the same webhook and USFM pipeline we already run; the only new thing is that the release tag is carried through and written onto the rendered_content row. BIEL then queries for content that is show_on_biel and released.

100%
Drag to pan · ⌘/Ctrl-scroll or the controls to zoom · double-click to reset.Cut a release → the existing render pipeline carries the tag → BIEL serves only blessed, un-stitched checkpoints. Junk on master can't reach the public.

How big a change is this?

How far do we go?

Stop the bleeding at the source

No model change. Fix the visible duplicates by hand, the way the data-inconsistencies note already prescribes.

  • For each conflicting book, pick the authoritative repo and stop publishing it from the other (retire the stale repo or escalate to its owner).
  • Cheap, reversible, and it clears the duplicates a reader sees today.
  • Doesn't touch the real defect. master tip is still live, so the next stray commit republishes instantly.
  • A human has to re-run the check forever; nothing structural prevents recurrence.

The additive change to the public data API

Today rendered_content exposes only file_type, file_size_bytes, url, hash, and timestamps — there is no release concept at all. The proposal is deliberately small and backward-compatible:

  • Add a nullable release column (the date tag, e.g. 2026.06.23) and, optionally, release_commit (the pinned commit the tag points at).
  • The API listener writes it as it inserts render metadata — the USFM pipeline just passes the release tag along for that render. No new service.
  • It is additive: existing rows stay valid, and any consumer that doesn’t care keeps reading url exactly as before. A consumer that wants the bleeding edge still reads the master tip.
  • The stored commit even lets both sides checksum a release to detect tampering — more than we likely need, but free.

This is also why text needs a more refined ecosystem than a flag: pre-rendering (USFM today, and plausibly HTML for web consumers) only makes sense if there is a stable thing — a release — to pre-render against and point at.

The hard lines

  • One release = one repo. The publish action applies to a single repo. It cannot be expected to consolidate or stitch many things under the hood.
  • No read-time stitching. A published product lives together before publish. The reader app must never assemble N books for some products and not others. Drafting may span repos — it shouldn’t — but “published” means un-stitched.
  • One centralized source of truth. The repo stays the source of truth. No moving repos between orgs to mark “released.”
  • OPS never sees the mechanism. OPS decides whether; IT decides how. Tags, releases, and pre-rendering are entirely under the hood.

Recommendation summary

Concern Owner Recommendation Rejected alternative
Versioning / publication OPS · IT Keep the bool gate; add a per-repo date-tagged release → additive release field on rendered_content “Master tip is always live” (today’s bug); forced branch workflows; a separate “released” org
Working area IT One repo per product; master tip = WIP; release pins a checkpoint within it N per-book repos stitched at read time
Updates OPS · IT Re-cut a release → new dated tag, same pipeline and API field; partial/in-progress releases allowed Treating updates as a separate system
Storage IT Out of scope here — text already renders to Azure Blob; revisit only if a consumer needs pre-rendered HTML

Open questions

A few gaps I’d want to close before this is a real proposal rather than a vision:

  • Who cuts the release, and where? Is the “Release” button surfaced in PORT next to the existing show_on_biel toggle, or in the Git host directly?
  • First release on first approval? Treating the initial show_on_biel flip as an automatic first release would keep the OPS mental model to a single action.
  • Does BIEL hard-require a release, or fall back? For the long tail of content with no release yet, does BIEL show the latest render (today’s behaviour) until a first release exists, or hide it? This decides whether the rollout can be gradual.
  • Pre-rendered HTML. If we pre-render HTML (not just USFM) for web consumers, where does it live — another Azure Blob path keyed by release, or a rendered_content row of its own?