workTracker
arrow_backAll shared work
design Updated June 22, 2026

Biel duplicates chapters for a single Bible

For some languages biel shows the same book and its chapters twice. Run the check to see which are affected.

folderpublic-data-api #data-quality #graphql #biel

For some languages, biel shows the same book — and all its chapters — twice in one Bible. It happens when a book is supplied by more than one published source, and the display stitches both in. Run the check below against prod or dev, then click see in biel on any result to view the duplicate.

Live check

Default view is the saved snapshot the ops worker stores; "Run live check" refetches now (display only — it never sends an alert).

Why it’s a problem

A reader sees Matthew (or whatever book) listed and rendered twice for what should be one Bible. It looks broken and makes the language picker untrustworthy.

What needs to fix it

A person has to choose which source wins, since the overlapping repos are genuinely separate. Pick the authoritative repo for the book and stop publishing it from the other, retire a stale repo, or escalate to whoever owns the publishing. There’s no safe automatic rule.

A proposed larger fix

Choosing a winning source clears the duplicates you see today, but it’s a band-aid — master tip is still live, so the next stray commit can republish the conflict. The structural fix is to give text a real publishing checkpoint. See A vision for publishing.


Technical detail

Scripture is published two ways: one repo holding the whole Bible, or many small repos (a book, sometimes a chapter, each). Biel reassembles the small repos into one content per domain / resource_type / type for display. A conflict is when, inside one of those reassembled groups, the same book is supplied by more than one content — so the stitch keeps both and the book renders twice.

The check only looks at languages biel reassembles (every non-gateway language, plus gateway exceptions — currently ceb) and the three stitched domains (scripture, gloss, parascriptural), since those are the ones that produce a visible duplicate. The collation rules live in biel-frontend src/data/pubDataApi.ts (doCollateContent, isKnownGatewayContentFormatException, collateGatewayContent) — keep the queries here in sync if those change.

The related duplicate class — the same URL stored twice under different casing — is already fixed at the source: ingest now enforces a case-insensitive unique URL and lets the latest write win.

The query (browser, GraphQL)

The tool sends this and does the grouping in the browser (Hasura can’t express the count(distinct) > 1 directly):

query DataInconsistencies {
  language(
    where: {contents: {wa_content_metadata: {status: {_eq: "Primary"}, show_on_biel: {_eq: true}}}}
  ) {
    ietf_code
    national_name
    wa_language_metadata { is_gateway }
    contents(
      where: {wa_content_metadata: {status: {_eq: "Primary"}, show_on_biel: {_eq: true}}, rendered_contents_aggregate: {count: {predicate: {_gt: 0}}}}
    ) {
      name
      type
      domain
      resource_type
      git_repo { repo_url }
      rendered_contents(
        where: {scriptural_rendering_metadata: {book_slug: {_is_null: false}}}
      ) {
        scriptural_rendering_metadata { book_slug }
      }
    }
  }
}

The query (DB, SQL)

The same idea run directly against Postgres (read-only). Returns one row per source content per conflicting book. The SQL counts every domain; the browser tool narrows to the stitched ones.

-- Stitched-book conflicts: same book_slug contributed by >1 content within a
-- biel "collated" language group. Read-only. Mirrors collateGatewayContent.
WITH collated AS (
  SELECT c.id AS content_id, c.language_id, c.domain, c.resource_type,
         c."type" AS content_type, c.name, g.repo_url
  FROM content c
  JOIN wa_content_metadata m ON m.content_id = c.id
  LEFT JOIN wa_language_metadata lm ON lm.ietf_code = c.language_id
  LEFT JOIN git_repo g ON g.content_id = c.id
  WHERE m.status = 'Primary' AND m.show_on_biel = true
    AND (lm.is_gateway = false OR lm.is_gateway IS NULL OR c.language_id = 'ceb')
),
book_sources AS (
  SELECT col.language_id, col.domain, col.resource_type, col.content_type,
         srm.book_slug, col.content_id, col.name, col.repo_url
  FROM collated col
  JOIN rendered_content r ON r.content_id = col.content_id
  JOIN scriptural_rendering_metadata srm ON srm.rendering_id = r.id
  WHERE srm.book_slug IS NOT NULL
  GROUP BY col.language_id, col.domain, col.resource_type, col.content_type,
           srm.book_slug, col.content_id, col.name, col.repo_url
),
conflicts AS (
  SELECT language_id, domain, resource_type, content_type, book_slug
  FROM book_sources
  GROUP BY language_id, domain, resource_type, content_type, book_slug
  HAVING count(DISTINCT content_id) > 1
)
SELECT bs.language_id, bs.resource_type, bs.book_slug,
       bs.content_id, bs.name AS content_name, bs.repo_url
FROM book_sources bs
JOIN conflicts cf
  ON cf.language_id  = bs.language_id
 AND cf.domain        IS NOT DISTINCT FROM bs.domain
 AND cf.resource_type IS NOT DISTINCT FROM bs.resource_type
 AND cf.content_type  IS NOT DISTINCT FROM bs.content_type
 AND cf.book_slug     = bs.book_slug
ORDER BY bs.language_id, bs.resource_type, bs.book_slug, bs.content_id;