Module: Ucode::CodeChart

Defined in:
lib/ucode/code_chart.rb,
lib/ucode/code_chart/writer.rb,
lib/ucode/code_chart/sidecar.rb,
lib/ucode/code_chart/extractor.rb,
lib/ucode/code_chart/provenance.rb

Overview

CodeChart — per-codepoint SVG glyph extraction from Unicode Code Charts PDFs.

The "Code Chart donor" use case (essenfont consumer): for blocks where no OFL real-font covers the glyphs (Sidetic in Unicode 17, Egyptian Hieroglyphs Extended-B), the only canonical source is the Unicode Consortium's Code Chart PDF. This namespace turns one such PDF into a tree of standalone SVG files plus provenance sidecar JSON.

Architecture (MECE)

Every concern has exactly one home:

* **Block metadata** (range + assigned codepoints) — Parsers::Blocks
* **PDF download + cache** — Fetch::CodeCharts + Glyphs::PdfFetcher
* **PDF object-graph walk + font extraction** — Glyphs::EmbeddedFonts::*
* **Tier selection (Pillar 1 / 2 / 3)** — Glyphs::Resolver
* **SVG conversion + y-flip + viewBox** — Glyphs::EmbeddedFonts::Svg
* **Provenance schema** — CodeChart::Provenance (this namespace)
* **Sidecar JSON write** — CodeChart::Sidecar (this namespace)
* **Per-block orchestration + idempotent disk write** — CodeChart::Writer
* **CLI dispatch** — Cli::CodeChartCmd

CodeChart::* is the feature-facing namespace. It does not implement extraction, font parsing, or PDF I/O — it composes the existing infrastructure. Replacing the implementation (e.g. a future pure-Ruby PDF parser per ADR-0001) does not change the public API.

Defined Under Namespace

Classes: Extractor, Provenance, Sidecar, Writer

Class Method Summary collapse

Class Method Details

.build(block:, codepoint:, ucd_version:, pdf_path:, now: nil) ⇒ Provenance

Builds a Provenance from the inputs the Writer has on hand (block, codepoint, ucd_version, pdf_path). Computes the PDF hash + URL once. The extracted_at timestamp is fixed at call time so re-running the same block produces identical provenance JSON for unchanged codepoints.

Parameters:

  • block (Ucode::Models::Block)
  • codepoint (Integer)
  • ucd_version (String)
  • pdf_path (Pathname, String)
  • now (Time, nil) (defaults to: nil)

    override for tests

Returns:



59
60
61
62
63
64
65
66
67
68
69
70
# File 'lib/ucode/code_chart/provenance.rb', line 59

def self.build(block:, codepoint:, ucd_version:, pdf_path:, now: nil)
  path = Pathname.new(pdf_path)
  Provenance.new(
    codepoint: format("U+%04X", codepoint),
    block: block.id,
    source_pdf_url: code_chart_url(block.range_first),
    source_pdf_sha256: sha256_of(path),
    ucd_version: ucd_version,
    extracted_at: (now || Time.now.utc).iso8601,
    extractor_version: Ucode::VERSION,
  )
end

.code_chart_url(block_first_cp) ⇒ String

Computes the source PDF's URL from a block name and first codepoint. Mirrors the per-block URL convention in Fetch::CodeCharts: 4-digit hex for BMP, 6-digit for supplementary planes.

Parameters:

  • block_first_cp (Integer)

Returns:

  • (String)


41
42
43
44
45
# File 'lib/ucode/code_chart/provenance.rb', line 41

def self.code_chart_url(block_first_cp)
  width = block_first_cp > 0xFFFF ? 6 : 4
  slug = block_first_cp.to_s(16).upcase.rjust(width, "0")
  "#{Ucode.configuration.charts_base_url}/U#{slug}.pdf"
end

.sha256_of(path) ⇒ String

Returns hex digest, "" when the path doesn't exist (callers can decide how to handle a missing hash).

Parameters:

  • path (Pathname)

Returns:

  • (String)

    hex digest, "" when the path doesn't exist (callers can decide how to handle a missing hash)



75
76
77
78
79
# File 'lib/ucode/code_chart/provenance.rb', line 75

def self.sha256_of(path)
  return "" unless path.exist?

  Digest::SHA256.file(path).hexdigest
end