Module: Ucode::CodeChart
- Defined in:
- lib/ucode/code_chart.rb,
lib/ucode/code_chart/writer.rb,
lib/ucode/code_chart/sidecar.rb,
lib/ucode/code_chart/extractor.rb,
lib/ucode/code_chart/provenance.rb
Overview
CodeChart — per-codepoint SVG glyph extraction from Unicode Code Charts PDFs.
The "Code Chart donor" use case (essenfont consumer): for blocks where no OFL real-font covers the glyphs (Sidetic in Unicode 17, Egyptian Hieroglyphs Extended-B), the only canonical source is the Unicode Consortium's Code Chart PDF. This namespace turns one such PDF into a tree of standalone SVG files plus provenance sidecar JSON.
Architecture (MECE)
Every concern has exactly one home:
* **Block metadata** (range + assigned codepoints) — Parsers::Blocks
* **PDF download + cache** — Fetch::CodeCharts + Glyphs::PdfFetcher
* **PDF object-graph walk + font extraction** — Glyphs::EmbeddedFonts::*
* **Tier selection (Pillar 1 / 2 / 3)** — Glyphs::Resolver
* **SVG conversion + y-flip + viewBox** — Glyphs::EmbeddedFonts::Svg
* **Provenance schema** — CodeChart::Provenance (this namespace)
* **Sidecar JSON write** — CodeChart::Sidecar (this namespace)
* **Per-block orchestration + idempotent disk write** — CodeChart::Writer
* **CLI dispatch** — Cli::CodeChartCmd
CodeChart::* is the feature-facing namespace. It does not implement extraction, font parsing, or PDF I/O — it composes the existing infrastructure. Replacing the implementation (e.g. a future pure-Ruby PDF parser per ADR-0001) does not change the public API.
Defined Under Namespace
Classes: Extractor, Provenance, Sidecar, Writer
Class Method Summary collapse
-
.build(block:, codepoint:, ucd_version:, pdf_path:, now: nil) ⇒ Provenance
Builds a Provenance from the inputs the Writer has on hand (block, codepoint, ucd_version, pdf_path).
-
.code_chart_url(block_first_cp) ⇒ String
Computes the source PDF's URL from a block name and first codepoint.
-
.sha256_of(path) ⇒ String
Hex digest, "" when the path doesn't exist (callers can decide how to handle a missing hash).
Class Method Details
.build(block:, codepoint:, ucd_version:, pdf_path:, now: nil) ⇒ Provenance
Builds a Provenance from the inputs the Writer has on hand
(block, codepoint, ucd_version, pdf_path). Computes the PDF
hash + URL once. The extracted_at timestamp is fixed at
call time so re-running the same block produces identical
provenance JSON for unchanged codepoints.
59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/ucode/code_chart/provenance.rb', line 59 def self.build(block:, codepoint:, ucd_version:, pdf_path:, now: nil) path = Pathname.new(pdf_path) Provenance.new( codepoint: format("U+%04X", codepoint), block: block.id, source_pdf_url: code_chart_url(block.range_first), source_pdf_sha256: sha256_of(path), ucd_version: ucd_version, extracted_at: (now || Time.now.utc).iso8601, extractor_version: Ucode::VERSION, ) end |
.code_chart_url(block_first_cp) ⇒ String
Computes the source PDF's URL from a block name and first codepoint. Mirrors the per-block URL convention in Fetch::CodeCharts: 4-digit hex for BMP, 6-digit for supplementary planes.
41 42 43 44 45 |
# File 'lib/ucode/code_chart/provenance.rb', line 41 def self.code_chart_url(block_first_cp) width = block_first_cp > 0xFFFF ? 6 : 4 slug = block_first_cp.to_s(16).upcase.rjust(width, "0") "#{Ucode.configuration.charts_base_url}/U#{slug}.pdf" end |
.sha256_of(path) ⇒ String
Returns hex digest, "" when the path doesn't exist (callers can decide how to handle a missing hash).
75 76 77 78 79 |
# File 'lib/ucode/code_chart/provenance.rb', line 75 def self.sha256_of(path) return "" unless path.exist? Digest::SHA256.file(path).hexdigest end |