Class: Ucode::Glyphs::EmbeddedFonts::ContentStreamCorrelator
- Inherits:
-
Object
- Object
- Ucode::Glyphs::EmbeddedFonts::ContentStreamCorrelator
- Defined in:
- lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb
Overview
Pillar 2 fallback: build a ‘=> gid` map for a Type0 font whose PDF object graph has no `/ToUnicode` CMap stream.
The Code Charts draw every chart cell as a ‘<use>` element that references the font’s GID via an ‘href` of the form `#font_<font_obj_id>_<gid>`. The chart also prints the row + column codepoint labels using one or more “label” fonts (small Latin glyphs) that show the hex codepoint as text. By clustering the labels positionally (Y-bucket for the row, X-bucket for the column) we recover the codepoint each cluster represents, then match each cluster positionally to the specimen glyph at the same Y/X position.
The algorithm generalizes the Tai Yo correlator that was tested against ‘data/pdfs/U1E6C0.pdf` (50/52 specimen codepoints matched, with the two missing being layout edge cases). The bucket sizes are configurable because some blocks use a tighter grid than others.
Inputs are deliberately pure: a string of SVG markup plus a Config. The catalog is responsible for sourcing the SVG (by rendering the relevant PDF page(s) via ‘mutool draw -F svg`) and for knowing which font_obj_ids are labels vs specimen on that page. That keeps this class trivially testable with synthetic SVG fixtures.
Defined Under Namespace
Constant Summary collapse
- DEFAULT_Y_BUCKET =
1.5- DEFAULT_X_BUCKET =
50.0
Instance Method Summary collapse
-
#correlate(svg) ⇒ Hash{Integer=>Integer}
Codepoint => gid.
-
#initialize(config) ⇒ ContentStreamCorrelator
constructor
A new instance of ContentStreamCorrelator.
Constructor Details
#initialize(config) ⇒ ContentStreamCorrelator
Returns a new instance of ContentStreamCorrelator.
65 66 67 68 69 |
# File 'lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb', line 65 def initialize(config) @config = config @y_bucket = config.y_bucket || DEFAULT_Y_BUCKET @x_bucket = config.x_bucket || DEFAULT_X_BUCKET end |
Instance Method Details
#correlate(svg) ⇒ Hash{Integer=>Integer}
Returns codepoint => gid. Empty if no clusters could be matched.
76 77 78 79 80 81 |
# File 'lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb', line 76 def correlate(svg) uses = parse_uses(svg) return {} if uses.empty? partition_and_map(uses) end |