Class: Ucode::Glyphs::EmbeddedFonts::ContentStreamCorrelator
- Inherits:
-
Object
- Object
- Ucode::Glyphs::EmbeddedFonts::ContentStreamCorrelator
- Defined in:
- lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb
Overview
Pillar 2 fallback: build a {codepoint => gid} map for a Type0
font whose PDF object graph has no /ToUnicode CMap stream.
The Code Charts draw every chart cell as a <use> element that
references the font's GID via an href of the form
#font_<font_obj_id>_<gid>. The chart also prints the row +
column codepoint labels using one or more "label" fonts (small
Latin glyphs) that show the hex codepoint as text. By clustering
the labels positionally (Y-bucket for the row, X-bucket for the
column) we recover the codepoint each cluster represents, then
match each cluster positionally to the specimen glyph at the
same Y/X position.
The algorithm generalizes the Tai Yo correlator that was tested
against data/pdfs/U1E6C0.pdf (50/52 specimen codepoints
matched, with the two missing being layout edge cases). The
bucket sizes are configurable because some blocks use a tighter
grid than others.
Inputs are deliberately pure: a string of SVG markup plus a
Config. The catalog is responsible for sourcing the SVG (by
rendering the relevant PDF page(s) via mutool draw -F svg) and
for knowing which font_obj_ids are labels vs specimen on that
page. That keeps this class trivially testable with synthetic
SVG fixtures.
Defined Under Namespace
Constant Summary collapse
- DEFAULT_Y_BUCKET =
1.5- DEFAULT_X_BUCKET =
50.0
Instance Method Summary collapse
-
#correlate(svg) ⇒ Hash{Integer=>Integer}
Codepoint => gid.
-
#initialize(config) ⇒ ContentStreamCorrelator
constructor
A new instance of ContentStreamCorrelator.
Constructor Details
#initialize(config) ⇒ ContentStreamCorrelator
Returns a new instance of ContentStreamCorrelator.
65 66 67 68 69 |
# File 'lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb', line 65 def initialize(config) @config = config @y_bucket = config.y_bucket || DEFAULT_Y_BUCKET @x_bucket = config.x_bucket || DEFAULT_X_BUCKET end |
Instance Method Details
#correlate(svg) ⇒ Hash{Integer=>Integer}
Returns codepoint => gid. Empty if no clusters could be matched.
76 77 78 79 80 81 |
# File 'lib/ucode/glyphs/embedded_fonts/content_stream_correlator.rb', line 76 def correlate(svg) uses = parse_uses(svg) return {} if uses.empty? partition_and_map(uses) end |