Class: Ucode::Glyphs::GridDetector
- Inherits:
-
Object
- Object
- Ucode::Glyphs::GridDetector
- Defined in:
- lib/ucode/glyphs/grid_detector.rb
Overview
Detects the chart grid in a Code Charts PDF page rendered to SVG.
The PDF page produced by pdftocairo / pdf2svg / dvisvgm contains every visible element (title, block name, row labels, codepoint digits, and the actual character glyphs) as positioned ‘<use>` references into a `<defs>` block of named glyph outlines. The character cells we want to extract correspond to glyphs whose bounding box is larger than every label or digit font on the page — the chart’s character samples are drawn at a larger size than any of the surrounding text.
Algorithm:
1. Walk `<defs>`, estimate each glyph's bbox via `PathBbox`.
2. Classify a glyph as "character-sized" when its width and
height both exceed `CharSizeThreshold` (default 8 pt).
This excludes title, row-label, and digit glyphs while
keeping every actual character sample — including pages
where the chart mixes multiple character fonts (e.g. the
Basic Latin page uses one font for punctuation/digits and
another for letters).
3. Collect every `<use>` that references a character-sized
glyph; these are the cell origins.
4. Cluster the Y values of those uses into rows, and within
each row cluster the X values into columns.
5. Drop rows whose column count diverges from the modal value
(these are footer/header artifacts, not chart rows).
6. Return a `Grid` value object anchored at the top-left cell
with uniform column/row pitches derived from the median
spacing between adjacent clusters.
This is pure (no I/O). The detector takes a parsed Nokogiri document and returns a ‘Grid`.
Defined Under Namespace
Classes: UsePosition
Class Method Summary collapse
-
.detect(doc, block_first_cp:) ⇒ Ucode::Glyphs::Grid?
Nil if no character grid could be detected.
Class Method Details
.detect(doc, block_first_cp:) ⇒ Ucode::Glyphs::Grid?
Returns nil if no character grid could be detected.
53 54 55 56 57 58 59 60 61 62 63 64 |
# File 'lib/ucode/glyphs/grid_detector.rb', line 53 def detect(doc, block_first_cp:) uses = collect_uses(doc) return nil if uses.empty? char_glyph_ids = char_sized_glyph_ids(doc) return nil if char_glyph_ids.empty? cell_uses = uses.select { |u| char_glyph_ids.include?(u.glyph_id) } return nil if cell_uses.empty? build_grid(cell_uses, block_first_cp) end |