Class: Ucode::Glyphs::MonolithPageMap

Inherits:
Object
  • Object
show all
Defined in:
lib/ucode/glyphs/monolith_page_map.rb

Overview

Maps a Unicode block’s first codepoint to its page range inside the monolith ‘CodeCharts.pdf` by parsing the PDF’s bookmark outline and matching each bookmark title to a Block.name from ‘Blocks.txt`.

Each chart cluster printed by the Unicode Consortium is a single bookmark entry:

BookmarkTitle: Greek and Coptic
BookmarkLevel: 1
BookmarkPageNumber: 415

The cluster title usually equals a Block.name verbatim, but a few clusters carry a heading that prepends “C0 Controls and ” / “C1 Controls and ” to the block name. We resolve both forms.

End-page of a cluster is one page before the next cluster’s start page (last cluster’s end-page is the PDF’s last page).

The map is cached as JSON at ‘data/codecharts_page_map.json` so we don’t re-scan the 3,156-page monolith on every run.

Defined Under Namespace

Classes: MapEntry

Class Method Summary collapse

Class Method Details

.attach_end_pages(entries, total_pages = nil) ⇒ Array<MapEntry>

Pure: attach end_pages by sorting entries and assigning each entry’s end to one page before the next entry’s start.

Parameters:

  • entries (Array<MapEntry>)
  • total_pages (Integer, nil) (defaults to: nil)

    page count of the source PDF; the last entry’s end_page falls back to this when present.

Returns:

  • (Array<MapEntry>)

    the same entries, mutated with end_pages.



96
97
98
99
100
101
102
103
# File 'lib/ucode/glyphs/monolith_page_map.rb', line 96

def attach_end_pages(entries, total_pages = nil)
  sorted = entries.sort_by(&:start_page)
  sorted.each_with_index do |entry, i|
    next_entry = sorted[i + 1]
    entry.end_page = next_entry ? next_entry.start_page - 1 : total_pages
  end
  sorted
end

.build(monolith_path:, blocks:) ⇒ Hash{Integer => MapEntry}

Build the map by parsing the monolith’s outline and matching each bookmark title to a Block.

Parameters:

Returns:

  • (Hash{Integer => MapEntry})

    keyed by block.range_first



53
54
55
56
57
58
59
60
61
62
63
# File 'lib/ucode/glyphs/monolith_page_map.rb', line 53

def build(monolith_path:, blocks:)
  name_to_first_cp = blocks.each_with_object({}) do |b, h|
    h[b.name] = b.range_first
  end
  total_pages = page_count(monolith_path)
  entries = parse_bookmarks(dump_bookmarks(monolith_path), name_to_first_cp)
  attach_end_pages(entries, total_pages)
  entries.each_with_object({}) do |e, h|
    h[e.first_cp] = e
  end
end

.dump_bookmarks(monolith_path) ⇒ Object

—- I/O helpers (impure) ————————————–



131
132
133
134
135
136
# File 'lib/ucode/glyphs/monolith_page_map.rb', line 131

def dump_bookmarks(monolith_path)
  out, status = Open3.capture2e("pdftk", monolith_path.to_s, "dump_data")
  return "" unless status.success?

  out
end

.load(monolith_path:, blocks:, cache_path: nil) ⇒ Hash{Integer => MapEntry}

Load from cache, or build and cache.

Parameters:

  • monolith_path (String, Pathname)
  • blocks (Array<Ucode::Models::Block>)
  • cache_path (String, Pathname, nil) (defaults to: nil)

Returns:



110
111
112
113
114
115
116
117
118
119
# File 'lib/ucode/glyphs/monolith_page_map.rb', line 110

def load(monolith_path:, blocks:, cache_path: nil)
  cache = cache_path && Pathname.new(cache_path)
  if cache&.exist?
    return load_from_json(cache.read)
  end

  map = build(monolith_path: monolith_path, blocks: blocks)
  write_cache(map, cache) if cache
  map
end

.page_count(monolith_path) ⇒ Object



138
139
140
141
142
143
144
# File 'lib/ucode/glyphs/monolith_page_map.rb', line 138

def page_count(monolith_path)
  out, status = Open3.capture2e("pdfinfo", monolith_path.to_s)
  return nil unless status.success?

  match = out.match(/^Pages:\s+(\d+)/)
  match ? match[1].to_i : nil
end

.parse_bookmarks(dump, name_to_first_cp) ⇒ Array<MapEntry>

Pure: parse a ‘pdftk dump_data` string into a list of MapEntry rows (without end_pages). Exposed for unit tests and any caller that already has the dump cached.

Parameters:

  • dump (String)

    the raw ‘pdftk dump_data` output

  • name_to_first_cp (Hash{String => Integer})

Returns:



72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# File 'lib/ucode/glyphs/monolith_page_map.rb', line 72

def parse_bookmarks(dump, name_to_first_cp)
  entries = []
  current_title = nil
  dump.each_line do |line|
    case line
    when BookmarkTitleRegex
      current_title = Regexp.last_match(1).strip
    when BookmarkPageRegex
      page = Regexp.last_match(1).to_i
      cp = resolve_first_cp(current_title, name_to_first_cp)
      entries << MapEntry.new(first_cp: cp, start_page: page) if cp
      current_title = nil
    end
  end
  entries.sort_by(&:start_page)
end

.range_for(map, block_first_cp) ⇒ MapEntry?

Look up a block’s page range by its first cp.

Parameters:

  • map (Hash{Integer => MapEntry})
  • block_first_cp (Integer)

Returns:



125
126
127
# File 'lib/ucode/glyphs/monolith_page_map.rb', line 125

def range_for(map, block_first_cp)
  map[block_first_cp]
end