Class: Ucode::Repo::BlockFeedEmitter

Inherits:
Object
  • Object
show all
Includes:
AtomicWrites
Defined in:
lib/ucode/repo/block_feed_emitter.rb

Overview

Emits a flat, per-block Unicode data feed from ucode's canonical output tree. The feed is a denormalized shape: each block file inlines all its codepoints (no joins needed at read time).

Three files are emitted under output_root:

unicode-blocks.json
[{ start, end, name, unicode_version }, ...]

unicode/blocks/<slug>.json
{ chars: [{ cp, n, c, s, cc?, bc?, mir? }, ...] }

unicode-version.json
{ version, blockCount, charCount, generatedAt }

This emitter reads ucode's canonical output (blocks/index.json, blocks//index.json, index/labels.json) and translates shapes. ucode stays canonical; the feed is one-way derived.

Block slug algorithm (matches common practice; no consumer assumptions baked in):

name.downcase.gsub(/[^a-z0-9]+/, "-").gsub(/^-|-$/, "")

Block display name uses Unicode's verbatim spacing (e.g. "Basic Latin", "Greek and Coptic") from ucode's canonical name.

The shape of this feed is documented in schema/block-feed.output.schema.yml — that YAML is the canonical contract for any consumer of the feed.

Instance Method Summary collapse

Methods included from AtomicWrites

#same_content?, #to_pretty_json, #write_atomic

Constructor Details

#initialize(ucode_output_root, output_root) ⇒ BlockFeedEmitter

Returns a new instance of BlockFeedEmitter.

Parameters:

  • ucode_output_root (String, Pathname)

    ucode's output/

  • output_root (String, Pathname)

    target directory; unicode-blocks.json, unicode-version.json, and unicode/ are written here.



48
49
50
51
# File 'lib/ucode/repo/block_feed_emitter.rb', line 48

def initialize(ucode_output_root, output_root)
  @ucode_root = Pathname.new(ucode_output_root)
  @output_root = Pathname.new(output_root)
end

Instance Method Details

#emit(ucd_version:) ⇒ Hash

Returns { blocks_written:, codepoints_written:, unicode_blocks_path:, unicode_version_path: }.

Parameters:

  • ucd_version (String)

    e.g. "17.0.0"

Returns:

  • (Hash)

    { blocks_written:, codepoints_written:, unicode_blocks_path:, unicode_version_path: }



56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/ucode/repo/block_feed_emitter.rb', line 56

def emit(ucd_version:)
  labels = load_json(ucode_path("index", "labels.json"))
  blocks_index = load_json(ucode_path("blocks", "index.json"))

  per_block = blocks_index.map do |entry|
    emit_block(entry, labels)
  end

  write_unicode_blocks(per_block)
  version_payload = write_unicode_version(ucd_version, per_block)

  {
    blocks_written: per_block.length,
    codepoints_written: per_block.sum { |b| b[:char_count] },
    unicode_blocks_path: @output_root.join("unicode-blocks.json"),
    unicode_version_path: @output_root.join("unicode-version.json"),
    version: version_payload,
  }
end