Class: Ucode::Repo::CodepointWriter

Inherits:
Object
  • Object
show all
Includes:
AtomicWrites
Defined in:
lib/ucode/repo/codepoint_writer.rb

Overview

Writes one ‘index.json` per codepoint under `output/blocks/<id>/<cp>/`.

Streaming + threaded + idempotent:

- **Streaming**: callers pass an Enumerator; the writer pulls one
  codepoint at a time, never the full 160k set in memory.
- **Threaded**: a fixed-size worker pool drains a shared queue.
  Each codepoint maps to a unique path → no per-file contention.
- **Idempotent**: existing files are byte-compared to the new
  payload before writing; identical content is a no-op. Safe to
  re-run on the full dataset.
- **Atomic**: writes go to `<path>.tmp`, then rename. A crash
  mid-write leaves either the old file or no file, never a
  truncated one.

Instance Method Summary collapse

Methods included from AtomicWrites

#same_content?, #to_pretty_json, #write_atomic

Constructor Details

#initialize(output_root, parallel_workers: 8) ⇒ CodepointWriter

Returns a new instance of CodepointWriter.

Parameters:

  • output_root (String, Pathname)
  • parallel_workers (Integer) (defaults to: 8)

    size of the worker pool. Set to 1 (or less) to run synchronously — useful in tests.



31
32
33
34
# File 'lib/ucode/repo/codepoint_writer.rb', line 31

def initialize(output_root, parallel_workers: 8)
  @output_root = Pathname.new(output_root)
  @parallel_workers = parallel_workers
end

Instance Method Details

#write(codepoint) ⇒ Pathname?

Write one codepoint synchronously.

Parameters:

Returns:

  • (Pathname, nil)

    the path written, or nil if skipped (missing block_id or content-identical to existing file)



40
41
42
43
44
45
46
47
48
# File 'lib/ucode/repo/codepoint_writer.rb', line 40

def write(codepoint)
  return nil if codepoint.block_id.nil?

  path = Paths.codepoint_json_path(@output_root, codepoint.block_id, codepoint.id)
  payload = serialize(codepoint)
  return nil unless write_atomic(path, payload)

  path
end

#write_each(enum) ⇒ Integer

Drain an Enumerator through the worker pool. Returns the total count of codepoints seen (whether or not each one was written).

Parameters:

Returns:

  • (Integer)


54
55
56
57
58
# File 'lib/ucode/repo/codepoint_writer.rb', line 54

def write_each(enum)
  return drain_inline(enum) if @parallel_workers <= 1

  drain_threaded(enum)
end