Class: Ucode::Audit::UniversalSetReference

Inherits:
CoverageReference show all
Defined in:
lib/ucode/audit/universal_set_reference.rb

Overview

CoverageReference backed by a universal-set manifest (TODO 24). Every codepoint in the set carries tier + source provenance, so a missing-codepoint report can answer "what does the missing glyph look like, and where did the universal set source it from?".

The manifest itself records codepoints but not block membership, so a Database is still required to map block name -> assigned codepoints. The reference answers per codepoint "is this in the universal set, and what tier/source did it come from?".

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(manifest:, database:) ⇒ UniversalSetReference

Returns a new instance of UniversalSetReference.

Parameters:



24
25
26
27
28
# File 'lib/ucode/audit/universal_set_reference.rb', line 24

def initialize(manifest:, database:)
  super()
  @manifest_source = manifest
  @database = database
end

Instance Attribute Details

#databaseUcode::Database? (readonly)

The UCD database used for block lookups. Exposed so the BlockAggregator can map codepoints -> block names through the same Database instance the reference was built against.

Returns:



92
93
94
# File 'lib/ucode/audit/universal_set_reference.rb', line 92

def database
  @database
end

Instance Method Details

#baseline_metadataHash{String=>Object}

Returns provenance metadata for the audit report's baseline field.

Returns:

  • (Hash{String=>Object})

    provenance metadata for the audit report's baseline field



66
67
68
69
70
71
72
73
# File 'lib/ucode/audit/universal_set_reference.rb', line 66

def 
  {
    "unicode_version" => manifest.unicode_version,
    "ucode_version" => manifest.ucode_version,
    "source_config_sha256" => manifest.source_config_sha256,
    "reference_kind" => "universal-set",
  }
end

#block_name_for(codepoint) ⇒ String?

Block name (verbatim Unicode identifier, e.g. "Basic_Latin") the codepoint falls under, or nil if it isn't in any known block. Used by BlockAggregator to group a font's cmap by block without needing direct access to the underlying Database.

Parameters:

  • codepoint (Integer)

Returns:

  • (String, nil)


41
42
43
44
45
# File 'lib/ucode/audit/universal_set_reference.rb', line 41

def block_name_for(codepoint)
  return nil if @database.nil?

  @database.lookup_block(codepoint)
end

#entries_for_block(block_id) ⇒ Array<Entry>

Every assigned codepoint in the block, with tier + source attached when the reference carries provenance.

Parameters:

  • block_id (String)

    verbatim Unicode block name (e.g. "Basic_Latin", "Greek_and_Coptic")

Returns:

  • (Array<Entry>)

    sorted by codepoint; empty for unknown block names or blocks with no assigned codepoints



48
49
50
51
52
53
54
55
# File 'lib/ucode/audit/universal_set_reference.rb', line 48

def entries_for_block(block_id)
  return [] if @database.nil?

  ranges = @database.block_ranges_by_name(block_id)
  return [] if ranges.nil? || ranges.empty?

  ranges.flat_map { |r| expand_range(r) }.compact
end

#include?(codepoint) ⇒ Boolean

Returns true if the codepoint is in the reference set.

Parameters:

  • codepoint (Integer)

Returns:

  • (Boolean)

    true if the codepoint is in the reference set



36
37
38
# File 'lib/ucode/audit/universal_set_reference.rb', line 36

def include?(codepoint)
  entries_by_cp.key?(codepoint)
end

#kindSymbol

Returns :universal_set.

Returns:

  • (Symbol)

    :universal_set



31
32
33
# File 'lib/ucode/audit/universal_set_reference.rb', line 31

def kind
  :universal_set
end

#manifestUcode::Models::UniversalSetManifest

The underlying manifest model, loaded lazily from disk.



84
85
86
# File 'lib/ucode/audit/universal_set_reference.rb', line 84

def manifest
  @manifest ||= load_manifest
end

#provenance_for(codepoints) ⇒ Array<Hash{Symbol=>Object}>?

Provenance rows for a list of codepoints, or nil when the reference carries no provenance (UCD-only). Returning nil (rather than an empty array) is the signal that the audit report should omit the missing_codepoint_provenance field entirely — preserving the legacy wire shape for UCD-only audits.

Parameters:

  • codepoints (Enumerable<Integer>)

Returns:

  • (Array<Hash{Symbol=>Object}>, nil)

    one hash per codepoint with :codepoint, :tier, :source keys; or nil

  • (Array<Hash{Symbol=>Object}>)

    one hash per codepoint, in input order



78
79
80
# File 'lib/ucode/audit/universal_set_reference.rb', line 78

def provenance_for(codepoints)
  codepoints.map { |cp| row_for(cp) }
end

#reference_idString

Stable identifier for the reference, embedded in audit reports so consumers can detect drift. Examples:

"ucd:17.0.0"
"universal-set:17.0.0:abc12345"

Returns:

  • (String)


58
59
60
61
62
# File 'lib/ucode/audit/universal_set_reference.rb', line 58

def reference_id
  sha = manifest.source_config_sha256
  short_sha = sha ? sha.to_s[0, 12] : "no-sha"
  "universal-set:#{manifest.unicode_version}:#{short_sha}"
end