Class: Ucode::Database

Inherits:
Object
  • Object
show all
Defined in:
lib/ucode/database.rb

Overview

SQLite-backed UCD lookup index for one Unicode version.

One Database instance = one ‘.sqlite3` file at `Cache.sqlite_path(version)`. The DB holds two range tables (`blocks` and `scripts`), each pre-coalesced during build.

Why SQLite (alongside the YAML Index):

  • Persistent across processes — build once, reuse across runs.

  • Btree-indexed queries load only the requested rows.

  • Small on disk (~hundreds of KB after coalescing).

Lifecycle:

Database.build(version)   # streams Coordinator output → SQLite
Database.open(version)    # opens existing SQLite (read-only)
Database.cached?(version) # checks for .sqlite3 file

Constant Summary collapse

SCHEMA_VERSION =
"1"

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(path) ⇒ Database

Returns a new instance of Database.

Parameters:

  • path (String)

    path to the .sqlite3 file



72
73
74
75
# File 'lib/ucode/database.rb', line 72

def initialize(path)
  @db = SQLite3::Database.new(path, readonly: true, results_as_hash: true)
  @db.busy_timeout = 5000
end

Class Method Details

.build(version) ⇒ Database

Stream the Coordinator output for ‘version` into a new SQLite cache, then open it. Replaces any existing file.

Parameters:

  • version (String)

Returns:



58
59
60
61
# File 'lib/ucode/database.rb', line 58

def build(version)
  DbBuilder.build(version)
  open(version)
end

.cached?(version) ⇒ Boolean

True if a built SQLite cache exists for this version.

Parameters:

  • version (String)

Returns:

  • (Boolean)


66
67
68
# File 'lib/ucode/database.rb', line 66

def cached?(version)
  Cache.sqlite_path(version).exist?
end

.open(version) ⇒ Database

Open an existing database. Raises DatabaseMissingError if the file is absent, DatabaseSchemaError if the on-disk schema version does not match ‘SCHEMA_VERSION`.

Parameters:

  • version (String)

Returns:



40
41
42
43
44
45
46
47
48
49
50
51
52
# File 'lib/ucode/database.rb', line 40

def open(version)
  path = Cache.sqlite_path(version)
  unless path.exist?
    raise DatabaseMissingError.new(
      "No UCD SQLite cache for version #{version.inspect} at #{path}",
      context: { version: version, path: path.to_s },
    )
  end

  db = new(path.to_s)
  db.verify_schema_version!
  db
end

Instance Method Details

#block_entriesArray<RangeEntry>

All block ranges, sorted by first_cp. Mostly useful in specs.

Returns:



124
125
126
# File 'lib/ucode/database.rb', line 124

def block_entries
  entries(BLOCKS_TABLE)
end

#block_ranges_by_name(name) ⇒ Array<RangeEntry>

Every block range that shares the given block name. Empty for an unknown name. Used by the audit BlockAggregator to derive a block’s assigned-codepoint set and span without a separate canonical-range lookup.

Parameters:

  • name (String)

    block name as stored (e.g. “Basic_Latin”)

Returns:



140
141
142
# File 'lib/ucode/database.rb', line 140

def block_ranges_by_name(name)
  ranges_by_name(BLOCKS_TABLE, name)
end

#closevoid

This method returns an undefined value.

Close the underlying SQLite connection. Idempotent.



154
155
156
# File 'lib/ucode/database.rb', line 154

def close
  @db.close
end

#each_block_overlapping(first, last, &block) ⇒ Enumerator<RangeEntry>

Enumerate every range in the blocks table that overlaps the inclusive query range, sorted by first_cp.

Parameters:

  • first (Integer)
  • last (Integer)

Returns:



109
110
111
# File 'lib/ucode/database.rb', line 109

def each_block_overlapping(first, last, &block)
  each_overlapping(BLOCKS_TABLE, first, last, &block)
end

#each_script_overlapping(first, last, &block) ⇒ Enumerator<RangeEntry>

Enumerate every range in the scripts table that overlaps the inclusive query range, sorted by first_cp.

Parameters:

  • first (Integer)
  • last (Integer)

Returns:



118
119
120
# File 'lib/ucode/database.rb', line 118

def each_script_overlapping(first, last, &block)
  each_overlapping(SCRIPTS_TABLE, first, last, &block)
end

#lookup_block(codepoint) ⇒ String?

Look up the block name covering ‘codepoint`. nil if not in any known block (typically: cp is unassigned or outside the source fixture).

Parameters:

  • codepoint (Integer)

Returns:

  • (String, nil)


92
93
94
# File 'lib/ucode/database.rb', line 92

def lookup_block(codepoint)
  lookup(BLOCKS_TABLE, codepoint)
end

#lookup_script(codepoint) ⇒ String?

Look up the script name covering ‘codepoint`. nil if not in any known script.

Parameters:

  • codepoint (Integer)

Returns:

  • (String, nil)


100
101
102
# File 'lib/ucode/database.rb', line 100

def lookup_script(codepoint)
  lookup(SCRIPTS_TABLE, codepoint)
end

#schema_versionString

Returns the schema version recorded at build time.

Returns:

  • (String)

    the schema version recorded at build time.



83
84
85
# File 'lib/ucode/database.rb', line 83

def schema_version
  @schema_version ||= meta("schema_version")
end

#script_entriesArray<RangeEntry>

All script ranges, sorted by first_cp. Mostly useful in specs.

Returns:



130
131
132
# File 'lib/ucode/database.rb', line 130

def script_entries
  entries(SCRIPTS_TABLE)
end

#script_ranges_by_name(name) ⇒ Array<RangeEntry>

Every script range that shares the given script code. Empty for an unknown name. Used by the audit ScriptAggregator.

Parameters:

  • name (String)

    ISO 15924 script code (e.g. “Latn”)

Returns:



148
149
150
# File 'lib/ucode/database.rb', line 148

def script_ranges_by_name(name)
  ranges_by_name(SCRIPTS_TABLE, name)
end

#ucd_versionString

Returns the UCD version this DB was built from.

Returns:

  • (String)

    the UCD version this DB was built from.



78
79
80
# File 'lib/ucode/database.rb', line 78

def ucd_version
  @ucd_version ||= meta("ucd_version")
end

#verify_schema_version!void

This method returns an undefined value.

Raises DatabaseSchemaError if the on-disk schema version does not match ‘SCHEMA_VERSION`. Called by `.open`; exposed for consumers that hold a long-lived connection.



162
163
164
165
166
167
168
169
170
# File 'lib/ucode/database.rb', line 162

def verify_schema_version!
  return if schema_version == SCHEMA_VERSION

  raise DatabaseSchemaError.new(
    "SQLite schema mismatch: on-disk #{schema_version.inspect}, " \
    "expected #{SCHEMA_VERSION.inspect}",
    context: { on_disk: schema_version, expected: SCHEMA_VERSION },
  )
end