Class: Ucode::Database
- Inherits:
-
Object
- Object
- Ucode::Database
- Defined in:
- lib/ucode/database.rb
Overview
SQLite-backed UCD lookup index for one Unicode version.
One Database instance = one ‘.sqlite3` file at `Cache.sqlite_path(version)`. The DB holds two range tables (`blocks` and `scripts`), each pre-coalesced during build.
Why SQLite (alongside the YAML Index):
-
Persistent across processes — build once, reuse across runs.
-
Btree-indexed queries load only the requested rows.
-
Small on disk (~hundreds of KB after coalescing).
Lifecycle:
Database.build(version) # streams Coordinator output → SQLite
Database.open(version) # opens existing SQLite (read-only)
Database.cached?(version) # checks for .sqlite3 file
Constant Summary collapse
- SCHEMA_VERSION =
"1"
Class Method Summary collapse
-
.build(version) ⇒ Database
Stream the Coordinator output for ‘version` into a new SQLite cache, then open it.
-
.cached?(version) ⇒ Boolean
True if a built SQLite cache exists for this version.
-
.open(version) ⇒ Database
Open an existing database.
Instance Method Summary collapse
-
#block_entries ⇒ Array<RangeEntry>
All block ranges, sorted by first_cp.
-
#block_ranges_by_name(name) ⇒ Array<RangeEntry>
Every block range that shares the given block name.
-
#close ⇒ void
Close the underlying SQLite connection.
-
#each_block_overlapping(first, last, &block) ⇒ Enumerator<RangeEntry>
Enumerate every range in the blocks table that overlaps the inclusive query range, sorted by first_cp.
-
#each_script_overlapping(first, last, &block) ⇒ Enumerator<RangeEntry>
Enumerate every range in the scripts table that overlaps the inclusive query range, sorted by first_cp.
-
#initialize(path) ⇒ Database
constructor
A new instance of Database.
-
#lookup_block(codepoint) ⇒ String?
Look up the block name covering ‘codepoint`.
-
#lookup_script(codepoint) ⇒ String?
Look up the script name covering ‘codepoint`.
-
#schema_version ⇒ String
The schema version recorded at build time.
-
#script_entries ⇒ Array<RangeEntry>
All script ranges, sorted by first_cp.
-
#script_ranges_by_name(name) ⇒ Array<RangeEntry>
Every script range that shares the given script code.
-
#ucd_version ⇒ String
The UCD version this DB was built from.
-
#verify_schema_version! ⇒ void
Raises DatabaseSchemaError if the on-disk schema version does not match ‘SCHEMA_VERSION`.
Constructor Details
#initialize(path) ⇒ Database
Returns a new instance of Database.
72 73 74 75 |
# File 'lib/ucode/database.rb', line 72 def initialize(path) @db = SQLite3::Database.new(path, readonly: true, results_as_hash: true) @db.busy_timeout = 5000 end |
Class Method Details
.build(version) ⇒ Database
Stream the Coordinator output for ‘version` into a new SQLite cache, then open it. Replaces any existing file.
58 59 60 61 |
# File 'lib/ucode/database.rb', line 58 def build(version) DbBuilder.build(version) open(version) end |
.cached?(version) ⇒ Boolean
True if a built SQLite cache exists for this version.
66 67 68 |
# File 'lib/ucode/database.rb', line 66 def cached?(version) Cache.sqlite_path(version).exist? end |
.open(version) ⇒ Database
Open an existing database. Raises DatabaseMissingError if the file is absent, DatabaseSchemaError if the on-disk schema version does not match ‘SCHEMA_VERSION`.
40 41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/ucode/database.rb', line 40 def open(version) path = Cache.sqlite_path(version) unless path.exist? raise DatabaseMissingError.new( "No UCD SQLite cache for version #{version.inspect} at #{path}", context: { version: version, path: path.to_s }, ) end db = new(path.to_s) db.verify_schema_version! db end |
Instance Method Details
#block_entries ⇒ Array<RangeEntry>
All block ranges, sorted by first_cp. Mostly useful in specs.
124 125 126 |
# File 'lib/ucode/database.rb', line 124 def block_entries entries(BLOCKS_TABLE) end |
#block_ranges_by_name(name) ⇒ Array<RangeEntry>
Every block range that shares the given block name. Empty for an unknown name. Used by the audit BlockAggregator to derive a block’s assigned-codepoint set and span without a separate canonical-range lookup.
140 141 142 |
# File 'lib/ucode/database.rb', line 140 def block_ranges_by_name(name) ranges_by_name(BLOCKS_TABLE, name) end |
#close ⇒ void
This method returns an undefined value.
Close the underlying SQLite connection. Idempotent.
154 155 156 |
# File 'lib/ucode/database.rb', line 154 def close @db.close end |
#each_block_overlapping(first, last, &block) ⇒ Enumerator<RangeEntry>
Enumerate every range in the blocks table that overlaps the inclusive query range, sorted by first_cp.
109 110 111 |
# File 'lib/ucode/database.rb', line 109 def each_block_overlapping(first, last, &block) each_overlapping(BLOCKS_TABLE, first, last, &block) end |
#each_script_overlapping(first, last, &block) ⇒ Enumerator<RangeEntry>
Enumerate every range in the scripts table that overlaps the inclusive query range, sorted by first_cp.
118 119 120 |
# File 'lib/ucode/database.rb', line 118 def each_script_overlapping(first, last, &block) each_overlapping(SCRIPTS_TABLE, first, last, &block) end |
#lookup_block(codepoint) ⇒ String?
Look up the block name covering ‘codepoint`. nil if not in any known block (typically: cp is unassigned or outside the source fixture).
92 93 94 |
# File 'lib/ucode/database.rb', line 92 def lookup_block(codepoint) lookup(BLOCKS_TABLE, codepoint) end |
#lookup_script(codepoint) ⇒ String?
Look up the script name covering ‘codepoint`. nil if not in any known script.
100 101 102 |
# File 'lib/ucode/database.rb', line 100 def lookup_script(codepoint) lookup(SCRIPTS_TABLE, codepoint) end |
#schema_version ⇒ String
Returns the schema version recorded at build time.
83 84 85 |
# File 'lib/ucode/database.rb', line 83 def schema_version @schema_version ||= ("schema_version") end |
#script_entries ⇒ Array<RangeEntry>
All script ranges, sorted by first_cp. Mostly useful in specs.
130 131 132 |
# File 'lib/ucode/database.rb', line 130 def script_entries entries(SCRIPTS_TABLE) end |
#script_ranges_by_name(name) ⇒ Array<RangeEntry>
Every script range that shares the given script code. Empty for an unknown name. Used by the audit ScriptAggregator.
148 149 150 |
# File 'lib/ucode/database.rb', line 148 def script_ranges_by_name(name) ranges_by_name(SCRIPTS_TABLE, name) end |
#ucd_version ⇒ String
Returns the UCD version this DB was built from.
78 79 80 |
# File 'lib/ucode/database.rb', line 78 def ucd_version @ucd_version ||= ("ucd_version") end |
#verify_schema_version! ⇒ void
This method returns an undefined value.
Raises DatabaseSchemaError if the on-disk schema version does not match ‘SCHEMA_VERSION`. Called by `.open`; exposed for consumers that hold a long-lived connection.
162 163 164 165 166 167 168 169 170 |
# File 'lib/ucode/database.rb', line 162 def verify_schema_version! return if schema_version == SCHEMA_VERSION raise DatabaseSchemaError.new( "SQLite schema mismatch: on-disk #{schema_version.inspect}, " \ "expected #{SCHEMA_VERSION.inspect}", context: { on_disk: schema_version, expected: SCHEMA_VERSION }, ) end |