Class: Mailmate::IndexReader
- Inherits:
-
Object
- Object
- Mailmate::IndexReader
- Defined in:
- lib/mailmate/index_reader.rb
Constant Summary collapse
- RECORD_SIZE =
12- FRESHNESS_INTERVAL =
Re-stat the underlying files at most this often per reader (seconds). Short-lived CLI processes never hit the recheck; the persistent MCP server picks up MailMate’s continuous index rewrites within this window instead of serving a snapshot from its first request forever.
1.0
Instance Attribute Summary collapse
- #name ⇒ Object readonly
Class Method Summary collapse
-
.for(name) ⇒ Object
Per-process cache of readers keyed by [name, db_headers].
-
.reset!(name = nil) ⇒ Object
Invalidate cached readers.
Instance Method Summary collapse
-
#each_eml_id(&block) ⇒ Object
Iterate every recorded eml-id.
-
#each_record ⇒ Object
Iterate every (eml_id, raw_value) pair, once per on-disk record.
-
#flags_for(eml_id) ⇒ Object
‘#flags.flag` semantics: the cache stores a space-separated list of IMAP keywords.
-
#ids_matching(needle) ⇒ Object
Inverted substring search: returns a Hash whose keys are every id with at least one record containing ‘needle` (byte-wise; pass pre-downcased bytes when querying an #lc index).
-
#initialize(name) ⇒ IndexReader
constructor
A new instance of IndexReader.
-
#key?(eml_id) ⇒ Boolean
True when the index has at least one record for this id.
-
#record_count ⇒ Object
Total number of on-disk records (sum across all ids).
-
#size ⇒ Object
Number of distinct ids in the index.
-
#stale? ⇒ Boolean
True when the on-disk files no longer match what this reader was built from.
-
#value_for(eml_id) ⇒ Object
Returns the raw cached value for a given .eml body-part ID, or nil if the id isn’t in this index.
-
#values_for(eml_id) ⇒ Object
Returns every recorded value for an id, in offsets-file order.
Constructor Details
#initialize(name) ⇒ IndexReader
Returns a new instance of IndexReader.
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 |
# File 'lib/mailmate/index_reader.rb', line 90 def initialize(name) @name = name @base = "#{Mailmate.config.db_headers}/#{name}" raise ArgumentError, "Index not found: #{name} (looked at #{@base}.{cache,offsets})" \ unless File.exist?("#{@base}.cache") && File.exist?("#{@base}.offsets") @cache_bytes = File.binread("#{@base}.cache") @offsets_bytes = File.binread("#{@base}.offsets") @cache_sig = file_sig("#{@base}.cache") @offsets_sig = file_sig("#{@base}.offsets") @checked_at = Process.clock_gettime(Process::CLOCK_MONOTONIC) # The id→ranges hash builds lazily (see index): ids_matching-only # consumers (the inverted body search) never need it, and skipping it # saves ~250 ms of construction on the big body indexes. @index = nil end |
Instance Attribute Details
#name ⇒ Object (readonly)
88 89 90 |
# File 'lib/mailmate/index_reader.rb', line 88 def name @name end |
Class Method Details
.for(name) ⇒ Object
Per-process cache of readers keyed by [name, db_headers]. Including db_headers means a Mailmate.config swap (e.g. a test pointing at a different tmpdir) doesn’t return stale readers built from the old path. Cached readers are re-validated against the on-disk files’ mtime+size (throttled; see FRESHNESS_INTERVAL) so long-lived processes don’t serve stale data after MailMate rewrites an index.
61 62 63 64 65 66 |
# File 'lib/mailmate/index_reader.rb', line 61 def for(name) @cache ||= {} key = cache_key(name) @cache.delete(key) if @cache[key]&.stale? @cache[key] ||= new(name) end |
.reset!(name = nil) ⇒ Object
Invalidate cached readers. With no argument, drops the entire cache (useful for tests or when MailMate’s database swaps out). With a name, invalidates only entries for that name across all db_headers — the common case (cache-bust after a write) doesn’t need to thread config through.
73 74 75 76 77 78 79 |
# File 'lib/mailmate/index_reader.rb', line 73 def reset!(name = nil) if name.nil? @cache = nil elsif @cache @cache.delete_if { |(n, _dir), _reader| n == name } end end |
Instance Method Details
#each_eml_id(&block) ⇒ Object
Iterate every recorded eml-id. Yields just the id; callers that also want the value should pair this with ‘value_for`. Exists so other gem modules don’t have to reach into ‘@index` directly.
195 196 197 198 |
# File 'lib/mailmate/index_reader.rb', line 195 def each_eml_id(&block) return enum_for(:each_eml_id) unless block index.each_key(&block) end |
#each_record ⇒ Object
Iterate every (eml_id, raw_value) pair, once per on-disk record. Multi-record ids yield multiple times. The value comes back as the bare cache substring; callers that need parsed form (e.g. flag tokens) should massage it themselves.
204 205 206 207 208 209 |
# File 'lib/mailmate/index_reader.rb', line 204 def each_record return enum_for(:each_record) unless block_given? index.each do |eml_id, packs| packs.each { |v| yield eml_id, @cache_bytes[(v >> 32)...(v & 0xFFFFFFFF)] } end end |
#flags_for(eml_id) ⇒ Object
‘#flags.flag` semantics: the cache stores a space-separated list of IMAP keywords. Split into individual flag tokens.
145 146 147 148 149 |
# File 'lib/mailmate/index_reader.rb', line 145 def flags_for(eml_id) v = value_for(eml_id) return [] if v.nil? || v.empty? v.split(/\s+/).reject(&:empty?) end |
#ids_matching(needle) ⇒ Object
Inverted substring search: returns a Hash whose keys are every id with at least one record containing ‘needle` (byte-wise; pass pre-downcased bytes when querying an #lc index). One memchr-fast String#index scan of the whole cache instead of one substring test per record — for a 77 MB body cache that’s ~75 ms versus seconds of per-message lookups.
A raw cache hit can span two adjacent records’ ranges; interval stabbing keeps only hits that fall entirely inside a single record (per-segment semantics, matching MailMate’s own body search). Records sharing a byte range (deduped values) all report their ids.
167 168 169 170 171 172 173 174 175 176 177 178 179 |
# File 'lib/mailmate/index_reader.rb', line 167 def ids_matching(needle) needle = needle.b found = {} return found if needle.empty? || @cache_bytes.empty? ensure_stab_table! nlen = needle.bytesize pos = 0 while (pos = @cache_bytes.index(needle, pos)) stab(pos, pos + nlen) { |id| found[id] = true } pos += 1 end found end |
#key?(eml_id) ⇒ Boolean
True when the index has at least one record for this id. Cheaper than values_for(id).empty? — no substring slicing.
153 154 155 |
# File 'lib/mailmate/index_reader.rb', line 153 def key?(eml_id) index.key?(eml_id.to_i) end |
#record_count ⇒ Object
Total number of on-disk records (sum across all ids). Diagnostics.
188 189 190 |
# File 'lib/mailmate/index_reader.rb', line 188 def record_count index.values.sum(&:size) end |
#size ⇒ Object
Number of distinct ids in the index. For multi-record indexes this is smaller than the on-disk record count (use record_count for that).
183 184 185 |
# File 'lib/mailmate/index_reader.rb', line 183 def size index.size end |
#stale? ⇒ Boolean
True when the on-disk files no longer match what this reader was built from. Throttled to one stat-pair per FRESHNESS_INTERVAL; a vanished file (mid-swap while MailMate rewrites) counts as not-stale so we keep serving the last good snapshot rather than racing the writer.
111 112 113 114 115 116 117 118 119 |
# File 'lib/mailmate/index_reader.rb', line 111 def stale? now = Process.clock_gettime(Process::CLOCK_MONOTONIC) return false if now - @checked_at < FRESHNESS_INTERVAL @checked_at = now cache_sig = file_sig("#{@base}.cache") offsets_sig = file_sig("#{@base}.offsets") return false if cache_sig.nil? || offsets_sig.nil? cache_sig != @cache_sig || offsets_sig != @offsets_sig end |
#value_for(eml_id) ⇒ Object
Returns the raw cached value for a given .eml body-part ID, or nil if the id isn’t in this index. Returns the LAST record for the id — for accumulator-style header indexes (‘#flags`, `#source`, `subject`, etc.) that’s the latest state; the older records are stale versions. For body indexes (‘#unquoted#lc`, `#quoted#lc`) last-alone is meaningless — use values_for to read every segment.
127 128 129 130 131 132 |
# File 'lib/mailmate/index_reader.rb', line 127 def value_for(eml_id) packs = index[eml_id.to_i] return nil if packs.nil? || packs.empty? v = packs[-1] @cache_bytes[(v >> 32)...(v & 0xFFFFFFFF)] end |
#values_for(eml_id) ⇒ Object
Returns every recorded value for an id, in offsets-file order. Returns
-
if the id isn’t in the index. Use this for body indexes
(#unquoted#lc, #quoted#lc), which store one record per text segment.
137 138 139 140 141 |
# File 'lib/mailmate/index_reader.rb', line 137 def values_for(eml_id) packs = index[eml_id.to_i] return [] if packs.nil? packs.map { |v| @cache_bytes[(v >> 32)...(v & 0xFFFFFFFF)] } end |