Module: Documentrix::Documents::Cache::Common

Includes:
Utils::Digests, Utils::Math, Enumerable
Included in:
SQLiteCache, MemoryCache, RedisCache
Defined in:
lib/documentrix/documents/cache/common.rb

Overview

Common interface for document caches

This module defines the standard interface that all document cache implementations must adhere to. It provides shared functionality for managing cached document embeddings, including methods for setting, retrieving, and deleting cache entries, as well as querying and filtering cached data based on tags and similarity searches.

The module includes methods for prefix management, collection enumeration, tag extraction, and cache clearing operations, ensuring consistent behavior across different cache backends such as memory, Redis, and SQLite.

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from Utils::Math

#convert_to_vector, #cosine_similarity, #norm

Instance Attribute Details

#prefixObject

current prefix defined for the cache



26
27
28
# File 'lib/documentrix/documents/cache/common.rb', line 26

def prefix
  @prefix
end

Instance Method Details

#clear(tags: nil) ⇒ self

The clear method removes cached records based on the provided tags or clears all records with the current prefix.

When tags are provided, it removes only the records that have matching tags. If no tags are provided, it removes all records that have keys starting with the current prefix.

Parameters:

  • tags (NilClass, Array<String>) (defaults to: nil)

    an array of tag names to filter records by, or nil to clear all records

Returns:

  • (self)

    returns the cache instance for method chaining



195
196
197
198
199
200
201
202
203
# File 'lib/documentrix/documents/cache/common.rb', line 195

def clear(tags: nil)
  tags = Documentrix::Utils::Tags.new(tags).to_a
  if tags.present?
    clear_for_tags(tags)
  else
    clear_all_with_prefix
  end
  self
end

#clear_by_source(source, digest: nil, operator: ?=) ⇒ self

The clear_by_source method removes all records from the cache that have a source matching the given source.

Parameters:

  • source (String)

    the source to filter records by

  • digest (String, nil) (defaults to: nil)

    the SHA256 hexadecimal digest of the source.

  • operator (Symbol, String) (defaults to: ?=)

    the operator to compare the digest with ('=' or '!=')

Returns:

  • (self)

    self



146
147
148
149
150
151
152
153
154
155
156
157
158
159
# File 'lib/documentrix/documents/cache/common.rb', line 146

def clear_by_source(source, digest: nil, operator: ?=)
  operator = operator == '=' ? '==' : '!='

  each do |key, record|
    next unless record.source == source
    if digest
      should_delete = record.digest.send(operator, digest)
      delete(unpre(key)) if should_delete
    else
      delete(unpre(key))
    end
  end
  self
end

#clear_for_tags(tags) ⇒ self

The clear_for_tags method removes all records from the cache that have tags matching any of the provided tags.

Parameters:

  • tags (Array<String>)

    an array of tag names to filter records by

Returns:

  • (self)

    self



113
114
115
116
117
118
119
120
# File 'lib/documentrix/documents/cache/common.rb', line 113

def clear_for_tags(tags)
  each do |key, record|
    if (tags & record.tags.to_a).size >= 1
      delete(unpre(key))
    end
  end
  self
end

#collections(prefix) ⇒ Array<Symbol>

Returns an array of collection names that match the given prefix.

Parameters:

  • prefix (String)

    a string to search for in collection names

Returns:

  • (Array<Symbol>)

    an array of matching collection names



32
33
34
35
36
37
38
39
# File 'lib/documentrix/documents/cache/common.rb', line 32

def collections(prefix)
  unique = Set.new
  full_each do |key, _|
    key =~ /\A#{prefix}(.+)-/ or next
    unique << $1
  end
  unique.map(&:to_sym)
end

#each_source {|source| ... } ⇒ Enumerator

Yields each unique, full source present in the cache records.

Yields:

  • (source)

    the full source string

Returns:

  • (Enumerator)

    an enumerator if no block is given, nil otherwise.



126
127
128
129
130
131
132
133
134
135
136
# File 'lib/documentrix/documents/cache/common.rb', line 126

def each_source(&block)
  block or return enum_for(__method__)
  seen = {}
  each do |_key, record|
    source = record.source.full? or next
    seen.key?(source) and next
    seen[source] = true
    block.(source)
  end
  nil
end

#find_records(needle, tags: nil, max_records: nil, min_similarity: -1)) ⇒ Array<Documentrix::Documents::Record>

The find_records method finds records that match the given needle and tags.

Parameters:

  • needle (Array)

    an array containing the embedding vector

  • tags (String, Array) (defaults to: nil)

    a string or array of strings representing the tags to search for

  • max_records (Integer) (defaults to: nil)

    the maximum number of records to return

  • min_similarity (Float) (defaults to: -1))

    the minimum similarity score required for a record to be returned (defaults to -1)

Returns:



69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/documentrix/documents/cache/common.rb', line 69

def find_records(needle, tags: nil, max_records: nil, min_similarity: -1)
  tags    = Documentrix::Utils::Tags.new(Array(tags)).to_a
  records = self
  if tags.present?
    records = records.select { |_key, record| (tags & record.tags).size >= 1 }
  end

  needle_norm = norm(needle)
  records     = records.map do |key, record|
    record.key        = key
    record.similarity = cosine_similarity(
      a:      needle,
      b:      record.embedding,
      a_norm: needle_norm,
      b_norm: record.norm,
    )
    record
  end.sort_by(&:similarity).reverse.select { _1.similarity >= min_similarity }

  max_records ? records.take(max_records) : records
end

#initialize(prefix:) ⇒ Object

The initialize method sets up the Documentrix::Documents::Cache instance's by setting its prefix attribute to the given value.

Parameters:

  • prefix (String)

    the string to be used as the prefix for this cache



22
23
24
# File 'lib/documentrix/documents/cache/common.rb', line 22

def initialize(prefix:)
  self.prefix = prefix
end

#pre(key, prefix: @prefix) ⇒ String

Returns a string representing the given key prefixed with the defined prefix.

Parameters:

  • key (String)

    the key to prefix

  • prefix (String) (defaults to: @prefix)

    the prefix to use (defaults to the cache's prefix)

Returns:

  • (String)

    the prefixed key



47
48
49
# File 'lib/documentrix/documents/cache/common.rb', line 47

def pre(key, prefix: @prefix)
  [ prefix, key ].join
end

#source_exist?(source, digest: nil, operator: ?=) ⇒ Boolean

Checks if any records associated with the given source exist in the cache.

Parameters:

  • source (String)

    the source to check for existence

  • digest (String, nil) (defaults to: nil)

    the SHA256 hexadecimal digest to compare against

  • operator (Symbol, String) (defaults to: ?=)

    the operator to compare the digest with ('=' or '!=')

Returns:

  • (Boolean)

    true if a matching record is found, false otherwise.



168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# File 'lib/documentrix/documents/cache/common.rb', line 168

def source_exist?(source, digest: nil, operator: ?=)
  operator = operator == '=' ? '==' : '!='

  each do |_, record|
    next unless record.source == source
    if digest
      if record.digest.send(operator, digest)
        return true
      end
    else
      return true
    end
  end
  false
end

#tagsDocumentrix::Utils::Tags

Returns a set of unique tags found in the cache records.

This method iterates through all records in the cache and collects unique tags from each record's tags collection. It constructs a new Documentrix::Utils::Tags object containing all the unique tags encountered.

Returns:



99
100
101
102
103
104
105
# File 'lib/documentrix/documents/cache/common.rb', line 99

def tags
  each_with_object(Documentrix::Utils::Tags.new) do |(_, record), t|
    record.tags.each do |tag|
      t.add(tag, source: record.source)
    end
  end
end

#unpre(key, prefix: @prefix) ⇒ String

Returns a string with the prefix removed from the given key.

Parameters:

  • key (String)

    the input string containing the prefix.

  • prefix (String) (defaults to: @prefix)

    the prefix to use (defaults to the cache's prefix)

Returns:

  • (String)

    the input string without the prefix.



56
57
58
# File 'lib/documentrix/documents/cache/common.rb', line 56

def unpre(key, prefix: @prefix)
  key.sub(/\A#{prefix}/, '')
end