Module: Canon::Cache

Defined in:
lib/canon/cache.rb

Overview

Cache for expensive operations during document comparison

Provides thread-safe caching with size limits to prevent memory bloat. Uses LRU (Least Recently Used) eviction when cache is full.

Examples:

Cache a parsed document

key = Cache.key_for_document(xml_string, :xml, :none)
parsed = Cache.fetch(:document_parse, key) { parse_xml(xml_string) }

Clear all caches (e.g., between test cases)

Cache.clear_all

Constant Summary collapse

MAX_CACHE_SIZE =

Maximum number of entries per cache category

100

Class Method Summary collapse

Class Method Details

.clear_allObject

Clear all caches

Useful for tests or when memory needs to be freed



54
55
56
57
# File 'lib/canon/cache.rb', line 54

def clear_all
  @caches&.each_value(&:clear)
  @caches = nil
end

.clear_category(category) ⇒ Object

Clear a specific cache category

Parameters:

  • category (Symbol)

    Cache category to clear



62
63
64
65
66
# File 'lib/canon/cache.rb', line 62

def clear_category(category)
  return unless @caches&.key?(category)

  @caches[category]&.clear
end

.fetch(category, key) { ... } ⇒ Object

Fetch a value from cache, or compute and cache it

Parameters:

  • category (Symbol)

    Cache category (:document_parse, :format_detect, etc.)

  • key (String)

    Cache key

Yields:

  • Block to compute value if not cached

Returns:

  • (Object)

    Cached or computed value



28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# File 'lib/canon/cache.rb', line 28

def fetch(category, key)
  cache = cache_for(category)

  # Check if key exists
  if cache.key?(key)
    # Update access time for LRU
    cache[key][:accessed] = Time.now
    return cache[key][:value]
  end

  # Compute and cache the value
  value = yield

  # Evict oldest entry if cache is full
  if cache.size >= MAX_CACHE_SIZE
    oldest_key = cache.min_by { |_, v| v[:accessed] }&.first
    cache.delete(oldest_key) if oldest_key
  end

  cache[key] = { value: value, accessed: Time.now }
  value
end

.key_for_c14n(content, with_comments) ⇒ String

Generate cache key for XML canonicalization

Parameters:

  • content (String)

    XML content

  • with_comments (Boolean)

    Whether to include comments

Returns:

  • (String)

    Cache key



103
104
105
106
# File 'lib/canon/cache.rb', line 103

def key_for_c14n(content, with_comments)
  digest = Digest::SHA256.hexdigest(content)
  "c14n:#{with_comments}:#{digest[0..16]}"
end

.key_for_document(content, format, preprocessing) ⇒ String

Generate cache key for document parsing

Parameters:

  • content (String)

    Document content

  • format (Symbol)

    Document format

  • preprocessing (Symbol)

    Preprocessing option

Returns:

  • (String)

    Cache key



81
82
83
84
# File 'lib/canon/cache.rb', line 81

def key_for_document(content, format, preprocessing)
  digest = Digest::SHA256.hexdigest(content)
  "doc:#{format}:#{preprocessing}:#{digest[0..16]}"
end

.key_for_format_detection(content) ⇒ String

Generate cache key for format detection

Parameters:

  • content (String)

    Document content

Returns:

  • (String)

    Cache key



90
91
92
93
94
95
96
# File 'lib/canon/cache.rb', line 90

def key_for_format_detection(content)
  # Use first 100 chars for quick key, plus length
  # Force to binary to avoid encoding compatibility issues
  preview = content[0..100].b
  digest = Digest::SHA256.hexdigest(preview + content.length.to_s)
  "fmt:#{digest[0..16]}"
end

.key_for_preprocessing(content, preprocessing) ⇒ String

Generate cache key for preprocessing

Parameters:

  • content (String)

    Original content

  • preprocessing (Symbol)

    Preprocessing type

Returns:

  • (String)

    Cache key



113
114
115
116
# File 'lib/canon/cache.rb', line 113

def key_for_preprocessing(content, preprocessing)
  digest = Digest::SHA256.hexdigest(content)
  "pre:#{preprocessing}:#{digest[0..16]}"
end

.statsHash

Get cache statistics

Returns:

  • (Hash)

    Statistics about cache usage



71
72
73
# File 'lib/canon/cache.rb', line 71

def stats
  @caches&.transform_values(&:size) || {}
end