Class: Perron::Site::Resource::Related

Inherits:
Object
  • Object
show all
Defined in:
lib/perron/resource/related.rb,
lib/perron/resource/related/stop_words.rb

Overview

Finds related resources using TF-IDF cosine similarity.

Pre-normalizes vectors so cosine similarity reduces to a dot product, then builds a symmetric similarity matrix once per collection.

Results are cached at the class level so the O(n²) comparison is paid once, not once per resource.

Defined Under Namespace

Modules: StopWords Classes: Cache

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(resource) ⇒ Related

Returns a new instance of Related.



41
42
43
44
45
# File 'lib/perron/resource/related.rb', line 41

def initialize(resource)
  @resource = resource
  @collection = resource.collection
  @cache = self.class.cache_for(@collection.name)
end

Class Method Details

.cache_for(collection_name) ⇒ Object



20
21
22
23
24
# File 'lib/perron/resource/related.rb', line 20

def self.cache_for(collection_name)
  clear_cache!(collection_name) if stale?(collection_name)

  @collection_caches[collection_name] ||= Cache.new(nil, nil, fingerprinted(collection_name))
end

.clear_cache!(collection_name) ⇒ Object



26
27
28
# File 'lib/perron/resource/related.rb', line 26

def self.clear_cache!(collection_name)
  @collection_caches.delete(collection_name)
end

.fingerprinted(collection_name) ⇒ Object



34
35
36
37
38
39
# File 'lib/perron/resource/related.rb', line 34

def self.fingerprinted(collection_name)
  path = File.join(Perron.configuration.input, collection_name)
  files = Dir.glob(File.join(path, "**", "*.*"))

  [files.size, files.map { File.mtime(it) }.max]
end

.stale?(collection_name) ⇒ Boolean

Returns:

  • (Boolean)


30
31
32
# File 'lib/perron/resource/related.rb', line 30

def self.stale?(collection_name)
  @collection_caches[collection_name]&.fingerprint != fingerprinted(collection_name)
end

Instance Method Details

#find(limit: 5) ⇒ Object



47
48
49
50
51
52
53
54
# File 'lib/perron/resource/related.rb', line 47

def find(limit: 5)
  scores = similarity_matrix[@resource.slug] || {}

  resources
    .reject { it.slug == @resource.slug }
    .sort_by { -(scores[it.slug] || 0.0) }
    .first(limit)
end