Class: Ragnar::UmapTransformService
- Inherits:
-
Object
- Object
- Ragnar::UmapTransformService
- Defined in:
- lib/ragnar/umap_transform_service.rb
Overview
Service for applying UMAP transformations to embeddings Separates transformation logic from training (UmapProcessor)
Instance Attribute Summary collapse
-
#database ⇒ Object
readonly
Returns the value of attribute database.
-
#model_path ⇒ Object
readonly
Returns the value of attribute model_path.
Class Method Summary collapse
Instance Method Summary collapse
-
#check_model_staleness ⇒ Hash
Check if model needs retraining based on staleness.
-
#initialize(model_path: "umap_model.bin", database:) ⇒ UmapTransformService
constructor
A new instance of UmapTransformService.
-
#model_exists? ⇒ Boolean
Check if a UMAP model exists.
-
#model_metadata ⇒ Hash?
Get metadata about the trained model.
-
#model_version ⇒ Integer
Get the version of the current model.
-
#transform_documents(document_ids) ⇒ Hash
Transform embeddings for specific documents.
-
#transform_query(embedding) ⇒ Array<Float>?
Transform a single query embedding.
Constructor Details
#initialize(model_path: "umap_model.bin", database:) ⇒ UmapTransformService
Returns a new instance of UmapTransformService.
10 11 12 13 14 15 |
# File 'lib/ragnar/umap_transform_service.rb', line 10 def initialize(model_path: "umap_model.bin", database:) @model_path = model_path @database = database @umap_model = nil @model_metadata = nil end |
Instance Attribute Details
#database ⇒ Object (readonly)
Returns the value of attribute database.
8 9 10 |
# File 'lib/ragnar/umap_transform_service.rb', line 8 def database @database end |
#model_path ⇒ Object (readonly)
Returns the value of attribute model_path.
8 9 10 |
# File 'lib/ragnar/umap_transform_service.rb', line 8 def model_path @model_path end |
Class Method Details
.instance ⇒ Object
201 202 203 |
# File 'lib/ragnar/umap_transform_service.rb', line 201 def instance UmapTransformServiceSingleton.instance end |
Instance Method Details
#check_model_staleness ⇒ Hash
Check if model needs retraining based on staleness
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# File 'lib/ragnar/umap_transform_service.rb', line 131 def check_model_staleness return { needs_retraining: true, coverage_percentage: 0, reason: "No model exists" } unless model_exists? = return { needs_retraining: true, coverage_percentage: 0, reason: "No metadata found" } unless trained_count = [:document_count] || 0 current_count = @database.document_count if current_count == 0 return { needs_retraining: false, coverage_percentage: 100, reason: "No documents" } end coverage = (trained_count.to_f / current_count * 100).round(1) staleness = 100 - coverage { needs_retraining: staleness > 30, coverage_percentage: coverage, trained_documents: trained_count, current_documents: current_count, staleness_percentage: staleness, reason: staleness > 30 ? "Model covers only #{coverage}% of documents" : "Model is up to date" } end |
#model_exists? ⇒ Boolean
Check if a UMAP model exists
104 105 106 |
# File 'lib/ragnar/umap_transform_service.rb', line 104 def model_exists? File.exist?(@model_path) end |
#model_metadata ⇒ Hash?
Get metadata about the trained model
110 111 112 113 114 115 116 117 118 119 120 |
# File 'lib/ragnar/umap_transform_service.rb', line 110 def return @model_metadata if @model_metadata = @model_path.sub(/\.bin$/, '_metadata.json') return nil unless File.exist?() @model_metadata = JSON.parse(File.read(), symbolize_names: true) rescue => e puts "Error loading model metadata: #{e.}" nil end |
#model_version ⇒ Integer
Get the version of the current model
124 125 126 127 |
# File 'lib/ragnar/umap_transform_service.rb', line 124 def model_version return 0 unless File.exist?(@model_path) File.mtime(@model_path).to_i end |
#transform_documents(document_ids) ⇒ Hash
Transform embeddings for specific documents
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/ragnar/umap_transform_service.rb', line 20 def transform_documents(document_ids) return { processed: 0, skipped: 0, errors: 0 } if document_ids.empty? load_model! # Fetch documents documents = @database.get_documents_by_ids(document_ids) if documents.empty? return { processed: 0, skipped: 0, errors: 0 } end # Extract and validate embeddings valid_docs = [] = [] skipped_count = 0 documents.each do |doc| emb = doc[:embedding] if emb.nil? || !emb.is_a?(Array) || emb.empty? skipped_count += 1 next end if emb.any? { |v| !v.is_a?(Numeric) || v.nan? || !v.finite? } skipped_count += 1 next end valid_docs << doc << emb end return { processed: 0, skipped: skipped_count, errors: 0 } if .empty? # Transform using UMAP begin = @umap_model.transform() # Prepare updates updates = valid_docs.zip().map do |doc, reduced_emb| { id: doc[:id], reduced_embedding: reduced_emb, umap_version: model_version } end # Update database @database.(updates) { processed: updates.size, skipped: skipped_count, errors: 0 } rescue => e puts "Error transforming documents: #{e.}" { processed: 0, skipped: skipped_count, errors: valid_docs.size } end end |
#transform_query(embedding) ⇒ Array<Float>?
Transform a single query embedding
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 |
# File 'lib/ragnar/umap_transform_service.rb', line 82 def transform_query() return nil if .nil? || !.is_a?(Array) || .empty? # Validate embedding if .any? { |v| !v.is_a?(Numeric) || v.nan? || !v.finite? } puts "Warning: Invalid query embedding (contains NaN or Infinity)" return nil end load_model! begin # Transform returns array of arrays, get first (and only) result @umap_model.transform([]).first rescue => e puts "Error transforming query: #{e.}" nil end end |