Module: HTM::LongTermMemory::TagOperations
- Included in:
- HTM::LongTermMemory
- Defined in:
- lib/htm/long_term_memory/tag_operations.rb
Overview
Tag management operations for LongTermMemory
Handles hierarchical tag operations including:
-
Adding tags to nodes
-
Querying nodes by topic/tag
-
Tag relationship analysis
-
Batch tag loading (N+1 prevention)
-
Query-to-tag matching
Security: All queries use parameterized placeholders and LIKE patterns are sanitized to prevent SQL injection.
Constant Summary collapse
- MAX_TAG_QUERY_LIMIT =
Maximum results to prevent DoS via unbounded queries
1000- MAX_TAG_SAMPLE_SIZE =
50- DEFAULT_TAG_SIMILARITY_THRESHOLD =
Default trigram similarity threshold for fuzzy tag search (0.0-1.0) Lower = more fuzzy matches, higher = stricter matching
0.3- POPULAR_TAGS_CACHE_TTL =
Cache TTL for popular tags (5 minutes) This eliminates expensive RANDOM() queries on every tag extraction
300
Class Attribute Summary collapse
-
.popular_tags_cache ⇒ Object
Returns the value of attribute popular_tags_cache.
-
.popular_tags_cache_expires_at ⇒ Object
Returns the value of attribute popular_tags_cache_expires_at.
-
.popular_tags_mutex ⇒ Object
Returns the value of attribute popular_tags_mutex.
Instance Method Summary collapse
-
#add_tag(node_id:, tag:) ⇒ void
Add a tag to a node (creates tag and all parent tags).
-
#batch_load_node_tags(node_ids) ⇒ Hash<Integer, Array<String>>
Batch load tags for multiple nodes (avoids N+1 queries).
-
#find_query_matching_tags(query, include_extracted: false) ⇒ Array<String>, Hash
Find tags that match terms in the query.
-
#get_node_tags(node_id) ⇒ Array<String>
Get tags for a specific node.
-
#node_topics(node_id) ⇒ Array<String>
Get topics for a specific node.
-
#nodes_by_topic(topic_path, exact: false, fuzzy: false, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD, limit: 50) ⇒ Array<Hash>
Retrieve nodes by ontological topic.
-
#ontology_structure ⇒ Array<Hash>
Get ontology structure view.
-
#popular_tags(limit: 20, timeframe: nil) ⇒ Array<Hash>
Get most popular tags.
-
#search_tags(query, limit: 20, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD) ⇒ Array<Hash>
Fuzzy search for tags using trigram similarity.
-
#topic_relationships(min_shared_nodes: 2, limit: 50) ⇒ Array<Hash>
Get topic relationships (co-occurrence).
Class Attribute Details
.popular_tags_cache ⇒ Object
Returns the value of attribute popular_tags_cache.
36 37 38 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 36 def @popular_tags_cache end |
.popular_tags_cache_expires_at ⇒ Object
Returns the value of attribute popular_tags_cache_expires_at.
36 37 38 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 36 def @popular_tags_cache_expires_at end |
.popular_tags_mutex ⇒ Object
Returns the value of attribute popular_tags_mutex.
36 37 38 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 36 def @popular_tags_mutex end |
Instance Method Details
#add_tag(node_id:, tag:) ⇒ void
This method returns an undefined value.
Add a tag to a node (creates tag and all parent tags)
When adding a hierarchical tag like “database:postgresql:extensions”, this also creates and associates the parent tags “database” and “database:postgresql” with the node.
54 55 56 57 58 59 60 61 62 63 64 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 54 def add_tag(node_id:, tag:) # Create tag and all ancestor tags, then associate each with the node HTM::Models::Tag.find_or_create_with_ancestors(tag).each do |tag_record| HTM::Models::NodeTag.find_or_create( node_id: node_id, tag_id: tag_record.id ) rescue Sequel::UniqueConstraintViolation # Tag association already exists, ignore end end |
#batch_load_node_tags(node_ids) ⇒ Hash<Integer, Array<String>>
Batch load tags for multiple nodes (avoids N+1 queries)
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 163 def (node_ids) return {} if node_ids.empty? # Single query to get all tags for all nodes results = HTM::Models::NodeTag .join(:tags, id: :tag_id) .where(node_id: node_ids) .select_map([:node_id, Sequel[:tags][:name]]) # Group by node_id results.group_by(&:first).transform_values { |pairs| pairs.map(&:last) } rescue Sequel::Error => e HTM.logger.error("Failed to batch load tags: #{e.}") {} end |
#find_query_matching_tags(query, include_extracted: false) ⇒ Array<String>, Hash
Find tags that match terms in the query
Searches the tags table for tags where any hierarchy level matches query words. Uses semantic extraction via LLM to find relevant tags.
Performance: Uses a single UNION query instead of multiple sequential queries.
239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 239 def (query, include_extracted: false) empty_result = include_extracted ? { extracted: [], matched: [] } : [] return empty_result if query.nil? || query.strip.empty? # OPTIMIZATION: Use cached popular tags instead of expensive RANDOM() query # This saves 50-300ms per call by avoiding a full table sort = # Use the tag extractor to generate semantic tags from the query = HTM::TagService.extract(query, existing_ontology: ) if .empty? return include_extracted ? { extracted: [], matched: [] } : [] end # Build prefix candidates from extracted tags prefix_candidates = .flat_map do |tag| levels = tag.split(':') (1...levels.size).map { |i| levels[0, i].join(':') } end.uniq # Get all components for component matching all_components = .flat_map { |tag| tag.split(':') }.uniq # Build UNION query to find matches in a single database round-trip = ( exact_candidates: , prefix_candidates: prefix_candidates, component_candidates: all_components ) if include_extracted { extracted: , matched: } else end end |
#get_node_tags(node_id) ⇒ Array<String>
Get tags for a specific node
148 149 150 151 152 153 154 155 156 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 148 def (node_id) HTM::Models::Tag .join(:node_tags, tag_id: :id) .where(Sequel[:node_tags][:node_id] => node_id) .select_map(:name) rescue Sequel::Error => e HTM.logger.error("Failed to retrieve tags for node #{node_id}: #{e.}") [] end |
#node_topics(node_id) ⇒ Array<String>
Get topics for a specific node
135 136 137 138 139 140 141 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 135 def node_topics(node_id) HTM::Models::Tag .join(:node_tags, tag_id: :id) .where(Sequel[:node_tags][:node_id] => node_id) .order(:name) .select_map(:name) end |
#nodes_by_topic(topic_path, exact: false, fuzzy: false, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD, limit: 50) ⇒ Array<Hash>
Retrieve nodes by ontological topic
Matching modes (in order of precedence):
-
exact: true - Only exact tag name match
-
fuzzy: true - Trigram similarity search (typo-tolerant)
-
default - LIKE prefix match (e.g., “database” matches “database:postgresql”)
80 81 82 83 84 85 86 87 88 89 90 91 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 80 def nodes_by_topic(topic_path, exact: false, fuzzy: false, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD, limit: 50) safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT) node_ids = node_ids_for_topic(topic_path, exact: exact, fuzzy: fuzzy, min_similarity: min_similarity) return [] if node_ids.empty? HTM::Models::Node .where(id: node_ids) .order(Sequel.desc(:created_at)) .limit(safe_limit) .all .map(&:to_hash) end |
#ontology_structure ⇒ Array<Hash>
Get ontology structure view
97 98 99 100 101 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 97 def ontology_structure HTM.db.fetch( "SELECT * FROM ontology_structure WHERE root_topic IS NOT NULL ORDER BY root_topic, level1_topic, level2_topic" ).all.map { |r| r.transform_keys(&:to_s) } end |
#popular_tags(limit: 20, timeframe: nil) ⇒ Array<Hash>
Get most popular tags
185 186 187 188 189 190 191 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 185 def (limit: 20, timeframe: nil) safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT) query = query = filter_by_timeframe(query, timeframe) if timeframe query.order(Sequel.desc(:usage_count)).limit(safe_limit).all .map { |tag| { name: tag[:name], usage_count: tag[:usage_count].to_i } } end |
#search_tags(query, limit: 20, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD) ⇒ Array<Hash>
Fuzzy search for tags using trigram similarity
Uses PostgreSQL pg_trgm extension to find tags that are similar to the query string, tolerating typos and partial matches.
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 204 def (query, limit: 20, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD) return [] if query.nil? || query.strip.empty? # Enforce limits safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT) safe_similarity = min_similarity.to_f.clamp(0.0, 1.0) sql = <<~SQL SELECT name, similarity(name, ?) as similarity FROM tags WHERE similarity(name, ?) >= ? ORDER BY similarity DESC, name LIMIT ? SQL HTM.db.fetch(sql, query, query, safe_similarity, safe_limit) .all .map { |r| { name: r[:name], similarity: r[:similarity].to_f } } rescue Sequel::Error => e HTM.logger.error("Failed to search tags: #{e.}") [] end |
#topic_relationships(min_shared_nodes: 2, limit: 50) ⇒ Array<Hash>
Get topic relationships (co-occurrence)
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
# File 'lib/htm/long_term_memory/tag_operations.rb', line 109 def topic_relationships(min_shared_nodes: 2, limit: 50) # Enforce limit to prevent DoS safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT) safe_min = [min_shared_nodes.to_i, 1].max sql = <<~SQL SELECT t1.name AS topic1, t2.name AS topic2, COUNT(DISTINCT nt1.node_id) AS shared_nodes FROM tags t1 JOIN node_tags nt1 ON t1.id = nt1.tag_id JOIN node_tags nt2 ON nt1.node_id = nt2.node_id JOIN tags t2 ON nt2.tag_id = t2.id WHERE t1.name < t2.name GROUP BY t1.name, t2.name HAVING COUNT(DISTINCT nt1.node_id) >= ? ORDER BY shared_nodes DESC LIMIT ? SQL HTM.db.fetch(sql, safe_min, safe_limit).all.map { |r| r.transform_keys(&:to_s) } end |