Module: HTM::LongTermMemory::TagOperations

Included in:
HTM::LongTermMemory
Defined in:
lib/htm/long_term_memory/tag_operations.rb

Overview

Tag management operations for LongTermMemory

Handles hierarchical tag operations including:

  • Adding tags to nodes

  • Querying nodes by topic/tag

  • Tag relationship analysis

  • Batch tag loading (N+1 prevention)

  • Query-to-tag matching

Security: All queries use parameterized placeholders and LIKE patterns are sanitized to prevent SQL injection.

Constant Summary collapse

MAX_TAG_QUERY_LIMIT =

Maximum results to prevent DoS via unbounded queries

1000
MAX_TAG_SAMPLE_SIZE =
50
DEFAULT_TAG_SIMILARITY_THRESHOLD =

Default trigram similarity threshold for fuzzy tag search (0.0-1.0) Lower = more fuzzy matches, higher = stricter matching

0.3
300

Class Attribute Summary collapse

Instance Method Summary collapse

Class Attribute Details

Returns the value of attribute popular_tags_cache.



36
37
38
# File 'lib/htm/long_term_memory/tag_operations.rb', line 36

def popular_tags_cache
  @popular_tags_cache
end

Returns the value of attribute popular_tags_cache_expires_at.



36
37
38
# File 'lib/htm/long_term_memory/tag_operations.rb', line 36

def popular_tags_cache_expires_at
  @popular_tags_cache_expires_at
end

Returns the value of attribute popular_tags_mutex.



36
37
38
# File 'lib/htm/long_term_memory/tag_operations.rb', line 36

def popular_tags_mutex
  @popular_tags_mutex
end

Instance Method Details

#add_tag(node_id:, tag:) ⇒ void

This method returns an undefined value.

Add a tag to a node (creates tag and all parent tags)

When adding a hierarchical tag like “database:postgresql:extensions”, this also creates and associates the parent tags “database” and “database:postgresql” with the node.

Examples:

add_tag(node_id: 123, tag: "database:postgresql:extensions")
# Creates tags: "database", "database:postgresql", "database:postgresql:extensions"
# Associates all three with node 123

Parameters:

  • node_id (Integer)

    Node database ID

  • tag (String)

    Tag name



54
55
56
57
58
59
60
61
62
63
64
# File 'lib/htm/long_term_memory/tag_operations.rb', line 54

def add_tag(node_id:, tag:)
  # Create tag and all ancestor tags, then associate each with the node
  HTM::Models::Tag.find_or_create_with_ancestors(tag).each do |tag_record|
    HTM::Models::NodeTag.find_or_create(
      node_id: node_id,
      tag_id: tag_record.id
    )
  rescue Sequel::UniqueConstraintViolation
    # Tag association already exists, ignore
  end
end

#batch_load_node_tags(node_ids) ⇒ Hash<Integer, Array<String>>

Batch load tags for multiple nodes (avoids N+1 queries)

Parameters:

  • node_ids (Array<Integer>)

    Node database IDs

Returns:

  • (Hash<Integer, Array<String>>)

    Map of node_id to array of tag names



163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
# File 'lib/htm/long_term_memory/tag_operations.rb', line 163

def batch_load_node_tags(node_ids)
  return {} if node_ids.empty?

  # Single query to get all tags for all nodes
  results = HTM::Models::NodeTag
            .join(:tags, id: :tag_id)
            .where(node_id: node_ids)
            .select_map([:node_id, Sequel[:tags][:name]])

  # Group by node_id
  results.group_by(&:first).transform_values { |pairs| pairs.map(&:last) }
rescue Sequel::Error => e
  HTM.logger.error("Failed to batch load tags: #{e.message}")
  {}
end

#find_query_matching_tags(query, include_extracted: false) ⇒ Array<String>, Hash

Find tags that match terms in the query

Searches the tags table for tags where any hierarchy level matches query words. Uses semantic extraction via LLM to find relevant tags.

Performance: Uses a single UNION query instead of multiple sequential queries.

Parameters:

  • query (String)

    Search query

  • include_extracted (Boolean) (defaults to: false)

    If true, returns hash with :extracted and :matched keys

Returns:

  • (Array<String>)

    Matching tag names (default)

  • (Hash)

    If include_extracted: { extracted: […], matched: […] }



239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
# File 'lib/htm/long_term_memory/tag_operations.rb', line 239

def find_query_matching_tags(query, include_extracted: false)
  empty_result = include_extracted ? { extracted: [], matched: [] } : []
  return empty_result if query.nil? || query.strip.empty?

  # OPTIMIZATION: Use cached popular tags instead of expensive RANDOM() query
  # This saves 50-300ms per call by avoiding a full table sort
  existing_tags = cached_popular_tags

  # Use the tag extractor to generate semantic tags from the query
  extracted_tags = HTM::TagService.extract(query, existing_ontology: existing_tags)

  if extracted_tags.empty?
    return include_extracted ? { extracted: [], matched: [] } : []
  end

  # Build prefix candidates from extracted tags
  prefix_candidates = extracted_tags.flat_map do |tag|
    levels = tag.split(':')
    (1...levels.size).map { |i| levels[0, i].join(':') }
  end.uniq

  # Get all components for component matching
  all_components = extracted_tags.flat_map { |tag| tag.split(':') }.uniq

  # Build UNION query to find matches in a single database round-trip
  matched_tags = find_matching_tags_unified(
    exact_candidates: extracted_tags,
    prefix_candidates: prefix_candidates,
    component_candidates: all_components
  )

  if include_extracted
    { extracted: extracted_tags, matched: matched_tags }
  else
    matched_tags
  end
end

#get_node_tags(node_id) ⇒ Array<String>

Get tags for a specific node

Parameters:

  • node_id (Integer)

    Node database ID

Returns:

  • (Array<String>)

    Tag names



148
149
150
151
152
153
154
155
156
# File 'lib/htm/long_term_memory/tag_operations.rb', line 148

def get_node_tags(node_id)
  HTM::Models::Tag
    .join(:node_tags, tag_id: :id)
    .where(Sequel[:node_tags][:node_id] => node_id)
    .select_map(:name)
rescue Sequel::Error => e
  HTM.logger.error("Failed to retrieve tags for node #{node_id}: #{e.message}")
  []
end

#node_topics(node_id) ⇒ Array<String>

Get topics for a specific node

Parameters:

  • node_id (Integer)

    Node database ID

Returns:

  • (Array<String>)

    Topic paths



135
136
137
138
139
140
141
# File 'lib/htm/long_term_memory/tag_operations.rb', line 135

def node_topics(node_id)
  HTM::Models::Tag
    .join(:node_tags, tag_id: :id)
    .where(Sequel[:node_tags][:node_id] => node_id)
    .order(:name)
    .select_map(:name)
end

#nodes_by_topic(topic_path, exact: false, fuzzy: false, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD, limit: 50) ⇒ Array<Hash>

Retrieve nodes by ontological topic

Matching modes (in order of precedence):

  • exact: true - Only exact tag name match

  • fuzzy: true - Trigram similarity search (typo-tolerant)

  • default - LIKE prefix match (e.g., “database” matches “database:postgresql”)

Parameters:

  • topic_path (String)

    Topic hierarchy path

  • exact (Boolean) (defaults to: false)

    Exact match only (highest priority)

  • fuzzy (Boolean) (defaults to: false)

    Use trigram similarity for typo-tolerant search

  • min_similarity (Float) (defaults to: DEFAULT_TAG_SIMILARITY_THRESHOLD)

    Minimum similarity for fuzzy mode (0.0-1.0)

  • limit (Integer) (defaults to: 50)

    Maximum results (capped at MAX_TAG_QUERY_LIMIT)

Returns:

  • (Array<Hash>)

    Matching nodes



80
81
82
83
84
85
86
87
88
89
90
91
# File 'lib/htm/long_term_memory/tag_operations.rb', line 80

def nodes_by_topic(topic_path, exact: false, fuzzy: false, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD, limit: 50)
  safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT)
  node_ids   = node_ids_for_topic(topic_path, exact: exact, fuzzy: fuzzy, min_similarity: min_similarity)
  return [] if node_ids.empty?

  HTM::Models::Node
    .where(id: node_ids)
    .order(Sequel.desc(:created_at))
    .limit(safe_limit)
    .all
    .map(&:to_hash)
end

#ontology_structureArray<Hash>

Get ontology structure view

Returns:

  • (Array<Hash>)

    Ontology structure



97
98
99
100
101
# File 'lib/htm/long_term_memory/tag_operations.rb', line 97

def ontology_structure
  HTM.db.fetch(
    "SELECT * FROM ontology_structure WHERE root_topic IS NOT NULL ORDER BY root_topic, level1_topic, level2_topic"
  ).all.map { |r| r.transform_keys(&:to_s) }
end

Get most popular tags

Parameters:

  • limit (Integer) (defaults to: 20)

    Number of tags to return (capped at MAX_TAG_QUERY_LIMIT)

  • timeframe (Range, nil) (defaults to: nil)

    Optional time range filter

Returns:

  • (Array<Hash>)

    Tags with usage counts



185
186
187
188
189
190
191
# File 'lib/htm/long_term_memory/tag_operations.rb', line 185

def popular_tags(limit: 20, timeframe: nil)
  safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT)
  query = base_popular_tags_query
  query = filter_by_timeframe(query, timeframe) if timeframe
  query.order(Sequel.desc(:usage_count)).limit(safe_limit).all
       .map { |tag| { name: tag[:name], usage_count: tag[:usage_count].to_i } }
end

#search_tags(query, limit: 20, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD) ⇒ Array<Hash>

Fuzzy search for tags using trigram similarity

Uses PostgreSQL pg_trgm extension to find tags that are similar to the query string, tolerating typos and partial matches.

Parameters:

  • query (String)

    Search query (tag name or partial)

  • limit (Integer) (defaults to: 20)

    Maximum results (capped at MAX_TAG_QUERY_LIMIT)

  • min_similarity (Float) (defaults to: DEFAULT_TAG_SIMILARITY_THRESHOLD)

    Minimum similarity threshold (0.0-1.0)

Returns:

  • (Array<Hash>)

    Matching tags with similarity scores Each hash contains: { name: String, similarity: Float }



204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
# File 'lib/htm/long_term_memory/tag_operations.rb', line 204

def search_tags(query, limit: 20, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD)
  return [] if query.nil? || query.strip.empty?

  # Enforce limits
  safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT)
  safe_similarity = min_similarity.to_f.clamp(0.0, 1.0)

  sql = <<~SQL
    SELECT name, similarity(name, ?) as similarity
    FROM tags
    WHERE similarity(name, ?) >= ?
    ORDER BY similarity DESC, name
    LIMIT ?
  SQL

  HTM.db.fetch(sql, query, query, safe_similarity, safe_limit)
     .all
     .map { |r| { name: r[:name], similarity: r[:similarity].to_f } }
rescue Sequel::Error => e
  HTM.logger.error("Failed to search tags: #{e.message}")
  []
end

#topic_relationships(min_shared_nodes: 2, limit: 50) ⇒ Array<Hash>

Get topic relationships (co-occurrence)

Parameters:

  • min_shared_nodes (Integer) (defaults to: 2)

    Minimum shared nodes

  • limit (Integer) (defaults to: 50)

    Maximum relationships (capped at MAX_TAG_QUERY_LIMIT)

Returns:

  • (Array<Hash>)

    Topic relationships



109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/htm/long_term_memory/tag_operations.rb', line 109

def topic_relationships(min_shared_nodes: 2, limit: 50)
  # Enforce limit to prevent DoS
  safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT)
  safe_min = [min_shared_nodes.to_i, 1].max

  sql = <<~SQL
    SELECT t1.name AS topic1, t2.name AS topic2, COUNT(DISTINCT nt1.node_id) AS shared_nodes
    FROM tags t1
    JOIN node_tags nt1 ON t1.id = nt1.tag_id
    JOIN node_tags nt2 ON nt1.node_id = nt2.node_id
    JOIN tags t2 ON nt2.tag_id = t2.id
    WHERE t1.name < t2.name
    GROUP BY t1.name, t2.name
    HAVING COUNT(DISTINCT nt1.node_id) >= ?
    ORDER BY shared_nodes DESC
    LIMIT ?
  SQL

  HTM.db.fetch(sql, safe_min, safe_limit).all.map { |r| r.transform_keys(&:to_s) }
end