Module: HTM::LongTermMemory::TagOperations

Included in:: HTM::LongTermMemory

Defined in:: lib/htm/long_term_memory/tag_operations.rb

Overview

Tag management operations for LongTermMemory

Handles hierarchical tag operations including:

Adding tags to nodes
Querying nodes by topic/tag
Tag relationship analysis
Batch tag loading (N+1 prevention)
Query-to-tag matching

Security: All queries use parameterized placeholders and LIKE patterns are sanitized to prevent SQL injection.

Constant Summary collapse

MAX_TAG_QUERY_LIMIT = Maximum results to prevent DoS via unbounded queries

MAX_TAG_SAMPLE_SIZE =

DEFAULT_TAG_SIMILARITY_THRESHOLD = Default trigram similarity threshold for fuzzy tag search (0.0-1.0) Lower = more fuzzy matches, higher = stricter matching

0.3

POPULAR_TAGS_CACHE_TTL = Cache TTL for popular tags (5 minutes) This eliminates expensive RANDOM() queries on every tag extraction

Class Attribute Summary collapse

.popular_tags_cache ⇒ Object

Returns the value of attribute popular_tags_cache.
.popular_tags_cache_expires_at ⇒ Object

Returns the value of attribute popular_tags_cache_expires_at.
.popular_tags_mutex ⇒ Object

Returns the value of attribute popular_tags_mutex.

Instance Method Summary collapse

#add_tag(node_id:, tag:) ⇒ void

Add a tag to a node (creates tag and all parent tags).
#batch_load_node_tags(node_ids) ⇒ Hash<Integer, Array<String>>

Batch load tags for multiple nodes (avoids N+1 queries).
#find_query_matching_tags(query, include_extracted: false) ⇒ Array<String>, Hash

Find tags that match terms in the query.
#get_node_tags(node_id) ⇒ Array<String>

Get tags for a specific node.
#node_topics(node_id) ⇒ Array<String>

Get topics for a specific node.
#nodes_by_topic(topic_path, exact: false, fuzzy: false, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD, limit: 50) ⇒ Array<Hash>

Retrieve nodes by ontological topic.
#ontology_structure ⇒ Array<Hash>

Get ontology structure view.
#popular_tags(limit: 20, timeframe: nil) ⇒ Array<Hash>

Get most popular tags.
#search_tags(query, limit: 20, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD) ⇒ Array<Hash>

Fuzzy search for tags using trigram similarity.
#topic_relationships(min_shared_nodes: 2, limit: 50) ⇒ Array<Hash>

Get topic relationships (co-occurrence).

Class Attribute Details

.popular_tags_cache ⇒ `Object`

Returns the value of attribute popular_tags_cache.



36
37
38

# File 'lib/htm/long_term_memory/tag_operations.rb', line 36

def popular_tags_cache
  @popular_tags_cache
end

.popular_tags_cache_expires_at ⇒ `Object`

Returns the value of attribute popular_tags_cache_expires_at.



36
37
38

# File 'lib/htm/long_term_memory/tag_operations.rb', line 36

def popular_tags_cache_expires_at
  @popular_tags_cache_expires_at
end

.popular_tags_mutex ⇒ `Object`

Returns the value of attribute popular_tags_mutex.



36
37
38

# File 'lib/htm/long_term_memory/tag_operations.rb', line 36

def popular_tags_mutex
  @popular_tags_mutex
end

Instance Method Details

#add_tag(node_id:, tag:) ⇒ `void`

This method returns an undefined value.

Add a tag to a node (creates tag and all parent tags)

When adding a hierarchical tag like “database:postgresql:extensions”, this also creates and associates the parent tags “database” and “database:postgresql” with the node.

Examples:

add_tag(node_id: 123, tag: "database:postgresql:extensions")
# Creates tags: "database", "database:postgresql", "database:postgresql:extensions"
# Associates all three with node 123

Parameters:

node_id (Integer) —

Node database ID
tag (String) —

Tag name

# File 'lib/htm/long_term_memory/tag_operations.rb', line 54

def add_tag(node_id:, tag:)
  # Create tag and all ancestor tags, then associate each with the node
  HTM::Models::Tag.find_or_create_with_ancestors(tag).each do |tag_record|
    HTM::Models::NodeTag.find_or_create(
      node_id: node_id,
      tag_id: tag_record.id
    )
  rescue Sequel::UniqueConstraintViolation
    # Tag association already exists, ignore
  end
end

#batch_load_node_tags(node_ids) ⇒ `Hash<Integer, Array<String>>`

Batch load tags for multiple nodes (avoids N+1 queries)

Parameters:

node_ids (Array<Integer>) —

Node database IDs

Returns:

(Hash<Integer, Array<String>>) —

Map of node_id to array of tag names

# File 'lib/htm/long_term_memory/tag_operations.rb', line 163

def batch_load_node_tags(node_ids)
  return {} if node_ids.empty?

  # Single query to get all tags for all nodes
  results = HTM::Models::NodeTag
            .join(:tags, id: :tag_id)
            .where(node_id: node_ids)
            .select_map([:node_id, Sequel[:tags][:name]])

  # Group by node_id
  results.group_by(&:first).transform_values { |pairs| pairs.map(&:last) }
rescue Sequel::Error => e
  HTM.logger.error("Failed to batch load tags: #{e.message}")
  {}
end

#find_query_matching_tags(query, include_extracted: false) ⇒ `Array<String>`, `Hash`

Find tags that match terms in the query

Searches the tags table for tags where any hierarchy level matches query words. Uses semantic extraction via LLM to find relevant tags.

Performance: Uses a single UNION query instead of multiple sequential queries.

Parameters:

query (String) —

Search query
include_extracted (Boolean) (defaults to: false) —

If true, returns hash with :extracted and :matched keys

Returns:

(Array<String>) —

Matching tag names (default)
(Hash) —

If include_extracted: { extracted: […], matched: […] }

# File 'lib/htm/long_term_memory/tag_operations.rb', line 239

def find_query_matching_tags(query, include_extracted: false)
  empty_result = include_extracted ? { extracted: [], matched: [] } : []
  return empty_result if query.nil? || query.strip.empty?

  # OPTIMIZATION: Use cached popular tags instead of expensive RANDOM() query
  # This saves 50-300ms per call by avoiding a full table sort
  existing_tags = cached_popular_tags

  # Use the tag extractor to generate semantic tags from the query
  extracted_tags = HTM::TagService.extract(query, existing_ontology: existing_tags)

  if extracted_tags.empty?
    return include_extracted ? { extracted: [], matched: [] } : []
  end

  # Build prefix candidates from extracted tags
  prefix_candidates = extracted_tags.flat_map do |tag|
    levels = tag.split(':')
    (1...levels.size).map { |i| levels[0, i].join(':') }
  end.uniq

  # Get all components for component matching
  all_components = extracted_tags.flat_map { |tag| tag.split(':') }.uniq

  # Build UNION query to find matches in a single database round-trip
  matched_tags = find_matching_tags_unified(
    exact_candidates: extracted_tags,
    prefix_candidates: prefix_candidates,
    component_candidates: all_components
  )

  if include_extracted
    { extracted: extracted_tags, matched: matched_tags }
  else
    matched_tags
  end
end

#get_node_tags(node_id) ⇒ `Array<String>`

Get tags for a specific node

Parameters:

node_id (Integer) —

Node database ID

Returns:

(Array<String>) —

Tag names

# File 'lib/htm/long_term_memory/tag_operations.rb', line 148

def get_node_tags(node_id)
  HTM::Models::Tag
    .join(:node_tags, tag_id: :id)
    .where(Sequel[:node_tags][:node_id] => node_id)
    .select_map(:name)
rescue Sequel::Error => e
  HTM.logger.error("Failed to retrieve tags for node #{node_id}: #{e.message}")
  []
end

#node_topics(node_id) ⇒ `Array<String>`

Get topics for a specific node

Parameters:

node_id (Integer) —

Node database ID

Returns:

(Array<String>) —

Topic paths

# File 'lib/htm/long_term_memory/tag_operations.rb', line 135

def node_topics(node_id)
  HTM::Models::Tag
    .join(:node_tags, tag_id: :id)
    .where(Sequel[:node_tags][:node_id] => node_id)
    .order(:name)
    .select_map(:name)
end

#nodes_by_topic(topic_path, exact: false, fuzzy: false, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD, limit: 50) ⇒ `Array<Hash>`

Retrieve nodes by ontological topic

Matching modes (in order of precedence):

exact: true - Only exact tag name match
fuzzy: true - Trigram similarity search (typo-tolerant)
default - LIKE prefix match (e.g., “database” matches “database:postgresql”)

Parameters:

topic_path (String) —

Topic hierarchy path
exact (Boolean) (defaults to: false) —

Exact match only (highest priority)
fuzzy (Boolean) (defaults to: false) —

Use trigram similarity for typo-tolerant search
min_similarity (Float) (defaults to: DEFAULT_TAG_SIMILARITY_THRESHOLD) —

Minimum similarity for fuzzy mode (0.0-1.0)
limit (Integer) (defaults to: 50) —

Maximum results (capped at MAX_TAG_QUERY_LIMIT)

Returns:

(Array<Hash>) —

Matching nodes

# File 'lib/htm/long_term_memory/tag_operations.rb', line 80

def nodes_by_topic(topic_path, exact: false, fuzzy: false, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD, limit: 50)
  safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT)
  node_ids   = node_ids_for_topic(topic_path, exact: exact, fuzzy: fuzzy, min_similarity: min_similarity)
  return [] if node_ids.empty?

  HTM::Models::Node
    .where(id: node_ids)
    .order(Sequel.desc(:created_at))
    .limit(safe_limit)
    .all
    .map(&:to_hash)
end

#ontology_structure ⇒ `Array<Hash>`

Get ontology structure view

Returns:

(Array<Hash>) —

Ontology structure

# File 'lib/htm/long_term_memory/tag_operations.rb', line 97

def ontology_structure
  HTM.db.fetch(
    "SELECT * FROM ontology_structure WHERE root_topic IS NOT NULL ORDER BY root_topic, level1_topic, level2_topic"
  ).all.map { |r| r.transform_keys(&:to_s) }
end

#popular_tags(limit: 20, timeframe: nil) ⇒ `Array<Hash>`

Get most popular tags

Parameters:

limit (Integer) (defaults to: 20) —

Number of tags to return (capped at MAX_TAG_QUERY_LIMIT)
timeframe (Range, nil) (defaults to: nil) —

Optional time range filter

Returns:

(Array<Hash>) —

Tags with usage counts

# File 'lib/htm/long_term_memory/tag_operations.rb', line 185

def popular_tags(limit: 20, timeframe: nil)
  safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT)
  query = base_popular_tags_query
  query = filter_by_timeframe(query, timeframe) if timeframe
  query.order(Sequel.desc(:usage_count)).limit(safe_limit).all
       .map { |tag| { name: tag[:name], usage_count: tag[:usage_count].to_i } }
end

#search_tags(query, limit: 20, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD) ⇒ `Array<Hash>`

Fuzzy search for tags using trigram similarity

Uses PostgreSQL pg_trgm extension to find tags that are similar to the query string, tolerating typos and partial matches.

Parameters:

query (String) —

Search query (tag name or partial)
limit (Integer) (defaults to: 20) —

Maximum results (capped at MAX_TAG_QUERY_LIMIT)
min_similarity (Float) (defaults to: DEFAULT_TAG_SIMILARITY_THRESHOLD) —

Minimum similarity threshold (0.0-1.0)

Returns:

(Array<Hash>) —

Matching tags with similarity scores Each hash contains: { name: String, similarity: Float }

# File 'lib/htm/long_term_memory/tag_operations.rb', line 204

def search_tags(query, limit: 20, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD)
  return [] if query.nil? || query.strip.empty?

  # Enforce limits
  safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT)
  safe_similarity = min_similarity.to_f.clamp(0.0, 1.0)

  sql = <<~SQL
    SELECT name, similarity(name, ?) as similarity
    FROM tags
    WHERE similarity(name, ?) >= ?
    ORDER BY similarity DESC, name
    LIMIT ?
  SQL

  HTM.db.fetch(sql, query, query, safe_similarity, safe_limit)
     .all
     .map { |r| { name: r[:name], similarity: r[:similarity].to_f } }
rescue Sequel::Error => e
  HTM.logger.error("Failed to search tags: #{e.message}")
  []
end

#topic_relationships(min_shared_nodes: 2, limit: 50) ⇒ `Array<Hash>`

Get topic relationships (co-occurrence)

Parameters:

min_shared_nodes (Integer) (defaults to: 2) —

Minimum shared nodes
limit (Integer) (defaults to: 50) —

Maximum relationships (capped at MAX_TAG_QUERY_LIMIT)

Returns:

(Array<Hash>) —

Topic relationships

# File 'lib/htm/long_term_memory/tag_operations.rb', line 109

def topic_relationships(min_shared_nodes: 2, limit: 50)
  # Enforce limit to prevent DoS
  safe_limit = limit.to_i.clamp(1, MAX_TAG_QUERY_LIMIT)
  safe_min = [min_shared_nodes.to_i, 1].max

  sql = <<~SQL
    SELECT t1.name AS topic1, t2.name AS topic2, COUNT(DISTINCT nt1.node_id) AS shared_nodes
    FROM tags t1
    JOIN node_tags nt1 ON t1.id = nt1.tag_id
    JOIN node_tags nt2 ON nt1.node_id = nt2.node_id
    JOIN tags t2 ON nt2.tag_id = t2.id
    WHERE t1.name < t2.name
    GROUP BY t1.name, t2.name
    HAVING COUNT(DISTINCT nt1.node_id) >= ?
    ORDER BY shared_nodes DESC
    LIMIT ?
  SQL

  HTM.db.fetch(sql, safe_min, safe_limit).all.map { |r| r.transform_keys(&:to_s) }
end

Module: HTM::LongTermMemory::TagOperations

Overview

Constant Summary collapse

Class Attribute Summary collapse

Instance Method Summary collapse

Class Attribute Details

.popular_tags_cache ⇒ Object

.popular_tags_cache_expires_at ⇒ Object

.popular_tags_mutex ⇒ Object

Instance Method Details

#add_tag(node_id:, tag:) ⇒ void

Examples:

#batch_load_node_tags(node_ids) ⇒ Hash<Integer, Array<String>>

#find_query_matching_tags(query, include_extracted: false) ⇒ Array<String>, Hash

#get_node_tags(node_id) ⇒ Array<String>

#node_topics(node_id) ⇒ Array<String>

#nodes_by_topic(topic_path, exact: false, fuzzy: false, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD, limit: 50) ⇒ Array<Hash>

#ontology_structure ⇒ Array<Hash>

#popular_tags(limit: 20, timeframe: nil) ⇒ Array<Hash>

#search_tags(query, limit: 20, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD) ⇒ Array<Hash>

#topic_relationships(min_shared_nodes: 2, limit: 50) ⇒ Array<Hash>

.popular_tags_cache ⇒ `Object`

.popular_tags_cache_expires_at ⇒ `Object`

.popular_tags_mutex ⇒ `Object`

#add_tag(node_id:, tag:) ⇒ `void`

#batch_load_node_tags(node_ids) ⇒ `Hash<Integer, Array<String>>`

#find_query_matching_tags(query, include_extracted: false) ⇒ `Array<String>`, `Hash`

#get_node_tags(node_id) ⇒ `Array<String>`

#node_topics(node_id) ⇒ `Array<String>`

#nodes_by_topic(topic_path, exact: false, fuzzy: false, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD, limit: 50) ⇒ `Array<Hash>`

#ontology_structure ⇒ `Array<Hash>`

#popular_tags(limit: 20, timeframe: nil) ⇒ `Array<Hash>`

#search_tags(query, limit: 20, min_similarity: DEFAULT_TAG_SIMILARITY_THRESHOLD) ⇒ `Array<Hash>`

#topic_relationships(min_shared_nodes: 2, limit: 50) ⇒ `Array<Hash>`