Class: HTM::TagService

Inherits:
Object
  • Object
show all
Defined in:
lib/htm/tag_service.rb

Overview

Tag Service - Processes and validates hierarchical tags

This service wraps the configured tag extractor and provides:

  • Response parsing (string or array)

  • Format validation (lowercase, alphanumeric, hyphens, colons)

  • Depth validation (max 5 levels)

  • Ontology consistency

  • Circuit breaker protection for external LLM failures

The actual LLM call is delegated to HTM.configuration.tag_extractor

Constant Summary collapse

TAG_FORMAT =

Validation regex

/^[a-z0-9-]+(:[a-z0-9-]+)*$/
SINGULARIZE_SKIP_LIST =

Words that should NOT be singularized (proper nouns, technical terms, etc.)

%w[
  rails kubernetes aws gcp azure s3 ios macos redis postgres
  postgresql mysql jenkins travis github gitlab mkdocs devops
  analytics statistics mathematics physics ethics dynamics
  graphics linguistics economics robotics
  pages windows
].freeze

Class Method Summary collapse

Class Method Details

.circuit_breakerHTM::CircuitBreaker

Get or create the circuit breaker for tag service

Returns:



38
39
40
41
42
43
44
45
46
47
48
# File 'lib/htm/tag_service.rb', line 38

def circuit_breaker
  config = HTM.configuration
  @circuit_breaker_mutex.synchronize do
    @circuit_breaker ||= HTM::CircuitBreaker.new(
      name: 'tag_service',
      failure_threshold: config.circuit_breaker_failure_threshold,
      reset_timeout: config.circuit_breaker_reset_timeout,
      half_open_max_calls: config.circuit_breaker_half_open_max_calls
    )
  end
end

.extract(content, existing_ontology: []) ⇒ Array<String>

Extract tags with validation and processing

Parameters:

  • content (String)

    Text to analyze

  • existing_ontology (Array<String>) (defaults to: [])

    Sample of existing tags for context

Returns:

  • (Array<String>)

    Validated tag names

Raises:



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# File 'lib/htm/tag_service.rb', line 68

def self.extract(content, existing_ontology: [])
  # Use circuit breaker to protect against cascading failures
  raw_tags = circuit_breaker.call do
    HTM.configuration.tag_extractor.call(content, existing_ontology)
  end

  # Parse response (may be string or array)
  parsed_tags = parse_tags(raw_tags)

  # Validate and filter tags
  validate_and_filter_tags(parsed_tags)
rescue HTM::CircuitBreakerOpenError, HTM::TagError
  raise
rescue StandardError => e
  HTM.logger.error "TagService: Failed to extract tags: #{e.message}"
  raise HTM::TagError, "Tag extraction failed: #{e.message}"
end

.max_depthInteger

Maximum tag hierarchy depth (configurable, default 4)

Returns:

  • (Integer)

    Max depth (3 colons max by default)



30
31
32
# File 'lib/htm/tag_service.rb', line 30

def max_depth
  HTM.configuration.max_tag_depth
end

.parse_hierarchy(tag) ⇒ Hash

Parse hierarchical structure of a tag

Parameters:

  • tag (String)

    Hierarchical tag (e.g., “ai:llm:embedding”)

Returns:

  • (Hash)

    Hierarchy structure

    full: "ai:llm:embedding",
    root: "ai",
    parent: "ai:llm",
    levels: ["ai", "llm", "embedding"],
    depth: 3
    



182
183
184
185
186
187
188
189
190
191
192
# File 'lib/htm/tag_service.rb', line 182

def self.parse_hierarchy(tag)
  levels = tag.split(':')

  {
    full: tag,
    root: levels.first,
    parent: levels.size > 1 ? levels[0..-2].join(':') : nil,
    levels: levels,
    depth: levels.size
  }
end

.parse_tags(raw_tags) ⇒ Array<String>

Parse tag response (handles string or array input)

Parameters:

  • raw_tags (String, Array)

    Raw response from extractor

Returns:

  • (Array<String>)

    Parsed tag strings



91
92
93
94
95
96
97
98
99
100
101
102
# File 'lib/htm/tag_service.rb', line 91

def self.parse_tags(raw_tags)
  case raw_tags
  when Array
    # Already an array, return as-is
    raw_tags.map(&:to_s).map(&:strip).reject(&:empty?)
  when String
    # String response - split by newlines
    raw_tags.split("\n").map(&:strip).reject(&:empty?)
  else
    raise HTM::TagError, "Tag response must be Array or String, got #{raw_tags.class}"
  end
end

.reset_circuit_breaker!void

This method returns an undefined value.

Reset the circuit breaker (useful for testing)



54
55
56
57
58
# File 'lib/htm/tag_service.rb', line 54

def reset_circuit_breaker!
  @circuit_breaker_mutex.synchronize do
    @circuit_breaker&.reset!
  end
end

.singularize_level(level) ⇒ String

Singularize a single tag level with safety checks

Parameters:

  • level (String)

    Single tag level

Returns:

  • (String)

    Singularized level or original if skipped



232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
# File 'lib/htm/tag_service.rb', line 232

def self.singularize_level(level)
  # Skip if in the skip list
  return level if SINGULARIZE_SKIP_LIST.include?(level.downcase)

  # Skip words ending in -ics (usually singular: analytics, robotics, etc.)
  return level if level.end_with?('ics')

  # Skip words ending in -ous (adjectives: victorious, precious, etc.)
  return level if level.end_with?('ous')

  # Skip words ending in -ss (class, access, etc.)
  return level if level.end_with?('ss')

  # Skip single-letter or very short words
  return level if level.length <= 2

  # Only singularize if it looks like a regular plural
  # (ends in s but not ss, ics, ous)
  unless level.end_with?('s')
    return level
  end

  singular = level.singularize

  # Sanity check: if singularize made it weird, keep original
  # (e.g., "pages" -> "page" is fine, but "bus" -> "bu" is not)
  if singular.length < level.length - 2
    return level
  end

  if singular != level
    HTM.logger.debug "TagService: Normalized '#{level}' to '#{singular}'"
  end

  singular
end

.singularize_tag_levels(tag) ⇒ String

Normalize tag levels to singular form

Converts plural levels to singular using ActiveSupport’s singularize. This ensures taxonomy consistency (e.g., “users” -> “user”).

Skips:

  • Proper nouns and technical terms (Rails, MkDocs, etc.)

  • Words ending in -ics (analytics, robotics, etc.)

  • Words that don’t end in common plural patterns

Parameters:

  • tag (String)

    Tag with potentially plural levels

Returns:

  • (String)

    Tag with all levels singularized



216
217
218
219
220
221
222
223
224
225
# File 'lib/htm/tag_service.rb', line 216

def self.singularize_tag_levels(tag)
  levels = tag.split(':')
  singularized = levels.map do |level|
    singularize_level(level)
  end
  singularized.join(':')
rescue NoMethodError
  # singularize not available (ActiveSupport not loaded)
  tag
end

.valid_tag?(tag) ⇒ Boolean

Validate single tag format

Parameters:

  • tag (String)

    Tag to validate

Returns:

  • (Boolean)

    True if valid



156
157
158
159
160
161
162
163
164
165
166
167
168
# File 'lib/htm/tag_service.rb', line 156

def self.valid_tag?(tag)
  return false unless tag.is_a?(String)
  return false if tag.empty?
  return false unless tag.match?(TAG_FORMAT)
  return false if tag.count(':') >= max_depth

  # Ontological validation
  levels = tag.split(':')
  return false if levels.size > 1 && levels.first == levels.last  # Self-containment
  return false if levels.size != levels.uniq.size  # Duplicate segments

  true
end

.validate_and_filter_tags(tags) ⇒ Array<String>

Validate and filter tags

Parameters:

  • tags (Array<String>)

    Parsed tags

Returns:

  • (Array<String>)

    Valid tags only



109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/htm/tag_service.rb', line 109

def self.validate_and_filter_tags(tags)
  valid_tags = []

  tags.each do |tag|
    # Normalize: convert plural levels to singular
    tag = singularize_tag_levels(tag)
    # Check format
    unless tag.match?(TAG_FORMAT)
      HTM.logger.warn "TagService: Invalid tag format, skipping: #{tag}"
      next
    end

    # Check depth
    depth = tag.count(':')
    max_tag_depth = max_depth
    if depth >= max_tag_depth
      HTM.logger.warn "TagService: Tag depth #{depth + 1} exceeds max #{max_tag_depth}, skipping: #{tag}"
      next
    end

    # Parse hierarchy for ontological validation
    levels = tag.split(':')

    # Check for self-containment (root == leaf creates circular reference)
    if levels.size > 1 && levels.first == levels.last
      HTM.logger.warn "TagService: Self-containment detected (root == leaf), skipping: #{tag}"
      next
    end

    # Check for duplicate segments in path (indicates circular/redundant hierarchy)
    if levels.size != levels.uniq.size
      HTM.logger.warn "TagService: Duplicate segment in hierarchy, skipping: #{tag}"
      next
    end

    # Tag is valid
    valid_tags << tag
  end

  valid_tags.uniq
end