Class: Woods::MCP::IndexReader

Inherits:
Object
  • Object
show all
Defined in:
lib/woods/mcp/index_reader.rb

Overview

Reads extraction output from disk for the MCP server.

Lazy-loads unit JSON files on demand with an LRU-ish cache cap. Builds an identifier index from _index.json files for fast lookups.

Examples:

reader = IndexReader.new("/path/to/woods")
reader.find_unit("Post")      # => Hash (full unit data)
reader.list_units(type: "model") # => Array<Hash>

Constant Summary collapse

TYPE_DIRS =

Directories that correspond to extractor types in the output. Must stay in sync with Extractor::EXTRACTORS keys.

%w[
  models controllers graphql components view_components
  services jobs mailers serializers managers policies validators
  concerns routes middleware i18n pundit_policies configurations
  engines view_templates migrations action_cable_channels
  scheduled_jobs rake_tasks state_machines events decorators
  database_views caching factories test_mappings rails_source
  poros libs
].freeze
DIR_TO_TYPE =

Singular type name for each directory (used in search filtering). Derived from TYPE_DIRS via ActiveSupport singularize — no manual sync needed.

TYPE_DIRS.to_h { |dir| [dir, dir.singularize] }.freeze
TYPE_TO_DIR =
DIR_TO_TYPE.invert.freeze
MAX_UNIT_CACHE =

Maximum number of loaded unit files to cache in memory.

50
DEFAULT_SEARCH_MAX_SCAN =

Default maximum number of unit files to load during phase-2 search. Override with WOODS_SEARCH_MAX_SCAN env var.

500

Instance Method Summary collapse

Constructor Details

#initialize(index_dir) ⇒ IndexReader

Returns a new instance of IndexReader.

Parameters:

  • index_dir (String)

    Path to extraction output directory

Raises:

  • (ArgumentError)

    if directory doesn’t exist or has no manifest.json



46
47
48
49
50
51
52
53
54
# File 'lib/woods/mcp/index_reader.rb', line 46

def initialize(index_dir)
  @index_dir = Pathname.new(index_dir)
  raise ArgumentError, "Index directory does not exist: #{index_dir}" unless @index_dir.directory?
  raise ArgumentError, "No manifest.json found in: #{index_dir}" unless @index_dir.join('manifest.json').file?

  @unit_cache = {}
  @unit_cache_order = []
  @identifier_map = nil
end

Instance Method Details

#dependency_graphWoods::DependencyGraph

Returns Graph loaded from disk.

Returns:



125
126
127
128
129
130
# File 'lib/woods/mcp/index_reader.rb', line 125

def dependency_graph
  @dependency_graph ||= begin
    data = parse_json('dependency_graph.json')
    Woods::DependencyGraph.from_h(data)
  end
end

#find_unit(identifier) ⇒ Hash?

Find a single unit by identifier.

Parameters:

  • identifier (String)

    Unit identifier (e.g. “Post”, “Api::V1::HealthController”)

Returns:

  • (Hash, nil)

    Full unit data or nil if not found



141
142
143
144
145
146
# File 'lib/woods/mcp/index_reader.rb', line 141

def find_unit(identifier)
  location = identifier_map[identifier]
  return nil unless location

  load_unit(location[:type_dir], location[:filename])
end

#framework_sources(keyword, limit: 20) ⇒ Array<Hash>

Search rails_source units by concept keyword.

Matches the keyword (case-insensitive) against identifier, source_code, and metadata fields of rails_source type units.

Parameters:

  • keyword (String)

    Concept keyword to match (e.g. “ActiveRecord”, “routing”, “persistence”)

  • limit (Integer) (defaults to: 20)

    Maximum results to return

Returns:

  • (Array<Hash>)

    Matching rails_source unit summaries



332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
# File 'lib/woods/mcp/index_reader.rb', line 332

def framework_sources(keyword, limit: 20)
  # Multi-word keywords ("ActiveRecord callbacks") are split on
  # whitespace and ANDed. Single-word queries behave as before.
  tokens = keyword.to_s.strip.split(/\s+/)
  return [] if tokens.empty?

  patterns = tokens.map { |t| Regexp.new(Regexp.escape(t), Regexp::IGNORECASE) }
  results = []

  entries = read_index('rails_source')
  entries.each do |entry|
    break if results.size >= limit

    id = entry['identifier']
    unit = find_unit(id)
    next unless unit

     = unit['metadata']&.to_json
    matched = patterns.all? do |pat|
      pat.match?(id) ||
        (unit['source_code'] && pat.match?(unit['source_code'])) ||
        ( && pat.match?())
    end

    next unless matched

    results << {
      identifier: id,
      type: 'rails_source',
      file_path: unit['file_path'],
      metadata: unit['metadata']
    }
  end

  results
end

#graph_analysisHash

Returns Parsed graph_analysis.json.

Returns:

  • (Hash)

    Parsed graph_analysis.json



133
134
135
# File 'lib/woods/mcp/index_reader.rb', line 133

def graph_analysis
  @graph_analysis ||= parse_json('graph_analysis.json')
end

#list_units(type: nil) ⇒ Array<Hash>

List units, optionally filtered by type.

Parameters:

  • type (String, nil) (defaults to: nil)

    Singular type name (e.g. “model”, “controller”)

Returns:

  • (Array<Hash>)

    Index entries for matching units



152
153
154
155
156
157
158
159
160
161
# File 'lib/woods/mcp/index_reader.rb', line 152

def list_units(type: nil)
  dirs = if type
           dir = TYPE_TO_DIR[type]
           dir ? [dir] : []
         else
           TYPE_DIRS
         end

  dirs.flat_map { |dir| read_index(dir) }
end

#manifestHash

Returns Parsed manifest.json.

Returns:

  • (Hash)

    Parsed manifest.json



101
102
103
# File 'lib/woods/mcp/index_reader.rb', line 101

def manifest
  @manifest ||= parse_json('manifest.json')
end

#raw_graph_dataHash

Returns Raw dependency graph data from JSON.

Returns:

  • (Hash)

    Raw dependency graph data from JSON



413
414
415
# File 'lib/woods/mcp/index_reader.rb', line 413

def raw_graph_data
  @raw_graph_data ||= parse_json('dependency_graph.json')
end

#recent_changes(limit: 10, types: nil) ⇒ Array<Hash>

Return units sorted by most recent git modification.

Reads all units that have metadata.git.last_modified and returns them sorted descending by that timestamp.

Parameters:

  • limit (Integer) (defaults to: 10)

    Maximum results to return

  • types (Array<String>, nil) (defaults to: nil)

    Filter to these singular type names

Returns:

  • (Array<Hash>)

    Units sorted by last_modified descending



377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
# File 'lib/woods/mcp/index_reader.rb', line 377

def recent_changes(limit: 10, types: nil)
  dirs = if types
           types.filter_map { |t| TYPE_TO_DIR[t] }
         else
           TYPE_DIRS
         end

  units_with_dates = []

  dirs.each do |dir|
    entries = read_index(dir)
    entries.each do |entry|
      id = entry['identifier']
      unit = find_unit(id)
      next unless unit

      last_modified = unit.dig('metadata', 'git', 'last_modified')
      next unless last_modified

      units_with_dates << {
        identifier: id,
        type: DIR_TO_TYPE[dir],
        file_path: unit['file_path'],
        last_modified: last_modified,
        author: unit.dig('metadata', 'git', 'last_author')
      }
    end
  end

  units_with_dates
    .sort_by { |u| u[:last_modified] }
    .reverse
    .first(limit)
end

#reload!void

This method returns an undefined value.

Clear all cached state so the next access re-reads from disk.



87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/woods/mcp/index_reader.rb', line 87

def reload!
  @unit_cache = {}
  @unit_cache_order = []
  @identifier_map = nil
  @index_cache = {}
  @manifest = nil
  @summary = nil
  @dependency_graph = nil
  @graph_analysis = nil
  @raw_graph_data = nil
  @normalized_graph_edges = nil
end

#search(query = nil, types: nil, fields: %w[identifier],, limit: 20, exact_prefix: nil, exact_suffix: nil) ⇒ Hash

Search units by case-insensitive pattern.

Phase 1: match identifiers from index files (cheap). Phase 2: lazy-load unit files for metadata/source_code matching.

The query is compiled as a raw Ruby regex with IGNORECASE. If the pattern is invalid, it falls back to an escaped literal match.

A “broad” pattern is one that matches more than 50% of the entries in a type directory. Broad patterns still run but the result includes a :note.

Phase-2 scan is capped at WOODS_SEARCH_MAX_SCAN unit files (default 500). When the cap is reached the result includes :partial => true.

The optional exact_prefix / exact_suffix filters restrict results to identifiers whose start/end matches the given string literally (case- insensitive). They are ANDed with the query regex and are safer than hand-escaping regex anchors — metacharacters like :: are treated as literal text.

Parameters:

  • query (String, nil) (defaults to: nil)

    Search pattern (case-insensitive regex). Optional when exact_prefix or exact_suffix is provided; otherwise required.

  • types (Array<String>, nil) (defaults to: nil)

    Filter to these singular type names

  • fields (Array<String>) (defaults to: %w[identifier],)

    Fields to search: “identifier”, “metadata”, “source_code”

  • limit (Integer) (defaults to: 20)

    Maximum results to return

  • exact_prefix (String, nil) (defaults to: nil)

    Literal identifier prefix filter (case-insensitive)

  • exact_suffix (String, nil) (defaults to: nil)

    Literal identifier suffix filter (case-insensitive)

Returns:

  • (Hash)

    { results: Array<Hash>, note: String|nil, partial: Boolean }

Raises:

  • (ArgumentError)

    when all of query, exact_prefix, and exact_suffix are blank



196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
# File 'lib/woods/mcp/index_reader.rb', line 196

def search(query = nil, types: nil, fields: %w[identifier], limit: 20, exact_prefix: nil, exact_suffix: nil)
  prefix = exact_prefix.blank? ? nil : exact_prefix.downcase
  suffix = exact_suffix.blank? ? nil : exact_suffix.downcase
  if query.blank? && !prefix && !suffix
    raise ArgumentError, 'search requires a query or exact_prefix/exact_suffix filter'
  end

  # When only prefix/suffix are provided, the regex acts as a match-all
  # wildcard so the existing phase-1/phase-2 pipeline still works.
  pattern = compile_search_pattern(query.to_s.empty? ? '.*' : query)
  max_scan_env = ENV.fetch('WOODS_SEARCH_MAX_SCAN', '').to_s.strip
  max_scan = max_scan_env.empty? ? DEFAULT_SEARCH_MAX_SCAN : max_scan_env.to_i
  max_scan = DEFAULT_SEARCH_MAX_SCAN if max_scan <= 0

  results = []
  notes = []
  phase2_scanned = 0
  partial = false

  dirs = if types
           types.filter_map { |t| TYPE_TO_DIR[t] }
         else
           TYPE_DIRS
         end

  # Phase 2 candidates are collected per-dir and then scanned in
  # round-robin across dirs. Exhausting the per-run scan cap linearly
  # down TYPE_DIRS order would starve later types (`concerns` at pos
  # 13, `test_mappings` at pos 31) on any codebase where the earlier
  # dirs together exceed max_scan entries. Interleaving guarantees
  # every type contributes to the scanned set.
  phase2_queues = {}

  dirs.each do |dir|
    type_name = DIR_TO_TYPE[dir]
    entries = read_index(dir)

    # Broad-match detection: warn when pattern matches >50% of dir entries
    if entries.size > 1
      matching_count = entries.count do |e|
        identifier_passes_filters?(e['identifier'], pattern, prefix, suffix)
      end
      if matching_count > entries.size / 2.0
        notes << "broad pattern matched #{matching_count}/#{entries.size} entries in #{dir}"
      end
    end

    entries.each do |entry|
      id = entry['identifier']
      next unless identifier_passes_prefix_suffix?(id, prefix, suffix)

      # Phase 1: identifier matching (still in-order per dir)
      if fields.include?('identifier') && pattern.match?(id)
        next if results.size >= limit

        results << { identifier: id, type: type_name, match_field: 'identifier' }
        next
      end

      # Phase 2 is only reached when the caller opted into deeper fields.
      next unless fields.include?('metadata') || fields.include?('source_code')

      (phase2_queues[dir] ||= []) << [type_name, id]
    end
  end

  if results.size < limit && phase2_queues.any?
    queues = phase2_queues.values.map(&:dup)
    catch(:phase2_done) do
      loop do
        progressed = false
        queues.each do |queue|
          next if queue.empty?

          throw :phase2_done if results.size >= limit

          if phase2_scanned >= max_scan
            partial = true
            throw :phase2_done
          end

          type_name, id = queue.shift
          progressed = true

          unit = find_unit(id)
          next unless unit

          phase2_scanned += 1

          if fields.include?('source_code') && unit['source_code'] && pattern.match?(unit['source_code'])
            results << { identifier: id, type: type_name, match_field: 'source_code' }
          elsif fields.include?('metadata') && unit['metadata'] && pattern.match?(unit['metadata'].to_json)
            results << { identifier: id, type: type_name, match_field: 'metadata' }
          end
        end
        break unless progressed
      end
    end
  end

  response = { results: results.first(limit) }
  response[:note] = notes.join('; ') unless notes.empty?
  response[:partial] = true if partial
  response
end

#summaryString?

Returns SUMMARY.md content, or nil if not present.

Returns:

  • (String, nil)

    SUMMARY.md content, or nil if not present



117
118
119
120
121
122
# File 'lib/woods/mcp/index_reader.rb', line 117

def summary
  @summary ||= begin
    path = @index_dir.join('SUMMARY.md')
    path.file? ? path.read : nil
  end
end

#template_enginesArray<Symbol>

Template engines the extraction pipeline currently understands. Delegates to ViewTemplateExtractor.supported_template_engines so the list stays honest as engines are added or removed. Surfaced by the MCP ‘structure` tool (#86).

Returns:

  • (Array<Symbol>)


111
112
113
114
# File 'lib/woods/mcp/index_reader.rb', line 111

def template_engines
  require_relative '../extractors/view_template_extractor'
  Woods::Extractors::ViewTemplateExtractor.supported_template_engines.dup
end

#traverse_dependencies(identifier, depth: 2, types: nil, via: nil) ⇒ Hash

BFS traversal of forward dependencies.

Parameters:

  • identifier (String)

    Starting unit identifier

  • depth (Integer) (defaults to: 2)

    Maximum traversal depth

  • types (Array<String>, nil) (defaults to: nil)

    Filter to these singular type names

  • via (Array<String>, nil) (defaults to: nil)

    Filter to these relationship types (e.g. [“link_to”, “redirect_to”])

Returns:

  • (Hash)

    { root:, nodes: { id => { type:, depth:, deps: [] } } }



309
310
311
# File 'lib/woods/mcp/index_reader.rb', line 309

def traverse_dependencies(identifier, depth: 2, types: nil, via: nil)
  traverse(identifier, depth: depth, types: types, via: via, direction: :forward)
end

#traverse_dependents(identifier, depth: 2, types: nil, via: nil) ⇒ Hash

BFS traversal of reverse dependencies (dependents).

Parameters:

  • identifier (String)

    Starting unit identifier

  • depth (Integer) (defaults to: 2)

    Maximum traversal depth

  • types (Array<String>, nil) (defaults to: nil)

    Filter to these singular type names

  • via (Array<String>, nil) (defaults to: nil)

    Filter to these relationship types (e.g. [“link_to”, “redirect_to”])

Returns:

  • (Hash)

    { root:, nodes: { id => { type:, depth:, deps: [] } } }



320
321
322
# File 'lib/woods/mcp/index_reader.rb', line 320

def traverse_dependents(identifier, depth: 2, types: nil, via: nil)
  traverse(identifier, depth: depth, types: types, via: via, direction: :reverse)
end

#warmup!Hash

Pre-populate cached state so the first MCP tool call doesn’t pay for disk reads + JSON parsing.

Touches every lazy accessor: manifest, summary, dependency_graph, graph_analysis, and the identifier_map (which reads all _index.json files). Each step is individually rescued so a missing optional artefact (e.g. graph_analysis.json) never blocks the rest.

Safe to call multiple times — lazy accessors short-circuit on the memoized value.

Returns:

  • (Hash)

    Per-step outcome: ‘=> true | Exception`



68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
# File 'lib/woods/mcp/index_reader.rb', line 68

def warmup!
  steps = {
    manifest: -> { manifest },
    summary: -> { summary },
    dependency_graph: -> { dependency_graph },
    graph_analysis: -> { graph_analysis },
    identifier_map: -> { identifier_map }
  }
  steps.each_with_object({}) do |(step, runner), result|
    runner.call
    result[step] = true
  rescue StandardError => e
    result[step] = e
  end
end