Class: Woods::MCP::IndexReader

Inherits:

Object

Object
Woods::MCP::IndexReader

show all

Defined in:: lib/woods/mcp/index_reader.rb

Overview

Reads extraction output from disk for the MCP server.

Lazy-loads unit JSON files on demand with an LRU-ish cache cap. Builds an identifier index from _index.json files for fast lookups.

Examples:

reader = IndexReader.new("/path/to/woods")
reader.find_unit("Post")      # => Hash (full unit data)
reader.list_units(type: "model") # => Array<Hash>

Constant Summary collapse

TYPE_DIRS = Directories that correspond to extractor types in the output. Must stay in sync with Extractor::EXTRACTORS keys.

%w[
  models controllers graphql components view_components
  services jobs mailers serializers managers policies validators
  concerns routes middleware i18n pundit_policies configurations
  engines view_templates migrations action_cable_channels
  scheduled_jobs rake_tasks state_machines events decorators
  database_views caching factories test_mappings rails_source
  poros libs
].freeze

DIR_TO_TYPE = Singular type name for each directory (used in search filtering). Derived from TYPE_DIRS via ActiveSupport singularize — no manual sync needed.

TYPE_DIRS.to_h { |dir| [dir, dir.singularize] }.freeze

TYPE_TO_DIR =

DIR_TO_TYPE.invert.freeze

MAX_UNIT_CACHE = Maximum number of loaded unit files to cache in memory.

DEFAULT_SEARCH_MAX_SCAN = Default maximum number of unit files to load during phase-2 search. Override with WOODS_SEARCH_MAX_SCAN env var.

Instance Method Summary collapse

#dependency_graph ⇒ Woods::DependencyGraph

Graph loaded from disk.
#find_unit(identifier) ⇒ Hash^?

Find a single unit by identifier.
#framework_sources(keyword, limit: 20) ⇒ Array<Hash>

Search rails_source units by concept keyword.
#graph_analysis ⇒ Hash

Parsed graph_analysis.json.
#initialize(index_dir) ⇒ IndexReader constructor

A new instance of IndexReader.
#list_units(type: nil) ⇒ Array<Hash>

List units, optionally filtered by type.
#manifest ⇒ Hash

Parsed manifest.json.
#raw_graph_data ⇒ Hash

Raw dependency graph data from JSON.
#recent_changes(limit: 10, types: nil) ⇒ Array<Hash>

Return units sorted by most recent git modification.
#reload! ⇒ void

Clear all cached state so the next access re-reads from disk.
#search(query = nil, types: nil, fields: %w[identifier],, limit: 20, exact_prefix: nil, exact_suffix: nil) ⇒ Hash

Search units by case-insensitive pattern.
#summary ⇒ String^?

SUMMARY.md content, or nil if not present.
#template_engines ⇒ Array<Symbol>

Template engines the extraction pipeline currently understands.
#traverse_dependencies(identifier, depth: 2, types: nil, via: nil) ⇒ Hash

BFS traversal of forward dependencies.
#traverse_dependents(identifier, depth: 2, types: nil, via: nil) ⇒ Hash

BFS traversal of reverse dependencies (dependents).
#warmup! ⇒ Hash

Pre-populate cached state so the first MCP tool call doesn’t pay for disk reads + JSON parsing.

Constructor Details

#initialize(index_dir) ⇒ `IndexReader`

Returns a new instance of IndexReader.

Parameters:

index_dir (String) —

Path to extraction output directory

Raises:

(ArgumentError) —

if directory doesn’t exist or has no manifest.json

# File 'lib/woods/mcp/index_reader.rb', line 46

def initialize(index_dir)
  @index_dir = Pathname.new(index_dir)
  raise ArgumentError, "Index directory does not exist: #{index_dir}" unless @index_dir.directory?
  raise ArgumentError, "No manifest.json found in: #{index_dir}" unless @index_dir.join('manifest.json').file?

  @unit_cache = {}
  @unit_cache_order = []
  @identifier_map = nil
end

Instance Method Details

#dependency_graph ⇒ `Woods::DependencyGraph`

Returns Graph loaded from disk.

Returns:

(Woods::DependencyGraph) —

Graph loaded from disk

# File 'lib/woods/mcp/index_reader.rb', line 125

def dependency_graph
  @dependency_graph ||= begin
    data = parse_json('dependency_graph.json')
    Woods::DependencyGraph.from_h(data)
  end
end

#find_unit(identifier) ⇒ `Hash`^?

Find a single unit by identifier.

Parameters:

identifier (String) —

Unit identifier (e.g. “Post”, “Api::V1::HealthController”)

Returns:

(Hash, nil) —

Full unit data or nil if not found

# File 'lib/woods/mcp/index_reader.rb', line 141

def find_unit(identifier)
  location = identifier_map[identifier]
  return nil unless location

  load_unit(location[:type_dir], location[:filename])
end

#framework_sources(keyword, limit: 20) ⇒ `Array<Hash>`

Search rails_source units by concept keyword.

Matches the keyword (case-insensitive) against identifier, source_code, and metadata fields of rails_source type units.

Parameters:

keyword (String) —

Concept keyword to match (e.g. “ActiveRecord”, “routing”, “persistence”)
limit (Integer) (defaults to: 20) —

Maximum results to return

Returns:

(Array<Hash>) —

Matching rails_source unit summaries

# File 'lib/woods/mcp/index_reader.rb', line 332

def framework_sources(keyword, limit: 20)
  # Multi-word keywords ("ActiveRecord callbacks") are split on
  # whitespace and ANDed. Single-word queries behave as before.
  tokens = keyword.to_s.strip.split(/\s+/)
  return [] if tokens.empty?

  patterns = tokens.map { |t| Regexp.new(Regexp.escape(t), Regexp::IGNORECASE) }
  results = []

  entries = read_index('rails_source')
  entries.each do |entry|
    break if results.size >= limit

    id = entry['identifier']
    unit = find_unit(id)
    next unless unit

    metadata_json = unit['metadata']&.to_json
    matched = patterns.all? do |pat|
      pat.match?(id) ||
        (unit['source_code'] && pat.match?(unit['source_code'])) ||
        (metadata_json && pat.match?(metadata_json))
    end

    next unless matched

    results << {
      identifier: id,
      type: 'rails_source',
      file_path: unit['file_path'],
      metadata: unit['metadata']
    }
  end

  results
end

#graph_analysis ⇒ `Hash`

Returns Parsed graph_analysis.json.

Returns:

(Hash) —

Parsed graph_analysis.json



133
134
135

# File 'lib/woods/mcp/index_reader.rb', line 133

def graph_analysis
  @graph_analysis ||= parse_json('graph_analysis.json')
end

#list_units(type: nil) ⇒ `Array<Hash>`

List units, optionally filtered by type.

Parameters:

type (String, nil) (defaults to: nil) —

Singular type name (e.g. “model”, “controller”)

Returns:

(Array<Hash>) —

Index entries for matching units

# File 'lib/woods/mcp/index_reader.rb', line 152

def list_units(type: nil)
  dirs = if type
           dir = TYPE_TO_DIR[type]
           dir ? [dir] : []
         else
           TYPE_DIRS
         end

  dirs.flat_map { |dir| read_index(dir) }
end

#manifest ⇒ `Hash`

Returns Parsed manifest.json.

Returns:

(Hash) —

Parsed manifest.json



101
102
103

# File 'lib/woods/mcp/index_reader.rb', line 101

def manifest
  @manifest ||= parse_json('manifest.json')
end

#raw_graph_data ⇒ `Hash`

Returns Raw dependency graph data from JSON.

Returns:

(Hash) —

Raw dependency graph data from JSON



413
414
415

# File 'lib/woods/mcp/index_reader.rb', line 413

def raw_graph_data
  @raw_graph_data ||= parse_json('dependency_graph.json')
end

#recent_changes(limit: 10, types: nil) ⇒ `Array<Hash>`

Return units sorted by most recent git modification.

Reads all units that have metadata.git.last_modified and returns them sorted descending by that timestamp.

Parameters:

limit (Integer) (defaults to: 10) —

Maximum results to return
types (Array<String>, nil) (defaults to: nil) —

Filter to these singular type names

Returns:

(Array<Hash>) —

Units sorted by last_modified descending

# File 'lib/woods/mcp/index_reader.rb', line 377

def recent_changes(limit: 10, types: nil)
  dirs = if types
           types.filter_map { |t| TYPE_TO_DIR[t] }
         else
           TYPE_DIRS
         end

  units_with_dates = []

  dirs.each do |dir|
    entries = read_index(dir)
    entries.each do |entry|
      id = entry['identifier']
      unit = find_unit(id)
      next unless unit

      last_modified = unit.dig('metadata', 'git', 'last_modified')
      next unless last_modified

      units_with_dates << {
        identifier: id,
        type: DIR_TO_TYPE[dir],
        file_path: unit['file_path'],
        last_modified: last_modified,
        author: unit.dig('metadata', 'git', 'last_author')
      }
    end
  end

  units_with_dates
    .sort_by { |u| u[:last_modified] }
    .reverse
    .first(limit)
end

#reload! ⇒ `void`

This method returns an undefined value.

Clear all cached state so the next access re-reads from disk.

# File 'lib/woods/mcp/index_reader.rb', line 87

def reload!
  @unit_cache = {}
  @unit_cache_order = []
  @identifier_map = nil
  @index_cache = {}
  @manifest = nil
  @summary = nil
  @dependency_graph = nil
  @graph_analysis = nil
  @raw_graph_data = nil
  @normalized_graph_edges = nil
end

#search(query = nil, types: nil, fields: %w[identifier],, limit: 20, exact_prefix: nil, exact_suffix: nil) ⇒ `Hash`

Search units by case-insensitive pattern.

Phase 1: match identifiers from index files (cheap). Phase 2: lazy-load unit files for metadata/source_code matching.

The query is compiled as a raw Ruby regex with IGNORECASE. If the pattern is invalid, it falls back to an escaped literal match.

A “broad” pattern is one that matches more than 50% of the entries in a type directory. Broad patterns still run but the result includes a :note.

Phase-2 scan is capped at WOODS_SEARCH_MAX_SCAN unit files (default 500). When the cap is reached the result includes :partial => true.

The optional exact_prefix / exact_suffix filters restrict results to identifiers whose start/end matches the given string literally (case- insensitive). They are ANDed with the query regex and are safer than hand-escaping regex anchors — metacharacters like :: are treated as literal text.

Parameters:

query (String, nil) (defaults to: nil) —

Search pattern (case-insensitive regex). Optional when exact_prefix or exact_suffix is provided; otherwise required.
types (Array<String>, nil) (defaults to: nil) —

Filter to these singular type names
fields (Array<String>) (defaults to: %w[identifier],) —

Fields to search: “identifier”, “metadata”, “source_code”
limit (Integer) (defaults to: 20) —

Maximum results to return
exact_prefix (String, nil) (defaults to: nil) —

Literal identifier prefix filter (case-insensitive)
exact_suffix (String, nil) (defaults to: nil) —

Literal identifier suffix filter (case-insensitive)

Returns:

(Hash) —

{ results: Array<Hash>, note: String|nil, partial: Boolean }

Raises:

(ArgumentError) —

when all of query, exact_prefix, and exact_suffix are blank

# File 'lib/woods/mcp/index_reader.rb', line 196

def search(query = nil, types: nil, fields: %w[identifier], limit: 20, exact_prefix: nil, exact_suffix: nil)
  prefix = exact_prefix.blank? ? nil : exact_prefix.downcase
  suffix = exact_suffix.blank? ? nil : exact_suffix.downcase
  if query.blank? && !prefix && !suffix
    raise ArgumentError, 'search requires a query or exact_prefix/exact_suffix filter'
  end

  # When only prefix/suffix are provided, the regex acts as a match-all
  # wildcard so the existing phase-1/phase-2 pipeline still works.
  pattern = compile_search_pattern(query.to_s.empty? ? '.*' : query)
  max_scan_env = ENV.fetch('WOODS_SEARCH_MAX_SCAN', '').to_s.strip
  max_scan = max_scan_env.empty? ? DEFAULT_SEARCH_MAX_SCAN : max_scan_env.to_i
  max_scan = DEFAULT_SEARCH_MAX_SCAN if max_scan <= 0

  results = []
  notes = []
  phase2_scanned = 0
  partial = false

  dirs = if types
           types.filter_map { |t| TYPE_TO_DIR[t] }
         else
           TYPE_DIRS
         end

  # Phase 2 candidates are collected per-dir and then scanned in
  # round-robin across dirs. Exhausting the per-run scan cap linearly
  # down TYPE_DIRS order would starve later types (`concerns` at pos
  # 13, `test_mappings` at pos 31) on any codebase where the earlier
  # dirs together exceed max_scan entries. Interleaving guarantees
  # every type contributes to the scanned set.
  phase2_queues = {}

  dirs.each do |dir|
    type_name = DIR_TO_TYPE[dir]
    entries = read_index(dir)

    # Broad-match detection: warn when pattern matches >50% of dir entries
    if entries.size > 1
      matching_count = entries.count do |e|
        identifier_passes_filters?(e['identifier'], pattern, prefix, suffix)
      end
      if matching_count > entries.size / 2.0
        notes << "broad pattern matched #{matching_count}/#{entries.size} entries in #{dir}"
      end
    end

    entries.each do |entry|
      id = entry['identifier']
      next unless identifier_passes_prefix_suffix?(id, prefix, suffix)

      # Phase 1: identifier matching (still in-order per dir)
      if fields.include?('identifier') && pattern.match?(id)
        next if results.size >= limit

        results << { identifier: id, type: type_name, match_field: 'identifier' }
        next
      end

      # Phase 2 is only reached when the caller opted into deeper fields.
      next unless fields.include?('metadata') || fields.include?('source_code')

      (phase2_queues[dir] ||= []) << [type_name, id]
    end
  end

  if results.size < limit && phase2_queues.any?
    queues = phase2_queues.values.map(&:dup)
    catch(:phase2_done) do
      loop do
        progressed = false
        queues.each do |queue|
          next if queue.empty?

          throw :phase2_done if results.size >= limit

          if phase2_scanned >= max_scan
            partial = true
            throw :phase2_done
          end

          type_name, id = queue.shift
          progressed = true

          unit = find_unit(id)
          next unless unit

          phase2_scanned += 1

          if fields.include?('source_code') && unit['source_code'] && pattern.match?(unit['source_code'])
            results << { identifier: id, type: type_name, match_field: 'source_code' }
          elsif fields.include?('metadata') && unit['metadata'] && pattern.match?(unit['metadata'].to_json)
            results << { identifier: id, type: type_name, match_field: 'metadata' }
          end
        end
        break unless progressed
      end
    end
  end

  response = { results: results.first(limit) }
  response[:note] = notes.join('; ') unless notes.empty?
  response[:partial] = true if partial
  response
end

#summary ⇒ `String`^?

Returns SUMMARY.md content, or nil if not present.

Returns:

(String, nil) —

SUMMARY.md content, or nil if not present

# File 'lib/woods/mcp/index_reader.rb', line 117

def summary
  @summary ||= begin
    path = @index_dir.join('SUMMARY.md')
    path.file? ? path.read : nil
  end
end

#template_engines ⇒ `Array<Symbol>`

Template engines the extraction pipeline currently understands. Delegates to ViewTemplateExtractor.supported_template_engines so the list stays honest as engines are added or removed. Surfaced by the MCP ‘structure` tool (#86).

Returns:

(Array<Symbol>)

# File 'lib/woods/mcp/index_reader.rb', line 111

def template_engines
  require_relative '../extractors/view_template_extractor'
  Woods::Extractors::ViewTemplateExtractor.supported_template_engines.dup
end

#traverse_dependencies(identifier, depth: 2, types: nil, via: nil) ⇒ `Hash`

BFS traversal of forward dependencies.

Parameters:

identifier (String) —

Starting unit identifier
depth (Integer) (defaults to: 2) —

Maximum traversal depth
types (Array<String>, nil) (defaults to: nil) —

Filter to these singular type names
via (Array<String>, nil) (defaults to: nil) —

Filter to these relationship types (e.g. [“link_to”, “redirect_to”])

Returns:

(Hash) —

{ root:, nodes: { id => { type:, depth:, deps: [] } } }



309
310
311

# File 'lib/woods/mcp/index_reader.rb', line 309

def traverse_dependencies(identifier, depth: 2, types: nil, via: nil)
  traverse(identifier, depth: depth, types: types, via: via, direction: :forward)
end

#traverse_dependents(identifier, depth: 2, types: nil, via: nil) ⇒ `Hash`

BFS traversal of reverse dependencies (dependents).

Parameters:

identifier (String) —

Starting unit identifier
depth (Integer) (defaults to: 2) —

Maximum traversal depth
types (Array<String>, nil) (defaults to: nil) —

Filter to these singular type names
via (Array<String>, nil) (defaults to: nil) —

Filter to these relationship types (e.g. [“link_to”, “redirect_to”])

Returns:

(Hash) —

{ root:, nodes: { id => { type:, depth:, deps: [] } } }



320
321
322

# File 'lib/woods/mcp/index_reader.rb', line 320

def traverse_dependents(identifier, depth: 2, types: nil, via: nil)
  traverse(identifier, depth: depth, types: types, via: via, direction: :reverse)
end

#warmup! ⇒ `Hash`

Pre-populate cached state so the first MCP tool call doesn’t pay for disk reads + JSON parsing.

Touches every lazy accessor: manifest, summary, dependency_graph, graph_analysis, and the identifier_map (which reads all _index.json files). Each step is individually rescued so a missing optional artefact (e.g. graph_analysis.json) never blocks the rest.

Safe to call multiple times — lazy accessors short-circuit on the memoized value.

Returns:

(Hash) —

Per-step outcome: ‘=> true | Exception`

# File 'lib/woods/mcp/index_reader.rb', line 68

def warmup!
  steps = {
    manifest: -> { manifest },
    summary: -> { summary },
    dependency_graph: -> { dependency_graph },
    graph_analysis: -> { graph_analysis },
    identifier_map: -> { identifier_map }
  }
  steps.each_with_object({}) do |(step, runner), result|
    runner.call
    result[step] = true
  rescue StandardError => e
    result[step] = e
  end
end

Class: Woods::MCP::IndexReader

Overview

Examples:

Constant Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(index_dir) ⇒ IndexReader

Instance Method Details

#dependency_graph ⇒ Woods::DependencyGraph

#find_unit(identifier) ⇒ Hash?

#framework_sources(keyword, limit: 20) ⇒ Array<Hash>

#graph_analysis ⇒ Hash

#list_units(type: nil) ⇒ Array<Hash>

#manifest ⇒ Hash

#raw_graph_data ⇒ Hash

#recent_changes(limit: 10, types: nil) ⇒ Array<Hash>

#reload! ⇒ void

#search(query = nil, types: nil, fields: %w[identifier],, limit: 20, exact_prefix: nil, exact_suffix: nil) ⇒ Hash

#summary ⇒ String?

#template_engines ⇒ Array<Symbol>

#traverse_dependencies(identifier, depth: 2, types: nil, via: nil) ⇒ Hash

#traverse_dependents(identifier, depth: 2, types: nil, via: nil) ⇒ Hash

#warmup! ⇒ Hash

#initialize(index_dir) ⇒ `IndexReader`

#dependency_graph ⇒ `Woods::DependencyGraph`

#find_unit(identifier) ⇒ `Hash`^?

#framework_sources(keyword, limit: 20) ⇒ `Array<Hash>`

#graph_analysis ⇒ `Hash`

#list_units(type: nil) ⇒ `Array<Hash>`

#manifest ⇒ `Hash`

#raw_graph_data ⇒ `Hash`

#recent_changes(limit: 10, types: nil) ⇒ `Array<Hash>`

#reload! ⇒ `void`

#search(query = nil, types: nil, fields: %w[identifier],, limit: 20, exact_prefix: nil, exact_suffix: nil) ⇒ `Hash`

#summary ⇒ `String`^?

#template_engines ⇒ `Array<Symbol>`

#traverse_dependencies(identifier, depth: 2, types: nil, via: nil) ⇒ `Hash`

#traverse_dependents(identifier, depth: 2, types: nil, via: nil) ⇒ `Hash`

#warmup! ⇒ `Hash`