Class: Woods::MCP::IndexReader
- Inherits:
-
Object
- Object
- Woods::MCP::IndexReader
- Defined in:
- lib/woods/mcp/index_reader.rb
Overview
Reads extraction output from disk for the MCP server.
Lazy-loads unit JSON files on demand with an LRU-ish cache cap. Builds an identifier index from _index.json files for fast lookups.
Constant Summary collapse
- TYPE_DIRS =
Directories that correspond to extractor types in the output. Must stay in sync with Extractor::EXTRACTORS keys.
%w[ models controllers graphql components view_components services jobs mailers serializers managers policies validators concerns routes middleware i18n pundit_policies configurations engines view_templates migrations action_cable_channels scheduled_jobs rake_tasks state_machines events decorators database_views caching factories test_mappings rails_source poros libs ].freeze
- DIR_TO_TYPE =
Singular type name for each directory (used in search filtering). Derived from TYPE_DIRS via ActiveSupport singularize — no manual sync needed.
TYPE_DIRS.to_h { |dir| [dir, dir.singularize] }.freeze
- TYPE_TO_DIR =
DIR_TO_TYPE.invert.freeze
- MAX_UNIT_CACHE =
Maximum number of loaded unit files to cache in memory.
50- DEFAULT_SEARCH_MAX_SCAN =
Default maximum number of unit files to load during phase-2 search. Override with WOODS_SEARCH_MAX_SCAN env var.
500
Instance Method Summary collapse
-
#dependency_graph ⇒ Woods::DependencyGraph
Graph loaded from disk.
-
#find_unit(identifier) ⇒ Hash?
Find a single unit by identifier.
-
#framework_sources(keyword, limit: 20) ⇒ Array<Hash>
Search rails_source units by concept keyword.
-
#graph_analysis ⇒ Hash
Parsed graph_analysis.json.
-
#initialize(index_dir) ⇒ IndexReader
constructor
A new instance of IndexReader.
-
#list_units(type: nil) ⇒ Array<Hash>
List units, optionally filtered by type.
-
#manifest ⇒ Hash
Parsed manifest.json.
-
#raw_graph_data ⇒ Hash
Raw dependency graph data from JSON.
-
#recent_changes(limit: 10, types: nil) ⇒ Array<Hash>
Return units sorted by most recent git modification.
-
#reload! ⇒ void
Clear all cached state so the next access re-reads from disk.
-
#search(query = nil, types: nil, fields: %w[identifier],, limit: 20, exact_prefix: nil, exact_suffix: nil) ⇒ Hash
Search units by case-insensitive pattern.
-
#summary ⇒ String?
SUMMARY.md content, or nil if not present.
-
#template_engines ⇒ Array<Symbol>
Template engines the extraction pipeline currently understands.
-
#traverse_dependencies(identifier, depth: 2, types: nil, via: nil) ⇒ Hash
BFS traversal of forward dependencies.
-
#traverse_dependents(identifier, depth: 2, types: nil, via: nil) ⇒ Hash
BFS traversal of reverse dependencies (dependents).
-
#warmup! ⇒ Hash
Pre-populate cached state so the first MCP tool call doesn’t pay for disk reads + JSON parsing.
Constructor Details
#initialize(index_dir) ⇒ IndexReader
Returns a new instance of IndexReader.
46 47 48 49 50 51 52 53 54 |
# File 'lib/woods/mcp/index_reader.rb', line 46 def initialize(index_dir) @index_dir = Pathname.new(index_dir) raise ArgumentError, "Index directory does not exist: #{index_dir}" unless @index_dir.directory? raise ArgumentError, "No manifest.json found in: #{index_dir}" unless @index_dir.join('manifest.json').file? @unit_cache = {} @unit_cache_order = [] @identifier_map = nil end |
Instance Method Details
#dependency_graph ⇒ Woods::DependencyGraph
Returns Graph loaded from disk.
125 126 127 128 129 130 |
# File 'lib/woods/mcp/index_reader.rb', line 125 def dependency_graph @dependency_graph ||= begin data = parse_json('dependency_graph.json') Woods::DependencyGraph.from_h(data) end end |
#find_unit(identifier) ⇒ Hash?
Find a single unit by identifier.
141 142 143 144 145 146 |
# File 'lib/woods/mcp/index_reader.rb', line 141 def find_unit(identifier) location = identifier_map[identifier] return nil unless location load_unit(location[:type_dir], location[:filename]) end |
#framework_sources(keyword, limit: 20) ⇒ Array<Hash>
Search rails_source units by concept keyword.
Matches the keyword (case-insensitive) against identifier, source_code, and metadata fields of rails_source type units.
332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 |
# File 'lib/woods/mcp/index_reader.rb', line 332 def framework_sources(keyword, limit: 20) # Multi-word keywords ("ActiveRecord callbacks") are split on # whitespace and ANDed. Single-word queries behave as before. tokens = keyword.to_s.strip.split(/\s+/) return [] if tokens.empty? patterns = tokens.map { |t| Regexp.new(Regexp.escape(t), Regexp::IGNORECASE) } results = [] entries = read_index('rails_source') entries.each do |entry| break if results.size >= limit id = entry['identifier'] unit = find_unit(id) next unless unit = unit['metadata']&.to_json matched = patterns.all? do |pat| pat.match?(id) || (unit['source_code'] && pat.match?(unit['source_code'])) || ( && pat.match?()) end next unless matched results << { identifier: id, type: 'rails_source', file_path: unit['file_path'], metadata: unit['metadata'] } end results end |
#graph_analysis ⇒ Hash
Returns Parsed graph_analysis.json.
133 134 135 |
# File 'lib/woods/mcp/index_reader.rb', line 133 def graph_analysis @graph_analysis ||= parse_json('graph_analysis.json') end |
#list_units(type: nil) ⇒ Array<Hash>
List units, optionally filtered by type.
152 153 154 155 156 157 158 159 160 161 |
# File 'lib/woods/mcp/index_reader.rb', line 152 def list_units(type: nil) dirs = if type dir = TYPE_TO_DIR[type] dir ? [dir] : [] else TYPE_DIRS end dirs.flat_map { |dir| read_index(dir) } end |
#manifest ⇒ Hash
Returns Parsed manifest.json.
101 102 103 |
# File 'lib/woods/mcp/index_reader.rb', line 101 def manifest @manifest ||= parse_json('manifest.json') end |
#raw_graph_data ⇒ Hash
Returns Raw dependency graph data from JSON.
413 414 415 |
# File 'lib/woods/mcp/index_reader.rb', line 413 def raw_graph_data @raw_graph_data ||= parse_json('dependency_graph.json') end |
#recent_changes(limit: 10, types: nil) ⇒ Array<Hash>
Return units sorted by most recent git modification.
Reads all units that have metadata.git.last_modified and returns them sorted descending by that timestamp.
377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 |
# File 'lib/woods/mcp/index_reader.rb', line 377 def recent_changes(limit: 10, types: nil) dirs = if types types.filter_map { |t| TYPE_TO_DIR[t] } else TYPE_DIRS end units_with_dates = [] dirs.each do |dir| entries = read_index(dir) entries.each do |entry| id = entry['identifier'] unit = find_unit(id) next unless unit last_modified = unit.dig('metadata', 'git', 'last_modified') next unless last_modified units_with_dates << { identifier: id, type: DIR_TO_TYPE[dir], file_path: unit['file_path'], last_modified: last_modified, author: unit.dig('metadata', 'git', 'last_author') } end end units_with_dates .sort_by { |u| u[:last_modified] } .reverse .first(limit) end |
#reload! ⇒ void
This method returns an undefined value.
Clear all cached state so the next access re-reads from disk.
87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/woods/mcp/index_reader.rb', line 87 def reload! @unit_cache = {} @unit_cache_order = [] @identifier_map = nil @index_cache = {} @manifest = nil @summary = nil @dependency_graph = nil @graph_analysis = nil @raw_graph_data = nil @normalized_graph_edges = nil end |
#search(query = nil, types: nil, fields: %w[identifier],, limit: 20, exact_prefix: nil, exact_suffix: nil) ⇒ Hash
Search units by case-insensitive pattern.
Phase 1: match identifiers from index files (cheap). Phase 2: lazy-load unit files for metadata/source_code matching.
The query is compiled as a raw Ruby regex with IGNORECASE. If the pattern is invalid, it falls back to an escaped literal match.
A “broad” pattern is one that matches more than 50% of the entries in a type directory. Broad patterns still run but the result includes a :note.
Phase-2 scan is capped at WOODS_SEARCH_MAX_SCAN unit files (default 500). When the cap is reached the result includes :partial => true.
The optional exact_prefix / exact_suffix filters restrict results to identifiers whose start/end matches the given string literally (case- insensitive). They are ANDed with the query regex and are safer than hand-escaping regex anchors — metacharacters like :: are treated as literal text.
196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 |
# File 'lib/woods/mcp/index_reader.rb', line 196 def search(query = nil, types: nil, fields: %w[identifier], limit: 20, exact_prefix: nil, exact_suffix: nil) prefix = exact_prefix.blank? ? nil : exact_prefix.downcase suffix = exact_suffix.blank? ? nil : exact_suffix.downcase if query.blank? && !prefix && !suffix raise ArgumentError, 'search requires a query or exact_prefix/exact_suffix filter' end # When only prefix/suffix are provided, the regex acts as a match-all # wildcard so the existing phase-1/phase-2 pipeline still works. pattern = compile_search_pattern(query.to_s.empty? ? '.*' : query) max_scan_env = ENV.fetch('WOODS_SEARCH_MAX_SCAN', '').to_s.strip max_scan = max_scan_env.empty? ? DEFAULT_SEARCH_MAX_SCAN : max_scan_env.to_i max_scan = DEFAULT_SEARCH_MAX_SCAN if max_scan <= 0 results = [] notes = [] phase2_scanned = 0 partial = false dirs = if types types.filter_map { |t| TYPE_TO_DIR[t] } else TYPE_DIRS end # Phase 2 candidates are collected per-dir and then scanned in # round-robin across dirs. Exhausting the per-run scan cap linearly # down TYPE_DIRS order would starve later types (`concerns` at pos # 13, `test_mappings` at pos 31) on any codebase where the earlier # dirs together exceed max_scan entries. Interleaving guarantees # every type contributes to the scanned set. phase2_queues = {} dirs.each do |dir| type_name = DIR_TO_TYPE[dir] entries = read_index(dir) # Broad-match detection: warn when pattern matches >50% of dir entries if entries.size > 1 matching_count = entries.count do |e| identifier_passes_filters?(e['identifier'], pattern, prefix, suffix) end if matching_count > entries.size / 2.0 notes << "broad pattern matched #{matching_count}/#{entries.size} entries in #{dir}" end end entries.each do |entry| id = entry['identifier'] next unless identifier_passes_prefix_suffix?(id, prefix, suffix) # Phase 1: identifier matching (still in-order per dir) if fields.include?('identifier') && pattern.match?(id) next if results.size >= limit results << { identifier: id, type: type_name, match_field: 'identifier' } next end # Phase 2 is only reached when the caller opted into deeper fields. next unless fields.include?('metadata') || fields.include?('source_code') (phase2_queues[dir] ||= []) << [type_name, id] end end if results.size < limit && phase2_queues.any? queues = phase2_queues.values.map(&:dup) catch(:phase2_done) do loop do progressed = false queues.each do |queue| next if queue.empty? throw :phase2_done if results.size >= limit if phase2_scanned >= max_scan partial = true throw :phase2_done end type_name, id = queue.shift progressed = true unit = find_unit(id) next unless unit phase2_scanned += 1 if fields.include?('source_code') && unit['source_code'] && pattern.match?(unit['source_code']) results << { identifier: id, type: type_name, match_field: 'source_code' } elsif fields.include?('metadata') && unit['metadata'] && pattern.match?(unit['metadata'].to_json) results << { identifier: id, type: type_name, match_field: 'metadata' } end end break unless progressed end end end response = { results: results.first(limit) } response[:note] = notes.join('; ') unless notes.empty? response[:partial] = true if partial response end |
#summary ⇒ String?
Returns SUMMARY.md content, or nil if not present.
117 118 119 120 121 122 |
# File 'lib/woods/mcp/index_reader.rb', line 117 def summary @summary ||= begin path = @index_dir.join('SUMMARY.md') path.file? ? path.read : nil end end |
#template_engines ⇒ Array<Symbol>
Template engines the extraction pipeline currently understands. Delegates to ViewTemplateExtractor.supported_template_engines so the list stays honest as engines are added or removed. Surfaced by the MCP ‘structure` tool (#86).
111 112 113 114 |
# File 'lib/woods/mcp/index_reader.rb', line 111 def template_engines require_relative '../extractors/view_template_extractor' Woods::Extractors::ViewTemplateExtractor.supported_template_engines.dup end |
#traverse_dependencies(identifier, depth: 2, types: nil, via: nil) ⇒ Hash
BFS traversal of forward dependencies.
309 310 311 |
# File 'lib/woods/mcp/index_reader.rb', line 309 def traverse_dependencies(identifier, depth: 2, types: nil, via: nil) traverse(identifier, depth: depth, types: types, via: via, direction: :forward) end |
#traverse_dependents(identifier, depth: 2, types: nil, via: nil) ⇒ Hash
BFS traversal of reverse dependencies (dependents).
320 321 322 |
# File 'lib/woods/mcp/index_reader.rb', line 320 def traverse_dependents(identifier, depth: 2, types: nil, via: nil) traverse(identifier, depth: depth, types: types, via: via, direction: :reverse) end |
#warmup! ⇒ Hash
Pre-populate cached state so the first MCP tool call doesn’t pay for disk reads + JSON parsing.
Touches every lazy accessor: manifest, summary, dependency_graph, graph_analysis, and the identifier_map (which reads all _index.json files). Each step is individually rescued so a missing optional artefact (e.g. graph_analysis.json) never blocks the rest.
Safe to call multiple times — lazy accessors short-circuit on the memoized value.
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 |
# File 'lib/woods/mcp/index_reader.rb', line 68 def warmup! steps = { manifest: -> { manifest }, summary: -> { summary }, dependency_graph: -> { dependency_graph }, graph_analysis: -> { graph_analysis }, identifier_map: -> { identifier_map } } steps.each_with_object({}) do |(step, runner), result| runner.call result[step] = true rescue StandardError => e result[step] = e end end |