Class: Woods::DependencyGraph

Inherits:
Object
  • Object
show all
Defined in:
lib/woods/dependency_graph.rb

Overview

DependencyGraph tracks relationships between code units for:

  1. Understanding what depends on what

  2. Computing “blast radius” for incremental re-indexing

  3. Enabling graph-based retrieval queries

The graph is bidirectional - we track both what a unit depends on and what depends on that unit (reverse edges).

Examples:

Building and querying the graph

graph = DependencyGraph.new
graph.register(user_model_unit)
graph.register(user_service_unit)

# Find everything affected by a change to user.rb
affected = graph.affected_by(["app/models/user.rb"])

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeDependencyGraph

Returns a new instance of DependencyGraph.



24
25
26
27
28
29
30
31
32
# File 'lib/woods/dependency_graph.rb', line 24

def initialize
  @nodes = {}      # identifier => { type:, file_path: }
  @edges = {}      # identifier => [{ target:, via: }]
  @reverse = {}    # identifier => Set of dependent identifiers
  @reverse_via = {} # [target, via] => Set of dependent identifiers
  @file_map = {}   # file_path => identifier
  @type_index = {} # type => Set of identifiers
  @to_h = nil
end

Class Method Details

.from_h(data) ⇒ DependencyGraph

Load graph from persisted data

After JSON round-trip all keys become strings. This method normalizes them back to the expected types: node values use symbol keys (:type, :file_path, :namespace), and type_index uses symbol keys for types.

Parameters:

  • data (Hash)

    Previously serialized graph data

Returns:



214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
# File 'lib/woods/dependency_graph.rb', line 214

def self.from_h(data)
  graph = new

  raw_nodes = data[:nodes] || data['nodes'] || {}
  graph.instance_variable_set(:@nodes, raw_nodes.transform_values { |v| symbolize_node(v) })

  raw_edges = data[:edges] || data['edges'] || {}
  graph.instance_variable_set(:@edges, raw_edges.transform_values { |edges| normalize_edges(edges) })

  raw_reverse = data[:reverse] || data['reverse'] || {}
  graph.instance_variable_set(:@reverse, raw_reverse.transform_values { |v| v.is_a?(Set) ? v : Set.new(v) })

  graph.instance_variable_set(:@file_map, data[:file_map] || data['file_map'] || {})

  raw_type_index = data[:type_index] || data['type_index'] || {}
  graph.instance_variable_set(:@type_index, raw_type_index.transform_keys(&:to_sym).transform_values do |v|
    v.is_a?(Set) ? v : Set.new(v)
  end)

  # Rebuild reverse_via index from edges
  reverse_via = {}
  graph.instance_variable_get(:@edges).each do |source_id, edges|
    edges.each do |edge|
      (reverse_via[[edge[:target], edge[:via]]] ||= Set.new).add(source_id)
    end
  end
  graph.instance_variable_set(:@reverse_via, reverse_via)

  graph
end

.normalize_edges(edges) ⇒ Array<Hash>

Normalize edge data from either old format (bare strings) or new format (hashes).

ROUND-TRIP INVARIANT (do not break when refactoring):

DependencyGraph#to_h -> JSON.generate -> JSON.parse -> DependencyGraph.from_h

must always yield the same in-memory shape. The two normalizers that sit at either end of this round trip are INTENTIONALLY SEPARATE — do not merge them:

  • This method (normalize_edges) runs on Ruby objects. It produces ‘{ target:, via: }` with SYMBOL keys because consumers (#dependencies_of, GraphAnalyzer) key on symbols.

  • MCP::IndexReader.normalize_all_edges runs on parsed JSON, producing ‘{ ’target’ => …, ‘via’ => … }‘ with STRING keys, because the MCP tools serialize straight through to the client and symbol keys would become `:target` on the wire.

This method also accepts OLD-format bare-string edges so graphs serialized before the ‘via` migration still load without explicit data conversion.

Parameters:

  • edges (Array)

    Edge entries — either strings or hashes

Returns:

  • (Array<Hash>)

    Normalized edges with :target and :via keys



281
282
283
284
285
286
287
288
289
290
291
292
293
# File 'lib/woods/dependency_graph.rb', line 281

def self.normalize_edges(edges)
  return [] unless edges.is_a?(Array)

  edges.map do |edge|
    if edge.is_a?(String)
      { target: edge, via: nil }
    elsif edge.is_a?(Hash)
      { target: edge[:target] || edge['target'], via: (edge[:via] || edge['via'])&.to_sym }
    else
      { target: edge.to_s, via: nil }
    end
  end
end

.symbolize_node(node) ⇒ Hash

Normalize a node hash to use symbol keys

Parameters:

  • node (Hash)

    Node data with string or symbol keys

Returns:

  • (Hash)

    Node data with symbol keys



249
250
251
252
253
254
255
256
257
# File 'lib/woods/dependency_graph.rb', line 249

def self.symbolize_node(node)
  return node unless node.is_a?(Hash)

  {
    type: (node[:type] || node['type'])&.to_sym,
    file_path: node[:file_path] || node['file_path'],
    namespace: node[:namespace] || node['namespace']
  }
end

Instance Method Details

#affected_by(changed_files, max_depth: nil) ⇒ Array<String>

Find all units affected by changes to given files Uses BFS to find transitive dependents

Parameters:

  • changed_files (Array<String>)

    List of changed file paths

  • max_depth (Integer) (defaults to: nil)

    Maximum traversal depth (nil for unlimited)

Returns:

  • (Array<String>)

    List of affected unit identifiers



65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# File 'lib/woods/dependency_graph.rb', line 65

def affected_by(changed_files, max_depth: nil)
  directly_changed = changed_files.filter_map { |f| @file_map[f] }

  affected = Set.new(directly_changed)
  queue = directly_changed.map { |id| [id, 0] } # [identifier, depth]

  while queue.any?
    current, depth = queue.shift
    next if max_depth && depth >= max_depth

    dependents = @reverse[current] || []

    dependents.each do |dep|
      unless affected.include?(dep)
        affected.add(dep)
        queue.push([dep, depth + 1])
      end
    end
  end

  affected.to_a
end

#dependencies_of(identifier, via: nil) ⇒ Array<String>

Get direct dependencies of a unit

Parameters:

  • identifier (String)

    Unit identifier

  • via (Symbol, Array<Symbol>, nil) (defaults to: nil)

    Filter by relationship type(s)

Returns:

  • (Array<String>)

    List of dependency identifiers



114
115
116
117
118
119
120
121
# File 'lib/woods/dependency_graph.rb', line 114

def dependencies_of(identifier, via: nil)
  edges = @edges[identifier] || []
  if via
    via_set = Array(via)
    edges = edges.select { |e| via_set.include?(e[:via]) }
  end
  edges.map { |e| e[:target] }
end

#dependents_of(identifier, via: nil) ⇒ Array<String>

Get direct dependents of a unit (what depends on it)

Parameters:

  • identifier (String)

    Unit identifier

  • via (Symbol, Array<Symbol>, nil) (defaults to: nil)

    Filter by relationship type(s)

Returns:

  • (Array<String>)

    List of dependent identifiers



128
129
130
131
132
133
134
# File 'lib/woods/dependency_graph.rb', line 128

def dependents_of(identifier, via: nil)
  return @reverse.fetch(identifier, Set.new).to_a unless via

  Array(via).each_with_object(Set.new) do |v, result|
    @reverse_via.fetch([identifier, v], Set.new).each { |dep| result.add(dep) }
  end.to_a
end

#find_node_by_suffix(suffix) ⇒ String?

Find a node by suffix matching (e.g., “Update” matches “Order::Update”).

When multiple nodes share the same suffix, the first match wins. Suffix matching requires a “::” separator — bare identifiers (no namespace) are not matched by this method; use #node_exists? for exact lookups.

Parameters:

  • suffix (String)

    The suffix to match against

Returns:

  • (String, nil)

    The first matching identifier, or nil



104
105
106
107
# File 'lib/woods/dependency_graph.rb', line 104

def find_node_by_suffix(suffix)
  target_suffix = "::#{suffix}"
  @nodes.keys.find { |id| id.end_with?(target_suffix) }
end

#node_exists?(identifier) ⇒ Boolean

Check if a node exists in the graph by exact identifier.

Parameters:

  • identifier (String)

    Unit identifier to check

Returns:

  • (Boolean)

    true if the node exists



92
93
94
# File 'lib/woods/dependency_graph.rb', line 92

def node_exists?(identifier)
  @nodes.key?(identifier)
end

#pagerank(damping: 0.85, iterations: 20) ⇒ Hash<String, Float>

Compute PageRank scores for all nodes

Uses the reverse edges (dependents) as the link structure: a node with many dependents gets a higher score. This matches Aider’s insight that structural importance correlates with retrieval relevance.

Parameters:

  • damping (Float) (defaults to: 0.85)

    Damping factor (default: 0.85)

  • iterations (Integer) (defaults to: 20)

    Number of iterations (default: 20)

Returns:

  • (Hash<String, Float>)

    Identifier => PageRank score



153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
# File 'lib/woods/dependency_graph.rb', line 153

def pagerank(damping: 0.85, iterations: 20)
  n = @nodes.size
  return {} if n.zero?

  node_ids = @nodes.keys
  base_score = 1.0 / n
  scores = node_ids.to_h { |id| [id, base_score] }

  iterations.times do
    # Collect rank from dangling nodes (no outgoing edges) and redistribute
    dangling_sum = node_ids.sum do |id|
      @edges[id].nil? || @edges[id].empty? ? scores[id] : 0.0
    end

    new_scores = {}

    node_ids.each do |id|
      # Sum contributions from nodes that depend on this one
      incoming = @reverse[id] || []
      rank_sum = incoming.sum do |src|
        out_degree = (@edges[src] || []).size
        out_degree.positive? ? scores[src] / out_degree : 0.0
      end

      new_scores[id] = ((1.0 - damping) / n) + (damping * (rank_sum + (dangling_sum / n)))
    end

    scores = new_scores
  end

  scores
end

#register(unit) ⇒ Object

Register a unit in the graph

Parameters:



37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/woods/dependency_graph.rb', line 37

def register(unit)
  @to_h = nil

  @nodes[unit.identifier] = {
    type: unit.type,
    file_path: unit.file_path,
    namespace: unit.namespace
  }

  @edges[unit.identifier] = unit.dependencies.map { |d| { target: d[:target], via: d[:via] } }
  @file_map[unit.file_path] = unit.identifier if unit.file_path

  # Type index for filtering (Set-based for O(1) insert)
  (@type_index[unit.type] ||= Set.new).add(unit.identifier)

  # Build reverse edges (Set-based for O(1) insert)
  unit.dependencies.each do |dep|
    (@reverse[dep[:target]] ||= Set.new).add(unit.identifier)
    (@reverse_via[[dep[:target], dep[:via]]] ||= Set.new).add(unit.identifier)
  end
end

#to_hHash

Serialize graph for persistence. Memoized — cache is invalidated on register. Returns a dup so callers can’t pollute the cached hash.

Returns:

  • (Hash)

    Complete graph data



190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
# File 'lib/woods/dependency_graph.rb', line 190

def to_h
  @to_h ||= {
    nodes: @nodes,
    edges: @edges,
    reverse: @reverse.transform_values(&:to_a),
    file_map: @file_map,
    type_index: @type_index.transform_values(&:to_a),
    stats: {
      node_count: @nodes.size,
      edge_count: @edges.values.sum(&:size),
      types: @type_index.transform_values(&:size)
    }
  }
  @to_h.dup
end

#units_of_type(type) ⇒ Array<String>

Get all units of a specific type

Parameters:

  • type (Symbol)

    Unit type (:model, :controller, etc.)

Returns:

  • (Array<String>)

    List of unit identifiers



140
141
142
# File 'lib/woods/dependency_graph.rb', line 140

def units_of_type(type)
  @type_index.fetch(type, Set.new).to_a
end