Class: Woods::DependencyGraph

Inherits:

Object

Object
Woods::DependencyGraph

show all

Defined in:: lib/woods/dependency_graph.rb

Overview

DependencyGraph tracks relationships between code units for:

Understanding what depends on what
Computing “blast radius” for incremental re-indexing
Enabling graph-based retrieval queries

The graph is bidirectional - we track both what a unit depends on and what depends on that unit (reverse edges).

Examples:

Building and querying the graph

graph = DependencyGraph.new
graph.register(user_model_unit)
graph.register(user_service_unit)

# Find everything affected by a change to user.rb
affected = graph.affected_by(["app/models/user.rb"])

Class Method Summary collapse

.from_h(data) ⇒ DependencyGraph

Load graph from persisted data.
.normalize_edges(edges) ⇒ Array<Hash>

Normalize edge data from either old format (bare strings) or new format (hashes).
.symbolize_node(node) ⇒ Hash

Normalize a node hash to use symbol keys.

Instance Method Summary collapse

#affected_by(changed_files, max_depth: nil) ⇒ Array<String>

Find all units affected by changes to given files Uses BFS to find transitive dependents.
#dependencies_of(identifier, via: nil) ⇒ Array<String>

Get direct dependencies of a unit.
#dependents_of(identifier, via: nil) ⇒ Array<String>

Get direct dependents of a unit (what depends on it).
#find_node_by_suffix(suffix) ⇒ String^?

Find a node by suffix matching (e.g., “Update” matches “Order::Update”).
#initialize ⇒ DependencyGraph constructor

A new instance of DependencyGraph.
#node_exists?(identifier) ⇒ Boolean

Check if a node exists in the graph by exact identifier.
#pagerank(damping: 0.85, iterations: 20) ⇒ Hash<String, Float>

Compute PageRank scores for all nodes.
#register(unit) ⇒ Object

Register a unit in the graph.
#to_h ⇒ Hash

Serialize graph for persistence.
#units_of_type(type) ⇒ Array<String>

Get all units of a specific type.

Constructor Details

#initialize ⇒ `DependencyGraph`

Returns a new instance of DependencyGraph.

# File 'lib/woods/dependency_graph.rb', line 24

def initialize
  @nodes = {}      # identifier => { type:, file_path: }
  @edges = {}      # identifier => [{ target:, via: }]
  @reverse = {}    # identifier => Set of dependent identifiers
  @reverse_via = {} # [target, via] => Set of dependent identifiers
  @file_map = {}   # file_path => identifier
  @type_index = {} # type => Set of identifiers
  @to_h = nil
end

Class Method Details

.from_h(data) ⇒ `DependencyGraph`

Load graph from persisted data

After JSON round-trip all keys become strings. This method normalizes them back to the expected types: node values use symbol keys (:type, :file_path, :namespace), and type_index uses symbol keys for types.

Parameters:

data (Hash) —

Previously serialized graph data

Returns:

(DependencyGraph) —

Restored graph

# File 'lib/woods/dependency_graph.rb', line 214

def self.from_h(data)
  graph = new

  raw_nodes = data[:nodes] || data['nodes'] || {}
  graph.instance_variable_set(:@nodes, raw_nodes.transform_values { |v| symbolize_node(v) })

  raw_edges = data[:edges] || data['edges'] || {}
  graph.instance_variable_set(:@edges, raw_edges.transform_values { |edges| normalize_edges(edges) })

  raw_reverse = data[:reverse] || data['reverse'] || {}
  graph.instance_variable_set(:@reverse, raw_reverse.transform_values { |v| v.is_a?(Set) ? v : Set.new(v) })

  graph.instance_variable_set(:@file_map, data[:file_map] || data['file_map'] || {})

  raw_type_index = data[:type_index] || data['type_index'] || {}
  graph.instance_variable_set(:@type_index, raw_type_index.transform_keys(&:to_sym).transform_values do |v|
    v.is_a?(Set) ? v : Set.new(v)
  end)

  # Rebuild reverse_via index from edges
  reverse_via = {}
  graph.instance_variable_get(:@edges).each do |source_id, edges|
    edges.each do |edge|
      (reverse_via[[edge[:target], edge[:via]]] ||= Set.new).add(source_id)
    end
  end
  graph.instance_variable_set(:@reverse_via, reverse_via)

  graph
end

.normalize_edges(edges) ⇒ `Array<Hash>`

Normalize edge data from either old format (bare strings) or new format (hashes).

ROUND-TRIP INVARIANT (do not break when refactoring):

DependencyGraph#to_h -> JSON.generate -> JSON.parse -> DependencyGraph.from_h

must always yield the same in-memory shape. The two normalizers that sit at either end of this round trip are INTENTIONALLY SEPARATE — do not merge them:

This method (normalize_edges) runs on Ruby objects. It produces ‘{ target:, via: }` with SYMBOL keys because consumers (#dependencies_of, GraphAnalyzer) key on symbols.
MCP::IndexReader.normalize_all_edges runs on parsed JSON, producing ‘{ ’target’ => …, ‘via’ => … }‘ with STRING keys, because the MCP tools serialize straight through to the client and symbol keys would become `:target` on the wire.

This method also accepts OLD-format bare-string edges so graphs serialized before the ‘via` migration still load without explicit data conversion.

Parameters:

edges (Array) —

Edge entries — either strings or hashes

Returns:

(Array<Hash>) —

Normalized edges with :target and :via keys

# File 'lib/woods/dependency_graph.rb', line 281

def self.normalize_edges(edges)
  return [] unless edges.is_a?(Array)

  edges.map do |edge|
    if edge.is_a?(String)
      { target: edge, via: nil }
    elsif edge.is_a?(Hash)
      { target: edge[:target] || edge['target'], via: (edge[:via] || edge['via'])&.to_sym }
    else
      { target: edge.to_s, via: nil }
    end
  end
end

.symbolize_node(node) ⇒ `Hash`

Normalize a node hash to use symbol keys

Parameters:

node (Hash) —

Node data with string or symbol keys

Returns:

(Hash) —

Node data with symbol keys

# File 'lib/woods/dependency_graph.rb', line 249

def self.symbolize_node(node)
  return node unless node.is_a?(Hash)

  {
    type: (node[:type] || node['type'])&.to_sym,
    file_path: node[:file_path] || node['file_path'],
    namespace: node[:namespace] || node['namespace']
  }
end

Instance Method Details

#affected_by(changed_files, max_depth: nil) ⇒ `Array<String>`

Find all units affected by changes to given files Uses BFS to find transitive dependents

Parameters:

changed_files (Array<String>) —

List of changed file paths
max_depth (Integer) (defaults to: nil) —

Maximum traversal depth (nil for unlimited)

Returns:

(Array<String>) —

List of affected unit identifiers

# File 'lib/woods/dependency_graph.rb', line 65

def affected_by(changed_files, max_depth: nil)
  directly_changed = changed_files.filter_map { |f| @file_map[f] }

  affected = Set.new(directly_changed)
  queue = directly_changed.map { |id| [id, 0] } # [identifier, depth]

  while queue.any?
    current, depth = queue.shift
    next if max_depth && depth >= max_depth

    dependents = @reverse[current] || []

    dependents.each do |dep|
      unless affected.include?(dep)
        affected.add(dep)
        queue.push([dep, depth + 1])
      end
    end
  end

  affected.to_a
end

#dependencies_of(identifier, via: nil) ⇒ `Array<String>`

Get direct dependencies of a unit

Parameters:

identifier (String) —

Unit identifier
via (Symbol, Array<Symbol>, nil) (defaults to: nil) —

Filter by relationship type(s)

Returns:

(Array<String>) —

List of dependency identifiers

# File 'lib/woods/dependency_graph.rb', line 114

def dependencies_of(identifier, via: nil)
  edges = @edges[identifier] || []
  if via
    via_set = Array(via)
    edges = edges.select { |e| via_set.include?(e[:via]) }
  end
  edges.map { |e| e[:target] }
end

#dependents_of(identifier, via: nil) ⇒ `Array<String>`

Get direct dependents of a unit (what depends on it)

Parameters:

identifier (String) —

Unit identifier
via (Symbol, Array<Symbol>, nil) (defaults to: nil) —

Filter by relationship type(s)

Returns:

(Array<String>) —

List of dependent identifiers

# File 'lib/woods/dependency_graph.rb', line 128

def dependents_of(identifier, via: nil)
  return @reverse.fetch(identifier, Set.new).to_a unless via

  Array(via).each_with_object(Set.new) do |v, result|
    @reverse_via.fetch([identifier, v], Set.new).each { |dep| result.add(dep) }
  end.to_a
end

#find_node_by_suffix(suffix) ⇒ `String`^?

Find a node by suffix matching (e.g., “Update” matches “Order::Update”).

When multiple nodes share the same suffix, the first match wins. Suffix matching requires a “::” separator — bare identifiers (no namespace) are not matched by this method; use #node_exists? for exact lookups.

Parameters:

suffix (String) —

The suffix to match against

Returns:

(String, nil) —

The first matching identifier, or nil

# File 'lib/woods/dependency_graph.rb', line 104

def find_node_by_suffix(suffix)
  target_suffix = "::#{suffix}"
  @nodes.keys.find { |id| id.end_with?(target_suffix) }
end

#node_exists?(identifier) ⇒ `Boolean`

Check if a node exists in the graph by exact identifier.

Parameters:

identifier (String) —

Unit identifier to check

Returns:

(Boolean) —

true if the node exists



92
93
94

# File 'lib/woods/dependency_graph.rb', line 92

def node_exists?(identifier)
  @nodes.key?(identifier)
end

#pagerank(damping: 0.85, iterations: 20) ⇒ `Hash<String, Float>`

Compute PageRank scores for all nodes

Uses the reverse edges (dependents) as the link structure: a node with many dependents gets a higher score. This matches Aider’s insight that structural importance correlates with retrieval relevance.

Parameters:

damping (Float) (defaults to: 0.85) —

Damping factor (default: 0.85)
iterations (Integer) (defaults to: 20) —

Number of iterations (default: 20)

Returns:

(Hash<String, Float>) —

Identifier => PageRank score

# File 'lib/woods/dependency_graph.rb', line 153

def pagerank(damping: 0.85, iterations: 20)
  n = @nodes.size
  return {} if n.zero?

  node_ids = @nodes.keys
  base_score = 1.0 / n
  scores = node_ids.to_h { |id| [id, base_score] }

  iterations.times do
    # Collect rank from dangling nodes (no outgoing edges) and redistribute
    dangling_sum = node_ids.sum do |id|
      @edges[id].nil? || @edges[id].empty? ? scores[id] : 0.0
    end

    new_scores = {}

    node_ids.each do |id|
      # Sum contributions from nodes that depend on this one
      incoming = @reverse[id] || []
      rank_sum = incoming.sum do |src|
        out_degree = (@edges[src] || []).size
        out_degree.positive? ? scores[src] / out_degree : 0.0
      end

      new_scores[id] = ((1.0 - damping) / n) + (damping * (rank_sum + (dangling_sum / n)))
    end

    scores = new_scores
  end

  scores
end

#register(unit) ⇒ `Object`

Parameters:

unit (ExtractedUnit) —

The unit to register

# File 'lib/woods/dependency_graph.rb', line 37

def register(unit)
  @to_h = nil

  @nodes[unit.identifier] = {
    type: unit.type,
    file_path: unit.file_path,
    namespace: unit.namespace
  }

  @edges[unit.identifier] = unit.dependencies.map { |d| { target: d[:target], via: d[:via] } }
  @file_map[unit.file_path] = unit.identifier if unit.file_path

  # Type index for filtering (Set-based for O(1) insert)
  (@type_index[unit.type] ||= Set.new).add(unit.identifier)

  # Build reverse edges (Set-based for O(1) insert)
  unit.dependencies.each do |dep|
    (@reverse[dep[:target]] ||= Set.new).add(unit.identifier)
    (@reverse_via[[dep[:target], dep[:via]]] ||= Set.new).add(unit.identifier)
  end
end

#to_h ⇒ `Hash`

Serialize graph for persistence. Memoized — cache is invalidated on register. Returns a dup so callers can’t pollute the cached hash.

Returns:

(Hash) —

Complete graph data

# File 'lib/woods/dependency_graph.rb', line 190

def to_h
  @to_h ||= {
    nodes: @nodes,
    edges: @edges,
    reverse: @reverse.transform_values(&:to_a),
    file_map: @file_map,
    type_index: @type_index.transform_values(&:to_a),
    stats: {
      node_count: @nodes.size,
      edge_count: @edges.values.sum(&:size),
      types: @type_index.transform_values(&:size)
    }
  }
  @to_h.dup
end

#units_of_type(type) ⇒ `Array<String>`

Get all units of a specific type

Parameters:

type (Symbol) —

Unit type (:model, :controller, etc.)

Returns:

(Array<String>) —

List of unit identifiers



140
141
142

# File 'lib/woods/dependency_graph.rb', line 140

def units_of_type(type)
  @type_index.fetch(type, Set.new).to_a
end

Class: Woods::DependencyGraph

Overview

Examples:

Building and querying the graph

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize ⇒ DependencyGraph

Class Method Details

.from_h(data) ⇒ DependencyGraph

.normalize_edges(edges) ⇒ Array<Hash>

.symbolize_node(node) ⇒ Hash

Instance Method Details

#affected_by(changed_files, max_depth: nil) ⇒ Array<String>

#dependencies_of(identifier, via: nil) ⇒ Array<String>

#dependents_of(identifier, via: nil) ⇒ Array<String>

#find_node_by_suffix(suffix) ⇒ String?

#node_exists?(identifier) ⇒ Boolean

#pagerank(damping: 0.85, iterations: 20) ⇒ Hash<String, Float>

#register(unit) ⇒ Object

#to_h ⇒ Hash

#units_of_type(type) ⇒ Array<String>

#initialize ⇒ `DependencyGraph`

.from_h(data) ⇒ `DependencyGraph`

.normalize_edges(edges) ⇒ `Array<Hash>`

.symbolize_node(node) ⇒ `Hash`

#affected_by(changed_files, max_depth: nil) ⇒ `Array<String>`

#dependencies_of(identifier, via: nil) ⇒ `Array<String>`

#dependents_of(identifier, via: nil) ⇒ `Array<String>`

#find_node_by_suffix(suffix) ⇒ `String`^?

#node_exists?(identifier) ⇒ `Boolean`

#pagerank(damping: 0.85, iterations: 20) ⇒ `Hash<String, Float>`

#register(unit) ⇒ `Object`

#to_h ⇒ `Hash`

#units_of_type(type) ⇒ `Array<String>`