Class: Woods::GraphAnalyzer

Inherits:
Object
  • Object
show all
Defined in:
lib/woods/graph_analyzer.rb

Overview

GraphAnalyzer computes structural properties of the dependency graph.

Given a DependencyGraph, it identifies architectural patterns like orphaned units, circular dependencies, hub nodes, and bridge nodes. These metrics help surface dead code, architectural bottlenecks, and high-risk change targets.

Inspired by FlowMapper’s Comparator pattern — takes a graph, produces a structural report without mutating anything.

Examples:

Basic usage

graph = Woods::DependencyGraph.new
# ... register units ...
analyzer = Woods::GraphAnalyzer.new(graph)
report = analyzer.analyze
report[:cycles]  # => [["A", "B", "A"], ...]
report[:hubs]    # => [{ identifier: "User", type: :model, ... }, ...]

Constant Summary collapse

EXCLUDED_ORPHAN_TYPES =

Types that are naturally root nodes and should not be flagged as orphans. Framework and gem sources are consumed but never referenced by application code in the dependency graph’s reverse index.

%i[rails_source gem_source].freeze

Instance Method Summary collapse

Constructor Details

#initialize(dependency_graph) ⇒ GraphAnalyzer

Returns a new instance of GraphAnalyzer.

Parameters:



30
31
32
# File 'lib/woods/graph_analyzer.rb', line 30

def initialize(dependency_graph)
  @graph = dependency_graph
end

Instance Method Details

#analyzeHash

Full analysis report combining all structural metrics.

Returns:

  • (Hash)

    Complete analysis with :orphans, :dead_ends, :hubs, :cycles, :bridges, and :stats



207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
# File 'lib/woods/graph_analyzer.rb', line 207

def analyze
  computed_orphans = orphans
  computed_dead_ends = dead_ends
  computed_hubs = hubs
  computed_cycles = cycles
  computed_bridges = bridges(limit: 10)

  {
    orphans: computed_orphans,
    dead_ends: computed_dead_ends,
    hubs: computed_hubs,
    cycles: computed_cycles,
    bridges: computed_bridges,
    stats: {
      orphan_count: computed_orphans.size,
      dead_end_count: computed_dead_ends.size,
      hub_count: computed_hubs.size,
      cycle_count: computed_cycles.size
    }
  }
end

#bridges(limit: 20, sample_size: 200) ⇒ Array<Hash>

Units that bridge different types in the graph.

Computes a simplified betweenness centrality metric — for each unit, we estimate how many shortest paths between sampled node pairs pass through it. High-scoring nodes are architectural bottlenecks whose failure or change would disrupt many cross-type communication paths.

For performance, samples a subset of node pairs rather than computing all-pairs shortest paths.

Parameters:

  • limit (Integer) (defaults to: 20)

    Maximum number of bridges to return

  • sample_size (Integer) (defaults to: 200)

    Number of node pairs to sample for estimation

Returns:

  • (Array<Hash>)

    Sorted by score descending. Each hash contains :identifier, :type, :score



122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# File 'lib/woods/graph_analyzer.rb', line 122

def bridges(limit: 20, sample_size: 200)
  nodes = graph_nodes
  return [] if nodes.size < 3

  node_ids = nodes.keys
  scores = Hash.new(0)

  # Sample random pairs of nodes for shortest-path computation.
  # Use a deterministic seed so results are reproducible for the same graph.
  rng = Random.new(node_ids.size)
  pairs = generate_sample_pairs(node_ids, sample_size, rng)

  pairs.each do |source, target|
    path = bfs_shortest_path(source, target)
    next unless path && path.size > 2

    # Credit intermediate nodes (exclude source and target)
    path[1..-2].each do |intermediate|
      scores[intermediate] += 1
    end
  end

  scores
    .sort_by { |_id, score| -score }
    .first(limit)
    .map do |identifier, score|
      meta = nodes[identifier] || {}
      {
        identifier: identifier,
        type: meta[:type],
        score: score
      }
    end
end

#cyclesArray<Array<String>>

Detect circular dependency chains in the graph.

Uses iterative DFS with a three-color marking scheme (white/gray/black). When a gray (in-progress) node is revisited, a cycle has been found. The cycle path is extracted from the recursion stack.

Returns:

  • (Array<Array<String>>)

    Each element is a cycle represented as an ordered array of identifiers, ending with the repeated node. For example: [“A”, “B”, “C”, “A”]



104
105
106
# File 'lib/woods/graph_analyzer.rb', line 104

def cycles
  @cycles ||= detect_cycles
end

#dead_endsArray<String>

Units with no dependencies (leaf nodes).

These are self-contained units that don’t reference anything else —typically utility classes, value objects, or standalone services.

Returns:

  • (Array<String>)

    Identifiers of dead-end units



62
63
64
65
66
67
68
69
70
# File 'lib/woods/graph_analyzer.rb', line 62

def dead_ends
  @dead_ends ||= begin
    nodes = graph_nodes
    nodes.each_with_object([]) do |(identifier, _meta), result|
      dependencies = @graph.dependencies_of(identifier)
      result << identifier if dependencies.empty?
    end
  end
end

#domain_clusters(min_size: 3, types: nil) ⇒ Array<Hash>

Group units into semantic domains using namespace prefixes and graph connectivity.

Strategy:

  1. Seed clusters from top-level namespace prefixes (e.g., ShippingProfile::*, Order::*)

  2. Assign unnamespaced units to their most-connected cluster

  3. Merge small clusters (< min_size) into their most-connected neighbor

  4. For each cluster, identify the hub (highest PageRank) and entry points

  5. Compute boundary edges between clusters

Parameters:

  • min_size (Integer) (defaults to: 3)

    Minimum units per cluster before merging (default: 3)

  • types (Array<String>, nil) (defaults to: nil)

    Filter to these unit types (default: all)

Returns:

  • (Array<Hash>)

    Clusters sorted by member count descending. Each hash: { name:, hub:, members:, member_count:, entry_points:, boundary_edges:, types: }



170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
# File 'lib/woods/graph_analyzer.rb', line 170

def domain_clusters(min_size: 3, types: nil)
  nodes = graph_nodes
  return [] if nodes.empty?

  # Filter by types if specified
  filtered_ids = if types
                   type_set = types.map(&:to_s)
                   nodes.select { |_, meta| type_set.include?(meta[:type].to_s) }.keys
                 else
                   nodes.keys
                 end

  return [] if filtered_ids.empty?

  # Step 1: Seed clusters from namespace prefixes
  clusters = seed_namespace_clusters(filtered_ids, nodes)

  # Step 2: Assign unnamespaced/root units to most-connected cluster
  assign_orphaned_units(clusters, filtered_ids, nodes)

  # Step 3: Merge small clusters
  merge_small_clusters(clusters, min_size)

  # Step 4: Enrich each cluster with hub, entry points, boundary edges
  pagerank_scores = @graph.pagerank
  enrich_clusters(clusters, nodes, pagerank_scores)

  # Sort by member count descending
  clusters.values
          .select { |c| c[:members].any? }
          .sort_by { |c| -c[:member_count] }
end

#hubs(limit: 20) ⇒ Array<Hash>

Units with the highest number of dependents (architectural hotspots).

A high dependent count means many other units reference this one. Changes to hub nodes have the widest blast radius.

Parameters:

  • limit (Integer) (defaults to: 20)

    Maximum number of hubs to return

Returns:

  • (Array<Hash>)

    Sorted by dependent_count descending. Each hash contains :identifier, :type, :dependent_count, :dependents



80
81
82
83
84
85
86
87
88
89
90
91
92
93
# File 'lib/woods/graph_analyzer.rb', line 80

def hubs(limit: 20)
  nodes = graph_nodes

  identifiers_with_dependents = nodes.map do |identifier, meta|
    dependents = @graph.dependents_of(identifier)
    {
      identifier: identifier,
      type: meta[:type],
      dependent_count: dependents.size,
      dependents: dependents
    }
  end
  identifiers_with_dependents.max_by(limit) { |h| h[:dependent_count] }
end

#orphansArray<String>

Units with no dependents (nothing references them).

These are potential dead code or entry points. Framework and gem sources are excluded since they are naturally unreferenced in the reverse index.

Returns:

  • (Array<String>)

    Identifiers of orphaned units



44
45
46
47
48
49
50
51
52
53
54
# File 'lib/woods/graph_analyzer.rb', line 44

def orphans
  @orphans ||= begin
    nodes = graph_nodes
    nodes.each_with_object([]) do |(identifier, meta), result|
      next if EXCLUDED_ORPHAN_TYPES.include?(meta[:type])

      dependents = @graph.dependents_of(identifier)
      result << identifier if dependents.empty?
    end
  end
end