Class: Woods::GraphAnalyzer
- Inherits:
-
Object
- Object
- Woods::GraphAnalyzer
- Defined in:
- lib/woods/graph_analyzer.rb
Overview
GraphAnalyzer computes structural properties of the dependency graph.
Given a DependencyGraph, it identifies architectural patterns like orphaned units, circular dependencies, hub nodes, and bridge nodes. These metrics help surface dead code, architectural bottlenecks, and high-risk change targets.
Inspired by FlowMapper’s Comparator pattern — takes a graph, produces a structural report without mutating anything.
Constant Summary collapse
- EXCLUDED_ORPHAN_TYPES =
Types that are naturally root nodes and should not be flagged as orphans. Framework and gem sources are consumed but never referenced by application code in the dependency graph’s reverse index.
%i[rails_source gem_source].freeze
Instance Method Summary collapse
-
#analyze ⇒ Hash
Full analysis report combining all structural metrics.
-
#bridges(limit: 20, sample_size: 200) ⇒ Array<Hash>
Units that bridge different types in the graph.
-
#cycles ⇒ Array<Array<String>>
Detect circular dependency chains in the graph.
-
#dead_ends ⇒ Array<String>
Units with no dependencies (leaf nodes).
-
#domain_clusters(min_size: 3, types: nil) ⇒ Array<Hash>
Group units into semantic domains using namespace prefixes and graph connectivity.
-
#hubs(limit: 20) ⇒ Array<Hash>
Units with the highest number of dependents (architectural hotspots).
-
#initialize(dependency_graph) ⇒ GraphAnalyzer
constructor
A new instance of GraphAnalyzer.
-
#orphans ⇒ Array<String>
Units with no dependents (nothing references them).
Constructor Details
#initialize(dependency_graph) ⇒ GraphAnalyzer
Returns a new instance of GraphAnalyzer.
30 31 32 |
# File 'lib/woods/graph_analyzer.rb', line 30 def initialize(dependency_graph) @graph = dependency_graph end |
Instance Method Details
#analyze ⇒ Hash
Full analysis report combining all structural metrics.
207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 |
# File 'lib/woods/graph_analyzer.rb', line 207 def analyze computed_orphans = orphans computed_dead_ends = dead_ends computed_hubs = hubs computed_cycles = cycles computed_bridges = bridges(limit: 10) { orphans: computed_orphans, dead_ends: computed_dead_ends, hubs: computed_hubs, cycles: computed_cycles, bridges: computed_bridges, stats: { orphan_count: computed_orphans.size, dead_end_count: computed_dead_ends.size, hub_count: computed_hubs.size, cycle_count: computed_cycles.size } } end |
#bridges(limit: 20, sample_size: 200) ⇒ Array<Hash>
Units that bridge different types in the graph.
Computes a simplified betweenness centrality metric — for each unit, we estimate how many shortest paths between sampled node pairs pass through it. High-scoring nodes are architectural bottlenecks whose failure or change would disrupt many cross-type communication paths.
For performance, samples a subset of node pairs rather than computing all-pairs shortest paths.
122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 |
# File 'lib/woods/graph_analyzer.rb', line 122 def bridges(limit: 20, sample_size: 200) nodes = graph_nodes return [] if nodes.size < 3 node_ids = nodes.keys scores = Hash.new(0) # Sample random pairs of nodes for shortest-path computation. # Use a deterministic seed so results are reproducible for the same graph. rng = Random.new(node_ids.size) pairs = generate_sample_pairs(node_ids, sample_size, rng) pairs.each do |source, target| path = bfs_shortest_path(source, target) next unless path && path.size > 2 # Credit intermediate nodes (exclude source and target) path[1..-2].each do |intermediate| scores[intermediate] += 1 end end scores .sort_by { |_id, score| -score } .first(limit) .map do |identifier, score| = nodes[identifier] || {} { identifier: identifier, type: [:type], score: score } end end |
#cycles ⇒ Array<Array<String>>
Detect circular dependency chains in the graph.
Uses iterative DFS with a three-color marking scheme (white/gray/black). When a gray (in-progress) node is revisited, a cycle has been found. The cycle path is extracted from the recursion stack.
104 105 106 |
# File 'lib/woods/graph_analyzer.rb', line 104 def cycles @cycles ||= detect_cycles end |
#dead_ends ⇒ Array<String>
Units with no dependencies (leaf nodes).
These are self-contained units that don’t reference anything else —typically utility classes, value objects, or standalone services.
62 63 64 65 66 67 68 69 70 |
# File 'lib/woods/graph_analyzer.rb', line 62 def dead_ends @dead_ends ||= begin nodes = graph_nodes nodes.each_with_object([]) do |(identifier, ), result| dependencies = @graph.dependencies_of(identifier) result << identifier if dependencies.empty? end end end |
#domain_clusters(min_size: 3, types: nil) ⇒ Array<Hash>
Group units into semantic domains using namespace prefixes and graph connectivity.
Strategy:
-
Seed clusters from top-level namespace prefixes (e.g., ShippingProfile::*, Order::*)
-
Assign unnamespaced units to their most-connected cluster
-
Merge small clusters (< min_size) into their most-connected neighbor
-
For each cluster, identify the hub (highest PageRank) and entry points
-
Compute boundary edges between clusters
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 |
# File 'lib/woods/graph_analyzer.rb', line 170 def domain_clusters(min_size: 3, types: nil) nodes = graph_nodes return [] if nodes.empty? # Filter by types if specified filtered_ids = if types type_set = types.map(&:to_s) nodes.select { |_, | type_set.include?([:type].to_s) }.keys else nodes.keys end return [] if filtered_ids.empty? # Step 1: Seed clusters from namespace prefixes clusters = seed_namespace_clusters(filtered_ids, nodes) # Step 2: Assign unnamespaced/root units to most-connected cluster assign_orphaned_units(clusters, filtered_ids, nodes) # Step 3: Merge small clusters merge_small_clusters(clusters, min_size) # Step 4: Enrich each cluster with hub, entry points, boundary edges pagerank_scores = @graph.pagerank enrich_clusters(clusters, nodes, pagerank_scores) # Sort by member count descending clusters.values .select { |c| c[:members].any? } .sort_by { |c| -c[:member_count] } end |
#hubs(limit: 20) ⇒ Array<Hash>
Units with the highest number of dependents (architectural hotspots).
A high dependent count means many other units reference this one. Changes to hub nodes have the widest blast radius.
80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
# File 'lib/woods/graph_analyzer.rb', line 80 def hubs(limit: 20) nodes = graph_nodes identifiers_with_dependents = nodes.map do |identifier, | dependents = @graph.dependents_of(identifier) { identifier: identifier, type: [:type], dependent_count: dependents.size, dependents: dependents } end identifiers_with_dependents.max_by(limit) { |h| h[:dependent_count] } end |
#orphans ⇒ Array<String>
Units with no dependents (nothing references them).
These are potential dead code or entry points. Framework and gem sources are excluded since they are naturally unreferenced in the reverse index.
44 45 46 47 48 49 50 51 52 53 54 |
# File 'lib/woods/graph_analyzer.rb', line 44 def orphans @orphans ||= begin nodes = graph_nodes nodes.each_with_object([]) do |(identifier, ), result| next if EXCLUDED_ORPHAN_TYPES.include?([:type]) dependents = @graph.dependents_of(identifier) result << identifier if dependents.empty? end end end |