Class: Woods::GraphAnalyzer

Inherits:

Object

Object
Woods::GraphAnalyzer

show all

Defined in:: lib/woods/graph_analyzer.rb

Overview

GraphAnalyzer computes structural properties of the dependency graph.

Given a DependencyGraph, it identifies architectural patterns like orphaned units, circular dependencies, hub nodes, and bridge nodes. These metrics help surface dead code, architectural bottlenecks, and high-risk change targets.

Inspired by FlowMapper’s Comparator pattern — takes a graph, produces a structural report without mutating anything.

Examples:

Basic usage

graph = Woods::DependencyGraph.new
# ... register units ...
analyzer = Woods::GraphAnalyzer.new(graph)
report = analyzer.analyze
report[:cycles]  # => [["A", "B", "A"], ...]
report[:hubs]    # => [{ identifier: "User", type: :model, ... }, ...]

Constant Summary collapse

EXCLUDED_ORPHAN_TYPES = Types that are naturally root nodes and should not be flagged as orphans. Framework and gem sources are consumed but never referenced by application code in the dependency graph’s reverse index.

%i[rails_source gem_source].freeze

Instance Method Summary collapse

#analyze ⇒ Hash

Full analysis report combining all structural metrics.
#bridges(limit: 20, sample_size: 200) ⇒ Array<Hash>

Units that bridge different types in the graph.
#cycles ⇒ Array<Array<String>>

Detect circular dependency chains in the graph.
#dead_ends ⇒ Array<String>

Units with no dependencies (leaf nodes).
#domain_clusters(min_size: 3, types: nil) ⇒ Array<Hash>

Group units into semantic domains using namespace prefixes and graph connectivity.
#hubs(limit: 20) ⇒ Array<Hash>

Units with the highest number of dependents (architectural hotspots).
#initialize(dependency_graph) ⇒ GraphAnalyzer constructor

A new instance of GraphAnalyzer.
#orphans ⇒ Array<String>

Units with no dependents (nothing references them).

Constructor Details

#initialize(dependency_graph) ⇒ `GraphAnalyzer`

Returns a new instance of GraphAnalyzer.

Parameters:

dependency_graph (DependencyGraph) —

The graph to analyze



30
31
32

# File 'lib/woods/graph_analyzer.rb', line 30

def initialize(dependency_graph)
  @graph = dependency_graph
end

Instance Method Details

#analyze ⇒ `Hash`

Full analysis report combining all structural metrics.

Returns:

(Hash) —

Complete analysis with :orphans, :dead_ends, :hubs, :cycles, :bridges, and :stats

# File 'lib/woods/graph_analyzer.rb', line 207

def analyze
  computed_orphans = orphans
  computed_dead_ends = dead_ends
  computed_hubs = hubs
  computed_cycles = cycles
  computed_bridges = bridges(limit: 10)

  {
    orphans: computed_orphans,
    dead_ends: computed_dead_ends,
    hubs: computed_hubs,
    cycles: computed_cycles,
    bridges: computed_bridges,
    stats: {
      orphan_count: computed_orphans.size,
      dead_end_count: computed_dead_ends.size,
      hub_count: computed_hubs.size,
      cycle_count: computed_cycles.size
    }
  }
end

#bridges(limit: 20, sample_size: 200) ⇒ `Array<Hash>`

Units that bridge different types in the graph.

Computes a simplified betweenness centrality metric — for each unit, we estimate how many shortest paths between sampled node pairs pass through it. High-scoring nodes are architectural bottlenecks whose failure or change would disrupt many cross-type communication paths.

For performance, samples a subset of node pairs rather than computing all-pairs shortest paths.

Parameters:

limit (Integer) (defaults to: 20) —

Maximum number of bridges to return
sample_size (Integer) (defaults to: 200) —

Number of node pairs to sample for estimation

Returns:

(Array<Hash>) —

Sorted by score descending. Each hash contains :identifier, :type, :score

# File 'lib/woods/graph_analyzer.rb', line 122

def bridges(limit: 20, sample_size: 200)
  nodes = graph_nodes
  return [] if nodes.size < 3

  node_ids = nodes.keys
  scores = Hash.new(0)

  # Sample random pairs of nodes for shortest-path computation.
  # Use a deterministic seed so results are reproducible for the same graph.
  rng = Random.new(node_ids.size)
  pairs = generate_sample_pairs(node_ids, sample_size, rng)

  pairs.each do |source, target|
    path = bfs_shortest_path(source, target)
    next unless path && path.size > 2

    # Credit intermediate nodes (exclude source and target)
    path[1..-2].each do |intermediate|
      scores[intermediate] += 1
    end
  end

  scores
    .sort_by { |_id, score| -score }
    .first(limit)
    .map do |identifier, score|
      meta = nodes[identifier] || {}
      {
        identifier: identifier,
        type: meta[:type],
        score: score
      }
    end
end

#cycles ⇒ `Array<Array<String>>`

Detect circular dependency chains in the graph.

Uses iterative DFS with a three-color marking scheme (white/gray/black). When a gray (in-progress) node is revisited, a cycle has been found. The cycle path is extracted from the recursion stack.

Returns:

(Array<Array<String>>) —

Each element is a cycle represented as an ordered array of identifiers, ending with the repeated node. For example: [“A”, “B”, “C”, “A”]



104
105
106

# File 'lib/woods/graph_analyzer.rb', line 104

def cycles
  @cycles ||= detect_cycles
end

#dead_ends ⇒ `Array<String>`

Units with no dependencies (leaf nodes).

These are self-contained units that don’t reference anything else —typically utility classes, value objects, or standalone services.

Returns:

(Array<String>) —

Identifiers of dead-end units

# File 'lib/woods/graph_analyzer.rb', line 62

def dead_ends
  @dead_ends ||= begin
    nodes = graph_nodes
    nodes.each_with_object([]) do |(identifier, _meta), result|
      dependencies = @graph.dependencies_of(identifier)
      result << identifier if dependencies.empty?
    end
  end
end

#domain_clusters(min_size: 3, types: nil) ⇒ `Array<Hash>`

Group units into semantic domains using namespace prefixes and graph connectivity.

Strategy:

Seed clusters from top-level namespace prefixes (e.g., ShippingProfile::*, Order::*)
Assign unnamespaced units to their most-connected cluster
Merge small clusters (< min_size) into their most-connected neighbor
For each cluster, identify the hub (highest PageRank) and entry points
Compute boundary edges between clusters

Parameters:

min_size (Integer) (defaults to: 3) —

Minimum units per cluster before merging (default: 3)
types (Array<String>, nil) (defaults to: nil) —

Filter to these unit types (default: all)

Returns:

(Array<Hash>) —

Clusters sorted by member count descending. Each hash: { name:, hub:, members:, member_count:, entry_points:, boundary_edges:, types: }

# File 'lib/woods/graph_analyzer.rb', line 170

def domain_clusters(min_size: 3, types: nil)
  nodes = graph_nodes
  return [] if nodes.empty?

  # Filter by types if specified
  filtered_ids = if types
                   type_set = types.map(&:to_s)
                   nodes.select { |_, meta| type_set.include?(meta[:type].to_s) }.keys
                 else
                   nodes.keys
                 end

  return [] if filtered_ids.empty?

  # Step 1: Seed clusters from namespace prefixes
  clusters = seed_namespace_clusters(filtered_ids, nodes)

  # Step 2: Assign unnamespaced/root units to most-connected cluster
  assign_orphaned_units(clusters, filtered_ids, nodes)

  # Step 3: Merge small clusters
  merge_small_clusters(clusters, min_size)

  # Step 4: Enrich each cluster with hub, entry points, boundary edges
  pagerank_scores = @graph.pagerank
  enrich_clusters(clusters, nodes, pagerank_scores)

  # Sort by member count descending
  clusters.values
          .select { |c| c[:members].any? }
          .sort_by { |c| -c[:member_count] }
end

#hubs(limit: 20) ⇒ `Array<Hash>`

Units with the highest number of dependents (architectural hotspots).

A high dependent count means many other units reference this one. Changes to hub nodes have the widest blast radius.

Parameters:

limit (Integer) (defaults to: 20) —

Maximum number of hubs to return

Returns:

(Array<Hash>) —

Sorted by dependent_count descending. Each hash contains :identifier, :type, :dependent_count, :dependents

# File 'lib/woods/graph_analyzer.rb', line 80

def hubs(limit: 20)
  nodes = graph_nodes

  identifiers_with_dependents = nodes.map do |identifier, meta|
    dependents = @graph.dependents_of(identifier)
    {
      identifier: identifier,
      type: meta[:type],
      dependent_count: dependents.size,
      dependents: dependents
    }
  end
  identifiers_with_dependents.max_by(limit) { |h| h[:dependent_count] }
end

#orphans ⇒ `Array<String>`

Units with no dependents (nothing references them).

These are potential dead code or entry points. Framework and gem sources are excluded since they are naturally unreferenced in the reverse index.

Returns:

(Array<String>) —

Identifiers of orphaned units

# File 'lib/woods/graph_analyzer.rb', line 44

def orphans
  @orphans ||= begin
    nodes = graph_nodes
    nodes.each_with_object([]) do |(identifier, meta), result|
      next if EXCLUDED_ORPHAN_TYPES.include?(meta[:type])

      dependents = @graph.dependents_of(identifier)
      result << identifier if dependents.empty?
    end
  end
end

Class: Woods::GraphAnalyzer

Overview

Examples:

Basic usage

Constant Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(dependency_graph) ⇒ GraphAnalyzer

Instance Method Details

#analyze ⇒ Hash

#bridges(limit: 20, sample_size: 200) ⇒ Array<Hash>

#cycles ⇒ Array<Array<String>>

#dead_ends ⇒ Array<String>

#domain_clusters(min_size: 3, types: nil) ⇒ Array<Hash>

#hubs(limit: 20) ⇒ Array<Hash>

#orphans ⇒ Array<String>

#initialize(dependency_graph) ⇒ `GraphAnalyzer`

#analyze ⇒ `Hash`

#bridges(limit: 20, sample_size: 200) ⇒ `Array<Hash>`

#cycles ⇒ `Array<Array<String>>`

#dead_ends ⇒ `Array<String>`

#domain_clusters(min_size: 3, types: nil) ⇒ `Array<Hash>`

#hubs(limit: 20) ⇒ `Array<Hash>`

#orphans ⇒ `Array<String>`