Class: Woods::Extractor

Inherits:
Object
  • Object
show all
Includes:
FilenameUtils
Defined in:
lib/woods/extractor.rb

Overview

Extractor is the main orchestrator for codebase extraction.

It coordinates all individual extractors, builds the dependency graph, enriches with git data, and outputs structured JSON for the indexing pipeline.

Examples:

Full extraction

extractor = Extractor.new(output_dir: "tmp/woods")
results = extractor.extract_all

Incremental extraction (for CI)

extractor = Extractor.new
extractor.extract_changed(["app/models/user.rb", "app/services/checkout.rb"])

Constant Summary collapse

EXTRACTION_DIRECTORIES =

Directories under app/ that contain classes we need to extract. Used by eager_load_extraction_directories as a fallback when Rails.application.eager_load! fails (e.g., NameError from graphql/).

%w[
  models
  controllers
  services
  jobs
  mailers
  components
  interactors
  operations
  commands
  use_cases
  serializers
  decorators
  blueprinters
  managers
  policies
  validators
  channels
  presenters
  form_objects
].freeze
EXTRACTORS =
{
  models: Extractors::ModelExtractor,
  controllers: Extractors::ControllerExtractor,
  graphql: Extractors::GraphQLExtractor,
  components: Extractors::PhlexExtractor,
  view_components: Extractors::ViewComponentExtractor,
  services: Extractors::ServiceExtractor,
  jobs: Extractors::JobExtractor,
  mailers: Extractors::MailerExtractor,
  serializers: Extractors::SerializerExtractor,
  managers: Extractors::ManagerExtractor,
  policies: Extractors::PolicyExtractor,
  validators: Extractors::ValidatorExtractor,
  concerns: Extractors::ConcernExtractor,
  routes: Extractors::RouteExtractor,
  middleware: Extractors::MiddlewareExtractor,
  i18n: Extractors::I18nExtractor,
  pundit_policies: Extractors::PunditExtractor,
  configurations: Extractors::ConfigurationExtractor,
  engines: Extractors::EngineExtractor,
  view_templates: Extractors::ViewTemplateExtractor,
  migrations: Extractors::MigrationExtractor,
  action_cable_channels: Extractors::ActionCableExtractor,
  scheduled_jobs: Extractors::ScheduledJobExtractor,
  rake_tasks: Extractors::RakeTaskExtractor,
  state_machines: Extractors::StateMachineExtractor,
  events: Extractors::EventExtractor,
  decorators: Extractors::DecoratorExtractor,
  database_views: Extractors::DatabaseViewExtractor,
  caching: Extractors::CachingExtractor,
  factories: Extractors::FactoryExtractor,
  test_mappings: Extractors::TestMappingExtractor,
  rails_source: Extractors::RailsSourceExtractor,
  poros: Extractors::PoroExtractor,
  libs: Extractors::LibExtractor
}.freeze
TYPE_TO_EXTRACTOR_KEY =

Maps singular unit types (as stored in ExtractedUnit/graph nodes) to the plural keys used in the EXTRACTORS constant.

Returns:

  • (Hash{Symbol => Symbol})
{
  model: :models,
  controller: :controllers,
  service: :services,
  component: :components,
  view_component: :view_components,
  job: :jobs,
  mailer: :mailers,
  graphql_type: :graphql,
  graphql_mutation: :graphql,
  graphql_resolver: :graphql,
  graphql_query: :graphql,
  serializer: :serializers,
  manager: :managers,
  policy: :policies,
  validator: :validators,
  concern: :concerns,
  route: :routes,
  middleware: :middleware,
  i18n: :i18n,
  pundit_policy: :pundit_policies,
  configuration: :configurations,
  engine: :engines,
  view_template: :view_templates,
  migration: :migrations,
  action_cable_channel: :action_cable_channels,
  scheduled_job: :scheduled_jobs,
  rake_task: :rake_tasks,
  state_machine: :state_machines,
  event: :events,
  decorator: :decorators,
  database_view: :database_views,
  caching: :caching,
  factory: :factories,
  test_mapping: :test_mappings,
  rails_source: :rails_source,
  poro: :poros,
  lib: :libs
}.freeze
CLASS_BASED =

Maps unit types to class-based extractor methods (constantize + call).

{
  model: :extract_model, controller: :extract_controller,
  component: :extract_component, view_component: :extract_component,
  mailer: :extract_mailer, action_cable_channel: :extract_channel
}.freeze
FILE_BASED =

Maps unit types to file-based extractor methods (pass file_path).

{
  service: :extract_service_file, job: :extract_job_file,
  serializer: :extract_serializer_file, manager: :extract_manager_file,
  policy: :extract_policy_file, validator: :extract_validator_file,
  concern: :extract_concern_file,
  i18n: :extract_i18n_file,
  pundit_policy: :extract_pundit_file,
  configuration: :extract_configuration_file,
  view_template: :extract_view_template_file,
  migration: :extract_migration_file,
  rake_task: :extract_rake_file,
  decorator: :extract_decorator_file,
  database_view: :extract_view_file,
  caching: :extract_caching_file,
  test_mapping: :extract_test_file,
  poro: :extract_poro_file,
  lib: :extract_lib_file
}.freeze
GRAPHQL_TYPES =

GraphQL types all use the same extractor method.

%i[graphql_type graphql_mutation graphql_resolver graphql_query].freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from FilenameUtils

#collision_safe_filename, #safe_filename

Constructor Details

#initialize(output_dir: nil) ⇒ Extractor

Returns a new instance of Extractor.



206
207
208
209
210
211
# File 'lib/woods/extractor.rb', line 206

def initialize(output_dir: nil)
  @output_dir = Pathname.new(output_dir || Rails.root.join('tmp/woods'))
  @dependency_graph = DependencyGraph.new
  @results = {}
  @extractors = {}
end

Instance Attribute Details

#dependency_graphObject (readonly)

Returns the value of attribute dependency_graph.



204
205
206
# File 'lib/woods/extractor.rb', line 204

def dependency_graph
  @dependency_graph
end

#output_dirObject (readonly)

Returns the value of attribute output_dir.



204
205
206
# File 'lib/woods/extractor.rb', line 204

def output_dir
  @output_dir
end

Instance Method Details

#extract_allHash

Perform full extraction of the codebase

Returns:

  • (Hash)

    Results keyed by extractor type



220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
# File 'lib/woods/extractor.rb', line 220

def extract_all
  setup_output_directory
  ModelNameCache.reset!

  # Eager load once — all extractors need loaded classes for introspection.
  safe_eager_load!

  # Phase 1: Extract all units
  if Woods.configuration.concurrent_extraction
    extract_all_concurrent
  else
    extract_all_sequential
  end

  # Phase 1.5: Deduplicate results
  Rails.logger.info '[Woods] Deduplicating results...'
  deduplicate_results

  # Rebuild graph from deduped results — Phase 1 registered all units including
  # duplicates, and DependencyGraph has no remove/unregister API.
  @dependency_graph = DependencyGraph.new
  @results.each_value { |units| units.each { |u| @dependency_graph.register(u) } }

  # Phase 2: Resolve dependents (reverse dependencies)
  Rails.logger.info '[Woods] Resolving dependents...'
  resolve_dependents

  # Phase 3: Graph analysis (PageRank, structural metrics)
  Rails.logger.info '[Woods] Analyzing dependency graph...'
  @graph_analysis = GraphAnalyzer.new(@dependency_graph).analyze

  # Phase 3.5: Precompute request flows (opt-in)
  if Woods.configuration.precompute_flows
    Rails.logger.info '[Woods] Precomputing request flows...'
    precompute_flows
  end

  # Phase 4: Enrich with git data
  Rails.logger.info '[Woods] Enriching with git data...'
  enrich_with_git_data

  # Phase 4.5: Normalize file_path to relative paths
  Rails.logger.info '[Woods] Normalizing file paths...'
  normalize_file_paths

  # Phase 5: Write output
  Rails.logger.info '[Woods] Writing output...'
  write_results
  write_dependency_graph
  write_graph_analysis
  write_manifest
  write_structural_summary
  capture_snapshot

  log_summary

  @results
end

#extract_changed(changed_files) ⇒ Array<String>

Extract only units affected by changed files Used for incremental indexing in CI

Parameters:

  • changed_files (Array<String>)

    List of changed file paths

Returns:

  • (Array<String>)

    List of re-extracted unit identifiers



288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
# File 'lib/woods/extractor.rb', line 288

def extract_changed(changed_files)
  # Load existing graph
  graph_path = @output_dir.join('dependency_graph.json')
  @dependency_graph = DependencyGraph.from_h(JSON.parse(File.read(graph_path))) if graph_path.exist?

  ModelNameCache.reset!

  # Eager load to ensure newly-added classes are discoverable.
  safe_eager_load!

  # Normalize relative paths (from git diff) to absolute (as stored in file_map)
  absolute_files = changed_files.map do |f|
    Pathname.new(f).absolute? ? f : Rails.root.join(f).to_s
  end

  # Compute affected units
  affected_ids = @dependency_graph.affected_by(absolute_files)
  Rails.logger.info "[Woods] #{changed_files.size} changed files affect #{affected_ids.size} units"

  # Re-extract affected units
  affected_types = Set.new
  affected_ids.each do |unit_id|
    re_extract_unit(unit_id, affected_types: affected_types)
  end

  # Regenerate type indexes for affected types
  affected_types.each do |type_key|
    regenerate_type_index(type_key)
  end

  # Update graph, manifest, and summary
  write_dependency_graph
  write_manifest
  write_structural_summary
  capture_snapshot

  affected_ids
end