Class: Woods::Extractor
- Inherits:
-
Object
- Object
- Woods::Extractor
- Includes:
- FilenameUtils
- Defined in:
- lib/woods/extractor.rb
Overview
Extractor is the main orchestrator for codebase extraction.
It coordinates all individual extractors, builds the dependency graph, enriches with git data, and outputs structured JSON for the indexing pipeline.
Constant Summary collapse
- EXTRACTION_DIRECTORIES =
Directories under app/ that contain classes we need to extract. Used by eager_load_extraction_directories as a fallback when Rails.application.eager_load! fails (e.g., NameError from graphql/).
%w[ models controllers services jobs mailers components interactors operations commands use_cases serializers decorators blueprinters managers policies validators channels presenters form_objects ].freeze
- EXTRACTORS =
{ models: Extractors::ModelExtractor, controllers: Extractors::ControllerExtractor, graphql: Extractors::GraphQLExtractor, components: Extractors::PhlexExtractor, view_components: Extractors::ViewComponentExtractor, services: Extractors::ServiceExtractor, jobs: Extractors::JobExtractor, mailers: Extractors::MailerExtractor, serializers: Extractors::SerializerExtractor, managers: Extractors::ManagerExtractor, policies: Extractors::PolicyExtractor, validators: Extractors::ValidatorExtractor, concerns: Extractors::ConcernExtractor, routes: Extractors::RouteExtractor, middleware: Extractors::MiddlewareExtractor, i18n: Extractors::I18nExtractor, pundit_policies: Extractors::PunditExtractor, configurations: Extractors::ConfigurationExtractor, engines: Extractors::EngineExtractor, view_templates: Extractors::ViewTemplateExtractor, migrations: Extractors::MigrationExtractor, action_cable_channels: Extractors::ActionCableExtractor, scheduled_jobs: Extractors::ScheduledJobExtractor, rake_tasks: Extractors::RakeTaskExtractor, state_machines: Extractors::StateMachineExtractor, events: Extractors::EventExtractor, decorators: Extractors::DecoratorExtractor, database_views: Extractors::DatabaseViewExtractor, caching: Extractors::CachingExtractor, factories: Extractors::FactoryExtractor, test_mappings: Extractors::TestMappingExtractor, rails_source: Extractors::RailsSourceExtractor, poros: Extractors::PoroExtractor, libs: Extractors::LibExtractor }.freeze
- TYPE_TO_EXTRACTOR_KEY =
Maps singular unit types (as stored in ExtractedUnit/graph nodes) to the plural keys used in the EXTRACTORS constant.
{ model: :models, controller: :controllers, service: :services, component: :components, view_component: :view_components, job: :jobs, mailer: :mailers, graphql_type: :graphql, graphql_mutation: :graphql, graphql_resolver: :graphql, graphql_query: :graphql, serializer: :serializers, manager: :managers, policy: :policies, validator: :validators, concern: :concerns, route: :routes, middleware: :middleware, i18n: :i18n, pundit_policy: :pundit_policies, configuration: :configurations, engine: :engines, view_template: :view_templates, migration: :migrations, action_cable_channel: :action_cable_channels, scheduled_job: :scheduled_jobs, rake_task: :rake_tasks, state_machine: :state_machines, event: :events, decorator: :decorators, database_view: :database_views, caching: :caching, factory: :factories, test_mapping: :test_mappings, rails_source: :rails_source, poro: :poros, lib: :libs }.freeze
- CLASS_BASED =
Maps unit types to class-based extractor methods (constantize + call).
{ model: :extract_model, controller: :extract_controller, component: :extract_component, view_component: :extract_component, mailer: :extract_mailer, action_cable_channel: :extract_channel }.freeze
- FILE_BASED =
Maps unit types to file-based extractor methods (pass file_path).
{ service: :extract_service_file, job: :extract_job_file, serializer: :extract_serializer_file, manager: :extract_manager_file, policy: :extract_policy_file, validator: :extract_validator_file, concern: :extract_concern_file, i18n: :extract_i18n_file, pundit_policy: :extract_pundit_file, configuration: :extract_configuration_file, view_template: :extract_view_template_file, migration: :extract_migration_file, rake_task: :extract_rake_file, decorator: :extract_decorator_file, database_view: :extract_view_file, caching: :extract_caching_file, test_mapping: :extract_test_file, poro: :extract_poro_file, lib: :extract_lib_file }.freeze
- GRAPHQL_TYPES =
GraphQL types all use the same extractor method.
%i[graphql_type graphql_mutation graphql_resolver graphql_query].freeze
Instance Attribute Summary collapse
-
#dependency_graph ⇒ Object
readonly
Returns the value of attribute dependency_graph.
-
#output_dir ⇒ Object
readonly
Returns the value of attribute output_dir.
Instance Method Summary collapse
-
#extract_all ⇒ Hash
Perform full extraction of the codebase.
-
#extract_changed(changed_files) ⇒ Array<String>
Extract only units affected by changed files Used for incremental indexing in CI.
-
#initialize(output_dir: nil) ⇒ Extractor
constructor
A new instance of Extractor.
Methods included from FilenameUtils
#collision_safe_filename, #safe_filename
Constructor Details
#initialize(output_dir: nil) ⇒ Extractor
Returns a new instance of Extractor.
206 207 208 209 210 211 |
# File 'lib/woods/extractor.rb', line 206 def initialize(output_dir: nil) @output_dir = Pathname.new(output_dir || Rails.root.join('tmp/woods')) @dependency_graph = DependencyGraph.new @results = {} @extractors = {} end |
Instance Attribute Details
#dependency_graph ⇒ Object (readonly)
Returns the value of attribute dependency_graph.
204 205 206 |
# File 'lib/woods/extractor.rb', line 204 def dependency_graph @dependency_graph end |
#output_dir ⇒ Object (readonly)
Returns the value of attribute output_dir.
204 205 206 |
# File 'lib/woods/extractor.rb', line 204 def output_dir @output_dir end |
Instance Method Details
#extract_all ⇒ Hash
Perform full extraction of the codebase
220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 |
# File 'lib/woods/extractor.rb', line 220 def extract_all setup_output_directory ModelNameCache.reset! # Eager load once — all extractors need loaded classes for introspection. safe_eager_load! # Phase 1: Extract all units if Woods.configuration.concurrent_extraction extract_all_concurrent else extract_all_sequential end # Phase 1.5: Deduplicate results Rails.logger.info '[Woods] Deduplicating results...' deduplicate_results # Rebuild graph from deduped results — Phase 1 registered all units including # duplicates, and DependencyGraph has no remove/unregister API. @dependency_graph = DependencyGraph.new @results.each_value { |units| units.each { |u| @dependency_graph.register(u) } } # Phase 2: Resolve dependents (reverse dependencies) Rails.logger.info '[Woods] Resolving dependents...' resolve_dependents # Phase 3: Graph analysis (PageRank, structural metrics) Rails.logger.info '[Woods] Analyzing dependency graph...' @graph_analysis = GraphAnalyzer.new(@dependency_graph).analyze # Phase 3.5: Precompute request flows (opt-in) if Woods.configuration.precompute_flows Rails.logger.info '[Woods] Precomputing request flows...' precompute_flows end # Phase 4: Enrich with git data Rails.logger.info '[Woods] Enriching with git data...' enrich_with_git_data # Phase 4.5: Normalize file_path to relative paths Rails.logger.info '[Woods] Normalizing file paths...' normalize_file_paths # Phase 5: Write output Rails.logger.info '[Woods] Writing output...' write_results write_dependency_graph write_graph_analysis write_manifest write_structural_summary capture_snapshot log_summary @results end |
#extract_changed(changed_files) ⇒ Array<String>
Extract only units affected by changed files Used for incremental indexing in CI
288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 |
# File 'lib/woods/extractor.rb', line 288 def extract_changed(changed_files) # Load existing graph graph_path = @output_dir.join('dependency_graph.json') @dependency_graph = DependencyGraph.from_h(JSON.parse(File.read(graph_path))) if graph_path.exist? ModelNameCache.reset! # Eager load to ensure newly-added classes are discoverable. safe_eager_load! # Normalize relative paths (from git diff) to absolute (as stored in file_map) absolute_files = changed_files.map do |f| Pathname.new(f).absolute? ? f : Rails.root.join(f).to_s end # Compute affected units affected_ids = @dependency_graph.affected_by(absolute_files) Rails.logger.info "[Woods] #{changed_files.size} changed files affect #{affected_ids.size} units" # Re-extract affected units affected_types = Set.new affected_ids.each do |unit_id| re_extract_unit(unit_id, affected_types: affected_types) end # Regenerate type indexes for affected types affected_types.each do |type_key| regenerate_type_index(type_key) end # Update graph, manifest, and summary write_dependency_graph write_manifest write_structural_summary capture_snapshot affected_ids end |