Class: Uniword::Batch::DocumentProcessor
- Inherits:
-
Object
- Object
- Uniword::Batch::DocumentProcessor
- Defined in:
- lib/uniword/batch/document_processor.rb
Overview
Orchestrates batch document processing through configurable pipeline stages.
Responsibility: Load pipeline configuration and coordinate stage execution. Single Responsibility - only orchestrates processing, delegates work to stages.
Follows Open/Closed Principle - new stages can be added via configuration without modifying this class.
Constant Summary collapse
- STAGE_CLASSES =
Stage class registry Maps stage names to their class implementations
{ normalize_styles: "NormalizeStylesStage", update_metadata: "UpdateMetadataStage", validate_links: "ValidateLinksStage", quality_check: "QualityCheckStage", convert_format: "ConvertFormatStage", compress_images: "CompressImagesStage", }.freeze
Instance Attribute Summary collapse
-
#config ⇒ Object
readonly
Returns the value of attribute config.
-
#stages ⇒ Object
readonly
Returns the value of attribute stages.
Instance Method Summary collapse
-
#add_stage(stage) ⇒ self
Add a custom processing stage.
-
#disabled_stages ⇒ Array<String>
Get list of disabled stage names.
-
#enabled_stages ⇒ Array<String>
Get list of enabled stage names.
-
#initialize(pipeline_config: nil, config: nil, parallel: false, max_workers: 4) ⇒ DocumentProcessor
constructor
Initialize document processor.
-
#process_batch(input_dir:, output_dir:, pattern: "*.{docx,doc}") ⇒ BatchResult
Process a batch of documents from input directory.
-
#process_file(input_path, output_path) ⇒ BatchResult
Process a single document file.
Constructor Details
#initialize(pipeline_config: nil, config: nil, parallel: false, max_workers: 4) ⇒ DocumentProcessor
Initialize document processor
48 49 50 51 52 53 54 55 56 57 |
# File 'lib/uniword/batch/document_processor.rb', line 48 def initialize(pipeline_config: nil, config: nil, parallel: false, max_workers: 4) @config = load_configuration(pipeline_config, config) @parallel = parallel || @config.dig(:pipeline, :parallel, :enabled) || false @max_workers = max_workers || @config.dig(:pipeline, :parallel, :max_workers) || 4 @stages = load_stages @custom_stages = [] end |
Instance Attribute Details
#config ⇒ Object (readonly)
Returns the value of attribute config.
29 30 31 |
# File 'lib/uniword/batch/document_processor.rb', line 29 def config @config end |
#stages ⇒ Object (readonly)
Returns the value of attribute stages.
29 30 31 |
# File 'lib/uniword/batch/document_processor.rb', line 29 def stages @stages end |
Instance Method Details
#add_stage(stage) ⇒ self
Add a custom processing stage
138 139 140 141 142 143 144 145 146 |
# File 'lib/uniword/batch/document_processor.rb', line 138 def add_stage(stage) unless stage.is_a?(ProcessingStage) raise ArgumentError, "Stage must inherit from ProcessingStage" end @custom_stages << stage self end |
#disabled_stages ⇒ Array<String>
Get list of disabled stage names
159 160 161 162 |
# File 'lib/uniword/batch/document_processor.rb', line 159 def disabled_stages all_stages = @stages + @custom_stages all_stages.reject(&:enabled?).map(&:name) end |
#enabled_stages ⇒ Array<String>
Get list of enabled stage names
151 152 153 154 |
# File 'lib/uniword/batch/document_processor.rb', line 151 def enabled_stages all_stages = @stages + @custom_stages all_stages.select(&:enabled?).map(&:name) end |
#process_batch(input_dir:, output_dir:, pattern: "*.{docx,doc}") ⇒ BatchResult
Process a batch of documents from input directory
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/uniword/batch/document_processor.rb', line 65 def process_batch(input_dir:, output_dir:, pattern: "*.{docx,doc}") validate_directories!(input_dir, output_dir) # Create output directory if it doesn't exist FileUtils.mkdir_p(output_dir) # Find all matching files files = Dir.glob(File.join(input_dir, pattern)) result = BatchResult.new if @parallel && files.size > 1 process_parallel(files, input_dir, output_dir, result) else process_sequential(files, input_dir, output_dir, result) end result.complete! end |
#process_file(input_path, output_path) ⇒ BatchResult
Process a single document file
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 |
# File 'lib/uniword/batch/document_processor.rb', line 90 def process_file(input_path, output_path) result = BatchResult.new start_time = Time.now begin # Load document document = DocumentFactory.from_file(input_path) # Create context context = { input_path: input_path, output_path: output_path, filename: File.basename(input_path), } # Execute pipeline executed_stages = [] all_stages = @stages + @custom_stages all_stages.each do |stage| next unless stage.enabled? stage.process(document, context) executed_stages << stage.name end # Save output output_dir = File.dirname(output_path) FileUtils.mkdir_p(output_dir) document.save(output_path) duration = Time.now - start_time result.add_success( file: input_path, duration: duration, stages: executed_stages, ) rescue StandardError => e handle_error(e, input_path, result) end result.complete! end |