Class: Uniword::Batch::DocumentProcessor

Inherits:

Object

Object
Uniword::Batch::DocumentProcessor

show all

Defined in:: lib/uniword/batch/document_processor.rb

Overview

Orchestrates batch document processing through configurable pipeline stages.

Responsibility: Load pipeline configuration and coordinate stage execution. Single Responsibility - only orchestrates processing, delegates work to stages.

Follows Open/Closed Principle - new stages can be added via configuration without modifying this class.

Examples:

Process batch of documents

processor = DocumentProcessor.new(pipeline_config: 'config/pipeline.yml')
results = processor.process_batch(
  input_dir: 'documents/input/',
  output_dir: 'documents/output/'
)
puts results.summary_text

Process with custom stages

processor = DocumentProcessor.new
processor.add_stage(CustomStage.new(enabled: true))
results = processor.process_file('document.docx', 'output.docx')

Constant Summary collapse

STAGE_CLASSES = Stage class registry Maps stage names to their class implementations

{
  normalize_styles: "NormalizeStylesStage",
  update_metadata: "UpdateMetadataStage",
  validate_links: "ValidateLinksStage",
  quality_check: "QualityCheckStage",
  convert_format: "ConvertFormatStage",
  compress_images: "CompressImagesStage",
}.freeze

Instance Attribute Summary collapse

#config ⇒ Object readonly

Returns the value of attribute config.
#stages ⇒ Object readonly

Returns the value of attribute stages.

Instance Method Summary collapse

#add_stage(stage) ⇒ self

Add a custom processing stage.
#disabled_stages ⇒ Array<String>

Get list of disabled stage names.
#enabled_stages ⇒ Array<String>

Get list of enabled stage names.
#initialize(pipeline_config: nil, config: nil, parallel: false, max_workers: 4) ⇒ DocumentProcessor constructor

Initialize document processor.
#process_batch(input_dir:, output_dir:, pattern: "*.{docx,doc}") ⇒ BatchResult

Process a batch of documents from input directory.
#process_file(input_path, output_path) ⇒ BatchResult

Process a single document file.

Constructor Details

#initialize(pipeline_config: nil, config: nil, parallel: false, max_workers: 4) ⇒ `DocumentProcessor`

Initialize document processor

Parameters:

pipeline_config (String, nil) (defaults to: nil) —

Path to pipeline configuration file
config (Hash, nil) (defaults to: nil) —

Direct configuration hash (overrides pipeline_config)
parallel (Boolean) (defaults to: false) —

Enable parallel processing
max_workers (Integer) (defaults to: 4) —

Maximum number of parallel workers

# File 'lib/uniword/batch/document_processor.rb', line 48

def initialize(pipeline_config: nil, config: nil, parallel: false,
max_workers: 4)
  @config = load_configuration(pipeline_config, config)
  @parallel = parallel || @config.dig(:pipeline, :parallel,
                                      :enabled) || false
  @max_workers = max_workers || @config.dig(:pipeline, :parallel,
                                            :max_workers) || 4
  @stages = load_stages
  @custom_stages = []
end

Instance Attribute Details

#config ⇒ `Object` (readonly)

Returns the value of attribute config.



29
30
31

# File 'lib/uniword/batch/document_processor.rb', line 29

def config
  @config
end

#stages ⇒ `Object` (readonly)

Returns the value of attribute stages.



29
30
31

# File 'lib/uniword/batch/document_processor.rb', line 29

def stages
  @stages
end

Instance Method Details

#add_stage(stage) ⇒ `self`

Add a custom processing stage

Parameters:

stage (ProcessingStage) —

Stage to add

Returns:

(self)

# File 'lib/uniword/batch/document_processor.rb', line 138

def add_stage(stage)
  unless stage.is_a?(ProcessingStage)
    raise ArgumentError,
          "Stage must inherit from ProcessingStage"
  end

  @custom_stages << stage
  self
end

#disabled_stages ⇒ `Array<String>`

Get list of disabled stage names

Returns:

(Array<String>) —

Names of disabled stages

# File 'lib/uniword/batch/document_processor.rb', line 159

def disabled_stages
  all_stages = @stages + @custom_stages
  all_stages.reject(&:enabled?).map(&:name)
end

#enabled_stages ⇒ `Array<String>`

Get list of enabled stage names

Returns:

(Array<String>) —

Names of enabled stages

# File 'lib/uniword/batch/document_processor.rb', line 151

def enabled_stages
  all_stages = @stages + @custom_stages
  all_stages.select(&:enabled?).map(&:name)
end

#process_batch(input_dir:, output_dir:, pattern: "*.{docx,doc}") ⇒ `BatchResult`

Process a batch of documents from input directory

Parameters:

input_dir (String) —

Input directory path
output_dir (String) —

Output directory path
pattern (String) (defaults to: "*.{docx,doc}") —

File pattern to match (default: ‘*.docx,doc’)

Returns:

(BatchResult) —

Processing results

# File 'lib/uniword/batch/document_processor.rb', line 65

def process_batch(input_dir:, output_dir:, pattern: "*.{docx,doc}")
  validate_directories!(input_dir, output_dir)

  # Create output directory if it doesn't exist
  FileUtils.mkdir_p(output_dir)

  # Find all matching files
  files = Dir.glob(File.join(input_dir, pattern))

  result = BatchResult.new

  if @parallel && files.size > 1
    process_parallel(files, input_dir, output_dir, result)
  else
    process_sequential(files, input_dir, output_dir, result)
  end

  result.complete!
end

#process_file(input_path, output_path) ⇒ `BatchResult`

Process a single document file

Parameters:

input_path (String) —

Input file path
output_path (String) —

Output file path

Returns:

(BatchResult) —

Processing result

# File 'lib/uniword/batch/document_processor.rb', line 90

def process_file(input_path, output_path)
  result = BatchResult.new
  start_time = Time.now

  begin
    # Load document
    document = DocumentFactory.from_file(input_path)

    # Create context
    context = {
      input_path: input_path,
      output_path: output_path,
      filename: File.basename(input_path),
    }

    # Execute pipeline
    executed_stages = []
    all_stages = @stages + @custom_stages

    all_stages.each do |stage|
      next unless stage.enabled?

      stage.process(document, context)
      executed_stages << stage.name
    end

    # Save output
    output_dir = File.dirname(output_path)
    FileUtils.mkdir_p(output_dir)
    document.save(output_path)

    duration = Time.now - start_time
    result.add_success(
      file: input_path,
      duration: duration,
      stages: executed_stages,
    )
  rescue StandardError => e
    handle_error(e, input_path, result)
  end

  result.complete!
end

Class: Uniword::Batch::DocumentProcessor

Overview

Examples:

Process batch of documents

Process with custom stages

Constant Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(pipeline_config: nil, config: nil, parallel: false, max_workers: 4) ⇒ DocumentProcessor

Instance Attribute Details

#config ⇒ Object (readonly)

#stages ⇒ Object (readonly)

Instance Method Details

#add_stage(stage) ⇒ self

#disabled_stages ⇒ Array<String>

#enabled_stages ⇒ Array<String>

#process_batch(input_dir:, output_dir:, pattern: "*.{docx,doc}") ⇒ BatchResult

#process_file(input_path, output_path) ⇒ BatchResult

#initialize(pipeline_config: nil, config: nil, parallel: false, max_workers: 4) ⇒ `DocumentProcessor`

#config ⇒ `Object` (readonly)

#stages ⇒ `Object` (readonly)

#add_stage(stage) ⇒ `self`

#disabled_stages ⇒ `Array<String>`

#enabled_stages ⇒ `Array<String>`

#process_batch(input_dir:, output_dir:, pattern: "*.{docx,doc}") ⇒ `BatchResult`

#process_file(input_path, output_path) ⇒ `BatchResult`