Class: Kreuzberg::Config::ContentFilter

Inherits:
Object
  • Object
show all
Defined in:
lib/kreuzberg/config.rb

Overview

Content filter configuration for controlling extraction of headers, footers, watermarks, and repeating text across document formats.

Examples:

Include headers and footers

filter = ContentFilter.new(include_headers: true, include_footers: true)

Disable repeating text removal

filter = ContentFilter.new(strip_repeating_text: false)

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(include_headers: false, include_footers: false, strip_repeating_text: true, include_watermarks: false) ⇒ ContentFilter

Returns a new instance of ContentFilter.



871
872
873
874
875
876
877
878
879
880
881
# File 'lib/kreuzberg/config.rb', line 871

def initialize(
  include_headers: false,
  include_footers: false,
  strip_repeating_text: true,
  include_watermarks: false
)
  @include_headers = include_headers ? true : false
  @include_footers = include_footers ? true : false
  @strip_repeating_text = strip_repeating_text ? true : false
  @include_watermarks = include_watermarks ? true : false
end

Instance Attribute Details

#include_footersObject (readonly)

Returns the value of attribute include_footers.



869
870
871
# File 'lib/kreuzberg/config.rb', line 869

def include_footers
  @include_footers
end

#include_headersObject (readonly)

Returns the value of attribute include_headers.



869
870
871
# File 'lib/kreuzberg/config.rb', line 869

def include_headers
  @include_headers
end

#include_watermarksObject (readonly)

Returns the value of attribute include_watermarks.



869
870
871
# File 'lib/kreuzberg/config.rb', line 869

def include_watermarks
  @include_watermarks
end

#strip_repeating_textObject (readonly)

Returns the value of attribute strip_repeating_text.



869
870
871
# File 'lib/kreuzberg/config.rb', line 869

def strip_repeating_text
  @strip_repeating_text
end

Instance Method Details

#to_hObject



883
884
885
886
887
888
889
890
# File 'lib/kreuzberg/config.rb', line 883

def to_h
  {
    include_headers: @include_headers,
    include_footers: @include_footers,
    strip_repeating_text: @strip_repeating_text,
    include_watermarks: @include_watermarks
  }
end