Module: Kreuzberg::OcrBackendProtocol

Defined in:
lib/kreuzberg/ocr_backend_protocol.rb

Overview

Examples:

Implementing a custom OCR backend

Implementing an OCR backend with initialization

Instance Method Summary collapse

Instance Method Details

#nameString

Returns Unique backend identifier.

Returns:

  • (String)

    Unique backend identifier

Raises:

  • (NotImplementedError)


9
10
11
# File 'lib/kreuzberg/ocr_backend_protocol.rb', line 9

def name
  raise NotImplementedError, "#{self.class} must implement #name"
end

#process_image(image_bytes, config) ⇒ String

Process image bytes and extract text via OCR.

This method receives raw image data (PNG, JPEG, TIFF, etc.) and an OCR configuration hash. It must return the extracted text as a string.

The config hash contains OCR settings such as:

  • “language” [String] - Language code (e.g., “eng”, “deu”, “fra”)

  • “backend” [String] - Backend name (same as #name)

  • Additional backend-specific settings

Examples:

def process_image(image_bytes, config)
  language = config["language"] || "eng"
  text = my_ocr_engine.recognize(image_bytes, language: language)
  text
end

Parameters:

  • image_bytes (String)

    Binary image data (PNG, JPEG, TIFF, etc.)

  • config (Hash)

    OCR configuration with the following keys:

    • “language” [String] - Language code for OCR (e.g., “eng”, “deu”)

    • “backend” [String] - Backend name

Returns:

  • (String)

    Extracted text content

Raises:

  • (NotImplementedError)


36
37
38
# File 'lib/kreuzberg/ocr_backend_protocol.rb', line 36

def process_image(image_bytes, config)
  raise NotImplementedError, "#{self.class} must implement #process_image(image_bytes, config)"
end