Module: Kreuzberg

Defined in:
lib/kreuzberg.rb,
lib/kreuzberg/cli.rb,
lib/kreuzberg/types.rb,
lib/kreuzberg/config.rb,
lib/kreuzberg/errors.rb,
lib/kreuzberg/result.rb,
lib/kreuzberg/version.rb,
lib/kreuzberg/api_proxy.rb,
lib/kreuzberg/cache_api.rb,
lib/kreuzberg/cli_proxy.rb,
lib/kreuzberg/mcp_proxy.rb,
lib/kreuzberg/djot_content.rb,
lib/kreuzberg/error_context.rb,
lib/kreuzberg/extraction_api.rb,
lib/kreuzberg/setup_lib_path.rb,
lib/kreuzberg/document_structure.rb,
lib/kreuzberg/validator_protocol.rb,
lib/kreuzberg/ocr_backend_protocol.rb,
lib/kreuzberg/post_processor_protocol.rb

Overview

Kreuzberg is a Ruby binding for the Rust core library providing document extraction, text extraction, and OCR capabilities.

Defined Under Namespace

Modules: APIProxy, CLI, CLIProxy, CacheAPI, Config, ErrorContext, Errors, ExtractionAPI, KeywordAlgorithm, MCPProxy, OcrBackendProtocol, PostProcessorProtocol, SetupLibPath, ValidatorProtocol Classes: BoundingBox, DocumentAnnotation, DocumentBoundingBox, DocumentNode, DocumentStructure, Element, ElementMetadata, ExtractedKeyword, HeaderMetadata, HtmlMetadata, ImageMetadata, LinkMetadata, PdfAnnotation, PdfAnnotationBoundingBox, ProcessingWarning, Result, StructuredData

Constant Summary collapse

ExtractionConfig =
Config::Extraction
PageConfig =
Config::PageConfig
ElementType =

Semantic element type classification.

Categorizes text content into semantic units for downstream processing. Supports the element types commonly found in Unstructured documents.

Examples:

type = Kreuzberg::ElementType::TITLE
T.type_alias do
  T.any(
    'title',
    'narrative_text',
    'heading',
    'list_item',
    'table',
    'image',
    'page_break',
    'code_block',
    'block_quote',
    'footer',
    'header'
  )
end
ERROR_CODE_SUCCESS =
0
ERROR_CODE_GENERIC =
1
ERROR_CODE_PANIC =
2
ERROR_CODE_INVALID_ARGUMENT =
3
ERROR_CODE_IO =
4
ERROR_CODE_PARSING =
5
ERROR_CODE_OCR =
6
ERROR_CODE_MISSING_DEPENDENCY =
7
VERSION =
'4.3.7'

Class Method Summary collapse

Class Method Details

.clear_post_processorsObject

.clear_validatorsObject

.detect_mime_typeObject

.detect_mime_type_from_pathObject

.get_extensions_for_mimeObject

.list_ocr_backendsObject

.list_post_processorsObject

.list_validatorsObject

.register_ocr_backendObject

.register_post_processorObject

.register_validatorObject

.unregister_ocr_backendObject

.unregister_post_processorObject

.unregister_validatorObject

.validate_mime_typeObject