Module: Kreuzberg
- Defined in:
- lib/kreuzberg.rb,
lib/kreuzberg/cli.rb,
lib/kreuzberg/types.rb,
lib/kreuzberg/config.rb,
lib/kreuzberg/errors.rb,
lib/kreuzberg/result.rb,
lib/kreuzberg/version.rb,
lib/kreuzberg/api_proxy.rb,
lib/kreuzberg/cache_api.rb,
lib/kreuzberg/cli_proxy.rb,
lib/kreuzberg/mcp_proxy.rb,
lib/kreuzberg/djot_content.rb,
lib/kreuzberg/error_context.rb,
lib/kreuzberg/extraction_api.rb,
lib/kreuzberg/setup_lib_path.rb,
lib/kreuzberg/document_structure.rb,
lib/kreuzberg/validator_protocol.rb,
lib/kreuzberg/ocr_backend_protocol.rb,
lib/kreuzberg/post_processor_protocol.rb
Overview
Kreuzberg is a Ruby binding for the Rust core library providing document extraction, text extraction, and OCR capabilities.
Defined Under Namespace
Modules: APIProxy, CLI, CLIProxy, CacheAPI, Config, ErrorContext, Errors, ExtractionAPI, KeywordAlgorithm, MCPProxy, OcrBackendProtocol, PostProcessorProtocol, SetupLibPath, ValidatorProtocol Classes: BoundingBox, DocumentAnnotation, DocumentBoundingBox, DocumentNode, DocumentStructure, Element, ElementMetadata, ExtractedKeyword, HeaderMetadata, HtmlMetadata, ImageMetadata, LinkMetadata, ProcessingWarning, Result, StructuredData
Constant Summary collapse
- ExtractionConfig =
Config::Extraction
- PageConfig =
Config::PageConfig
- ElementType =
Semantic element type classification.
Categorizes text content into semantic units for downstream processing. Supports the element types commonly found in Unstructured documents.
T.type_alias do T.any( 'title', 'narrative_text', 'heading', 'list_item', 'table', 'image', 'page_break', 'code_block', 'block_quote', 'footer', 'header' ) end
- ERROR_CODE_SUCCESS =
0- ERROR_CODE_GENERIC =
1- ERROR_CODE_PANIC =
2- ERROR_CODE_INVALID_ARGUMENT =
3- ERROR_CODE_IO =
4- ERROR_CODE_PARSING =
5- ERROR_CODE_OCR =
6- ERROR_CODE_MISSING_DEPENDENCY =
7- VERSION =
'4.3.5'
Class Method Summary collapse
- .clear_post_processors ⇒ Object
- .clear_validators ⇒ Object
- .detect_mime_type ⇒ Object
- .detect_mime_type_from_path ⇒ Object
- .get_extensions_for_mime ⇒ Object
- .list_ocr_backends ⇒ Object
- .list_post_processors ⇒ Object
- .list_validators ⇒ Object
- .register_ocr_backend ⇒ Object
- .register_post_processor ⇒ Object
- .register_validator ⇒ Object
- .unregister_ocr_backend ⇒ Object
- .unregister_post_processor ⇒ Object
- .unregister_validator ⇒ Object
- .validate_mime_type ⇒ Object