Module: LiteParse

Defined in:
lib/liteparse/types.rb,
lib/liteparse.rb,
lib/liteparse/cli.rb,
lib/liteparse/parser.rb,
lib/liteparse/version.rb

Overview

All types (TextItem, ParsedPage, ParseResult, etc.) are defined natively in the Rust extension. This file re-exports them for convenience.

Defined Under Namespace

Modules: CLI Classes: Config, Error, ExtractedImage, LiteParse, ParseResult, ParsedPage, ScreenshotResult, TextItem

Constant Summary collapse

VERSION =

Current version of liteparse-rb.

"0.1.14"

Instance Method Summary collapse

Instance Method Details

#configLiteParse::Config

Get the current parser configuration.

Returns:



# File 'lib/liteparse/parser.rb', line 32

#parse(input) ⇒ LiteParse::ParseResult

Parse a document from a file path.

Examples:

result = parser.parse("report.pdf")
result.pages.each { |page| puts page.text }

Parameters:

  • input (String)

    Path to the document file (.pdf, .docx, .pptx, .xlsx, .html, image, etc.)

Returns:

Raises:

  • (RuntimeError)

    If parsing fails



# File 'lib/liteparse/parser.rb', line 14

#parse_bytes(data) ⇒ LiteParse::ParseResult

Parse a document from raw bytes.

Examples:

data = File.binread("report.pdf")
result = parser.parse_bytes(data)

Parameters:

  • data (String)

    Raw document bytes (binary string)

Returns:

Raises:

  • (RuntimeError)

    If parsing fails



# File 'lib/liteparse/parser.rb', line 23