Top Level Namespace

Defined Under Namespace

Modules: LiteParse

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.new(**kwargs) ⇒ LiteParse::LiteParse

Create a new LiteParse parser instance.

Examples:

parser = LiteParse::LiteParse.new(ocr_enabled: true, dpi: 200)

Parameters:

  • kwargs (Hash)

    Keyword arguments for parser configuration

Options Hash (**kwargs):

  • :ocr_language (String) — default: "eng"

    Language for OCR

  • :ocr_enabled (Boolean) — default: true

    Enable OCR

  • :ocr_server_url (String, nil)

    URL of an external OCR server

  • :ocr_server_headers (Hash<String, String>, nil)

    Headers for OCR server requests

  • :tessdata_path (String, nil)

    Path to Tesseract tessdata directory

  • :max_pages (Integer) — default: 1000

    Maximum pages to parse

  • :target_pages (String, nil)

    Page range expression (e.g. “1-5,7”)

  • :dpi (Float) — default: 150.0

    Rendering DPI

  • :output_format (String) — default: "json"

    Output format: “json”, “text”, or “markdown”

  • :preserve_very_small_text (Boolean) — default: false

    Preserve tiny text

  • :password (String, nil)

    Password for encrypted documents

  • :quiet (Boolean) — default: false

    Suppress non-error output

  • :num_workers (Integer)

    Number of worker threads (auto-detected)

  • :image_mode (String) — default: "placeholder"

    Image mode: “placeholder”, “embed”, or “off”

  • :extract_links (Boolean) — default: false

    Extract hyperlinks

Returns:



# File 'lib/liteparse.rb', line 12

Instance Method Details

#screenshot(input, page_numbers: nil) ⇒ Array<LiteParse::ScreenshotResult> Also known as: native_screenshot

Take screenshots of document pages.

Examples:

parser = LiteParse::LiteParse.new
screenshots = parser.screenshot("document.pdf", page_numbers: [1, 3])
screenshots.each { |s| File.write("page_#{s.page_num}.png", s.image_bytes) }

Parameters:

  • input (String)

    Path to the document file

  • page_numbers (Array<Integer>, nil) (defaults to: nil)

    Specific page numbers (1-indexed) to screenshot. nil = all pages.

Returns:



# File 'lib/liteparse.rb', line 34