Module: Pdfsink

Defined in:
lib/pdfsink.rb,
lib/pdfsink/cli.rb,
lib/pdfsink/page.rb,
lib/pdfsink/error.rb,
lib/pdfsink/railtie.rb,
lib/pdfsink/version.rb,
lib/pdfsink/document.rb,
lib/pdfsink/table_strategy.rb

Overview

Pdfsink wraps the pdfsink-rs CLI, a fast pure-Rust PDF extraction tool, exposing text, word, object, table, link, and search extraction to Ruby.

Examples:

Open a document and read a page

doc = Pdfsink.open("report.pdf")
doc.page_count            # => 12
doc.page(1).extract_text  # => "Quarterly Report\n..."

One-shot text extraction

Pdfsink.extract_text("report.pdf", page: 1)

Tables

Pdfsink.open("invoice.pdf").page(1).tables(strategy: :text)

Defined Under Namespace

Modules: Cli, TableStrategy Classes: BinaryNotFoundError, CommandError, Configuration, Document, Error, Page, ParseError, Railtie

Constant Summary collapse

VERSION =
"0.1.0"
PDFSINK_RS_VERSION =

Version of the pdfsink-rs crate this gem builds and wraps.

"0.2.8"

Class Method Summary collapse

Class Method Details

.configurationConfiguration

Returns:



41
42
43
# File 'lib/pdfsink.rb', line 41

def configuration
  @configuration ||= Configuration.new
end

.configure {|configuration| ... } ⇒ Object

Yields the configuration object for modification.

Examples:

Pdfsink.configure do |config|
  config.default_table_strategy = :text
end

Yields:



51
52
53
# File 'lib/pdfsink.rb', line 51

def configure
  yield(configuration)
end

.extract_text(path, page: 1) ⇒ String

Extract the text of a single page in one call.

Parameters:

  • path (String)
  • page (Integer) (defaults to: 1)

    1-based page number

Returns:

  • (String)


70
71
72
# File 'lib/pdfsink.rb', line 70

def extract_text(path, page: 1)
  Cli.text(File.expand_path(path), page)
end

.open(path) ⇒ Document

Open a PDF document.

Parameters:

  • path (String)

    path to a PDF file

Returns:



61
62
63
# File 'lib/pdfsink.rb', line 61

def open(path)
  Document.open(path)
end

.versionString

The version of the underlying pdfsink-rs binary the gem was built with.

Returns:

  • (String)

    e.g. “0.2.8”



77
78
79
# File 'lib/pdfsink.rb', line 77

def version
  Cli.version
end