Class: Pdfsink::Document

Inherits:
Object
  • Object
show all
Defined in:
lib/pdfsink/document.rb

Overview

A PDF document opened from a file on disk.

Opening is cheap: the path is validated and the document’s info payload (page count and per-page metadata) is fetched lazily on first access. Page objects are created on demand and memoized.

Examples:

doc = Pdfsink::Document.open("report.pdf")
doc.page_count          # => 12
doc.page(1).extract_text
doc.pages.flat_map(&:extract_words)

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(path) ⇒ Document

Returns a new instance of Document.

Parameters:

  • path (String)

    path to a PDF file

Raises:

  • (Errno::ENOENT)


29
30
31
32
33
34
# File 'lib/pdfsink/document.rb', line 29

def initialize(path)
  @path = File.expand_path(path)
  raise Errno::ENOENT, @path unless File.exist?(@path)

  @pages = {}
end

Instance Attribute Details

#pathString (readonly)

Returns absolute path to the PDF file.

Returns:

  • (String)

    absolute path to the PDF file



17
18
19
# File 'lib/pdfsink/document.rb', line 17

def path
  @path
end

Class Method Details

.open(path) ⇒ Document

Open a PDF document.

Parameters:

  • path (String)

    path to a PDF file

Returns:

Raises:

  • (Errno::ENOENT)

    if the file does not exist



24
25
26
# File 'lib/pdfsink/document.rb', line 24

def self.open(path)
  new(path)
end

Instance Method Details

#each_page {|page| ... } ⇒ Enumerator

Iterate over each page.

Yield Parameters:

Returns:

  • (Enumerator)

    if no block is given



74
75
76
77
78
# File 'lib/pdfsink/document.rb', line 74

def each_page(&block)
  return enum_for(:each_page) unless block

  pages.each(&block)
end

#infoHash

Document and per-page metadata as returned by the binary.

Returns:

  • (Hash)


39
40
41
# File 'lib/pdfsink/document.rb', line 39

def info
  @info ||= Cli.info(path)
end

#inspectObject



80
81
82
# File 'lib/pdfsink/document.rb', line 80

def inspect
  "#<Pdfsink::Document path=#{path.inspect} pages=#{page_count}>"
end

#page(number) ⇒ Page

Fetch a single page.

Parameters:

  • number (Integer)

    1-based page number

Returns:

Raises:

  • (RangeError)

    if the page number is out of range



55
56
57
58
59
60
61
# File 'lib/pdfsink/document.rb', line 55

def page(number)
  unless number.is_a?(Integer) && number >= 1 && number <= page_count
    raise RangeError, "page #{number} out of range (1..#{page_count})"
  end

  @pages[number] ||= Page.new(self, number, info["pages"][number - 1])
end

#page_countInteger Also known as: length, size

Returns number of pages.

Returns:

  • (Integer)

    number of pages



44
45
46
# File 'lib/pdfsink/document.rb', line 44

def page_count
  info["page_count"]
end

#pagesArray<Page>

All pages, in order.

Returns:



66
67
68
# File 'lib/pdfsink/document.rb', line 66

def pages
  (1..page_count).map { |n| page(n) }
end