Class: Pdfsink::Page
- Inherits:
-
Object
- Object
- Pdfsink::Page
- Defined in:
- lib/pdfsink/page.rb
Overview
A single page of a Document.
Each accessor shells out to the pdfsink-rs binary for that page; results are cached so repeated reads don’t re-spawn the process. Page-level metadata (dimensions, rotation, bbox, object counts) comes from the document’s info payload and needs no extra spawn.
Instance Attribute Summary collapse
-
#number ⇒ Integer
readonly
1-based page number.
Instance Method Summary collapse
-
#bbox ⇒ Hash
The page bounding box (“top”, “x1”, “bottom”).
-
#extract_text ⇒ String
The page’s text in reading order.
-
#extract_words ⇒ Array<Hash>
Words with positions and font metadata.
-
#height ⇒ Float
Page height in PDF points.
-
#initialize(document, number, meta) ⇒ Page
constructor
A new instance of Page.
- #inspect ⇒ Object
-
#links ⇒ Array<Hash>
Hyperlinks on the page.
-
#object_counts ⇒ Hash
Counts of each object kind on the page.
-
#objects ⇒ Hash
Every page object (chars, lines, rects, curves, images, annots, …).
-
#rotation ⇒ Integer
Clockwise rotation in degrees (0, 90, 180, 270).
-
#search(pattern) ⇒ Array<Hash>
Regex search matches within the page text.
-
#tables(strategy: nil) ⇒ Array<Array>?
The page’s largest detected table, or nil if none is found.
-
#width ⇒ Float
Page width in PDF points.
Constructor Details
#initialize(document, number, meta) ⇒ Page
Returns a new instance of Page.
24 25 26 27 28 |
# File 'lib/pdfsink/page.rb', line 24 def initialize(document, number, ) @document = document @number = number @meta = end |
Instance Attribute Details
#number ⇒ Integer (readonly)
Returns 1-based page number.
19 20 21 |
# File 'lib/pdfsink/page.rb', line 19 def number @number end |
Instance Method Details
#bbox ⇒ Hash
Returns the page bounding box (“top”, “x1”, “bottom”).
40 |
# File 'lib/pdfsink/page.rb', line 40 def bbox = @meta["bbox"] |
#extract_text ⇒ String
The page’s text in reading order.
48 49 50 |
# File 'lib/pdfsink/page.rb', line 48 def extract_text @extract_text ||= Cli.text(path, number) end |
#extract_words ⇒ Array<Hash>
Words with positions and font metadata.
55 56 57 |
# File 'lib/pdfsink/page.rb', line 55 def extract_words @extract_words ||= Cli.words(path, number) end |
#height ⇒ Float
Returns page height in PDF points.
34 |
# File 'lib/pdfsink/page.rb', line 34 def height = @meta["height"] |
#inspect ⇒ Object
89 90 91 |
# File 'lib/pdfsink/page.rb', line 89 def inspect "#<Pdfsink::Page number=#{number} #{width}x#{height}>" end |
#links ⇒ Array<Hash>
Hyperlinks on the page.
69 70 71 |
# File 'lib/pdfsink/page.rb', line 69 def links @links ||= Cli.links(path, number) end |
#object_counts ⇒ Hash
Returns counts of each object kind on the page.
43 |
# File 'lib/pdfsink/page.rb', line 43 def object_counts = @meta["object_counts"] |
#objects ⇒ Hash
Every page object (chars, lines, rects, curves, images, annots, …).
62 63 64 |
# File 'lib/pdfsink/page.rb', line 62 def objects @objects ||= Cli.objects(path, number) end |
#rotation ⇒ Integer
Returns clockwise rotation in degrees (0, 90, 180, 270).
37 |
# File 'lib/pdfsink/page.rb', line 37 def rotation = @meta["rotation"] |
#search(pattern) ⇒ Array<Hash>
Regex search matches within the page text.
77 78 79 |
# File 'lib/pdfsink/page.rb', line 77 def search(pattern) Cli.search(path, number, pattern.is_a?(Regexp) ? pattern.source : pattern.to_s) end |
#tables(strategy: nil) ⇒ Array<Array>?
The page’s largest detected table, or nil if none is found.
85 86 87 |
# File 'lib/pdfsink/page.rb', line 85 def tables(strategy: nil) Cli.table(path, number, TableStrategy.resolve(strategy)) end |
#width ⇒ Float
Returns page width in PDF points.
31 |
# File 'lib/pdfsink/page.rb', line 31 def width = @meta["width"] |