Class: Pikuri::Extractor::Page
- Inherits:
-
Data
- Object
- Data
- Pikuri::Extractor::Page
- Defined in:
- lib/pikuri/extractor.rb
Overview
One windowed slice of a document, returned by extract_paged. The caller turns this into an observation; this struct carries everything a trailer needs without the caller re-reading the document.
Fields
-
lines— Array<String>, the collected window. Already per-line truncated (with PAGE_LINE_TRUNCATION_MARKER); not line-numbered — numbering is presentation the caller adds. For a PDF the array includes the “— Page N —” marker lines pikuri-pdf’s extractor emits, which count towardlimit/ the byte cap like any other line. -
start_line— the 1-indexed line number oflines.first(i.e. theoffsetthe caller asked for).lines.lastis at start_line lines.length - 1+. -
total_lines— total line count of the document when known, elsenil. Known when the read reached EOF, when the format was extracted in full (noextract_lines— e.g. HTML), or when the lazy stream is cheap enough to count to the end (plain text).nilwhen a lazy stream stopped early — the byte cap fired, or a PDF filled the window before its last page (counting the rest would mean parsing every page, defeating the laziness). -
more—trueif content remains past this window (the caller should offer offset = start_line lines.length+). -
byte_capped—trueif the byte cap (not the line limit) was the stopping criterion. -
kind— the matched extractor’skindtag (:text/:pdf/:html); lets the caller word format-specific trailers and the empty-document message.
An empty document yields lines: [], total_lines: 0; an offset past EOF yields lines: [] with total_lines set to the real (non-zero) count — the caller distinguishes the two.
Instance Attribute Summary collapse
-
#byte_capped ⇒ Object
readonly
Returns the value of attribute byte_capped.
-
#kind ⇒ Object
readonly
Returns the value of attribute kind.
-
#lines ⇒ Object
readonly
Returns the value of attribute lines.
-
#more ⇒ Object
readonly
Returns the value of attribute more.
-
#start_line ⇒ Object
readonly
Returns the value of attribute start_line.
-
#total_lines ⇒ Object
readonly
Returns the value of attribute total_lines.
Instance Attribute Details
#byte_capped ⇒ Object (readonly)
Returns the value of attribute byte_capped
146 147 148 |
# File 'lib/pikuri/extractor.rb', line 146 def byte_capped @byte_capped end |
#kind ⇒ Object (readonly)
Returns the value of attribute kind
146 147 148 |
# File 'lib/pikuri/extractor.rb', line 146 def kind @kind end |
#lines ⇒ Object (readonly)
Returns the value of attribute lines
146 147 148 |
# File 'lib/pikuri/extractor.rb', line 146 def lines @lines end |
#more ⇒ Object (readonly)
Returns the value of attribute more
146 147 148 |
# File 'lib/pikuri/extractor.rb', line 146 def more @more end |
#start_line ⇒ Object (readonly)
Returns the value of attribute start_line
146 147 148 |
# File 'lib/pikuri/extractor.rb', line 146 def start_line @start_line end |
#total_lines ⇒ Object (readonly)
Returns the value of attribute total_lines
146 147 148 |
# File 'lib/pikuri/extractor.rb', line 146 def total_lines @total_lines end |