Class: ExtrasDeCont::Parser

Inherits:
Object
  • Object
show all
Defined in:
lib/extras_de_cont/parser.rb

Overview

Utility class for parsing a pdf file and extracting transaction data.

Instance Method Summary collapse

Constructor Details

#initialize(file) ⇒ Parser

Returns a new instance of Parser.



9
10
11
# File 'lib/extras_de_cont/parser.rb', line 9

def initialize(file)
  @file = file
end

Instance Method Details

#parse_with(rule) ⇒ Array<ExtrasDeCont::Transaction>

Parses the pdf text with the requested Rule class.

Parameters:

  • rule (ExtrasDeCont::Rule)
    • The parsing rule specific to the bank.

Returns:



33
34
35
# File 'lib/extras_de_cont/parser.rb', line 33

def parse_with(rule)
  rule.parse(text)
end

#textString

Extracts all text content from the pdf file.

This method opens the pdf using PDF::Reader, concatenates the text from every page, and returns it as a single string.

Returns:

  • (String)

    the full text extracted from all pages of the PDF



19
20
21
22
23
24
25
26
27
28
# File 'lib/extras_de_cont/parser.rb', line 19

def text
  reader = PDF::Reader.new(@file)
  all_pdf_text = StringIO.new

  reader.pages.each do |page|
    all_pdf_text << page.text
  end

  all_pdf_text.string
end