Class: Archaeo::WarcReader
- Inherits:
-
Object
- Object
- Archaeo::WarcReader
- Defined in:
- lib/archaeo/warc_support.rb
Overview
Reads WARC (Web ARChive) format files (.warc, .warc.gz).
Parses WARC 1.0 records and yields WarcRecord value objects containing headers and body content.
Constant Summary collapse
- WARC_VERSION =
"WARC/1.0"- CRLF =
"\r\n"- HEADER_END =
"\r\n\r\n"
Instance Method Summary collapse
-
#initialize ⇒ WarcReader
constructor
A new instance of WarcReader.
- #read(path, &block) ⇒ Object
- #read_records(path) ⇒ Object
Constructor Details
#initialize ⇒ WarcReader
Returns a new instance of WarcReader.
17 18 19 |
# File 'lib/archaeo/warc_support.rb', line 17 def initialize @record_count = 0 end |
Instance Method Details
#read(path, &block) ⇒ Object
21 22 23 24 25 26 |
# File 'lib/archaeo/warc_support.rb', line 21 def read(path, &block) io = open_warc(path) read_records_from_io(io, &block) ensure io&.close end |
#read_records(path) ⇒ Object
28 29 30 31 32 |
# File 'lib/archaeo/warc_support.rb', line 28 def read_records(path) records = [] read(path) { |record| records << record } records end |