Class: Kotoshu::Readers::ZipReader

Inherits:
Object
  • Object
show all
Defined in:
lib/kotoshu/readers/file_reader.rb

Overview

Zip reader for reading files from zip archives.

This class reads files from within zip archives, such as OpenOffice/LibreOffice extensions (.odt, .oxt).

Examples:

Reading from a zip archive

zip = Zip::File.open('dictionary.oxt')
reader = ZipReader.new(zip, 'en_US.aff', 'UTF-8')
reader.each do |line_no, line|
  puts "#{line_no}: #{line}"
end

Constant Summary collapse

UTF8_BOM =

BOM (byte-order mark) for UTF-8

"\xEF\xBB\xBF".freeze

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(zipfile, entry_path, encoding = 'UTF-8') ⇒ ZipReader

Create a new zip reader.

Parameters:

  • zipfile (Zip::File)

    The zip file object

  • entry_path (String)

    Path to the entry within the zip

  • encoding (String) (defaults to: 'UTF-8')

    File encoding (default: ‘UTF-8’)



231
232
233
234
235
236
237
238
239
# File 'lib/kotoshu/readers/file_reader.rb', line 231

def initialize(zipfile, entry_path, encoding = 'UTF-8')
  @zipfile = zipfile
  @entry_path = entry_path
  @encoding = encoding
  @line_no = 0
  @entry = nil
  @iterator = nil
  reset_io
end

Instance Attribute Details

#encodingString (readonly)

Returns The encoding.

Returns:

  • (String)

    The encoding



218
219
220
# File 'lib/kotoshu/readers/file_reader.rb', line 218

def encoding
  @encoding
end

#entry_pathString (readonly)

Returns The entry path within the zip.

Returns:

  • (String)

    The entry path within the zip



215
216
217
# File 'lib/kotoshu/readers/file_reader.rb', line 215

def entry_path
  @entry_path
end

#line_noInteger (readonly)

Returns Current line number.

Returns:

  • (Integer)

    Current line number



221
222
223
# File 'lib/kotoshu/readers/file_reader.rb', line 221

def line_no
  @line_no
end

#zipfileZip::File (readonly)

Returns The zip file object.

Returns:

  • (Zip::File)

    The zip file object



212
213
214
# File 'lib/kotoshu/readers/file_reader.rb', line 212

def zipfile
  @zipfile
end

Instance Method Details

#closeObject

Close the zip entry.



299
300
301
302
# File 'lib/kotoshu/readers/file_reader.rb', line 299

def close
  @entry&.close
  @entry = nil
end

#each {|Integer, String| ... } ⇒ Enumerator

Iterate over lines.

Yields:

  • (Integer, String)

    Line number and line content

Returns:

  • (Enumerator)

    If no block given



255
256
257
258
259
# File 'lib/kotoshu/readers/file_reader.rb', line 255

def each
  return enum_for(:each) unless block_given?

  @iterator.each { |line_no, line| yield(line_no, line) }
end

#has_next?Boolean

Check if there are more lines.

Returns:

  • (Boolean)

    True if there are more lines



271
272
273
274
275
276
# File 'lib/kotoshu/readers/file_reader.rb', line 271

def has_next?
  peek
  true
rescue StopIteration
  false
end

#nextArray<Integer, String>

Get next line.

Returns:

  • (Array<Integer, String>)

    Line number and content



288
289
290
# File 'lib/kotoshu/readers/file_reader.rb', line 288

def next
  @iterator.next
end

#peekArray<Integer, String>

Peek at next line without consuming it.

Returns:

  • (Array<Integer, String>)

    Next line number and content



281
282
283
# File 'lib/kotoshu/readers/file_reader.rb', line 281

def peek
  @iterator.peek
end

#resetObject

Reset the reader to the beginning.



293
294
295
296
# File 'lib/kotoshu/readers/file_reader.rb', line 293

def reset
  @line_no = 0
  reset_io
end

#reset_encoding(new_encoding) ⇒ Object

Reset encoding and reopen zip entry.

Parameters:

  • new_encoding (String)

    New encoding



244
245
246
247
248
249
# File 'lib/kotoshu/readers/file_reader.rb', line 244

def reset_encoding(new_encoding)
  @encoding = new_encoding
  @line_no = 0
  @entry&.close
  reset_io
end

#to_aArray<Array<Integer, String>>

Get all lines as an array.

Returns:

  • (Array<Array<Integer, String>>)

    Array of [line_no, line] pairs



264
265
266
# File 'lib/kotoshu/readers/file_reader.rb', line 264

def to_a
  @iterator.to_a
end