Module: Philiprehberger::GzipKit

Defined in:
lib/philiprehberger/gzip_kit.rb,
lib/philiprehberger/gzip_kit/version.rb

Defined Under Namespace

Classes: Error

Constant Summary collapse

CHUNK_SIZE =
64 * 1024
GZIP_MAGIC =
[0x1f, 0x8b].freeze
VERSION =
'0.3.0'

Class Method Summary collapse

Class Method Details

.compress(string, level: Zlib::DEFAULT_COMPRESSION, stats: false) ⇒ String, Hash

Compress a string to gzip bytes.

Parameters:

  • string (String)

    the data to compress

  • level (Integer) (defaults to: Zlib::DEFAULT_COMPRESSION)

    compression level (Zlib::DEFAULT_COMPRESSION by default)

  • stats (Boolean) (defaults to: false)

    when true, return a hash with compression statistics

Returns:

  • (String, Hash)

    gzip-compressed bytes, or a stats hash when stats: true



20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/philiprehberger/gzip_kit.rb', line 20

def self.compress(string, level: Zlib::DEFAULT_COMPRESSION, stats: false)
  io_out = StringIO.new
  io_out.binmode
  gz = Zlib::GzipWriter.new(io_out, level)
  gz.write(string)
  gz.close
  compressed = io_out.string

  if stats
    original_size = string.bytesize
    compressed_size = compressed.bytesize
    ratio = original_size.zero? ? 0.0 : 1.0 - (compressed_size.to_f / original_size)
    {
      data: compressed,
      ratio: ratio,
      original_size: original_size,
      compressed_size: compressed_size
    }
  else
    compressed
  end
end

.compress_file(src, dest, level: Zlib::DEFAULT_COMPRESSION) {|bytes_processed, total_bytes| ... } ⇒ void

This method returns an undefined value.

Compress a file to a gzip file.

Parameters:

  • src (String)

    path to the source file

  • dest (String)

    path to the destination gzip file

  • level (Integer) (defaults to: Zlib::DEFAULT_COMPRESSION)

    compression level (Zlib::DEFAULT_COMPRESSION by default)

Yields:

  • (bytes_processed, total_bytes)

    progress callback

Yield Parameters:

  • bytes_processed (Integer)

    bytes processed so far

  • total_bytes (Integer)

    total file size



88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# File 'lib/philiprehberger/gzip_kit.rb', line 88

def self.compress_file(src, dest, level: Zlib::DEFAULT_COMPRESSION, &block)
  File.open(src, 'rb') do |io_in|
    File.open(dest, 'wb') do |io_out|
      if block
        total_bytes = File.size(src)
        bytes_processed = 0
        gz = Zlib::GzipWriter.new(io_out, level)
        while (chunk = io_in.read(CHUNK_SIZE))
          gz.write(chunk)
          bytes_processed += chunk.bytesize
          block.call(bytes_processed, total_bytes)
        end
        gz.finish
      else
        compress_stream(io_in, io_out, level: level)
      end
    end
  end
end

.compress_stream(io_in, io_out, level: Zlib::DEFAULT_COMPRESSION) ⇒ void

This method returns an undefined value.

Streaming compression from one IO to another, reading in 64KB chunks.

Parameters:

  • io_in (IO)

    readable input stream

  • io_out (IO)

    writable output stream

  • level (Integer) (defaults to: Zlib::DEFAULT_COMPRESSION)

    compression level (Zlib::DEFAULT_COMPRESSION by default)



200
201
202
203
204
205
206
# File 'lib/philiprehberger/gzip_kit.rb', line 200

def self.compress_stream(io_in, io_out, level: Zlib::DEFAULT_COMPRESSION)
  gz = Zlib::GzipWriter.new(io_out, level)
  while (chunk = io_in.read(CHUNK_SIZE))
    gz.write(chunk)
  end
  gz.finish
end

.compressed?(data) ⇒ Boolean

Check if data is gzip-compressed by inspecting magic bytes.

Parameters:

  • data (String)

    data to check

Returns:

  • (Boolean)

    true if data starts with gzip magic bytes



72
73
74
75
76
77
# File 'lib/philiprehberger/gzip_kit.rb', line 72

def self.compressed?(data)
  return false if data.nil? || data.bytesize < 2

  bytes = data.bytes
  bytes[0] == GZIP_MAGIC[0] && bytes[1] == GZIP_MAGIC[1]
end

.concat(data_a, data_b) ⇒ String

Concatenate two gzip-compressed strings.

Per the gzip specification, concatenated gzip streams are valid.

Parameters:

  • data_a (String)

    first gzip-compressed string

  • data_b (String)

    second gzip-compressed string

Returns:

  • (String)

    concatenated gzip data

Raises:

  • (Error)

    if either input is not valid gzip



143
144
145
146
147
148
149
150
# File 'lib/philiprehberger/gzip_kit.rb', line 143

def self.concat(data_a, data_b)
  raise Error, 'first argument is not valid gzip data' unless compressed?(data_a)
  raise Error, 'second argument is not valid gzip data' unless compressed?(data_b)

  result = String.new(data_a, encoding: Encoding::BINARY)
  result << data_b.b
  result
end

.decompress(data) ⇒ String

Decompress gzip bytes to a string.

Parameters:

  • data (String)

    gzip-compressed bytes

Returns:

  • (String)

    decompressed string

Raises:

  • (Zlib::GzipFile::Error)

    if the data is not valid gzip



48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
# File 'lib/philiprehberger/gzip_kit.rb', line 48

def self.decompress(data)
  io_in = StringIO.new(data)
  io_in.binmode
  result = String.new(encoding: Encoding::BINARY)

  # Handle concatenated gzip streams per gzip spec
  until io_in.eof?
    gz = Zlib::GzipReader.new(io_in)
    result << gz.read
    # GzipReader leaves io_in positioned after the stream
    unused = gz.unused
    gz.finish
    if unused
      io_in.pos -= unused.bytesize
    end
  end

  result.force_encoding(Encoding::UTF_8)
end

.decompress_file(src, dest) {|bytes_processed, total_bytes| ... } ⇒ void

This method returns an undefined value.

Decompress a gzip file to a regular file.

Parameters:

  • src (String)

    path to the gzip source file

  • dest (String)

    path to the destination file

Yields:

  • (bytes_processed, total_bytes)

    progress callback

Yield Parameters:

  • bytes_processed (Integer)

    bytes decompressed so far

  • total_bytes (nil)

    always nil (total unknown until decompression completes)



116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# File 'lib/philiprehberger/gzip_kit.rb', line 116

def self.decompress_file(src, dest, &block)
  File.open(src, 'rb') do |io_in|
    File.open(dest, 'wb') do |io_out|
      if block
        gz = Zlib::GzipReader.new(io_in)
        bytes_processed = 0
        while (chunk = gz.read(CHUNK_SIZE))
          io_out.write(chunk)
          bytes_processed += chunk.bytesize
          block.call(bytes_processed, nil)
        end
        gz.close
      else
        decompress_stream(io_in, io_out)
      end
    end
  end
end

.decompress_stream(io_in, io_out) ⇒ void

This method returns an undefined value.

Streaming decompression from one IO to another, reading in 64KB chunks.

Parameters:

  • io_in (IO)

    readable input stream containing gzip data

  • io_out (IO)

    writable output stream



213
214
215
216
217
218
219
220
# File 'lib/philiprehberger/gzip_kit.rb', line 213

def self.decompress_stream(io_in, io_out)
  gz = Zlib::GzipReader.new(io_in)
  while (chunk = gz.read(CHUNK_SIZE))
    io_out.write(chunk)
  end
ensure
  gz&.close
end

.equivalent?(blob_a, blob_b) ⇒ Boolean

Check whether two gzip-compressed blobs decompress to equal byte strings.

Useful for comparing gzip outputs produced at different compression levels or with different metadata — only the decompressed payloads are compared.

Parameters:

  • blob_a (String)

    first gzip-compressed string

  • blob_b (String)

    second gzip-compressed string

Returns:

  • (Boolean)

    true iff both blobs decompress to equal byte strings

Raises:

  • (Error)

    if either input is not valid gzip



161
162
163
164
165
166
167
168
# File 'lib/philiprehberger/gzip_kit.rb', line 161

def self.equivalent?(blob_a, blob_b)
  raise Error, 'first argument is not valid gzip data' unless compressed?(blob_a)
  raise Error, 'second argument is not valid gzip data' unless compressed?(blob_b)

  decompress(blob_a).b == decompress(blob_b).b
rescue Zlib::GzipFile::Error => e
  raise Error, "failed to decompress gzip data: #{e.message}"
end

.inspect_header(data) ⇒ Hash?

Inspect the gzip header without decompressing.

Parameters:

  • data (String)

    gzip-compressed data

Returns:

  • (Hash, nil)

    header info or nil if not valid gzip



174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
# File 'lib/philiprehberger/gzip_kit.rb', line 174

def self.inspect_header(data)
  return nil unless compressed?(data)

  io = StringIO.new(data)
  io.binmode
  gz = Zlib::GzipReader.new(io)

  {
    method: :deflate,
    mtime: gz.mtime,
    os: gz.os_code,
    original_name: gz.orig_name && gz.orig_name.empty? ? nil : gz.orig_name,
    comment: gz.comment && gz.comment.empty? ? nil : gz.comment
  }
rescue Zlib::GzipFile::Error
  nil
ensure
  gz&.close
end