Module: Philiprehberger::GzipKit
- Defined in:
- lib/philiprehberger/gzip_kit.rb,
lib/philiprehberger/gzip_kit/version.rb
Overview
GzipKit provides gzip compression and decompression with streaming support.
The module exposes both string-oriented and IO-oriented entry points:
-
GzipKit.compress / GzipKit.decompress for in-memory string data
-
GzipKit.compress_stream / GzipKit.decompress_stream for IO-to-IO streaming
-
GzipKit.compress_file / GzipKit.decompress_file for file-to-file transforms
-
GzipKit.compressed? / GzipKit.inspect_header for gzip detection and header inspection
-
GzipKit.concat / GzipKit.equivalent? for combining and comparing gzip blobs
Streaming and file methods read in 64 KB chunks by default. The chunk size can be tuned via the chunk_size: keyword when dealing with very small or very large payloads.
Defined Under Namespace
Classes: Error
Constant Summary collapse
- CHUNK_SIZE =
64 * 1024
- GZIP_MAGIC =
[0x1f, 0x8b].freeze
- VERSION =
'0.4.0'
Class Method Summary collapse
-
.compress(string, level: Zlib::DEFAULT_COMPRESSION, stats: false) ⇒ String, Hash
Compress a string to gzip bytes.
-
.compress_file(src, dest, level: Zlib::DEFAULT_COMPRESSION, chunk_size: CHUNK_SIZE) {|bytes_processed, total_bytes| ... } ⇒ void
Compress a file to a gzip file.
-
.compress_stream(io_in, io_out, level: Zlib::DEFAULT_COMPRESSION, chunk_size: CHUNK_SIZE) ⇒ void
Streaming compression from one IO to another.
-
.compressed?(data) ⇒ Boolean
Check if data is gzip-compressed by inspecting magic bytes.
-
.concat(data_a, data_b) ⇒ String
Concatenate two gzip-compressed strings.
-
.decompress(data, stats: false) ⇒ String, Hash
Decompress gzip bytes to a string.
-
.decompress_file(src, dest, chunk_size: CHUNK_SIZE) {|bytes_processed, total_bytes| ... } ⇒ void
Decompress a gzip file to a regular file.
-
.decompress_stream(io_in, io_out, chunk_size: CHUNK_SIZE) ⇒ void
Streaming decompression from one IO to another.
-
.equivalent?(blob_a, blob_b) ⇒ Boolean
Check whether two gzip-compressed blobs decompress to equal byte strings.
-
.inspect_header(data) ⇒ Hash?
Inspect the gzip header without decompressing.
Class Method Details
.compress(string, level: Zlib::DEFAULT_COMPRESSION, stats: false) ⇒ String, Hash
Compress a string to gzip bytes.
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/philiprehberger/gzip_kit.rb', line 44 def self.compress(string, level: Zlib::DEFAULT_COMPRESSION, stats: false) io_out = StringIO.new io_out.binmode gz = Zlib::GzipWriter.new(io_out, level) gz.write(string) gz.close compressed = io_out.string if stats original_size = string.bytesize compressed_size = compressed.bytesize ratio = original_size.zero? ? 0.0 : 1.0 - (compressed_size.to_f / original_size) { data: compressed, ratio: ratio, original_size: original_size, compressed_size: compressed_size } else compressed end end |
.compress_file(src, dest, level: Zlib::DEFAULT_COMPRESSION, chunk_size: CHUNK_SIZE) {|bytes_processed, total_bytes| ... } ⇒ void
This method returns an undefined value.
Compress a file to a gzip file.
133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/philiprehberger/gzip_kit.rb', line 133 def self.compress_file(src, dest, level: Zlib::DEFAULT_COMPRESSION, chunk_size: CHUNK_SIZE, &block) validate_chunk_size!(chunk_size) File.open(src, 'rb') do |io_in| File.open(dest, 'wb') do |io_out| if block total_bytes = File.size(src) bytes_processed = 0 gz = Zlib::GzipWriter.new(io_out, level) while (chunk = io_in.read(chunk_size)) gz.write(chunk) bytes_processed += chunk.bytesize block.call(bytes_processed, total_bytes) end gz.finish else compress_stream(io_in, io_out, level: level, chunk_size: chunk_size) end end end end |
.compress_stream(io_in, io_out, level: Zlib::DEFAULT_COMPRESSION, chunk_size: CHUNK_SIZE) ⇒ void
This method returns an undefined value.
Streaming compression from one IO to another.
263 264 265 266 267 268 269 270 271 |
# File 'lib/philiprehberger/gzip_kit.rb', line 263 def self.compress_stream(io_in, io_out, level: Zlib::DEFAULT_COMPRESSION, chunk_size: CHUNK_SIZE) validate_chunk_size!(chunk_size) gz = Zlib::GzipWriter.new(io_out, level) while (chunk = io_in.read(chunk_size)) gz.write(chunk) end gz.finish end |
.compressed?(data) ⇒ Boolean
Check if data is gzip-compressed by inspecting magic bytes.
115 116 117 118 119 120 |
# File 'lib/philiprehberger/gzip_kit.rb', line 115 def self.compressed?(data) return false if data.nil? || data.bytesize < 2 bytes = data.bytes bytes[0] == GZIP_MAGIC[0] && bytes[1] == GZIP_MAGIC[1] end |
.concat(data_a, data_b) ⇒ String
Concatenate two gzip-compressed strings.
Per the gzip specification, concatenated gzip streams are valid.
194 195 196 197 198 199 200 201 |
# File 'lib/philiprehberger/gzip_kit.rb', line 194 def self.concat(data_a, data_b) raise Error, 'first argument is not valid gzip data' unless compressed?(data_a) raise Error, 'second argument is not valid gzip data' unless compressed?(data_b) result = String.new(data_a, encoding: Encoding::BINARY) result << data_b.b result end |
.decompress(data, stats: false) ⇒ String, Hash
Decompress gzip bytes to a string.
82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
# File 'lib/philiprehberger/gzip_kit.rb', line 82 def self.decompress(data, stats: false) io_in = StringIO.new(data) io_in.binmode result = String.new(encoding: Encoding::BINARY) # Handle concatenated gzip streams per gzip spec until io_in.eof? gz = Zlib::GzipReader.new(io_in) result << gz.read # GzipReader leaves io_in positioned after the stream unused = gz.unused gz.finish if unused io_in.pos -= unused.bytesize end end decompressed = result.force_encoding(Encoding::UTF_8) if stats decompressed_size = decompressed.bytesize compressed_size = data.bytesize ratio = decompressed_size.zero? ? 0.0 : compressed_size.to_f / decompressed_size { data: decompressed, ratio: ratio } else decompressed end end |
.decompress_file(src, dest, chunk_size: CHUNK_SIZE) {|bytes_processed, total_bytes| ... } ⇒ void
This method returns an undefined value.
Decompress a gzip file to a regular file.
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
# File 'lib/philiprehberger/gzip_kit.rb', line 165 def self.decompress_file(src, dest, chunk_size: CHUNK_SIZE, &block) validate_chunk_size!(chunk_size) File.open(src, 'rb') do |io_in| File.open(dest, 'wb') do |io_out| if block gz = Zlib::GzipReader.new(io_in) bytes_processed = 0 while (chunk = gz.read(chunk_size)) io_out.write(chunk) bytes_processed += chunk.bytesize block.call(bytes_processed, nil) end gz.close else decompress_stream(io_in, io_out, chunk_size: chunk_size) end end end end |
.decompress_stream(io_in, io_out, chunk_size: CHUNK_SIZE) ⇒ void
This method returns an undefined value.
Streaming decompression from one IO to another.
287 288 289 290 291 292 293 294 295 296 |
# File 'lib/philiprehberger/gzip_kit.rb', line 287 def self.decompress_stream(io_in, io_out, chunk_size: CHUNK_SIZE) validate_chunk_size!(chunk_size) gz = Zlib::GzipReader.new(io_in) while (chunk = gz.read(chunk_size)) io_out.write(chunk) end ensure gz&.close end |
.equivalent?(blob_a, blob_b) ⇒ Boolean
Check whether two gzip-compressed blobs decompress to equal byte strings.
Useful for comparing gzip outputs produced at different compression levels or with different metadata — only the decompressed payloads are compared.
212 213 214 215 216 217 218 219 |
# File 'lib/philiprehberger/gzip_kit.rb', line 212 def self.equivalent?(blob_a, blob_b) raise Error, 'first argument is not valid gzip data' unless compressed?(blob_a) raise Error, 'second argument is not valid gzip data' unless compressed?(blob_b) decompress(blob_a).b == decompress(blob_b).b rescue Zlib::GzipFile::Error => e raise Error, "failed to decompress gzip data: #{e.}" end |
.inspect_header(data) ⇒ Hash?
Inspect the gzip header without decompressing.
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 |
# File 'lib/philiprehberger/gzip_kit.rb', line 225 def self.inspect_header(data) return nil unless compressed?(data) io = StringIO.new(data) io.binmode gz = Zlib::GzipReader.new(io) { method: :deflate, mtime: gz.mtime, os: gz.os_code, original_name: gz.orig_name && gz.orig_name.empty? ? nil : gz.orig_name, comment: gz.comment && gz.comment.empty? ? nil : gz.comment } rescue Zlib::GzipFile::Error nil ensure gz&.close end |