Class: Uniword::Infrastructure::ZipExtractor
- Inherits:
-
Object
- Object
- Uniword::Infrastructure::ZipExtractor
- Defined in:
- lib/uniword/infrastructure/zip_extractor.rb
Overview
Extracts content from ZIP archives (e.g., DOCX files).
Responsibility: Handle ZIP file extraction operations. Does NOT handle: Document parsing or deserialization.
DOCX files are ZIP archives containing XML files and media. This class provides low-level ZIP extraction functionality.
Instance Method Summary collapse
-
#extract(path) ⇒ Hash<String, String>
Extract all files from a ZIP archive or stream.
-
#extract_file(path, entry_path) ⇒ String?
Extract a specific file from a ZIP archive.
-
#extract_from_stream(stream) ⇒ Hash<String, String>
Extract from IO or StringIO stream.
-
#list_files(path) ⇒ Array<String>
List all files in a ZIP archive.
Instance Method Details
#extract(path) ⇒ Hash<String, String>
Extract all files from a ZIP archive or stream.
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
# File 'lib/uniword/infrastructure/zip_extractor.rb', line 26 def extract(path) # Handle streams directly return extract_from_stream(path) if path.is_a?(IO) || path.is_a?(StringIO) validate_path(path) content = {} Zip::File.open(path) do |zip_file| zip_file.each do |entry| next if entry.directory? content[entry.name] = entry.get_input_stream.read.force_encoding("UTF-8") end # Explicitly extract theme if present theme_entry = zip_file.find_entry("word/theme/theme1.xml") if theme_entry && !content.key?("word/theme/theme1.xml") content["word/theme/theme1.xml"] = theme_entry.get_input_stream.read.force_encoding("UTF-8") end end content end |
#extract_file(path, entry_path) ⇒ String?
Extract a specific file from a ZIP archive.
85 86 87 88 89 90 91 92 93 94 |
# File 'lib/uniword/infrastructure/zip_extractor.rb', line 85 def extract_file(path, entry_path) validate_path(path) Zip::File.open(path) do |zip_file| entry = zip_file.find_entry(entry_path) return nil unless entry entry.get_input_stream.read.force_encoding("UTF-8") end end |
#extract_from_stream(stream) ⇒ Hash<String, String>
Extract from IO or StringIO stream
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/uniword/infrastructure/zip_extractor.rb', line 57 def extract_from_stream(stream) content = {} Zip::File.open_buffer(stream) do |zip_file| zip_file.each do |entry| next if entry.directory? content[entry.name] = entry.get_input_stream.read.force_encoding("UTF-8") end # Explicitly extract theme if present theme_entry = zip_file.find_entry("word/theme/theme1.xml") if theme_entry && !content.key?("word/theme/theme1.xml") content["word/theme/theme1.xml"] = theme_entry.get_input_stream.read.force_encoding("UTF-8") end end content end |
#list_files(path) ⇒ Array<String>
List all files in a ZIP archive.
101 102 103 104 105 106 107 108 109 110 111 112 113 |
# File 'lib/uniword/infrastructure/zip_extractor.rb', line 101 def list_files(path) validate_path(path) files = [] Zip::File.open(path) do |zip_file| zip_file.each do |entry| files << entry.name unless entry.directory? end end files end |