Module: Metaclean::Exiftool
- Defined in:
- lib/metaclean/exiftool.rb
Overview
‘module Exiftool` (vs `class`) because we want module-level methods like `Exiftool.read(path)` — there’s no state to carry per instance.
Constant Summary collapse
- WRITE_UNSUPPORTED_RE =
ExifTool can READ many formats it cannot WRITE — the ZIP-based documents (docx/xlsx/pptx/odt/ods/odp/odg/odf/epub) are read-only, and mat2 owns the strip for them. ExifTool reports this as “…writing of X files is not yet supported”. strip! returns :unsupported for that case so the runner treats it as a soft skip, not a pipeline failure, when mat2 already cleaned.
/not yet supported|writing of .* files/i
Class Method Summary collapse
-
.available? ⇒ Boolean
Returns true if ‘exiftool` is on PATH.
-
.read(path) ⇒ Object
Reads metadata from a file and returns a flat Hash of “Group:Tag” => value.
-
.scrub_encoding(obj) ⇒ Object
ExifTool labels its -j output UTF-8, but binary/odd tag values (UserComment, MakerNotes fragments, corrupt or hostile files) can carry invalid bytes.
-
.strip!(path) ⇒ Object
Removes every removable tag, in place.
-
.version ⇒ Object
Returns the version string, or nil if exiftool is missing/broken.
Class Method Details
.available? ⇒ Boolean
Returns true if ‘exiftool` is on PATH. The result is memoized in `@available` so repeated checks don’t re-spawn the process.
‘defined?(@available)` is safer than `@available.nil?` because the cached value could legitimately be `false` — we want to skip the re-check in that case too.
34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
# File 'lib/metaclean/exiftool.rb', line 34 def available? return @available if defined?(@available) out, _err, status = Open3.capture3('exiftool', '-ver') @available = status.success? # Stash the version off the same call so `version` need not re-spawn. @version = @available ? out.strip : nil @available rescue Errno::ENOENT # `Errno::ENOENT` ("no such file or directory") is what Open3 raises # when the executable can't be found. We treat that as "not available". @version = nil @available = false end |
.read(path) ⇒ Object
Reads metadata from a file and returns a flat Hash of “Group:Tag” => value.
ExifTool flag glossary:
-j JSON output (machine-parseable)
-G1 Include the family-1 group name. NB: with -G1 mainstream EXIF
tags appear under "IFD0"/"ExifIFD"/"IFD1", not "EXIF" (that's
the family-0 name); GPS/IPTC/XMP-dc keep those group names.
-a Allow duplicate tags (some formats have several with same name)
-u Include unknown/unidentified tags
-s Short tag names (no descriptions)
-n Numeric values (no human formatting like "1/100 sec")
-api largefilesupport=1 Allow files >4 GB
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'lib/metaclean/exiftool.rb', line 67 def read(path) out, err, status = Open3.capture3( 'exiftool', '-j', '-G1', '-a', '-u', '-s', '-n', '-api', 'largefilesupport=1', Metaclean.safe_path(path) ) raise Error, "ExifTool read failed: #{err.strip}" unless status.success? # ExifTool's JSON output is an array (one entry per file). We always # pass one file, so we take the first element. `|| {}` handles the # edge case where exiftool returns an empty array. A non-array shape is # unexpected — bail with a clear error instead of crashing later on # `.first` returning a Hash/scalar. data = JSON.parse(out) raise Error, 'Unexpected ExifTool output (expected a JSON array)' unless data.is_a?(Array) scrub_encoding(data.first || {}) rescue JSON::ParserError => e raise Error, "Could not parse ExifTool output: #{e.}" end |
.scrub_encoding(obj) ⇒ Object
ExifTool labels its -j output UTF-8, but binary/odd tag values (UserComment, MakerNotes fragments, corrupt or hostile files) can carry invalid bytes. A later gsub (Display.format_value) raises on an invalid-encoding String and would crash the whole run, so replace bad bytes up front. This hash is only used for display/diff/residual checks — the actual strip operates on the file via the tools — so scrubbing is safe.
93 94 95 96 97 98 99 100 |
# File 'lib/metaclean/exiftool.rb', line 93 def scrub_encoding(obj) case obj when String then obj.valid_encoding? ? obj : obj.scrub when Array then obj.map { |e| scrub_encoding(e) } when Hash then obj.transform_values { |v| scrub_encoding(v) } else obj end end |
.strip!(path) ⇒ Object
Removes every removable tag, in place. Returns true on success, :unsupported when ExifTool cannot write the format, and raises on failure.
‘-all=` is the magic incantation: it sets every tag to nothing (= empty), which deletes them. `-overwrite_original` makes ExifTool replace the file directly instead of writing `file_original` next to it. `-api largefilesupport=1` lets files larger than 4 GB through.
116 117 118 119 120 121 122 123 124 |
# File 'lib/metaclean/exiftool.rb', line 116 def strip!(path) _out, err, status = Open3.capture3( 'exiftool', '-all=', '-overwrite_original', '-q', '-q', '-api', 'largefilesupport=1', Metaclean.safe_path(path) ) return true if status.success? return :unsupported if err.match?(WRITE_UNSUPPORTED_RE) raise Error, "ExifTool strip failed: #{err.strip}" end |
.version ⇒ Object
Returns the version string, or nil if exiftool is missing/broken. Captured by ‘available?`, so this never re-runs the binary.
51 52 53 |
# File 'lib/metaclean/exiftool.rb', line 51 def version available? ? @version : nil end |