Module: Metaclean::Exiftool
- Defined in:
- lib/metaclean/exiftool.rb
Constant Summary collapse
- WRITE_UNSUPPORTED_RE =
ExifTool can READ many formats it cannot WRITE, and mat2 owns the strip for them: the ZIP-based documents (docx/xlsx/pptx/odt/ods/odp/odg/odf/epub) and the RIFF containers (avi/wav). ExifTool announces the inability with one of a few phrasings — “writing of X files is not yet supported”, “does not yet support writing of …”, or “Can’t currently write RIFF … files” — so we match all of them. strip! returns :unsupported in these cases so the runner treats it as a soft skip (mat2 does the actual strip), NOT a pipeline failure that would wrongly pin an already-clean file at :unverified. This is safe because the post-strip residual re-read still gates the :cleaned status.
/not yet support|can't currently write|writing of .* files/i
Class Method Summary collapse
-
.available? ⇒ Boolean
True if ‘exiftool` is on PATH.
-
.read(path) ⇒ Object
Reads metadata from a file and returns a flat Hash of “Group:Tag” => value.
-
.scrub_encoding(obj) ⇒ Object
ExifTool labels its -j output UTF-8, but binary/odd tag values (UserComment, MakerNotes fragments, corrupt or hostile files) can carry invalid bytes.
-
.strip!(path, also_delete: []) ⇒ Object
Removes every removable tag, in place.
-
.version ⇒ Object
Returns the version string, or nil if exiftool is missing/broken.
Class Method Details
.available? ⇒ Boolean
True if ‘exiftool` is on PATH. Memoized so repeated checks don’t re-spawn it (defined? not nil? — the cached value can legitimately be false).
21 22 23 24 25 26 27 28 29 30 31 32 |
# File 'lib/metaclean/exiftool.rb', line 21 def available? return @available if defined?(@available) out, _err, status = Open3.capture3('exiftool', '-ver') @available = status.success? # Stash the version off the same call so `version` need not re-spawn. @version = @available ? out.strip : nil @available rescue Errno::ENOENT @version = nil @available = false # exiftool not on PATH end |
.read(path) ⇒ Object
Reads metadata from a file and returns a flat Hash of “Group:Tag” => value.
ExifTool flag glossary:
-j JSON output (machine-parseable)
-G1 Include the family-1 group name. NB: with -G1 mainstream EXIF
tags appear under "IFD0"/"ExifIFD"/"IFD1", not "EXIF" (that's
the family-0 name); GPS/IPTC/XMP-dc keep those group names.
-a Allow duplicate tags (some formats have several with same name)
-u Include unknown/unidentified tags
-s Short tag names (no descriptions)
-n Numeric values (no human formatting like "1/100 sec")
-api largefilesupport=1 Allow files >4 GB
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/metaclean/exiftool.rb', line 52 def read(path) out, err, status = Metaclean.capture3( 'exiftool', '-j', '-G1', '-a', '-u', '-s', '-n', '-api', 'largefilesupport=1', Metaclean.safe_path(path) ) raise Error, "ExifTool read failed: #{err.strip}" unless status.success? # ExifTool's JSON output is an array (one entry per file). We always # pass one file, so we take the first element. `|| {}` handles the # edge case where exiftool returns an empty array. A non-array shape is # unexpected — bail with a clear error instead of crashing later on # `.first` returning a Hash/scalar. data = JSON.parse(out) raise Error, 'Unexpected ExifTool output (expected a JSON array)' unless data.is_a?(Array) scrub_encoding(data.first || {}) rescue JSON::ParserError => e raise Error, "Could not parse ExifTool output: #{e.}" end |
.scrub_encoding(obj) ⇒ Object
ExifTool labels its -j output UTF-8, but binary/odd tag values (UserComment, MakerNotes fragments, corrupt or hostile files) can carry invalid bytes. A later gsub (Display.format_value) raises on an invalid-encoding String and would crash the whole run, so replace bad bytes up front. This hash is only used for display/diff/residual checks — the actual strip operates on the file via the tools — so scrubbing is safe.
78 79 80 81 82 83 84 85 |
# File 'lib/metaclean/exiftool.rb', line 78 def scrub_encoding(obj) case obj when String then obj.valid_encoding? ? obj : obj.scrub when Array then obj.map { |e| scrub_encoding(e) } when Hash then obj.transform_values { |v| scrub_encoding(v) } else obj end end |
.strip!(path, also_delete: []) ⇒ Object
Removes every removable tag, in place. Returns true on success, :unsupported when ExifTool cannot write the format, and raises on failure.
‘-all=` sets every tag to empty, which deletes them. `-overwrite_original` makes ExifTool replace the file directly instead of writing `file_original` next to it. `-api largefilesupport=1` lets files larger than 4 GB through.
104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
# File 'lib/metaclean/exiftool.rb', line 104 def strip!(path, also_delete: []) # `-all=` clears metadata, but for TIFF/DNG ExifTool refuses to delete the # IFD0 directory and leaves its tags (Artist, Software, …) behind. So we # ALSO delete the known privacy tags by name and clear the GPS group: both # are no-ops where `-all=` already removed them (e.g. JPEG), but they make # the strip complete AND lossless (no re-encode) for IFD0-preserving formats. args = ['exiftool', '-all=', '-gps:all='] also_delete.each { |tag| args << "-#{tag}=" } args.concat(['-overwrite_original', '-q', '-q', '-api', 'largefilesupport=1', Metaclean.safe_path(path)]) _out, err, status = Metaclean.capture3(*args) return true if status.success? return :unsupported if err.match?(WRITE_UNSUPPORTED_RE) raise Error, "ExifTool strip failed: #{err.strip}" end |
.version ⇒ Object
Returns the version string, or nil if exiftool is missing/broken. Captured by ‘available?`, so this never re-runs the binary.
36 37 38 |
# File 'lib/metaclean/exiftool.rb', line 36 def version available? ? @version : nil end |