Module: Metaclean::Exiftool

Defined in:
lib/metaclean/exiftool.rb

Overview

‘module Exiftool` (vs `class`) because we want module-level methods like `Exiftool.read(path)` — there’s no state to carry per instance.

Constant Summary collapse

WRITE_UNSUPPORTED_RE =

ExifTool can READ many formats it cannot WRITE — the ZIP-based documents (docx/xlsx/pptx/odt/ods/odp/odg/odf/epub) are read-only, and mat2 owns the strip for them. ExifTool reports this as “…writing of X files is not yet supported”. strip! returns :unsupported for that case so the runner treats it as a soft skip, not a pipeline failure, when mat2 already cleaned.

/not yet supported|writing of .* files/i

Class Method Summary collapse

Class Method Details

.available?Boolean

Returns true if ‘exiftool` is on PATH. The result is memoized in `@available` so repeated checks don’t re-spawn the process.

‘defined?(@available)` is safer than `@available.nil?` because the cached value could legitimately be `false` — we want to skip the re-check in that case too.

Returns:

  • (Boolean)


34
35
36
37
38
39
40
41
42
43
44
45
46
47
# File 'lib/metaclean/exiftool.rb', line 34

def available?
  return @available if defined?(@available)

  out, _err, status = Open3.capture3('exiftool', '-ver')
  @available = status.success?
  # Stash the version off the same call so `version` need not re-spawn.
  @version = @available ? out.strip : nil
  @available
rescue Errno::ENOENT
  # `Errno::ENOENT` ("no such file or directory") is what Open3 raises
  # when the executable can't be found. We treat that as "not available".
  @version = nil
  @available = false
end

.read(path) ⇒ Object

Reads metadata from a file and returns a flat Hash of “Group:Tag” => value.

ExifTool flag glossary:

-j         JSON output (machine-parseable)
-G1        Include the family-1 group name. NB: with -G1 mainstream EXIF
           tags appear under "IFD0"/"ExifIFD"/"IFD1", not "EXIF" (that's
           the family-0 name); GPS/IPTC/XMP-dc keep those group names.
-a         Allow duplicate tags (some formats have several with same name)
-u         Include unknown/unidentified tags
-s         Short tag names (no descriptions)
-n         Numeric values (no human formatting like "1/100 sec")
-api largefilesupport=1   Allow files >4 GB


67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'lib/metaclean/exiftool.rb', line 67

def read(path)
  out, err, status = Open3.capture3(
    'exiftool', '-j', '-G1', '-a', '-u', '-s', '-n', '-api', 'largefilesupport=1',
    Metaclean.safe_path(path)
  )
  raise Error, "ExifTool read failed: #{err.strip}" unless status.success?

  # ExifTool's JSON output is an array (one entry per file). We always
  # pass one file, so we take the first element. `|| {}` handles the
  # edge case where exiftool returns an empty array. A non-array shape is
  # unexpected — bail with a clear error instead of crashing later on
  # `.first` returning a Hash/scalar.
  data = JSON.parse(out)
  raise Error, 'Unexpected ExifTool output (expected a JSON array)' unless data.is_a?(Array)

  scrub_encoding(data.first || {})
rescue JSON::ParserError => e
  raise Error, "Could not parse ExifTool output: #{e.message}"
end

.scrub_encoding(obj) ⇒ Object

ExifTool labels its -j output UTF-8, but binary/odd tag values (UserComment, MakerNotes fragments, corrupt or hostile files) can carry invalid bytes. A later gsub (Display.format_value) raises on an invalid-encoding String and would crash the whole run, so replace bad bytes up front. This hash is only used for display/diff/residual checks — the actual strip operates on the file via the tools — so scrubbing is safe.



93
94
95
96
97
98
99
100
# File 'lib/metaclean/exiftool.rb', line 93

def scrub_encoding(obj)
  case obj
  when String then obj.valid_encoding? ? obj : obj.scrub
  when Array  then obj.map { |e| scrub_encoding(e) }
  when Hash   then obj.transform_values { |v| scrub_encoding(v) }
  else obj
  end
end

.strip!(path) ⇒ Object

Removes every removable tag, in place. Returns true on success, :unsupported when ExifTool cannot write the format, and raises on failure.

‘-all=` is the magic incantation: it sets every tag to nothing (= empty), which deletes them. `-overwrite_original` makes ExifTool replace the file directly instead of writing `file_original` next to it. `-api largefilesupport=1` lets files larger than 4 GB through.

Raises:



116
117
118
119
120
121
122
123
124
# File 'lib/metaclean/exiftool.rb', line 116

def strip!(path)
  _out, err, status = Open3.capture3(
    'exiftool', '-all=', '-overwrite_original', '-q', '-q', '-api', 'largefilesupport=1', Metaclean.safe_path(path)
  )
  return true if status.success?
  return :unsupported if err.match?(WRITE_UNSUPPORTED_RE)

  raise Error, "ExifTool strip failed: #{err.strip}"
end

.versionObject

Returns the version string, or nil if exiftool is missing/broken. Captured by ‘available?`, so this never re-runs the binary.



51
52
53
# File 'lib/metaclean/exiftool.rb', line 51

def version
  available? ? @version : nil
end