Module: Metaclean::Strategy
- Defined in:
- lib/metaclean/strategy.rb
Constant Summary collapse
- PRIVACY_GROUPS =
Tag GROUPS that almost always carry personally identifying info. Survival of any tag in these groups raises a flag to the user.
%w[GPS MakerNotes XMP-dc XMP-photoshop IPTC ICC-header].freeze
- PRIVACY_TAGS =
Specific tag NAMES (regardless of group) we never want to leak. If exiftool reports e.g. “EXIF:Artist” we still flag it because of the tag-name match, not the group.
%w[ Artist Author Creator Copyright Rights By-line By-lineTitle Credit Source Contact OwnerName CameraOwnerName SerialNumber InternalSerialNumber LensSerialNumber Software HostComputer ProcessingSoftware ImageDescription UserComment LastModifiedBy LastSavedBy LastAuthor ].freeze
- MAT2_PREFERRED =
File extensions where mat2 is meaningfully stricter than ExifTool and should run first. For other formats, ExifTool is the broader expert.
%w[ pdf docx xlsx pptx odt ods odp odg epub png svg mp4 avi mkv mov webm ].freeze
Class Method Summary collapse
-
.privacy_residual(meta) ⇒ Object
Looks at metadata read AFTER cleaning and returns the entries that still look privacy-relevant.
-
.tools_for(path, prefer: {}) ⇒ Object
Returns an ordered list of tool symbols (e.g. ‘[:mat2, :exiftool, :qpdf]`) to run on `path`.
Class Method Details
.privacy_residual(meta) ⇒ Object
Looks at metadata read AFTER cleaning and returns the entries that still look privacy-relevant. The runner uses this for the “still present” warning at the end of each file.
Why both group-match and tag-match? Tag names can appear under different groups depending on the format (e.g. “Author” in PDF vs “Artist” in EXIF). Combining the two keeps coverage broad without having to enumerate every tag pair.
80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'lib/metaclean/strategy.rb', line 80 def privacy_residual() .reject { |k, _| k == 'SourceFile' }.select do |k, _| # ExifTool keys look like "GPS:GPSLatitude". Split on the first ":". group, tag = k.to_s.split(':', 2) # Skip System/File/etc. — those aren't user metadata. next false if Display::NON_METADATA_GROUPS.include?(group) if tag.nil? # No "Group:" prefix — the whole key is the tag name. PRIVACY_TAGS.include?(group.to_s) else PRIVACY_GROUPS.include?(group) || PRIVACY_TAGS.include?(tag) end end end |
.tools_for(path, prefer: {}) ⇒ Object
Returns an ordered list of tool symbols (e.g. ‘[:mat2, :exiftool, :qpdf]`) to run on `path`. The runner executes them in order; if one fails or is skipped, the next still runs.
‘prefer:` is a hash of user opt-outs from the CLI flags (–no-mat2, –exiftool-only, etc.). The pattern `prefer != false` treats both `nil` (not set) and `true` as “use it” — only an explicit `false` disables.
47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 |
# File 'lib/metaclean/strategy.rb', line 47 def tools_for(path, prefer: {}) ext = File.extname(path).downcase.delete('.') tools = [] if ext == 'pdf' # PDFs benefit from all three, in this order: # mat2 → cleans the high-level metadata + content streams it knows # exiftool → strips the Info dictionary (Author, Title, Producer) # qpdf → rebuilds the file, dropping any unreferenced bits tools << :mat2 if prefer[:mat2] != false && Mat2.available? tools << :exiftool if prefer[:exiftool] != false tools << :qpdf if prefer[:qpdf] != false && Qpdf.available? elsif MAT2_PREFERRED.include?(ext) && prefer[:mat2] != false && Mat2.available? # Office docs, modern image/video containers — mat2 leads. tools << :mat2 tools << :exiftool if prefer[:exiftool] != false else # Everything else (JPEG, MP3, RAW, …) — ExifTool is the gold standard. tools << :exiftool if prefer[:exiftool] != false tools << :mat2 if prefer[:mat2] != false && Mat2.supports?(path) end tools end |