Class: MimeMagic
- Inherits:
-
Object
- Object
- MimeMagic
- Defined in:
- lib/mimemagic.rb,
lib/mimemagic/tables.rb,
lib/mimemagic/version.rb
Overview
Mime type detection
Constant Summary collapse
- EXTENSIONS =
{}
- TYPES =
{}
- MAGIC =
[]
- VERSION =
MimeMagic version string
'0.5.4'
Instance Attribute Summary collapse
-
#mediatype ⇒ Object
readonly
Returns the value of attribute mediatype.
-
#params ⇒ Object
readonly
Returns the value of attribute params.
-
#subtype ⇒ Object
readonly
Returns the value of attribute subtype.
-
#type ⇒ Object
readonly
Returns the value of attribute type.
Class Method Summary collapse
-
.[](type) ⇒ MimeMagic?
Syntactic sugar alias for constructor.
-
.add(type, extensions: [], parents: [], magic: [], comment: nil, aliases: []) ⇒ Object
Add a custom MIME type to the internal dictionary.
-
.aliases(type) ⇒ Array<MimeMagic>
Return the type's aliases.
-
.all_by_magic(io, default: false) ⇒ Array<MimeMagic>
Return all matching MIME types by magic content analysis.
-
.binary?(thing) ⇒ true, ...
Determine if an input is binary.
-
.by_extension(ext, default: false) ⇒ nil, MimeMagic
Look up MIME type by file extension.
-
.by_magic(io, default: false) ⇒ nil, MimeMagic
Look up MIME type by magic content analysis.
-
.by_path(path, default: false) ⇒ nil, MimeMagic
Look up MIME type by file path.
-
.canonical(type) ⇒ MimeMagic?
Return the canonical type.
-
.child?(child, parent, recurse: true) ⇒ true, false
Returns true if type is child of parent type.
- .coerce_default(thing, default) ⇒ Object
-
.default_type(thing = nil) ⇒ MimeMagic
Return either
application/octet-streamortext/plaindepending on whether the thing is binary. - .get_matches(parent) ⇒ Object
- .magic_match(io, method) ⇒ Object
- .magic_match_io(io, matches, buffer) ⇒ Object
- .open_mime_database ⇒ Object
- .parse_database ⇒ Object
-
.remove(type) ⇒ Object
Removes a MIME type from the dictionary.
- .str2int(s) ⇒ Object
Instance Method Summary collapse
-
#alias? ⇒ false, true
Determine if the type is an alias.
-
#aliases ⇒ Array<MimeMagic>
Return the type's aliases.
-
#audio? ⇒ Boolean
Determine if the type is audio.
-
#binary? ⇒ true, ...
Determine if the type is a descendant of
text/plain. -
#canonical ⇒ MimeMagic?
Return the canonical type.
-
#child_of?(parent, recurse: true) ⇒ true, false
Returns true if type is child of parent type.
-
#comment ⇒ nil, String
Get MIME comment.
-
#descendant_of?(ancestor) ⇒ true, false
Returns true if the ancestor type is anywhere in the subject type's lineage.
-
#eql?(other) ⇒ false, true
(also: #==)
Compare the equality of the type with another (or plain string).
-
#extensions ⇒ Array<String>
Get string list of file extensions.
-
#hash ⇒ Integer
Return the object's (the underlying type string) hash.
-
#image? ⇒ Boolean
Determine if the type is an image.
-
#initialize(type) ⇒ MimeMagic
constructor
Initialize a new MIME type by its string representation.
-
#inspect ⇒ String
Return a diagnostic representation of the object.
-
#lineage ⇒ Array<MimeMagic>
(also: #ancestor_types)
Fetches the entire inheritance hierarchy for the given MIME type.
-
#parents ⇒ Array<MimeMagic>
Fetches the immediate parent types.
-
#text? ⇒ Boolean
Returns true if type is a text format.
-
#to_s ⇒ String
Return the type as a string.
-
#video? ⇒ Boolean
Determine if the type is video.
Constructor Details
#initialize(type) ⇒ MimeMagic
Initialize a new MIME type by its string representation.
18 19 20 21 22 23 24 |
# File 'lib/mimemagic.rb', line 18 def initialize(type) @type, *params = type.to_s.strip.split(/(?:\s*;\s*)+/) # chop off params @type.downcase! # normalize the case # split parameter-value pairs if present @params = params.map { |x| x.split(/\s*=\s*/, 2) } unless params.empty? @mediatype, @subtype = @type.split ?/, 2 # split major and minor end |
Instance Attribute Details
#mediatype ⇒ Object (readonly)
Returns the value of attribute mediatype.
12 13 14 |
# File 'lib/mimemagic.rb', line 12 def mediatype @mediatype end |
#params ⇒ Object (readonly)
Returns the value of attribute params.
12 13 14 |
# File 'lib/mimemagic.rb', line 12 def params @params end |
#subtype ⇒ Object (readonly)
Returns the value of attribute subtype.
12 13 14 |
# File 'lib/mimemagic.rb', line 12 def subtype @subtype end |
#type ⇒ Object (readonly)
Returns the value of attribute type.
12 13 14 |
# File 'lib/mimemagic.rb', line 12 def type @type end |
Class Method Details
.[](type) ⇒ MimeMagic?
Syntactic sugar alias for constructor. No-op if type is already
a MimeMagic object. The argument is treated as a file extension
if it doesn't contain a /, and may return nil if it doesn't
resolve.
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
# File 'lib/mimemagic.rb', line 36 def self.[] type # try noop first return type if type.is_a? self # now we handle the string type = type.to_s.strip # empty string should be default return default_type if type.empty? # this may return null return by_extension type unless type.include? ?/ # otherwise pass to constructor new type end |
.add(type, extensions: [], parents: [], magic: [], comment: nil, aliases: []) ⇒ Object
Add a custom MIME type to the internal dictionary.
61 62 63 64 65 66 67 68 69 70 71 72 73 74 |
# File 'lib/mimemagic.rb', line 61 def self.add type, extensions: [], parents: [], magic: [], comment: nil, aliases: [] type = type.to_s.strip.downcase extensions = [extensions].flatten.compact aliases = [[aliases] || []].flatten.compact t = TYPES[type] = [extensions, [parents].flatten.compact, comment, type, aliases] aliases.each { |a| TYPES[a] = t } extensions.each {|ext| EXTENSIONS[ext] ||= type } MAGIC.unshift [type, magic] if magic true # output is ignored end |
.aliases(type) ⇒ Array<MimeMagic>
Return the type's aliases.
349 350 351 |
# File 'lib/mimemagic.rb', line 349 def self.aliases type self[type].aliases end |
.all_by_magic(io, default: false) ⇒ Array<MimeMagic>
This is a relatively slow operation.
Return all matching MIME types by magic content analysis. When
default is true or a value, the result will never be empty.
315 316 317 318 319 320 |
# File 'lib/mimemagic.rb', line 315 def self.all_by_magic io, default: false default = coerce_default io, default out = magic_match(io, :select).map { |mime| new mime.first } out << default if out.empty? and default out end |
.binary?(thing) ⇒ true, ...
Determine if an input is binary. Not to be confused with the instance method #binary?, which concerns the type.
362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 |
# File 'lib/mimemagic.rb', line 362 def self.binary? thing sample = '' # get some stuff out of the IO or get a substring if thing.is_a? MimeMagic return thing.binary? elsif %i[seek tell read].all? { |m| thing.respond_to? m } pos = thing.tell thing.seek 0, 0 sample = thing.read(256).to_s # handle empty thing.seek pos elsif thing.respond_to? :to_s str = thing.to_s # if it contains a slash it could be either a path or mimetype test = if str.include? ?/ canonical(str) || by_extension(str.split(?.).last) else by_extension str.split(?.).last end return test.binary? if test sample = str[0, 256] else # nil if we don't know what this thing is return end # consider this to be 'binary' if empty return true if sample.empty? # control codes minus ordinary whitespace /[\x0-\x8\xe-\x1f\x7f]/n.match? sample.b end |
.by_extension(ext, default: false) ⇒ nil, MimeMagic
Look up MIME type by file extension. When default is true or a
value, this method will always return a value.
270 271 272 273 274 275 |
# File 'lib/mimemagic.rb', line 270 def self.by_extension ext, default: false ext = ext.to_s.downcase.delete_prefix ?. default = coerce_default '', default mime = EXTENSIONS[ext] mime ? new(mime) : default end |
.by_magic(io, default: false) ⇒ nil, MimeMagic
This is a relatively slow operation.
Look up MIME type by magic content analysis. When default is true or a
value, this method will always return a value.
299 300 301 302 303 |
# File 'lib/mimemagic.rb', line 299 def self.by_magic io, default: false default = coerce_default io, default mime = magic_match(io, :find) or return default new mime.first end |
.by_path(path, default: false) ⇒ nil, MimeMagic
Look up MIME type by file path. When default is true or a value,
this method will always return a value.
285 286 287 |
# File 'lib/mimemagic.rb', line 285 def self.by_path path, default: false by_extension(File.extname(path), default: default) end |
.canonical(type) ⇒ MimeMagic?
Return the canonical type.
339 340 341 |
# File 'lib/mimemagic.rb', line 339 def self.canonical type self[type].canonical end |
.child?(child, parent, recurse: true) ⇒ true, false
Returns true if type is child of parent type.
329 330 331 |
# File 'lib/mimemagic.rb', line 329 def self.child?(child, parent, recurse: true) self[child].child_of? parent, recurse: recurse end |
.coerce_default(thing, default) ⇒ Object
410 411 412 413 414 415 416 417 418 |
# File 'lib/mimemagic.rb', line 410 def self.coerce_default thing, default case default when nil, false then nil when true then default_type thing when MimeMagic then default when String, -> x { x.respond_to? :to_s } then new default else default_type thing end end |
.default_type(thing = nil) ⇒ MimeMagic
Return either application/octet-stream or text/plain depending
on whether the thing is binary.
403 404 405 406 |
# File 'lib/mimemagic.rb', line 403 def self.default_type thing = nil return new 'application/octet-stream' unless thing new(binary?(thing) ? 'application/octet-stream' : 'text/plain') end |
.get_matches(parent) ⇒ Object
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
# File 'lib/mimemagic/tables.rb', line 18 def self.get_matches(parent) parent.elements.map {|match| if match['mask'] nil else type = match['type'] value = match['value'] offset = match['offset'].split(':').map {|x| x.to_i } offset = offset.size == 2 ? offset[0]..offset[1] : offset[0] case type when 'string' # This *one* pattern match, in the entirety of fd.o's mime types blows up the parser # because of the escape character \c, so right here we have a hideous hack to # accommodate that. if value == '\chapter' '\chapter' else value.gsub!(/\\(x[\dA-Fa-f]{1,2}|0\d{1,3}|\d{1,3}|.)/) { eval("\"\\#{$1}\"") } end when 'big16' value = str2int(value) value = ((value >> 8).chr + (value & 0xFF).chr) when 'big32' value = str2int(value) value = (((value >> 24) & 0xFF).chr + ((value >> 16) & 0xFF).chr + ((value >> 8) & 0xFF).chr + (value & 0xFF).chr) when 'little16' value = str2int(value) value = ((value & 0xFF).chr + (value >> 8).chr) when 'little32' value = str2int(value) value = ((value & 0xFF).chr + ((value >> 8) & 0xFF).chr + ((value >> 16) & 0xFF).chr + ((value >> 24) & 0xFF).chr) when 'host16' # use little endian value = str2int(value) value = ((value & 0xFF).chr + (value >> 8).chr) when 'host32' # use little endian value = str2int(value) value = ((value & 0xFF).chr + ((value >> 8) & 0xFF).chr + ((value >> 16) & 0xFF).chr + ((value >> 24) & 0xFF).chr) when 'byte' value = str2int(value) value = value.chr end children = get_matches(match) children.empty? ? [offset, value] : [offset, value, children] end }.compact end |
.magic_match(io, method) ⇒ Object
420 421 422 423 424 425 426 427 428 |
# File 'lib/mimemagic.rb', line 420 def self.magic_match(io, method) return magic_match(StringIO.new(io.to_s), method) unless io.respond_to?(:read) io.binmode if io.respond_to?(:binmode) io.set_encoding(Encoding::BINARY) if io.respond_to?(:set_encoding) buffer = "".encode(Encoding::BINARY) MAGIC.send(method) { |type, matches| magic_match_io(io, matches, buffer) } end |
.magic_match_io(io, matches, buffer) ⇒ Object
430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 |
# File 'lib/mimemagic.rb', line 430 def self.magic_match_io(io, matches, buffer) matches.any? do |offset, value, children| match = if Range === offset io.read(offset.begin, buffer) x = io.read(offset.end - offset.begin + value.bytesize, buffer) x && x.include?(value) else io.read(offset, buffer) io.read(value.bytesize, buffer) == value end io.rewind match && (!children || magic_match_io(io, children, buffer)) end end |
.open_mime_database ⇒ Object
67 68 69 70 |
# File 'lib/mimemagic/tables.rb', line 67 def self.open_mime_database path = MimeMagic::DATABASE_PATH File.open(path) end |
.parse_database ⇒ Object
72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
# File 'lib/mimemagic/tables.rb', line 72 def self.parse_database file = open_mime_database doc = Nokogiri::XML(file) extensions = {} types = {} magics = [] (doc/'mime-info/mime-type').each do |mime| comments = Hash[*(mime/'comment').map {|comment| [comment['xml:lang'], comment.inner_text] }.flatten] type = mime['type'] subclass = (mime/'sub-class-of').map{|x| x['type']} exts = (mime/'glob').map do |x| x['pattern'] =~ /^\*\.([^\[\]]+)$/ ? $1.downcase : nil end.compact (mime/'magic').each do |magic| priority = magic['priority'].to_i matches = get_matches(magic) magics << [priority, type, matches] end aliases = (mime/'alias/@type').map { |a| a.value.downcase.strip.freeze } # XXX uhh do we only use the type if it has a file extension?? unless exts.empty? exts.each { |x| extensions[x] ||= type } types[type] = [exts, subclass, comments[nil], type, aliases] # don't add the aliases yet; we do that below end end magics = magics.sort {|a,b| [-a[0],a[1]] <=> [-b[0],b[1]] } common_types = [ "image/jpeg", # .jpg "image/png", # .png "image/gif", # .gif "image/tiff", # .tiff "image/bmp", # .bmp "image/vnd.adobe.photoshop", # .psd "image/webp", # .webp "image/svg+xml", # .svg "video/x-msvideo", # .avi "video/x-ms-wmv", # .wmv "video/mp4", # .mp4, .m4v "video/quicktime", # .mov "video/mpeg", # .mpeg "video/ogg", # .ogv "video/webm", # .webm "video/x-matroska", # .mkv "video/x-flv", # .flv "audio/mpeg", # .mp3 "audio/x-wav", # .wav "audio/aac", # .aac "audio/flac", # .flac "audio/mp4", # .m4a "audio/ogg", # .ogg "application/pdf", # .pdf "application/msword", # .doc "application/vnd.openxmlformats-officedocument.wordprocessingml.document", # .docx "application/vnd.ms-powerpoint", # .pps "application/vnd.openxmlformats-officedocument.presentationml.slideshow", # .ppsx "application/vnd.ms-excel", # .pps "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet", # .ppsx ] common_magics = common_types.map do |common_type| magics.find { |_, type, _| type == common_type } end magics = (common_magics.compact + magics).uniq extensions.keys.sort.each do |key| EXTENSIONS[key] = extensions[key] end types.keys.sort.each do |key| exts, parents, comment, canon, aliases = *types[key] parents.sort! aliases.sort! # we are copying it i guess t = TYPES[key] = [exts, parents, comment, canon, aliases].freeze # now do the aliases oops they'll be out of order oh well aliases.each { |a| TYPES[a] = t } end magics.each do |priority, type, matches| MAGIC << [type, matches] end end |
.remove(type) ⇒ Object
All associated extensions and magic are removed too.
Removes a MIME type from the dictionary. You might want to do this if you're seeing impossible conflicts (for instance, application/x-gmc-link).
83 84 85 86 87 88 89 |
# File 'lib/mimemagic.rb', line 83 def self.remove(type) EXTENSIONS.delete_if {|ext, t| t == type } MAGIC.delete_if {|t, m| t == type } TYPES.delete(type) true # output is also ignored end |
.str2int(s) ⇒ Object
12 13 14 15 16 |
# File 'lib/mimemagic/tables.rb', line 12 def self.str2int(s) return s.to_i(16) if s[0..1].downcase == '0x' return s.to_i(8) if s[0..0].downcase == '0' s.to_i(10) end |
Instance Method Details
#alias? ⇒ false, true
Determine if the type is an alias.
144 145 146 |
# File 'lib/mimemagic.rb', line 144 def alias? type != canonical.type end |
#aliases ⇒ Array<MimeMagic>
Return the type's aliases.
134 135 136 137 138 |
# File 'lib/mimemagic.rb', line 134 def aliases TYPES.fetch(type.downcase, [nil, nil, nil, nil, []])[4].map do |t| self.class.new t end end |
#audio? ⇒ Boolean
Determine if the type is audio.
98 |
# File 'lib/mimemagic.rb', line 98 def audio?; mediatype == 'audio'; end |
#binary? ⇒ true, ...
Determine if the type is a descendant of text/plain. Not to be
confused with the class method binary?, which concerns
arbitrary input.
209 210 211 |
# File 'lib/mimemagic.rb', line 209 def binary? not lineage.include? 'text/plain' end |
#canonical ⇒ MimeMagic?
Return the canonical type. Returns nil if the type is unknown to
the registry.
124 125 126 127 128 |
# File 'lib/mimemagic.rb', line 124 def canonical t = TYPES[type.downcase] or return return self if type == t[3] self.class.new t[3] end |
#child_of?(parent, recurse: true) ⇒ true, false
Returns true if type is child of parent type. Behaves the same as
descendant_of? if recurse is true, which is the default.
172 173 174 175 176 |
# File 'lib/mimemagic.rb', line 172 def child_of?(parent, recurse: true) return descendant_of? parent if recurse return unless c = canonical c.parents.include? self.class[parent].canonical end |
#comment ⇒ nil, String
Get MIME comment.
115 116 117 |
# File 'lib/mimemagic.rb', line 115 def comment TYPES.fetch(type, [nil, nil, nil])[2].to_s.dup end |
#descendant_of?(ancestor) ⇒ true, false
Returns true if the ancestor type is anywhere in the subject
type's lineage. Always returns false if either self or
ancestor are unknown to the type registry.
156 157 158 159 160 161 162 |
# File 'lib/mimemagic.rb', line 156 def descendant_of? ancestor # always false if we don't know what this is return unless c = canonical # ancestor canonical could be nil which will be false c.lineage.include? self.class[ancestor].canonical end |
#eql?(other) ⇒ false, true Also known as: ==
Compare the equality of the type with another (or plain string).
219 220 221 222 223 224 225 226 227 228 229 230 231 |
# File 'lib/mimemagic.rb', line 219 def eql?(other) # coerce the rhs other = self.class[other] || self.class.default_type # check for an exact match return true if type == other.type # now canonicalize both sides and check lhs = canonical rhs = other.canonical lhs && rhs && lhs.type == rhs.type end |
#extensions ⇒ Array<String>
Get string list of file extensions.
107 108 109 |
# File 'lib/mimemagic.rb', line 107 def extensions TYPES.fetch(type, [[]]).first.map { |e| e.to_s.dup } end |
#hash ⇒ Integer
Return the object's (the underlying type string) hash.
239 240 241 |
# File 'lib/mimemagic.rb', line 239 def hash type.hash end |
#image? ⇒ Boolean
Determine if the type is an image.
95 |
# File 'lib/mimemagic.rb', line 95 def image?; mediatype == 'image'; end |
#inspect ⇒ String
Return a diagnostic representation of the object.
255 256 257 258 259 260 |
# File 'lib/mimemagic.rb', line 255 def inspect out = @type out = [out, @params.map { |x| x.join ?= }].join ?; if @params and !@params.empty? %q[<%s "%s">] % [self.class, out] end |
#lineage ⇒ Array<MimeMagic> Also known as: ancestor_types
Fetches the entire inheritance hierarchy for the given MIME type.
197 198 199 |
# File 'lib/mimemagic.rb', line 197 def lineage ([canonical || self] + parents.map { |t| t.lineage }.flatten).uniq end |
#parents ⇒ Array<MimeMagic>
Fetches the immediate parent types.
182 183 184 185 186 187 188 189 190 191 |
# File 'lib/mimemagic.rb', line 182 def parents out = TYPES.fetch(type.to_s.downcase, [nil, []])[1].map do |x| self.class.new x end # add this unless we're it out << self.class.new('application/octet-stream') if out.empty? and type.downcase != 'application/octet-stream' out.uniq end |
#text? ⇒ Boolean
Returns true if type is a text format.
92 |
# File 'lib/mimemagic.rb', line 92 def text?; mediatype == 'text' || descendant_of?('text/plain'); end |
#to_s ⇒ String
Return the type as a string.
247 248 249 |
# File 'lib/mimemagic.rb', line 247 def to_s type end |
#video? ⇒ Boolean
Determine if the type is video.
101 |
# File 'lib/mimemagic.rb', line 101 def video?; mediatype == 'video'; end |