Class: MimeMagic

Inherits:
Object
  • Object
show all
Defined in:
lib/mimemagic.rb,
lib/mimemagic/tables.rb,
lib/mimemagic/version.rb

Overview

Mime type detection

Constant Summary collapse

EXTENSIONS =
{}
TYPES =
{}
MAGIC =
[]
VERSION =

MimeMagic version string

'0.5.4'

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(type) ⇒ MimeMagic

Initialize a new MIME type by its string representation.

Parameters:

  • type (#to_s)

    the type to parse.



18
19
20
21
22
23
24
# File 'lib/mimemagic.rb', line 18

def initialize(type)
  @type, *params = type.to_s.strip.split(/(?:\s*;\s*)+/) # chop off params
  @type.downcase! # normalize the case
  # split parameter-value pairs if present
  @params = params.map { |x| x.split(/\s*=\s*/, 2) } unless params.empty?
  @mediatype, @subtype = @type.split ?/, 2 # split major and minor
end

Instance Attribute Details

#mediatypeObject (readonly)

Returns the value of attribute mediatype.



12
13
14
# File 'lib/mimemagic.rb', line 12

def mediatype
  @mediatype
end

#paramsObject (readonly)

Returns the value of attribute params.



12
13
14
# File 'lib/mimemagic.rb', line 12

def params
  @params
end

#subtypeObject (readonly)

Returns the value of attribute subtype.



12
13
14
# File 'lib/mimemagic.rb', line 12

def subtype
  @subtype
end

#typeObject (readonly)

Returns the value of attribute type.



12
13
14
# File 'lib/mimemagic.rb', line 12

def type
  @type
end

Class Method Details

.[](type) ⇒ MimeMagic?

Syntactic sugar alias for constructor. No-op if type is already a MimeMagic object. The argument is treated as a file extension if it doesn't contain a /, and may return nil if it doesn't resolve.

Parameters:

  • type (#to_s)

    a string-like object representing a MIME type or file extension.

Returns:

  • (MimeMagic, nil)

    the instantiated object.



36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/mimemagic.rb', line 36

def self.[] type
  # try noop first
  return type if type.is_a? self

  # now we handle the string
  type = type.to_s.strip
  # empty string should be default
  return default_type if type.empty?

  # this may return null
  return by_extension type unless type.include? ?/

  # otherwise pass to constructor
  new type
end

.add(type, extensions: [], parents: [], magic: [], comment: nil, aliases: []) ⇒ Object

Add a custom MIME type to the internal dictionary.

Parameters:

  • type (#to_s)

    the type

  • extensions (Array<#to_s>) (defaults to: [])

    file extensions

  • parents (Array<#to_s>) (defaults to: [])

    parent types

  • magic (Array) (defaults to: [])

    MIME "magic" specification

  • aliases (Array<#to_s>) (defaults to: [])

    alternative names for the type

  • comment (#to_s) (defaults to: nil)

    a comment



61
62
63
64
65
66
67
68
69
70
71
72
73
74
# File 'lib/mimemagic.rb', line 61

def self.add type,
    extensions: [], parents: [], magic: [], comment: nil, aliases: []
  type = type.to_s.strip.downcase
  extensions = [extensions].flatten.compact
  aliases = [[aliases] || []].flatten.compact
  t = TYPES[type] = [extensions, [parents].flatten.compact,
                 comment, type, aliases]
  aliases.each { |a| TYPES[a] = t }
  extensions.each {|ext| EXTENSIONS[ext] ||= type }

  MAGIC.unshift [type, magic] if magic

  true # output is ignored
end

.aliases(type) ⇒ Array<MimeMagic>

Return the type's aliases.

Parameters:

  • type (#to_s)

    the type to check

Returns:

  • (Array<MimeMagic>)

    the aliases, if any.



349
350
351
# File 'lib/mimemagic.rb', line 349

def self.aliases type
  self[type].aliases
end

.all_by_magic(io, default: false) ⇒ Array<MimeMagic>

Note:

This is a relatively slow operation.

Return all matching MIME types by magic content analysis. When default is true or a value, the result will never be empty.

Parameters:

  • io (#read, #to_s)

    the IO/String-like object to check for magic

  • default (false, true, #to_s, MimeMagic) (defaults to: false)

    a default fallback type

Returns:



315
316
317
318
319
320
# File 'lib/mimemagic.rb', line 315

def self.all_by_magic io, default: false
  default = coerce_default io, default
  out = magic_match(io, :select).map { |mime| new mime.first }
  out << default if out.empty? and default
  out
end

.binary?(thing) ⇒ true, ...

Determine if an input is binary. Not to be confused with the instance method #binary?, which concerns the type.

Parameters:

  • thing (#read, #to_s)

    the IO-like or String-like thing to test; can also be a file name/path/extension or MIME type.

Returns:

  • (true, false, nil)

    whether the input is binary (nil if indeterminate).



362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
# File 'lib/mimemagic.rb', line 362

def self.binary? thing
  sample = ''

  # get some stuff out of the IO or get a substring
  if thing.is_a? MimeMagic
    return thing.binary?
  elsif %i[seek tell read].all? { |m| thing.respond_to? m }
    pos = thing.tell
    thing.seek 0, 0
    sample = thing.read(256).to_s # handle empty
    thing.seek pos
  elsif thing.respond_to? :to_s
    str = thing.to_s
    # if it contains a slash it could be either a path or mimetype
    test = if str.include? ?/
             canonical(str) || by_extension(str.split(?.).last)
           else
             by_extension str.split(?.).last
           end

    return test.binary? if test

    sample = str[0, 256]
  else
    # nil if we don't know what this thing is
    return
  end

  # consider this to be 'binary' if empty
  return true if sample.empty?
  # control codes minus ordinary whitespace
  /[\x0-\x8\xe-\x1f\x7f]/n.match? sample.b
end

.by_extension(ext, default: false) ⇒ nil, MimeMagic

Look up MIME type by file extension. When default is true or a value, this method will always return a value.

Parameters:

  • path (#to_s)
  • default (false, true, #to_s, MimeMagic) (defaults to: false)

    a default fallback type

Returns:



270
271
272
273
274
275
# File 'lib/mimemagic.rb', line 270

def self.by_extension ext, default: false
  ext = ext.to_s.downcase.delete_prefix ?.
  default = coerce_default '', default
  mime = EXTENSIONS[ext]
  mime ? new(mime) : default
end

.by_magic(io, default: false) ⇒ nil, MimeMagic

Note:

This is a relatively slow operation.

Look up MIME type by magic content analysis. When default is true or a value, this method will always return a value.

Parameters:

  • io (#read, #to_s)

    the IO/String-like object to check for magic

  • default (false, true, #to_s, MimeMagic) (defaults to: false)

    a default fallback type

Returns:

  • (nil, MimeMagic)

    a matching type, if found.



299
300
301
302
303
# File 'lib/mimemagic.rb', line 299

def self.by_magic io, default: false
  default = coerce_default io, default
  mime = magic_match(io, :find) or return default
  new mime.first
end

.by_path(path, default: false) ⇒ nil, MimeMagic

Look up MIME type by file path. When default is true or a value, this method will always return a value.

Parameters:

  • path (#to_s)

    the file/path to check

  • default (false, true, #to_s, MimeMagic) (defaults to: false)

    a default fallback type

Returns:



285
286
287
# File 'lib/mimemagic.rb', line 285

def self.by_path path, default: false
  by_extension(File.extname(path), default: default)
end

.canonical(type) ⇒ MimeMagic?

Return the canonical type.

Parameters:

  • type (#to_s)

    the type to test

Returns:

  • (MimeMagic, nil)

    the canonical type, if present.



339
340
341
# File 'lib/mimemagic.rb', line 339

def self.canonical type
  self[type].canonical
end

.child?(child, parent, recurse: true) ⇒ true, false

Returns true if type is child of parent type.

Parameters:

  • child (#to_s)

    a candidate child type

  • parent (#to_s)

    a candidate parent type

Returns:

  • (true, false)

    whether self is a child of parent



329
330
331
# File 'lib/mimemagic.rb', line 329

def self.child?(child, parent, recurse: true)
  self[child].child_of? parent, recurse: recurse
end

.coerce_default(thing, default) ⇒ Object



410
411
412
413
414
415
416
417
418
# File 'lib/mimemagic.rb', line 410

def self.coerce_default thing, default
  case default
  when nil, false then nil
  when true then default_type thing
  when MimeMagic then default
  when String, -> x { x.respond_to? :to_s } then new default
  else default_type thing
  end
end

.default_type(thing = nil) ⇒ MimeMagic

Return either application/octet-stream or text/plain depending on whether the thing is binary.

Parameters:

  • thing (#read, #to_s) (defaults to: nil)

    the thing (IO-like, String-like, MIME type,

Returns:



403
404
405
406
# File 'lib/mimemagic.rb', line 403

def self.default_type thing = nil
  return new 'application/octet-stream' unless thing
  new(binary?(thing) ? 'application/octet-stream' : 'text/plain')
end

.get_matches(parent) ⇒ Object



18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# File 'lib/mimemagic/tables.rb', line 18

def self.get_matches(parent)
  parent.elements.map {|match|
    if match['mask']
      nil
    else
      type = match['type']
      value = match['value']
      offset = match['offset'].split(':').map {|x| x.to_i }
      offset = offset.size == 2 ? offset[0]..offset[1] : offset[0]
      case type
      when 'string'
        # This *one* pattern match, in the entirety of fd.o's mime types blows up the parser
        # because of the escape character \c, so right here we have a hideous hack to
        # accommodate that.
        if value == '\chapter'
          '\chapter'
        else
          value.gsub!(/\\(x[\dA-Fa-f]{1,2}|0\d{1,3}|\d{1,3}|.)/) {
            eval("\"\\#{$1}\"")
          }
        end
      when 'big16'
        value = str2int(value)
        value = ((value >> 8).chr + (value & 0xFF).chr)
      when 'big32'
        value = str2int(value)
        value = (((value >> 24) & 0xFF).chr + ((value >> 16) & 0xFF).chr + ((value >> 8) & 0xFF).chr + (value & 0xFF).chr)
      when 'little16'
        value = str2int(value)
        value = ((value & 0xFF).chr + (value >> 8).chr)
      when 'little32'
        value = str2int(value)
        value = ((value & 0xFF).chr + ((value >> 8) & 0xFF).chr + ((value >> 16) & 0xFF).chr + ((value >> 24) & 0xFF).chr)
      when 'host16' # use little endian
        value = str2int(value)
        value = ((value & 0xFF).chr + (value >> 8).chr)
      when 'host32' # use little endian
        value = str2int(value)
        value = ((value & 0xFF).chr + ((value >> 8) & 0xFF).chr + ((value >> 16) & 0xFF).chr + ((value >> 24) & 0xFF).chr)
      when 'byte'
        value = str2int(value)
        value = value.chr
      end
      children = get_matches(match)
      children.empty? ? [offset, value] : [offset, value, children]
    end
  }.compact
end

.magic_match(io, method) ⇒ Object



420
421
422
423
424
425
426
427
428
# File 'lib/mimemagic.rb', line 420

def self.magic_match(io, method)
  return magic_match(StringIO.new(io.to_s), method) unless io.respond_to?(:read)

  io.binmode if io.respond_to?(:binmode)
  io.set_encoding(Encoding::BINARY) if io.respond_to?(:set_encoding)
  buffer = "".encode(Encoding::BINARY)

  MAGIC.send(method) { |type, matches| magic_match_io(io, matches, buffer) }
end

.magic_match_io(io, matches, buffer) ⇒ Object



430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
# File 'lib/mimemagic.rb', line 430

def self.magic_match_io(io, matches, buffer)
  matches.any? do |offset, value, children|
    match =
      if Range === offset
        io.read(offset.begin, buffer)
        x = io.read(offset.end - offset.begin + value.bytesize, buffer)
        x && x.include?(value)
      else
        io.read(offset, buffer)
        io.read(value.bytesize, buffer) == value
      end
    io.rewind
    match && (!children || magic_match_io(io, children, buffer))
  end
end

.open_mime_databaseObject



67
68
69
70
# File 'lib/mimemagic/tables.rb', line 67

def self.open_mime_database
  path = MimeMagic::DATABASE_PATH
  File.open(path)
end

.parse_databaseObject



72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# File 'lib/mimemagic/tables.rb', line 72

def self.parse_database
  file = open_mime_database

  doc = Nokogiri::XML(file)
  extensions = {}
  types = {}
  magics = []
  (doc/'mime-info/mime-type').each do |mime|
    comments = Hash[*(mime/'comment').map {|comment| [comment['xml:lang'], comment.inner_text] }.flatten]
    type = mime['type']
    subclass = (mime/'sub-class-of').map{|x| x['type']}
    exts = (mime/'glob').map do |x|
      x['pattern'] =~ /^\*\.([^\[\]]+)$/ ? $1.downcase : nil
    end.compact

    (mime/'magic').each do |magic|
      priority = magic['priority'].to_i
      matches = get_matches(magic)
      magics << [priority, type, matches]
    end

    aliases = (mime/'alias/@type').map { |a| a.value.downcase.strip.freeze }

    # XXX uhh do we only use the type if it has a file extension??
    unless exts.empty?
      exts.each { |x| extensions[x] ||= type }
      types[type] = [exts, subclass, comments[nil], type, aliases]
      # don't add the aliases yet; we do that below
    end
  end

  magics = magics.sort {|a,b| [-a[0],a[1]] <=> [-b[0],b[1]] }

  common_types = [
    "image/jpeg",                                                              # .jpg
    "image/png",                                                               # .png
    "image/gif",                                                               # .gif
    "image/tiff",                                                              # .tiff
    "image/bmp",                                                               # .bmp
    "image/vnd.adobe.photoshop",                                               # .psd
    "image/webp",                                                              # .webp
    "image/svg+xml",                                                           # .svg

    "video/x-msvideo",                                                         # .avi
    "video/x-ms-wmv",                                                          # .wmv
    "video/mp4",                                                               # .mp4, .m4v
    "video/quicktime",                                                         # .mov
    "video/mpeg",                                                              # .mpeg
    "video/ogg",                                                               # .ogv
    "video/webm",                                                              # .webm
    "video/x-matroska",                                                        # .mkv
    "video/x-flv",                                                             # .flv

    "audio/mpeg",                                                              # .mp3
    "audio/x-wav",                                                             # .wav
    "audio/aac",                                                               # .aac
    "audio/flac",                                                              # .flac
    "audio/mp4",                                                               # .m4a
    "audio/ogg",                                                               # .ogg

    "application/pdf",                                                         # .pdf
    "application/msword",                                                      # .doc
    "application/vnd.openxmlformats-officedocument.wordprocessingml.document", # .docx
    "application/vnd.ms-powerpoint",                                           # .pps
    "application/vnd.openxmlformats-officedocument.presentationml.slideshow",  # .ppsx
    "application/vnd.ms-excel",                                                # .pps
    "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",       # .ppsx
  ]

  common_magics = common_types.map do |common_type|
    magics.find { |_, type, _| type == common_type }
  end

  magics = (common_magics.compact + magics).uniq

  extensions.keys.sort.each do |key|
    EXTENSIONS[key] = extensions[key]
  end

  types.keys.sort.each do |key|
    exts, parents, comment, canon, aliases = *types[key]

    parents.sort!
    aliases.sort!

    # we are copying it i guess
    t = TYPES[key] = [exts, parents, comment, canon, aliases].freeze

    # now do the aliases oops they'll be out of order oh well
    aliases.each { |a| TYPES[a] = t }
  end

  magics.each do |priority, type, matches|
    MAGIC << [type, matches]
  end
end

.remove(type) ⇒ Object

Note:

All associated extensions and magic are removed too.

Removes a MIME type from the dictionary. You might want to do this if you're seeing impossible conflicts (for instance, application/x-gmc-link).

Parameters:

  • type (#to_s)

    the type to remove.



83
84
85
86
87
88
89
# File 'lib/mimemagic.rb', line 83

def self.remove(type)
  EXTENSIONS.delete_if {|ext, t| t == type }
  MAGIC.delete_if {|t, m| t == type }
  TYPES.delete(type)

  true # output is also ignored
end

.str2int(s) ⇒ Object



12
13
14
15
16
# File 'lib/mimemagic/tables.rb', line 12

def self.str2int(s)
  return s.to_i(16) if s[0..1].downcase == '0x'
  return s.to_i(8) if s[0..0].downcase == '0'
  s.to_i(10)
end

Instance Method Details

#alias?false, true

Determine if the type is an alias.

Returns:

  • (false, true)

    whether the type is an alias.



144
145
146
# File 'lib/mimemagic.rb', line 144

def alias?
  type != canonical.type
end

#aliasesArray<MimeMagic>

Return the type's aliases.

Returns:

  • (Array<MimeMagic>)

    the aliases, if any.



134
135
136
137
138
# File 'lib/mimemagic.rb', line 134

def aliases
  TYPES.fetch(type.downcase, [nil, nil, nil, nil, []])[4].map do |t|
    self.class.new t
  end
end

#audio?Boolean

Determine if the type is audio.

Returns:

  • (Boolean)


98
# File 'lib/mimemagic.rb', line 98

def audio?; mediatype == 'audio'; end

#binary?true, ...

Determine if the type is a descendant of text/plain. Not to be confused with the class method binary?, which concerns arbitrary input.

Returns:

  • (true, false, nil)

    whether the type is binary.



209
210
211
# File 'lib/mimemagic.rb', line 209

def binary?
  not lineage.include? 'text/plain'
end

#canonicalMimeMagic?

Return the canonical type. Returns nil if the type is unknown to the registry.

Returns:

  • (MimeMagic, nil)

    the canonical type, if present.



124
125
126
127
128
# File 'lib/mimemagic.rb', line 124

def canonical
  t = TYPES[type.downcase] or return
  return self if type == t[3]
  self.class.new t[3]
end

#child_of?(parent, recurse: true) ⇒ true, false

Returns true if type is child of parent type. Behaves the same as

descendant_of? if recurse is true, which is the default.

Parameters:

  • parent (#to_s)

    a candidate parent type

  • recurse (true, false) (defaults to: true)

    whether to recurse

Returns:

  • (true, false)

    whether self is a child of parent



172
173
174
175
176
# File 'lib/mimemagic.rb', line 172

def child_of?(parent, recurse: true)
  return descendant_of? parent if recurse
  return unless c = canonical
  c.parents.include? self.class[parent].canonical
end

#commentnil, String

Get MIME comment.

Returns:

  • (nil, String)

    the comment



115
116
117
# File 'lib/mimemagic.rb', line 115

def comment
  TYPES.fetch(type, [nil, nil, nil])[2].to_s.dup
end

#descendant_of?(ancestor) ⇒ true, false

Returns true if the ancestor type is anywhere in the subject type's lineage. Always returns false if either self or ancestor are unknown to the type registry.

Parameters:

  • ancestor (MimeType, #to_s)

    the candidate ancestor type

Returns:

  • (true, false)

    whether self is a descendant of ancestor



156
157
158
159
160
161
162
# File 'lib/mimemagic.rb', line 156

def descendant_of? ancestor
  # always false if we don't know what this is
  return unless c = canonical

  # ancestor canonical could be nil which will be false
  c.lineage.include? self.class[ancestor].canonical
end

#eql?(other) ⇒ false, true Also known as: ==

Compare the equality of the type with another (or plain string).

Parameters:

  • other (#to_s)

    the other to test

Returns:

  • (false, true)

    whether the two are equal.



219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/mimemagic.rb', line 219

def eql?(other)
  # coerce the rhs
  other = self.class[other] || self.class.default_type

  # check for an exact match
  return true if type == other.type

  # now canonicalize both sides and check
  lhs = canonical
  rhs = other.canonical

  lhs && rhs && lhs.type == rhs.type
end

#extensionsArray<String>

Get string list of file extensions.

Returns:

  • (Array<String>)

    associated file extensions.



107
108
109
# File 'lib/mimemagic.rb', line 107

def extensions
  TYPES.fetch(type, [[]]).first.map { |e| e.to_s.dup }
end

#hashInteger

Return the object's (the underlying type string) hash.

Returns:

  • (Integer)

    the hash value.



239
240
241
# File 'lib/mimemagic.rb', line 239

def hash
  type.hash
end

#image?Boolean

Determine if the type is an image.

Returns:

  • (Boolean)


95
# File 'lib/mimemagic.rb', line 95

def image?; mediatype == 'image'; end

#inspectString

Return a diagnostic representation of the object.

Returns:

  • (String)

    a string representing the object.



255
256
257
258
259
260
# File 'lib/mimemagic.rb', line 255

def inspect
  out = @type
  out = [out, @params.map { |x| x.join ?= }].join ?; if
    @params and !@params.empty?
  %q[<%s "%s">] % [self.class, out]
end

#lineageArray<MimeMagic> Also known as: ancestor_types

Fetches the entire inheritance hierarchy for the given MIME type.

Returns:



197
198
199
# File 'lib/mimemagic.rb', line 197

def lineage
  ([canonical || self] + parents.map { |t| t.lineage }.flatten).uniq
end

#parentsArray<MimeMagic>

Fetches the immediate parent types.

Returns:



182
183
184
185
186
187
188
189
190
191
# File 'lib/mimemagic.rb', line 182

def parents
  out = TYPES.fetch(type.to_s.downcase, [nil, []])[1].map do |x|
    self.class.new x
  end
  # add this unless we're it
  out << self.class.new('application/octet-stream') if
    out.empty? and type.downcase != 'application/octet-stream'

  out.uniq
end

#text?Boolean

Returns true if type is a text format.

Returns:

  • (Boolean)


92
# File 'lib/mimemagic.rb', line 92

def text?; mediatype == 'text' || descendant_of?('text/plain'); end

#to_sString

Return the type as a string.

Returns:

  • (String)

    the type, as a string.



247
248
249
# File 'lib/mimemagic.rb', line 247

def to_s
  type
end

#video?Boolean

Determine if the type is video.

Returns:

  • (Boolean)


101
# File 'lib/mimemagic.rb', line 101

def video?; mediatype == 'video'; end