Module: SafeImage::SvgMetadata

Defined in:
lib/safe_image/svg_metadata.rb

Constant Summary collapse

MAX_SVG_BYTES =
1 * 1024 * 1024
MAX_SVG_DEPTH =
64
MAX_SVG_ELEMENTS =
10_000
MAX_SVG_ATTRIBUTES =
50_000
MAX_SVG_DIMENSION =
100_000
MAX_SVG_PIXELS =
100_000_000
LENGTH_PATTERN =
/\A\s*([+]?(?:\d+(?:\.\d+)?|\.\d+))(?:px)?\s*\z/i.freeze
VIEWBOX_SPLIT =
/[\s,]+/.freeze
NON_UTF8_BOMS =

Byte-order marks for the multi-byte encodings whose ASCII characters our byte-level scans below cannot see through. XML mandates a BOM for UTF-16 and UTF-32, so a document in one of these encodings either carries a BOM here or contains NUL bytes for its ASCII characters (caught separately). Order matters: the UTF-32 LE mark begins with the UTF-16 LE mark.

[
  "\xFF\xFE\x00\x00".b, # UTF-32 LE
  "\x00\x00\xFE\xFF".b, # UTF-32 BE
  "\xFF\xFE".b, # UTF-16 LE
  "\xFE\xFF".b # UTF-16 BE
].freeze
UTF8_BOM =
"\xEF\xBB\xBF".b.freeze
SAFE_DECLARED_ENCODING =

Declared encodings we accept: UTF-8/ASCII plus the single-byte, ASCII-transparent legacy charsets (ISO-8859-*, Windows-125x). Their bytes below 0x80 decode to identical ASCII, so the byte scans below see the same markup any XML decoder or browser does; and being single-byte, no lead byte can swallow a following quote the way Shift-JIS, GBK, or Big5 can. Multi-byte (Shift-JIS, GBK, EUC-*, ISO-2022-*), transforming (UTF-7: “+ADw-” decodes to “<”), and NUL-interleaved (UTF-16/32) encodings are deliberately excluded — they let bytes our ASCII scans cannot see become markup the parser acts on. The shape match alone is not airtight: “utf8” or “windows-1259” fit the pattern yet name no real encoding, so a name must also resolve via Encoding.find to pass — lookalikes fail closed here instead of leaking a parser encoding error to the caller.

/\A(?:utf-?8|us-ascii|ascii|iso-?8859-?\d{1,2}|(?:windows|cp)-?125\d)\z/i.freeze
XML_DECL_ENCODING =

ASCII-only so it matches the binary buffer; the optional BOM is stripped before matching rather than embedded here (which would make this UTF-8).

/\A\s*<\?xml\b[^>]*?\bencoding\s*=\s*["']([^"']+)["']/i.freeze

Class Method Summary collapse

Class Method Details

.cap_scanner_classObject

The SAX cap-enforcement handler, built lazily and memoised the first time an SVG is scanned. It subclasses Nokogiri::XML::SAX::Document, so it cannot be declared at file-load time without forcing nokogiri to load eagerly and defeating the lazy require above. A breached cap raises LimitError straight out of a callback; libxml2 propagates it at the next event boundary, so the parse aborts promptly rather than scanning to the end (verified: rejection time grows far slower than input size).



235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
# File 'lib/safe_image/svg_metadata.rb', line 235

def cap_scanner_class
  @cap_scanner_class ||=
    Class.new(Nokogiri::XML::SAX::Document) do
      attr_reader :root_name, :root_attributes, :parse_error

      def initialize
        super
        @depth = -1
        @elements = 0
        @attributes = 0
        @root_name = nil
        @root_attributes = nil
        @parse_error = nil
      end

      # attrs: array of Nokogiri::XML::SAX::Parser::Attribute (localname/value),
      # NOT including namespace declarations; `ns` carries the xmlns decls. Both
      # count toward the attribute cap so the bound cannot be sidestepped by
      # spraying namespace declarations.
      def start_element_namespace(name, attrs = [], _prefix = nil, _uri = nil, ns = [])
        @depth += 1
        raise LimitError, "SVG nesting exceeds #{MAX_SVG_DEPTH}" if @depth > MAX_SVG_DEPTH

        @elements += 1
        raise LimitError, "SVG has too many elements" if @elements > MAX_SVG_ELEMENTS

        @attributes += attrs.length + ns.length
        raise LimitError, "SVG has too many attributes" if @attributes > MAX_SVG_ATTRIBUTES

        return unless @root_name.nil?

        @root_name = name
        @root_attributes =
          attrs.each_with_object({}) do |attr, hash|
            # Dimensions are security-relevant: only the actual no-namespace
            # root attributes a browser will use may feed the pixel cap. A
            # namespaced e:width/e:height must not shadow width/height here.
            next unless attr.prefix.to_s.empty? && attr.uri.to_s.empty?

            hash[attr.localname] = attr.value
          end
      end

      def end_element_namespace(_name, _prefix = nil, _uri = nil)
        @depth -= 1
      end

      # libxml2 reports well-formedness violations here rather than raising;
      # record the first so scan_svg! can reject on it.
      def error(message)
        @parse_error ||= message.to_s.strip
      end

      def warning(_message)
      end
    end
end

.dimensions(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) ⇒ Object



62
63
64
65
66
# File 'lib/safe_image/svg_metadata.rb', line 62

def dimensions(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES)
  xml = read_svg(path, max_bytes: max_bytes)
  _name, attributes = scan_svg!(xml)
  dimensions_from_attributes(attributes, max_pixels: max_pixels)
end

.dimensions_from_attributes(attributes, max_pixels: nil) ⇒ Object

Computes and validates the document dimensions from the already-scanned root attributes, so a caller that has run scan_svg! does not re-read or re-scan the file. Same width/height-then-viewBox fallback and limits as dimensions above.



72
73
74
75
76
77
78
79
80
81
82
83
# File 'lib/safe_image/svg_metadata.rb', line 72

def dimensions_from_attributes(attributes, max_pixels: nil)
  width = parse_length(attributes["width"])
  height = parse_length(attributes["height"])

  unless width && height
    view_box = parse_view_box(attributes["viewBox"])
    width ||= view_box&.fetch(2)
    height ||= view_box&.fetch(3)
  end

  validate_dimensions!(width, height, max_pixels: max_pixels)
end

.known_encoding?(name) ⇒ Boolean

Returns:

  • (Boolean)


131
132
133
134
135
136
# File 'lib/safe_image/svg_metadata.rb', line 131

def known_encoding?(name)
  Encoding.find(name)
  true
rescue ArgumentError
  false
end

.parse_length(value) ⇒ Object



138
139
140
141
142
143
144
145
146
147
148
149
# File 'lib/safe_image/svg_metadata.rb', line 138

def parse_length(value)
  value = value.to_s
  match = LENGTH_PATTERN.match(value)
  return nil unless match

  number = Float(match[1])
  return nil unless number.finite? && number.positive?

  number
rescue ArgumentError
  nil
end

.parse_view_box(value) ⇒ Object



151
152
153
154
155
156
157
158
159
160
161
# File 'lib/safe_image/svg_metadata.rb', line 151

def parse_view_box(value)
  parts = value.to_s.strip.split(VIEWBOX_SPLIT)
  return nil unless parts.length == 4

  numbers = parts.map { |part| Float(part) }
  return nil unless numbers.all?(&:finite?) && numbers[2].positive? && numbers[3].positive?

  numbers
rescue ArgumentError
  nil
end

.probe(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) ⇒ Object



49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/safe_image/svg_metadata.rb', line 49

def probe(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES)
  started = Process.clock_gettime(Process::CLOCK_MONOTONIC)
  path = safe_svg_path(path)
  width, height = dimensions(path, max_pixels: max_pixels, max_bytes: max_bytes)
  {
    input_format: "svg",
    width: width,
    height: height,
    frames: 1,
    duration_ms: (Process.clock_gettime(Process::CLOCK_MONOTONIC) - started) * 1000
  }
end

.read_svg(path, max_bytes: MAX_SVG_BYTES) ⇒ Object

Raises:



85
86
87
88
89
90
91
92
93
94
# File 'lib/safe_image/svg_metadata.rb', line 85

def read_svg(path, max_bytes: MAX_SVG_BYTES)
  path = safe_svg_path(path)
  size = File.size(path)
  raise LimitError, "SVG exceeds #{max_bytes} bytes" if size > max_bytes

  xml = File.binread(path, max_bytes + 1) || "".b
  raise LimitError, "SVG exceeds #{max_bytes} bytes" if xml.bytesize > max_bytes
  reject_unsafe_xml!(xml)
  xml
end

.reject_unsafe_encoding!(xml) ⇒ Object

Raises:



113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
# File 'lib/safe_image/svg_metadata.rb', line 113

def reject_unsafe_encoding!(xml)
  bytes = xml.b
  # UTF-16/UTF-32 interleave NUL bytes between ASCII characters, hiding
  # "<!DOCTYPE" from the ASCII scans while the XML parser still decodes and
  # honours it. (NUL is invalid in XML 1.0 regardless, so this also rejects
  # garbage.)
  if NON_UTF8_BOMS.any? { |bom| bytes.start_with?(bom) } || bytes.include?("\x00".b)
    raise InvalidImageError, "SVG must use a single-byte or UTF-8 encoding"
  end

  bytes = bytes.byteslice(UTF8_BOM.bytesize..) if bytes.start_with?(UTF8_BOM)
  match = bytes.match(XML_DECL_ENCODING)
  return unless match
  return if match[1].match?(SAFE_DECLARED_ENCODING) && known_encoding?(match[1])

  raise InvalidImageError, "unsupported SVG encoding: #{match[1]}"
end

.reject_unsafe_xml!(xml) ⇒ Object

Raises:



102
103
104
105
106
107
108
109
110
111
# File 'lib/safe_image/svg_metadata.rb', line 102

def reject_unsafe_xml!(xml)
  # The DOCTYPE/PI scans below are ASCII byte regexes; they only see what
  # they expect when the bytes we scan decode to the same markup the XML
  # parser sees. That holds for UTF-8 and single-byte ASCII-transparent
  # charsets but not for UTF-16/32 or multi-byte/transforming encodings, so
  # reject those first.
  reject_unsafe_encoding!(xml)
  raise InvalidImageError, "doctype is not allowed in SVG" if xml.match?(/<!DOCTYPE/i)
  raise InvalidImageError, "XML processing instructions are not allowed in SVG" if xml.match?(/<\?(?!xml\s)/i)
end

.require_nokogiriObject

Loaded on first SVG use, not at file load: keeping the XML library off the hot path of every non-SVG operation where it would otherwise be paid for nothing.



224
225
226
# File 'lib/safe_image/svg_metadata.rb', line 224

def require_nokogiri
  require "nokogiri"
end

.safe_svg_path(path) ⇒ Object



96
97
98
99
100
# File 'lib/safe_image/svg_metadata.rb', line 96

def safe_svg_path(path)
  path = PathSafety.ensure_regular_file!(path)
  raise UnsupportedFormatError, "not an SVG file: #{path}" unless File.extname(path.to_s).downcase == ".svg"
  path.to_s
end

.scan_svg!(xml) ⇒ Object

Streams the document with a SAX parser, enforcing the structural caps as events arrive (see cap_scanner_class), so a hostile “millions of tiny elements” document is rejected at the cap without ever retaining the multi-million-object DOM a parse-then-validate approach would build. Returns the root element’s local name and a localname=>value hash of its attributes, matching the contract dimensions_from_attributes consumes.

SAX does NOT raise on malformed XML even with recovery disabled — it reports through the error callback and keeps going — so well-formedness is enforced by recording any reported error and rejecting after the parse. This preserves the old pull-parser’s reject set (unclosed/mismatched tags, trailing junk) and is strictly stricter on multiple root elements, which is a safe direction for a gate.

Raises:



197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
# File 'lib/safe_image/svg_metadata.rb', line 197

def scan_svg!(xml)
  require_nokogiri
  handler = cap_scanner_class.new
  parser = Nokogiri::XML::SAX::Parser.new(handler)
  begin
    # recovery: false — do not silently repair malformed markup. Errors still
    # arrive via the error callback rather than as exceptions, so they are
    # checked explicitly below.
    parser.parse(xml) { |ctx| ctx.recovery = false }
  rescue LimitError, InvalidImageError
    raise # our own cap/validation rejections, surfaced from a callback
  rescue StandardError => e
    # Nokogiri rejects some inputs by raising rather than via the error
    # callback (e.g. empty input -> "input string cannot be empty"). Keep
    # untrusted-input failures inside our error hierarchy.
    raise InvalidImageError, "invalid SVG: #{e.message}"
  end

  raise InvalidImageError, "invalid SVG: #{handler.parse_error}" if handler.parse_error
  raise InvalidImageError, "SVG root required" unless handler.root_name == "svg"

  [handler.root_name, handler.root_attributes]
end

.validate_dimensions!(width, height, max_pixels: nil) ⇒ Object

Raises:



163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
# File 'lib/safe_image/svg_metadata.rb', line 163

def validate_dimensions!(width, height, max_pixels: nil)
  raise InvalidImageError, "SVG dimensions are missing or invalid" unless width&.positive? && height&.positive?
  if width > MAX_SVG_DIMENSION || height > MAX_SVG_DIMENSION
    raise LimitError, "SVG dimensions exceed #{MAX_SVG_DIMENSION}px"
  end

  pixels = width * height
  limit =
    if max_pixels.nil?
      MAX_SVG_PIXELS
    else
      value = Integer(max_pixels)
      raise ArgumentError, "max_pixels must be positive" if value <= 0

      value
    end
  raise LimitError, "SVG has #{pixels.to_i} pixels, exceeds #{limit}" if pixels > limit

  [width.ceil, height.ceil]
end