Module: SafeImage::SvgMetadata
- Defined in:
- lib/safe_image/svg_metadata.rb
Constant Summary collapse
- MAX_SVG_BYTES =
1 * 1024 * 1024
- MAX_SVG_DEPTH =
64- MAX_SVG_ELEMENTS =
10_000- MAX_SVG_ATTRIBUTES =
50_000- MAX_SVG_DIMENSION =
100_000- MAX_SVG_PIXELS =
100_000_000- LENGTH_PATTERN =
/\A\s*([+]?(?:\d+(?:\.\d+)?|\.\d+))(?:px)?\s*\z/i.freeze
- VIEWBOX_SPLIT =
/[\s,]+/.freeze
- NON_UTF8_BOMS =
Byte-order marks for the multi-byte encodings whose ASCII characters our byte-level scans below cannot see through. XML mandates a BOM for UTF-16 and UTF-32, so a document in one of these encodings either carries a BOM here or contains NUL bytes for its ASCII characters (caught separately). Order matters: the UTF-32 LE mark begins with the UTF-16 LE mark.
[ "\xFF\xFE\x00\x00".b, # UTF-32 LE "\x00\x00\xFE\xFF".b, # UTF-32 BE "\xFF\xFE".b, # UTF-16 LE "\xFE\xFF".b # UTF-16 BE ].freeze
- UTF8_BOM =
"\xEF\xBB\xBF".b.freeze
- SAFE_DECLARED_ENCODING =
Declared encodings we accept: UTF-8/ASCII plus the single-byte, ASCII-transparent legacy charsets (ISO-8859-*, Windows-125x). Their bytes below 0x80 decode to identical ASCII, so the byte scans below see the same markup any XML decoder or browser does; and being single-byte, no lead byte can swallow a following quote the way Shift-JIS, GBK, or Big5 can. Multi-byte (Shift-JIS, GBK, EUC-*, ISO-2022-*), transforming (UTF-7: “+ADw-” decodes to “<”), and NUL-interleaved (UTF-16/32) encodings are deliberately excluded — they let bytes our ASCII scans cannot see become markup the parser acts on. The shape match alone is not airtight: “utf8” or “windows-1259” fit the pattern yet name no real encoding, so a name must also resolve via Encoding.find to pass — lookalikes fail closed here instead of leaking a parser encoding error to the caller.
/\A(?:utf-?8|us-ascii|ascii|iso-?8859-?\d{1,2}|(?:windows|cp)-?125\d)\z/i.freeze
- XML_DECL_ENCODING =
ASCII-only so it matches the binary buffer; the optional BOM is stripped before matching rather than embedded here (which would make this UTF-8).
/\A\s*<\?xml\b[^>]*?\bencoding\s*=\s*["']([^"']+)["']/i.freeze
Class Method Summary collapse
-
.cap_scanner_class ⇒ Object
The SAX cap-enforcement handler, built lazily and memoised the first time an SVG is scanned.
- .dimensions(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) ⇒ Object
-
.dimensions_from_attributes(attributes, max_pixels: nil) ⇒ Object
Computes and validates the document dimensions from the already-scanned root attributes, so a caller that has run scan_svg! does not re-read or re-scan the file.
- .known_encoding?(name) ⇒ Boolean
- .parse_length(value) ⇒ Object
- .parse_view_box(value) ⇒ Object
- .probe(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) ⇒ Object
- .read_svg(path, max_bytes: MAX_SVG_BYTES) ⇒ Object
- .reject_unsafe_encoding!(xml) ⇒ Object
- .reject_unsafe_xml!(xml) ⇒ Object
-
.require_nokogiri ⇒ Object
Loaded on first SVG use, not at file load: keeping the XML library off the hot path of every non-SVG operation where it would otherwise be paid for nothing.
- .safe_svg_path(path) ⇒ Object
-
.scan_svg!(xml) ⇒ Object
Streams the document with a SAX parser, enforcing the structural caps as events arrive (see cap_scanner_class), so a hostile “millions of tiny elements” document is rejected at the cap without ever retaining the multi-million-object DOM a parse-then-validate approach would build.
- .validate_dimensions!(width, height, max_pixels: nil) ⇒ Object
Class Method Details
.cap_scanner_class ⇒ Object
The SAX cap-enforcement handler, built lazily and memoised the first time an SVG is scanned. It subclasses Nokogiri::XML::SAX::Document, so it cannot be declared at file-load time without forcing nokogiri to load eagerly and defeating the lazy require above. A breached cap raises LimitError straight out of a callback; libxml2 propagates it at the next event boundary, so the parse aborts promptly rather than scanning to the end (verified: rejection time grows far slower than input size).
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 |
# File 'lib/safe_image/svg_metadata.rb', line 235 def cap_scanner_class @cap_scanner_class ||= Class.new(Nokogiri::XML::SAX::Document) do attr_reader :root_name, :root_attributes, :parse_error def initialize super @depth = -1 @elements = 0 @attributes = 0 @root_name = nil @root_attributes = nil @parse_error = nil end # attrs: array of Nokogiri::XML::SAX::Parser::Attribute (localname/value), # NOT including namespace declarations; `ns` carries the xmlns decls. Both # count toward the attribute cap so the bound cannot be sidestepped by # spraying namespace declarations. def start_element_namespace(name, attrs = [], _prefix = nil, _uri = nil, ns = []) @depth += 1 raise LimitError, "SVG nesting exceeds #{MAX_SVG_DEPTH}" if @depth > MAX_SVG_DEPTH @elements += 1 raise LimitError, "SVG has too many elements" if @elements > MAX_SVG_ELEMENTS @attributes += attrs.length + ns.length raise LimitError, "SVG has too many attributes" if @attributes > MAX_SVG_ATTRIBUTES return unless @root_name.nil? @root_name = name @root_attributes = attrs.each_with_object({}) do |attr, hash| # Dimensions are security-relevant: only the actual no-namespace # root attributes a browser will use may feed the pixel cap. A # namespaced e:width/e:height must not shadow width/height here. next unless attr.prefix.to_s.empty? && attr.uri.to_s.empty? hash[attr.localname] = attr.value end end def end_element_namespace(_name, _prefix = nil, _uri = nil) @depth -= 1 end # libxml2 reports well-formedness violations here rather than raising; # record the first so scan_svg! can reject on it. def error() @parse_error ||= .to_s.strip end def warning() end end end |
.dimensions(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) ⇒ Object
62 63 64 65 66 |
# File 'lib/safe_image/svg_metadata.rb', line 62 def dimensions(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) xml = read_svg(path, max_bytes: max_bytes) _name, attributes = scan_svg!(xml) dimensions_from_attributes(attributes, max_pixels: max_pixels) end |
.dimensions_from_attributes(attributes, max_pixels: nil) ⇒ Object
Computes and validates the document dimensions from the already-scanned root attributes, so a caller that has run scan_svg! does not re-read or re-scan the file. Same width/height-then-viewBox fallback and limits as dimensions above.
72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/safe_image/svg_metadata.rb', line 72 def dimensions_from_attributes(attributes, max_pixels: nil) width = parse_length(attributes["width"]) height = parse_length(attributes["height"]) unless width && height view_box = parse_view_box(attributes["viewBox"]) width ||= view_box&.fetch(2) height ||= view_box&.fetch(3) end validate_dimensions!(width, height, max_pixels: max_pixels) end |
.known_encoding?(name) ⇒ Boolean
131 132 133 134 135 136 |
# File 'lib/safe_image/svg_metadata.rb', line 131 def known_encoding?(name) Encoding.find(name) true rescue ArgumentError false end |
.parse_length(value) ⇒ Object
138 139 140 141 142 143 144 145 146 147 148 149 |
# File 'lib/safe_image/svg_metadata.rb', line 138 def parse_length(value) value = value.to_s match = LENGTH_PATTERN.match(value) return nil unless match number = Float(match[1]) return nil unless number.finite? && number.positive? number rescue ArgumentError nil end |
.parse_view_box(value) ⇒ Object
151 152 153 154 155 156 157 158 159 160 161 |
# File 'lib/safe_image/svg_metadata.rb', line 151 def parse_view_box(value) parts = value.to_s.strip.split(VIEWBOX_SPLIT) return nil unless parts.length == 4 numbers = parts.map { |part| Float(part) } return nil unless numbers.all?(&:finite?) && numbers[2].positive? && numbers[3].positive? numbers rescue ArgumentError nil end |
.probe(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) ⇒ Object
49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/safe_image/svg_metadata.rb', line 49 def probe(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) started = Process.clock_gettime(Process::CLOCK_MONOTONIC) path = safe_svg_path(path) width, height = dimensions(path, max_pixels: max_pixels, max_bytes: max_bytes) { input_format: "svg", width: width, height: height, frames: 1, duration_ms: (Process.clock_gettime(Process::CLOCK_MONOTONIC) - started) * 1000 } end |
.read_svg(path, max_bytes: MAX_SVG_BYTES) ⇒ Object
85 86 87 88 89 90 91 92 93 94 |
# File 'lib/safe_image/svg_metadata.rb', line 85 def read_svg(path, max_bytes: MAX_SVG_BYTES) path = safe_svg_path(path) size = File.size(path) raise LimitError, "SVG exceeds #{max_bytes} bytes" if size > max_bytes xml = File.binread(path, max_bytes + 1) || "".b raise LimitError, "SVG exceeds #{max_bytes} bytes" if xml.bytesize > max_bytes reject_unsafe_xml!(xml) xml end |
.reject_unsafe_encoding!(xml) ⇒ Object
113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
# File 'lib/safe_image/svg_metadata.rb', line 113 def reject_unsafe_encoding!(xml) bytes = xml.b # UTF-16/UTF-32 interleave NUL bytes between ASCII characters, hiding # "<!DOCTYPE" from the ASCII scans while the XML parser still decodes and # honours it. (NUL is invalid in XML 1.0 regardless, so this also rejects # garbage.) if NON_UTF8_BOMS.any? { |bom| bytes.start_with?(bom) } || bytes.include?("\x00".b) raise InvalidImageError, "SVG must use a single-byte or UTF-8 encoding" end bytes = bytes.byteslice(UTF8_BOM.bytesize..) if bytes.start_with?(UTF8_BOM) match = bytes.match(XML_DECL_ENCODING) return unless match return if match[1].match?(SAFE_DECLARED_ENCODING) && known_encoding?(match[1]) raise InvalidImageError, "unsupported SVG encoding: #{match[1]}" end |
.reject_unsafe_xml!(xml) ⇒ Object
102 103 104 105 106 107 108 109 110 111 |
# File 'lib/safe_image/svg_metadata.rb', line 102 def reject_unsafe_xml!(xml) # The DOCTYPE/PI scans below are ASCII byte regexes; they only see what # they expect when the bytes we scan decode to the same markup the XML # parser sees. That holds for UTF-8 and single-byte ASCII-transparent # charsets but not for UTF-16/32 or multi-byte/transforming encodings, so # reject those first. reject_unsafe_encoding!(xml) raise InvalidImageError, "doctype is not allowed in SVG" if xml.match?(/<!DOCTYPE/i) raise InvalidImageError, "XML processing instructions are not allowed in SVG" if xml.match?(/<\?(?!xml\s)/i) end |
.require_nokogiri ⇒ Object
Loaded on first SVG use, not at file load: keeping the XML library off the hot path of every non-SVG operation where it would otherwise be paid for nothing.
224 225 226 |
# File 'lib/safe_image/svg_metadata.rb', line 224 def require_nokogiri require "nokogiri" end |
.safe_svg_path(path) ⇒ Object
96 97 98 99 100 |
# File 'lib/safe_image/svg_metadata.rb', line 96 def safe_svg_path(path) path = PathSafety.ensure_regular_file!(path) raise UnsupportedFormatError, "not an SVG file: #{path}" unless File.extname(path.to_s).downcase == ".svg" path.to_s end |
.scan_svg!(xml) ⇒ Object
Streams the document with a SAX parser, enforcing the structural caps as events arrive (see cap_scanner_class), so a hostile “millions of tiny elements” document is rejected at the cap without ever retaining the multi-million-object DOM a parse-then-validate approach would build. Returns the root element’s local name and a localname=>value hash of its attributes, matching the contract dimensions_from_attributes consumes.
SAX does NOT raise on malformed XML even with recovery disabled — it reports through the error callback and keeps going — so well-formedness is enforced by recording any reported error and rejecting after the parse. This preserves the old pull-parser’s reject set (unclosed/mismatched tags, trailing junk) and is strictly stricter on multiple root elements, which is a safe direction for a gate.
197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 |
# File 'lib/safe_image/svg_metadata.rb', line 197 def scan_svg!(xml) require_nokogiri handler = cap_scanner_class.new parser = Nokogiri::XML::SAX::Parser.new(handler) begin # recovery: false — do not silently repair malformed markup. Errors still # arrive via the error callback rather than as exceptions, so they are # checked explicitly below. parser.parse(xml) { |ctx| ctx.recovery = false } rescue LimitError, InvalidImageError raise # our own cap/validation rejections, surfaced from a callback rescue StandardError => e # Nokogiri rejects some inputs by raising rather than via the error # callback (e.g. empty input -> "input string cannot be empty"). Keep # untrusted-input failures inside our error hierarchy. raise InvalidImageError, "invalid SVG: #{e.}" end raise InvalidImageError, "invalid SVG: #{handler.parse_error}" if handler.parse_error raise InvalidImageError, "SVG root required" unless handler.root_name == "svg" [handler.root_name, handler.root_attributes] end |
.validate_dimensions!(width, height, max_pixels: nil) ⇒ Object
163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 |
# File 'lib/safe_image/svg_metadata.rb', line 163 def validate_dimensions!(width, height, max_pixels: nil) raise InvalidImageError, "SVG dimensions are missing or invalid" unless width&.positive? && height&.positive? if width > MAX_SVG_DIMENSION || height > MAX_SVG_DIMENSION raise LimitError, "SVG dimensions exceed #{MAX_SVG_DIMENSION}px" end pixels = width * height limit = if max_pixels.nil? MAX_SVG_PIXELS else value = Integer(max_pixels) raise ArgumentError, "max_pixels must be positive" if value <= 0 value end raise LimitError, "SVG has #{pixels.to_i} pixels, exceeds #{limit}" if pixels > limit [width.ceil, height.ceil] end |