Module: SafeImage::SvgMetadata
- Defined in:
- lib/safe_image/svg_metadata.rb
Constant Summary collapse
- MAX_SVG_BYTES =
1 * 1024 * 1024
- MAX_SVG_DEPTH =
64- MAX_SVG_ELEMENTS =
10_000- MAX_SVG_ATTRIBUTES =
50_000- MAX_SVG_DIMENSION =
100_000- MAX_SVG_PIXELS =
100_000_000- MAX_SVG_RENDER_UNITS =
Upper bound on the render tree the document instantiates. The caps above bound the source document, but several allowlisted features replicate referenced content at render time, so a small source can cost a consumer (browser/rasterizer) orders of magnitude more work:
* <use href="#id"> deep-copies its target subtree — a chain of doubling groups fans a few dozen nodes into billions ("use bomb"), and a cyclic reference expands forever. * a <marker> is drawn once per vertex of every path/line/polyline/polygon that references it, so (vertex count) x (marker subtree size) draws — a dense `d` (~200k vertices fit in 1 MB) times a non-trivial marker is a linear-but-huge "draw bomb" no node/byte/element cap can see.SvgSanitizer charges both against this single budget over the sanitized tree (renderer-free static accounting) and rejects when it is exceeded.
1_000_000- LENGTH_PATTERN =
/\A\s*([+]?(?:\d+(?:\.\d+)?|\.\d+))(?:px)?\s*\z/i.freeze
- VIEWBOX_SPLIT =
/[\s,]+/.freeze
- NON_UTF8_BOMS =
Byte-order marks for the multi-byte encodings whose ASCII characters our byte-level scans below cannot see through. XML mandates a BOM for UTF-16 and UTF-32, so a document in one of these encodings either carries a BOM here or contains NUL bytes for its ASCII characters (caught separately). Order matters: the UTF-32 LE mark begins with the UTF-16 LE mark.
[ "\xFF\xFE\x00\x00".b, # UTF-32 LE "\x00\x00\xFE\xFF".b, # UTF-32 BE "\xFF\xFE".b, # UTF-16 LE "\xFE\xFF".b # UTF-16 BE ].freeze
- UTF8_BOM =
"\xEF\xBB\xBF".b.freeze
- SAFE_DECLARED_ENCODING =
Declared encodings we accept: UTF-8/ASCII plus the single-byte, ASCII-transparent legacy charsets (ISO-8859-*, Windows-125x). Their bytes below 0x80 decode to identical ASCII, so the byte scans below see the same markup any decoder (REXML or a browser) does; and being single-byte, no lead byte can swallow a following quote the way Shift-JIS, GBK, or Big5 can. Multi-byte (Shift-JIS, GBK, EUC-*, ISO-2022-*), transforming (UTF-7: “+ADw-” decodes to “<”), and NUL-interleaved (UTF-16/32) encodings are deliberately excluded — they let bytes our ASCII scans cannot see become markup the parser acts on. The shape match alone is not airtight: “utf8” or “windows-1259” fit the pattern yet name no real encoding, so a name must also resolve via Encoding.find to pass — lookalikes fail closed here instead of leaking REXML’s bare ArgumentError to the caller.
/\A(?:utf-?8|us-ascii|ascii|iso-?8859-?\d{1,2}|(?:windows|cp)-?125\d)\z/i.freeze
- XML_DECL_ENCODING =
ASCII-only so it matches the binary buffer; the optional BOM is stripped before matching rather than embedded here (which would make this UTF-8).
/\A\s*<\?xml\b[^>]*?\bencoding\s*=\s*["']([^"']+)["']/i.freeze
Class Method Summary collapse
-
.cap_scanner_class ⇒ Object
The SAX cap-enforcement handler, built lazily and memoised the first time an SVG is scanned.
- .dimensions(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) ⇒ Object
-
.dimensions_from_attributes(attributes, max_pixels: nil) ⇒ Object
Computes and validates the document dimensions from the already-scanned root attributes, so a caller that has run scan_svg! does not re-read or re-scan the file.
- .known_encoding?(name) ⇒ Boolean
- .parse_length(value) ⇒ Object
- .parse_view_box(value) ⇒ Object
- .probe(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) ⇒ Object
- .read_svg(path, max_bytes: MAX_SVG_BYTES) ⇒ Object
- .reject_unsafe_encoding!(xml) ⇒ Object
- .reject_unsafe_xml!(xml) ⇒ Object
-
.require_nokogiri ⇒ Object
Loaded on first SVG use, not at file load: keeping the XML library off the hot path of every non-SVG operation (and every sandbox worker boot) where it would otherwise be paid for nothing.
- .safe_svg_path(path) ⇒ Object
-
.scan_svg!(xml) ⇒ Object
Streams the document with a SAX parser, enforcing the structural caps as events arrive (see cap_scanner_class), so a hostile “millions of tiny elements” document is rejected at the cap without ever retaining the multi-million-object DOM a parse-then-validate approach would build.
- .validate_dimensions!(width, height, max_pixels: nil) ⇒ Object
Class Method Details
.cap_scanner_class ⇒ Object
The SAX cap-enforcement handler, built lazily and memoised the first time an SVG is scanned. It subclasses Nokogiri::XML::SAX::Document, so it cannot be declared at file-load time without forcing nokogiri to load eagerly and defeating the lazy require above. A breached cap raises LimitError straight out of a callback; libxml2 propagates it at the next event boundary, so the parse aborts promptly rather than scanning to the end (verified: rejection time grows far slower than input size).
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 |
# File 'lib/safe_image/svg_metadata.rb', line 240 def cap_scanner_class @cap_scanner_class ||= Class.new(Nokogiri::XML::SAX::Document) do attr_reader :root_name, :root_attributes, :parse_error def initialize super @depth = -1 @elements = 0 @attributes = 0 @root_name = nil @root_attributes = nil @parse_error = nil end # attrs: array of Nokogiri::XML::SAX::Parser::Attribute (localname/value), # NOT including namespace declarations; `ns` carries the xmlns decls. Both # count toward the attribute cap so the bound cannot be sidestepped by # spraying namespace declarations. def start_element_namespace(name, attrs = [], _prefix = nil, _uri = nil, ns = []) @depth += 1 raise LimitError, "SVG nesting exceeds #{MAX_SVG_DEPTH}" if @depth > MAX_SVG_DEPTH @elements += 1 raise LimitError, "SVG has too many elements" if @elements > MAX_SVG_ELEMENTS @attributes += attrs.length + ns.length raise LimitError, "SVG has too many attributes" if @attributes > MAX_SVG_ATTRIBUTES return unless @root_name.nil? @root_name = name @root_attributes = attrs.each_with_object({}) { |attr, hash| hash[attr.localname] = attr.value } end def end_element_namespace(_name, _prefix = nil, _uri = nil) @depth -= 1 end # libxml2 reports well-formedness violations here rather than raising; # record the first so scan_svg! can reject on it. def error() @parse_error ||= .to_s.strip end def warning(); end end end |
.dimensions(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) ⇒ Object
77 78 79 80 81 |
# File 'lib/safe_image/svg_metadata.rb', line 77 def dimensions(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) xml = read_svg(path, max_bytes: max_bytes) _name, attributes = scan_svg!(xml) dimensions_from_attributes(attributes, max_pixels: max_pixels) end |
.dimensions_from_attributes(attributes, max_pixels: nil) ⇒ Object
Computes and validates the document dimensions from the already-scanned root attributes, so a caller that has run scan_svg! does not re-read or re-scan the file. Same width/height-then-viewBox fallback and limits as dimensions above.
87 88 89 90 91 92 93 94 95 96 97 98 |
# File 'lib/safe_image/svg_metadata.rb', line 87 def dimensions_from_attributes(attributes, max_pixels: nil) width = parse_length(attributes["width"]) height = parse_length(attributes["height"]) unless width && height view_box = parse_view_box(attributes["viewBox"]) width ||= view_box&.fetch(2) height ||= view_box&.fetch(3) end validate_dimensions!(width, height, max_pixels: max_pixels) end |
.known_encoding?(name) ⇒ Boolean
146 147 148 149 150 151 |
# File 'lib/safe_image/svg_metadata.rb', line 146 def known_encoding?(name) Encoding.find(name) true rescue ArgumentError false end |
.parse_length(value) ⇒ Object
153 154 155 156 157 158 159 160 161 162 163 164 |
# File 'lib/safe_image/svg_metadata.rb', line 153 def parse_length(value) value = value.to_s match = LENGTH_PATTERN.match(value) return nil unless match number = Float(match[1]) return nil unless number.finite? && number.positive? number rescue ArgumentError nil end |
.parse_view_box(value) ⇒ Object
166 167 168 169 170 171 172 173 174 175 176 |
# File 'lib/safe_image/svg_metadata.rb', line 166 def parse_view_box(value) parts = value.to_s.strip.split(VIEWBOX_SPLIT) return nil unless parts.length == 4 numbers = parts.map { |part| Float(part) } return nil unless numbers.all?(&:finite?) && numbers[2].positive? && numbers[3].positive? numbers rescue ArgumentError nil end |
.probe(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) ⇒ Object
64 65 66 67 68 69 70 71 72 73 74 75 |
# File 'lib/safe_image/svg_metadata.rb', line 64 def probe(path, max_pixels: nil, max_bytes: MAX_SVG_BYTES) started = Process.clock_gettime(Process::CLOCK_MONOTONIC) path = safe_svg_path(path) width, height = dimensions(path, max_pixels: max_pixels, max_bytes: max_bytes) { input_format: "svg", width: width, height: height, frames: 1, duration_ms: (Process.clock_gettime(Process::CLOCK_MONOTONIC) - started) * 1000 } end |
.read_svg(path, max_bytes: MAX_SVG_BYTES) ⇒ Object
100 101 102 103 104 105 106 107 108 109 |
# File 'lib/safe_image/svg_metadata.rb', line 100 def read_svg(path, max_bytes: MAX_SVG_BYTES) path = safe_svg_path(path) size = File.size(path) raise LimitError, "SVG exceeds #{max_bytes} bytes" if size > max_bytes xml = File.binread(path, max_bytes + 1) || "".b raise LimitError, "SVG exceeds #{max_bytes} bytes" if xml.bytesize > max_bytes reject_unsafe_xml!(xml) xml end |
.reject_unsafe_encoding!(xml) ⇒ Object
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
# File 'lib/safe_image/svg_metadata.rb', line 128 def reject_unsafe_encoding!(xml) bytes = xml.b # UTF-16/UTF-32 interleave NUL bytes between ASCII characters, hiding # "<!DOCTYPE" from the ASCII scans while the XML parser still decodes and # honours it. (NUL is invalid in XML 1.0 regardless, so this also rejects # garbage.) if NON_UTF8_BOMS.any? { |bom| bytes.start_with?(bom) } || bytes.include?("\x00".b) raise InvalidImageError, "SVG must use a single-byte or UTF-8 encoding" end bytes = bytes.byteslice(UTF8_BOM.bytesize..) if bytes.start_with?(UTF8_BOM) match = bytes.match(XML_DECL_ENCODING) return unless match return if match[1].match?(SAFE_DECLARED_ENCODING) && known_encoding?(match[1]) raise InvalidImageError, "unsupported SVG encoding: #{match[1]}" end |
.reject_unsafe_xml!(xml) ⇒ Object
117 118 119 120 121 122 123 124 125 126 |
# File 'lib/safe_image/svg_metadata.rb', line 117 def reject_unsafe_xml!(xml) # The DOCTYPE/PI scans below are ASCII byte regexes; they only see what # they expect when the bytes we scan decode to the same markup the XML # parser sees. That holds for UTF-8 and single-byte ASCII-transparent # charsets but not for UTF-16/32 or multi-byte/transforming encodings, so # reject those first. reject_unsafe_encoding!(xml) raise InvalidImageError, "doctype is not allowed in SVG" if xml.match?(/<!DOCTYPE/i) raise InvalidImageError, "XML processing instructions are not allowed in SVG" if xml.match?(/<\?(?!xml\s)/i) end |
.require_nokogiri ⇒ Object
Loaded on first SVG use, not at file load: keeping the XML library off the hot path of every non-SVG operation (and every sandbox worker boot) where it would otherwise be paid for nothing.
229 230 231 |
# File 'lib/safe_image/svg_metadata.rb', line 229 def require_nokogiri require "nokogiri" end |
.safe_svg_path(path) ⇒ Object
111 112 113 114 115 |
# File 'lib/safe_image/svg_metadata.rb', line 111 def safe_svg_path(path) path = PathSafety.ensure_regular_file!(path) raise UnsupportedFormatError, "not an SVG file: #{path}" unless File.extname(path.to_s).downcase == ".svg" path.to_s end |
.scan_svg!(xml) ⇒ Object
Streams the document with a SAX parser, enforcing the structural caps as events arrive (see cap_scanner_class), so a hostile “millions of tiny elements” document is rejected at the cap without ever retaining the multi-million-object DOM a parse-then-validate approach would build. Returns the root element’s local name and a localname=>value hash of its attributes, matching the contract dimensions_from_attributes consumes.
SAX does NOT raise on malformed XML even with recovery disabled — it reports through the error callback and keeps going — so well-formedness is enforced by recording any reported error and rejecting after the parse. This reproduces the old REXML pull-parser’s reject set (unclosed/mismatched tags, trailing junk) and is strictly stricter on multiple root elements, which is a safe direction for a gate.
202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
# File 'lib/safe_image/svg_metadata.rb', line 202 def scan_svg!(xml) require_nokogiri handler = cap_scanner_class.new parser = Nokogiri::XML::SAX::Parser.new(handler) begin # recovery: false — do not silently repair malformed markup. Errors still # arrive via the error callback rather than as exceptions, so they are # checked explicitly below. parser.parse(xml) { |ctx| ctx.recovery = false } rescue LimitError, InvalidImageError raise # our own cap/validation rejections, surfaced from a callback rescue StandardError => e # Nokogiri rejects some inputs by raising rather than via the error # callback (e.g. empty input -> "input string cannot be empty"). Keep # untrusted-input failures inside our error hierarchy. raise InvalidImageError, "invalid SVG: #{e.}" end raise InvalidImageError, "invalid SVG: #{handler.parse_error}" if handler.parse_error raise InvalidImageError, "SVG root required" unless handler.root_name == "svg" [handler.root_name, handler.root_attributes] end |
.validate_dimensions!(width, height, max_pixels: nil) ⇒ Object
178 179 180 181 182 183 184 185 186 187 |
# File 'lib/safe_image/svg_metadata.rb', line 178 def validate_dimensions!(width, height, max_pixels: nil) raise InvalidImageError, "SVG dimensions are missing or invalid" unless width&.positive? && height&.positive? raise LimitError, "SVG dimensions exceed #{MAX_SVG_DIMENSION}px" if width > MAX_SVG_DIMENSION || height > MAX_SVG_DIMENSION pixels = width * height limit = max_pixels || MAX_SVG_PIXELS raise LimitError, "SVG has #{pixels.to_i} pixels, exceeds #{limit}" if pixels > limit [width.ceil, height.ceil] end |