Module: SafeImage::SvgSanitizer
- Defined in:
- lib/safe_image/svg_sanitizer.rb
Overview
Allowlist SVG sanitizer. Parses untrusted SVG with Nokogiri (libxml2) and builds a fresh output tree containing only allowlisted elements, attributes, and namespaces — the svg-hush model. Nothing the attacker declared is ever carried over: there is no “remove the bad parts” step because only explicitly allowed content is ever added, so the output’s element/attribute/namespace sets are a closed allowlist by construction. A bug therefore tends to drop legitimate content (fails closed, visible) rather than leak attacker content (fails open, silent).
The structural caps and the byte-level encoding/DOCTYPE/PI rejection run first, in SvgMetadata, on the raw bytes — libxml2 only ever sees input that already passed those gates, so its default internal-entity expansion is unreachable (a DOCTYPE is rejected before parsing).
Constant Summary collapse
- ALLOWED_ELEMENTS =
%w[ svg g defs title desc path rect circle ellipse line polyline polygon text tspan textPath linearGradient radialGradient stop clipPath mask pattern use symbol style marker ].freeze
- ALLOWED_ATTRIBUTES =
Presentation attributes. The CSS-property names here are mirrored by SvgCss::ALLOWED_PROPERTIES (a test asserts the subset relationship) so a style=“” / <style> declaration and its attribute twin are treated alike. Attribute values that may carry url() (fill, stroke, clip-path, mask, marker*) are constrained to #fragment references by dangerous_value?.
%w[ id class x y x1 y1 x2 y2 cx cy r rx ry d points width height viewBox fill stroke stroke-width stroke-linecap stroke-linejoin stroke-miterlimit fill-rule clip-rule opacity fill-opacity stroke-opacity transform gradientUnits gradientTransform offset stop-color stop-opacity clip-path mask href xlink:href xmlns xmlns:xlink version preserveAspectRatio font-family font-size font-weight text-anchor style color stroke-dasharray stroke-dashoffset vector-effect marker marker-start marker-mid marker-end markerWidth markerHeight refX refY orient markerUnits display visibility overflow paint-order mix-blend-mode isolation shape-rendering image-rendering color-interpolation font-style font-variant font-stretch text-decoration letter-spacing word-spacing dominant-baseline baseline-shift writing-mode direction ].freeze
- SVG_NAMESPACE =
"http://www.w3.org/2000/svg"- XLINK_NAMESPACE =
"http://www.w3.org/1999/xlink"- NAMESPACE_PATTERN =
Caller namespace tokens must already be valid id/class idents so the prefixed ids and the scope class are well-formed; rejected, not coerced, so two distinct tokens can never collapse to one.
/\A[A-Za-z][A-Za-z0-9_-]*\z/.freeze
- URL_FRAGMENT_REF =
A url() referencing a same-document fragment, with optional matching quotes, any case, surrounding whitespace allowed. This is the ONLY url() form dangerous_value? keeps in a presentation attribute, and exactly the form the namespace rewrite targets (capturing the fragment name) — so the validation and rewrite paths cannot disagree and leave a reference bare.
/url\(\s*(['"]?)#([A-Za-z][\w.-]*)\1\s*\)/i.freeze
- ARIA_IDREF_ATTRIBUTES =
ARIA attributes whose values are an id or a space-separated list of ids. They are references like href/url(#…) and must move into the namespace too, or they bind to a host element (or dangle) when the SVG is inlined.
%w[ aria-activedescendant aria-controls aria-describedby aria-details aria-errormessage aria-flowto aria-labelledby aria-owns ].freeze
- REPLICATING_ELEMENTS =
Elements that instantiate a referenced <marker> once per vertex, and the attributes that carry the marker reference. Used by the render-expansion bound.
%w[path line polyline polygon].freeze
- MARKER_ATTRIBUTES =
%w[marker marker-start marker-mid marker-end].freeze
- NAMESPACE_REQUIRED =
Sentinel marking id_namespace as unsupplied, so omitting it raises an instructive error rather than silently picking a safety posture.
Object.new.freeze
Class Method Summary collapse
-
.allowed_attribute?(attr) ⇒ Boolean
An attribute is allowed when it is a recognised href (plain or xlink) or a no-namespace attribute on the allowlist (or an aria-* attribute).
-
.allowed_element?(element) ⇒ Boolean
— policy predicates against Nokogiri’s attribute/namespace model —.
-
.apply_scope_class!(root, namespace) ⇒ Object
Anchors a namespaced document’s scoped <style> selectors: they target ‘.<ns>-scope <selector>`, so the root must carry that class for them to match its own content (and nothing else).
- .atomic_write(path, content) ⇒ Object
- .attr_expanded_name(attr) ⇒ Object
-
.build_element(in_element, out_parent, out_doc, namespace) ⇒ Object
Builds the sanitized counterpart of an allowed input element as a child of out_parent: the node is created, bound to the SVG namespace, and attached before it is populated, so attribute namespaces (xlink) resolve against the root’s declarations during the build rather than on a detached node.
-
.build_style_element(in_element, out, namespace) ⇒ Object
A <style> element collapses to a single text node holding the sanitized stylesheet.
- .check_render_expansion!(cost) ⇒ Object
- .collect_ids(element, id_map) ⇒ Object
- .contains_style?(element) ⇒ Boolean
-
.copy_attributes(in_element, out, out_doc, namespace) ⇒ Object
Copies only the attributes the policy allows, applying the same value checks regardless of how the attribute is named.
- .dangerous_value?(value) ⇒ Boolean
- .ensure_xlink(out_doc) ⇒ Object
- .event_attribute?(attr) ⇒ Boolean
- .href_attribute?(attr) ⇒ Boolean
- .invalid_href?(attr) ⇒ Boolean
-
.marker_render_cost(element, id_map, memo, active) ⇒ Object
A marked path instantiates each referenced marker once per vertex.
- .namespace_declaration?(attr) ⇒ Boolean
-
.namespace_references!(element, namespace) ⇒ Object
Prefixes this element’s own id and every same-document reference it makes (href/xhref fragments, ARIA IDREFs, and url(#…) in any attribute) with the namespace, keeping definitions and references consistent.
-
.namespace_tree!(element, namespace) ⇒ Object
Applies reference namespacing to every element in the assembled output tree.
-
.neutralize_root_overflow!(root) ⇒ Object
In inline (namespaced) mode the root <svg> must clip to its own box, or a tiny declared viewport with oversized content becomes a full-page overlay.
-
.parse(xml) ⇒ Object
Hardened parse: no network, no external DTD load.
-
.path_vertex_count(element) ⇒ Object
A deliberate upper bound on the vertices a geometry element renders, never an exact parse: every run of digits in ‘d`/`points` is counted as a coordinate, so the result is >= the real vertex count.
-
.populate_element(in_element, out, out_doc, namespace) ⇒ Object
Fills an already-created, already-attached output node from its input counterpart: sanitized attributes, then sanitized children.
-
.referenced_markers(element, id_map) ⇒ Object
Collects the distinct marker subtrees a geometry element references, via the marker-* presentation attributes or their style=“” twins.
-
.reject_render_expansion!(root) ⇒ Object
Bounds the render tree the document instantiates.
-
.resolve_namespace(id_namespace) ⇒ Object
Maps the required id_namespace argument to a namespace token, or nil for an explicit standalone document.
-
.sanitize!(path, max_pixels: nil, id_namespace: NAMESPACE_REQUIRED) ⇒ Object
Sanitizes an SVG in place to the element/attribute/CSS allowlists above.
- .serialize(root) ⇒ Object
-
.set_attribute(out, out_doc, attr, value) ⇒ Object
Sets an attribute on the output node, preserving the xlink namespace for xhref and writing everything else as a plain (no-namespace) attribute.
- .subtree_render_cost(element, id_map, memo, active) ⇒ Object
- .svg_namespace(out_doc, out) ⇒ Object
- .use_element?(element) ⇒ Boolean
- .use_target(element, id_map) ⇒ Object
Class Method Details
.allowed_attribute?(attr) ⇒ Boolean
An attribute is allowed when it is a recognised href (plain or xlink) or a no-namespace attribute on the allowlist (or an aria-* attribute). A prefixed attribute in any other namespace is never copied.
282 283 284 285 286 287 288 |
# File 'lib/safe_image/svg_sanitizer.rb', line 282 def allowed_attribute?(attr) return true if href_attribute?(attr) return false unless attr.namespace.nil? name = attr.name.to_s ALLOWED_ATTRIBUTES.include?(name) || name.start_with?("aria-") end |
.allowed_element?(element) ⇒ Boolean
— policy predicates against Nokogiri’s attribute/namespace model —
274 275 276 277 |
# File 'lib/safe_image/svg_sanitizer.rb', line 274 def allowed_element?(element) href = element.namespace&.href.to_s ALLOWED_ELEMENTS.include?(element.name.to_s) && (href.empty? || href == SVG_NAMESPACE) end |
.apply_scope_class!(root, namespace) ⇒ Object
Anchors a namespaced document’s scoped <style> selectors: they target ‘.<ns>-scope <selector>`, so the root must carry that class for them to match its own content (and nothing else). Idempotent.
385 386 387 388 389 390 |
# File 'lib/safe_image/svg_sanitizer.rb', line 385 def apply_scope_class!(root, namespace) scope = "#{namespace}-scope" classes = root["class"].to_s.split(/\s+/) return if classes.include?(scope) root["class"] = (classes << scope).join(" ").strip end |
.atomic_write(path, content) ⇒ Object
574 575 576 577 578 579 580 581 |
# File 'lib/safe_image/svg_sanitizer.rb', line 574 def atomic_write(path, content) Tempfile.create([path.basename.to_s, ".tmp"], path.dirname.to_s, binmode: false) do |tmp| tmp.write(content) tmp.flush tmp.fsync File.rename(tmp.path, path.to_s) end end |
.attr_expanded_name(attr) ⇒ Object
312 313 314 315 |
# File 'lib/safe_image/svg_sanitizer.rb', line 312 def (attr) prefix = attr.namespace&.prefix prefix ? "#{prefix}:#{attr.name}" : attr.name.to_s end |
.build_element(in_element, out_parent, out_doc, namespace) ⇒ Object
Builds the sanitized counterpart of an allowed input element as a child of out_parent: the node is created, bound to the SVG namespace, and attached before it is populated, so attribute namespaces (xlink) resolve against the root’s declarations during the build rather than on a detached node.
162 163 164 165 166 167 168 |
# File 'lib/safe_image/svg_sanitizer.rb', line 162 def build_element(in_element, out_parent, out_doc, namespace) out = out_doc.create_element(in_element.name) out.namespace = svg_namespace(out_doc, out) out_parent.add_child(out) populate_element(in_element, out, out_doc, namespace) out end |
.build_style_element(in_element, out, namespace) ⇒ Object
A <style> element collapses to a single text node holding the sanitized stylesheet. When nothing survives, the element itself is removed from the output entirely (not left as an empty <style/>), matching the policy that a stylesheet which fails closed leaves no trace. Element attributes (type, media) are never copied: the output is plain CSS.
198 199 200 201 202 203 204 205 206 |
# File 'lib/safe_image/svg_sanitizer.rb', line 198 def build_style_element(in_element, out, namespace) css = in_element.children.select { |c| c.text? || c.cdata? }.map(&:content).join sanitized = SvgCss.sanitize_stylesheet(css, namespace: namespace) if sanitized out.add_child(out.document.create_text_node(sanitized)) else out.unlink end end |
.check_render_expansion!(cost) ⇒ Object
545 546 547 548 549 |
# File 'lib/safe_image/svg_sanitizer.rb', line 545 def check_render_expansion!(cost) return if cost <= SvgMetadata::MAX_SVG_RENDER_UNITS raise LimitError, "SVG render expansion exceeds #{SvgMetadata::MAX_SVG_RENDER_UNITS} rendered nodes" end |
.collect_ids(element, id_map) ⇒ Object
466 467 468 469 470 471 472 |
# File 'lib/safe_image/svg_sanitizer.rb', line 466 def collect_ids(element, id_map) id = element["id"] id_map[id.to_s] = element if id && !id_map.key?(id.to_s) element.children.each do |child| collect_ids(child, id_map) if child.is_a?(Nokogiri::XML::Element) end end |
.contains_style?(element) ⇒ Boolean
392 393 394 395 |
# File 'lib/safe_image/svg_sanitizer.rb', line 392 def contains_style?(element) return true if element.name == "style" element.children.any? { |child| child.is_a?(Nokogiri::XML::Element) && contains_style?(child) } end |
.copy_attributes(in_element, out, out_doc, namespace) ⇒ Object
Copies only the attributes the policy allows, applying the same value checks regardless of how the attribute is named. The style=“” attribute is the one whose value is CSS: it is rewritten to the sanitized subset (or dropped). Reference namespacing happens later, over the assembled tree.
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 |
# File 'lib/safe_image/svg_sanitizer.rb', line 212 def copy_attributes(in_element, out, out_doc, namespace) style_value = nil in_element.attribute_nodes.each do |attr| next if namespace_declaration?(attr) value = attr.value.to_s if (attr) == "style" sanitized = SvgCss.sanitize_declarations(value, namespace: namespace) style_value = sanitized if sanitized next end next unless allowed_attribute?(attr) next if event_attribute?(attr) next if dangerous_value?(value) next if invalid_href?(attr) set_attribute(out, out_doc, attr, value) end out["style"] = style_value if style_value end |
.dangerous_value?(value) ⇒ Boolean
416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 |
# File 'lib/safe_image/svg_sanitizer.rb', line 416 def dangerous_value?(value) # Presentation attributes are fed to browsers' CSS value parsers, where # escapes re-form tokens after the pattern checks below (\6c is "l", so # ur\6c( becomes url(). No allowlisted attribute legitimately contains # a backslash; reject outright. return true if value.to_s.include?("\\") normalized = value.to_s.gsub(/[\u0000-\u0020\u007f]+/, "") return true if normalized.match?(/(?:javascript|data):/i) # var()/env()/attr() resolve against the host page or element context, so an # inlined SVG could pull in host-controlled values the sanitizer never saw # — including a url() the namespace rewrite missed. They are inert in # standalone output anyway (no custom properties survive sanitisation), so # reject them in every mode. return true if normalized.match?(/(?:var|env|attr)\s*\(/i) # Every url(...) must be a same-document fragment in the canonical form the # namespace rewrite handles. Strip those, then fail closed if any url( # introducer remains: this catches external URLs, mismatched quotes, AND # unterminated/malformed url( that a complete-match scan would miss and # browsers may still parse leniently. Keeps validation and the rewrite in # lockstep, so no bare reference can survive in namespaced output. value.to_s.gsub(URL_FRAGMENT_REF, "").match?(/url\s*\(/i) end |
.ensure_xlink(out_doc) ⇒ Object
259 260 261 262 263 264 |
# File 'lib/safe_image/svg_sanitizer.rb', line 259 def ensure_xlink(out_doc) root = out_doc.root return if root.namespace_definitions.any? { |n| n.prefix == "xlink" } root.add_namespace_definition("xlink", XLINK_NAMESPACE) end |
.event_attribute?(attr) ⇒ Boolean
297 298 299 |
# File 'lib/safe_image/svg_sanitizer.rb', line 297 def event_attribute?(attr) attr.name.to_s.downcase.start_with?("on") end |
.href_attribute?(attr) ⇒ Boolean
301 302 303 304 305 306 |
# File 'lib/safe_image/svg_sanitizer.rb', line 301 def href_attribute?(attr) name = attr.name.to_s return true if name == "href" && attr.namespace.nil? name == "href" && attr.namespace&.href == XLINK_NAMESPACE end |
.invalid_href?(attr) ⇒ Boolean
308 309 310 |
# File 'lib/safe_image/svg_sanitizer.rb', line 308 def invalid_href?(attr) href_attribute?(attr) && !attr.value.to_s.start_with?("#") end |
.marker_render_cost(element, id_map, memo, active) ⇒ Object
A marked path instantiates each referenced marker once per vertex. Charge (vertex count) x (sum of distinct referenced marker subtree costs). The marker subtree cost goes through subtree_render_cost too, so the active-path set still catches a marker that references itself, and a marker containing a <use> bomb is counted. Vertices are over-counted (see path_vertex_count), which only makes the bound more conservative.
507 508 509 510 511 512 513 514 515 516 517 518 |
# File 'lib/safe_image/svg_sanitizer.rb', line 507 def marker_render_cost(element, id_map, memo, active) return 0 unless REPLICATING_ELEMENTS.include?(element.name.to_s) markers = referenced_markers(element, id_map) return 0 if markers.empty? vertices = path_vertex_count(element) return 0 if vertices.zero? per_vertex = markers.sum { |marker| subtree_render_cost(marker, id_map, memo, active) } vertices * per_vertex end |
.namespace_declaration?(attr) ⇒ Boolean
290 291 292 293 294 295 |
# File 'lib/safe_image/svg_sanitizer.rb', line 290 def namespace_declaration?(attr) # Nokogiri does not surface xmlns declarations through attribute_nodes, but # guard defensively in case a libxml2 build does. name = attr.name.to_s name == "xmlns" || attr.namespace&.prefix == "xmlns" || name.start_with?("xmlns") end |
.namespace_references!(element, namespace) ⇒ Object
Prefixes this element’s own id and every same-document reference it makes (href/xhref fragments, ARIA IDREFs, and url(#…) in any attribute) with the namespace, keeping definitions and references consistent. The style attribute’s url()s are already namespaced by SvgCss.
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 |
# File 'lib/safe_image/svg_sanitizer.rb', line 321 def namespace_references!(element, namespace) if (id = element["id"]) element["id"] = SvgCss.apply_namespace(namespace, id) end # Class names are attacker-chosen references into the host stylesheet: # inlined, a bare class="modal fixed" would pick up the page's framework # CSS (an overlay/UI-redress vector). Namespace each token — paired with the # matching rewrite of `.class` selectors — so internal class styling still # matches while host selectors never do. if (klass = element["class"]) tokens = klass.split(/\s+/).reject(&:empty?) element["class"] = tokens.map { |t| SvgCss.apply_namespace(namespace, t) }.join(" ") unless tokens.empty? end element.attribute_nodes.each do |attr| next unless href_attribute?(attr) value = attr.value.to_s next unless value.start_with?("#") attr.value = "##{SvgCss.apply_namespace(namespace, value[1..])}" end ARIA_IDREF_ATTRIBUTES.each do |aria| value = element[aria] next unless value ids = value.split(/\s+/).reject(&:empty?) next if ids.empty? element[aria] = ids.map { |ref| SvgCss.apply_namespace(namespace, ref) }.join(" ") end element.attribute_nodes.each do |attr| name = attr.name.to_s next if name == "style" value = attr.value.to_s next unless value.match?(/url\(/i) rewritten = value.gsub(URL_FRAGMENT_REF) { "url(##{SvgCss.apply_namespace(namespace, Regexp.last_match(2))})" } attr.value = rewritten if rewritten != value end end |
.namespace_tree!(element, namespace) ⇒ Object
Applies reference namespacing to every element in the assembled output tree. Done after the build so each attribute’s namespace has resolved.
239 240 241 242 243 244 |
# File 'lib/safe_image/svg_sanitizer.rb', line 239 def namespace_tree!(element, namespace) namespace_references!(element, namespace) element.children.each do |child| namespace_tree!(child, namespace) if child.is_a?(Nokogiri::XML::Element) end end |
.neutralize_root_overflow!(root) ⇒ Object
In inline (namespaced) mode the root <svg> must clip to its own box, or a tiny declared viewport with oversized content becomes a full-page overlay. Drop any overflow the SVG set on the root so it falls back to the outermost-svg default (hidden); inner elements keep overflow (markers need it) and the root clip bounds them all. Standalone output is untouched — an <img>/CSS-url resource is already clipped by its own element box.
403 404 405 406 407 408 409 410 411 412 413 414 |
# File 'lib/safe_image/svg_sanitizer.rb', line 403 def neutralize_root_overflow!(root) root.remove_attribute("overflow") style = root["style"] return unless style kept = style.split(";").reject { |declaration| declaration.start_with?("overflow:") } if kept.empty? root.remove_attribute("style") else root["style"] = kept.join(";") end end |
.parse(xml) ⇒ Object
Hardened parse: no network, no external DTD load. DOCTYPE is already rejected upstream, so entity expansion is unreachable; NONET is set defensively regardless.
150 151 152 153 154 155 156 |
# File 'lib/safe_image/svg_sanitizer.rb', line 150 def parse(xml) Nokogiri::XML(xml) do |config| config. = Nokogiri::XML::ParseOptions::NONET end rescue Nokogiri::XML::SyntaxError => e raise InvalidImageError, "invalid SVG: #{e.}" end |
.path_vertex_count(element) ⇒ Object
A deliberate upper bound on the vertices a geometry element renders, never an exact parse: every run of digits in ‘d`/`points` is counted as a coordinate, so the result is >= the real vertex count. Over-counting only tightens the bound; under-counting would be the bug, so we never try to be precise about path command grammar.
539 540 541 542 543 |
# File 'lib/safe_image/svg_sanitizer.rb', line 539 def path_vertex_count(element) geometry = "#{element['d']} #{element['points']}" count = geometry.scan(/\d+(?:\.\d+)?/).length count.zero? ? 0 : count + 1 end |
.populate_element(in_element, out, out_doc, namespace) ⇒ Object
Fills an already-created, already-attached output node from its input counterpart: sanitized attributes, then sanitized children. <style> collapses to its sanitized stylesheet text; CDATA becomes escaped text; disallowed children are simply never created. Reference namespacing is NOT done here — it is a separate post-build pass over the assembled tree.
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/safe_image/svg_sanitizer.rb', line 175 def populate_element(in_element, out, out_doc, namespace) if in_element.name == "style" build_style_element(in_element, out, namespace) return end copy_attributes(in_element, out, out_doc, namespace) in_element.children.each do |child| case child when Nokogiri::XML::CDATA, Nokogiri::XML::Text out.add_child(out_doc.create_text_node(child.content.to_s)) when Nokogiri::XML::Element build_element(child, out, out_doc, namespace) if allowed_element?(child) end end end |
.referenced_markers(element, id_map) ⇒ Object
Collects the distinct marker subtrees a geometry element references, via the marker-* presentation attributes or their style=“” twins. Only the canonical url(#fragment) form survives sanitisation, so one regex over the marker attributes and the style attribute finds every reference.
524 525 526 527 528 529 530 531 532 |
# File 'lib/safe_image/svg_sanitizer.rb', line 524 def referenced_markers(element, id_map) sources = MARKER_ATTRIBUTES.map { |name| element[name].to_s } sources << element["style"].to_s targets = [] sources.each do |value| value.scan(URL_FRAGMENT_REF) { targets << id_map[Regexp.last_match(2)] } end targets.compact.uniq end |
.reject_render_expansion!(root) ⇒ Object
Bounds the render tree the document instantiates. The structural caps in SvgMetadata bound the source document, but several features replicate referenced content at render time, so the sanitized output is walked once and the instantiated render cost is accumulated against a single budget:
* a <use href="#id"> charges a deep copy of its target subtree — a chain
of doubling groups fans a few dozen source nodes into billions (the
"use bomb"), and a cyclic reference expands forever.
* a path/line/polyline/polygon that references a <marker> charges
(vertex count) x (referenced marker subtree cost): a marker is drawn
once per vertex, so a dense `d` (~200k vertices in 1 MB) times a
non-trivial marker is a linear-but-huge "draw bomb" the node/byte/
element caps cannot see.
The walk is memoised on subtree cost so it cannot itself blow up, with an active-path set so a reference cycle is caught rather than recursed into. Marker references are resolved against the same id map as <use>, so a marker that contains <use> (or another marked path) composes naturally.
460 461 462 463 464 |
# File 'lib/safe_image/svg_sanitizer.rb', line 460 def reject_render_expansion!(root) id_map = {} collect_ids(root, id_map) subtree_render_cost(root, id_map, {}, {}) end |
.resolve_namespace(id_namespace) ⇒ Object
Maps the required id_namespace argument to a namespace token, or nil for an explicit standalone document. Forces the caller to decide, and rejects (does not coerce) malformed tokens so two distinct callers’ values can never collapse to the same namespace.
365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 |
# File 'lib/safe_image/svg_sanitizer.rb', line 365 def resolve_namespace(id_namespace) case id_namespace when :standalone nil when String return id_namespace if id_namespace.match?(NAMESPACE_PATTERN) raise ArgumentError, "id_namespace: #{id_namespace.inspect} is not a valid namespace. It must be a letter " \ "followed by letters/digits/_/- (e.g. prefix a sha like \"u<sha>\")." else raise ArgumentError, "id_namespace: is required. Pass a stable, per-document String (e.g. the upload sha) " \ "to make the output safe to inline into HTML, or :standalone if it is only ever served " \ "as an <img>/CSS-url/file and never spliced into a page's DOM." end end |
.sanitize!(path, max_pixels: nil, id_namespace: NAMESPACE_REQUIRED) ⇒ Object
Sanitizes an SVG in place to the element/attribute/CSS allowlists above.
id_namespace is required and forces a deliberate choice of where the output may be used — there is no silently-wrong default:
-
a stable, per-document String (e.g. the upload sha) makes the output safe to inline into an HTML DOM: every id and every reference to it (href, url(#…), CSS) is prefixed with the namespace, and every <style> selector is scoped under the root, so a preserved <style> cannot reach the host page’s cascade and ids cannot clobber host ids. Re-sanitising with the same namespace is a fixed point.
-
:standalone produces document-safe output (no namespacing) for SVGs that are only ever served as an external ‘<img src>`, CSS url(…), or their own file — never spliced into an HTML DOM.
99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 |
# File 'lib/safe_image/svg_sanitizer.rb', line 99 def sanitize!(path, max_pixels: nil, id_namespace: NAMESPACE_REQUIRED) require "nokogiri" namespace = resolve_namespace(id_namespace) path = Pathname.new(SvgMetadata.safe_svg_path(path)) # Byte-level encoding/DOCTYPE/PI rejection and the streaming structural caps # run on the raw bytes before any DOM parse, so libxml2 only ever sees input # those gates already accepted. xml = SvgMetadata.read_svg(path.to_s) _root_name, root_attributes = SvgMetadata.scan_svg!(xml) begin SvgMetadata.dimensions_from_attributes(root_attributes, max_pixels: max_pixels) rescue InvalidImageError => e raise unless e..include?("dimensions are missing") end in_doc = parse(xml) in_root = in_doc.root raise InvalidImageError, "SVG root required" unless in_root && allowed_element?(in_root) out_doc = Nokogiri::XML::Document.new # Establish the output root before building anything under it: the root # carries the only namespace declarations we ever emit (svg always, xlink # lazily), and the recursive build references out_doc.root when an # xlink:href survives, so it must exist first. out_root = out_doc.create_element(in_root.name) out_doc.root = out_root out_root.namespace = svg_namespace(out_doc, out_root) populate_element(in_root, out_root, out_doc, namespace) # Reference namespacing runs as one pass over the fully-assembled tree, not # during the build: an attribute's namespace only resolves once its element # is attached under the root that declares the prefix, so href/url rewrites # must happen after the whole tree exists. namespace_tree!(out_root, namespace) if namespace reject_render_expansion!(out_root) if namespace neutralize_root_overflow!(out_root) apply_scope_class!(out_root, namespace) if contains_style?(out_root) end atomic_write(path, serialize(out_root)) { format: "svg", sanitized: true, filesize: File.size(path.to_s) } end |
.serialize(root) ⇒ Object
568 569 570 571 572 |
# File 'lib/safe_image/svg_sanitizer.rb', line 568 def serialize(root) = Nokogiri::XML::Node::SaveOptions::AS_XML | Nokogiri::XML::Node::SaveOptions::NO_DECLARATION root.to_xml(save_with: ) end |
.set_attribute(out, out_doc, attr, value) ⇒ Object
250 251 252 253 254 255 256 257 |
# File 'lib/safe_image/svg_sanitizer.rb', line 250 def set_attribute(out, out_doc, attr, value) if href_attribute?(attr) && attr.namespace&.href == XLINK_NAMESPACE ensure_xlink(out_doc) out["xlink:href"] = value else out[attr.name.to_s] = value end end |
.subtree_render_cost(element, id_map, memo, active) ⇒ Object
474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 |
# File 'lib/safe_image/svg_sanitizer.rb', line 474 def subtree_render_cost(element, id_map, memo, active) key = element.object_id cached = memo[key] return cached if cached raise InvalidImageError, "SVG reference cycle" if active[key] active[key] = true cost = 1 element.children.each do |child| next unless child.is_a?(Nokogiri::XML::Element) cost += subtree_render_cost(child, id_map, memo, active) check_render_expansion!(cost) end if use_element?(element) && (target = use_target(element, id_map)) cost += subtree_render_cost(target, id_map, memo, active) check_render_expansion!(cost) end cost += marker_render_cost(element, id_map, memo, active) check_render_expansion!(cost) active.delete(key) memo[key] = cost end |
.svg_namespace(out_doc, out) ⇒ Object
266 267 268 269 270 |
# File 'lib/safe_image/svg_sanitizer.rb', line 266 def svg_namespace(out_doc, out) root = out_doc.root existing = root&.namespace_definitions&.find { |n| n.prefix.nil? && n.href == SVG_NAMESPACE } existing || out.add_namespace_definition(nil, SVG_NAMESPACE) end |
.use_element?(element) ⇒ Boolean
551 552 553 |
# File 'lib/safe_image/svg_sanitizer.rb', line 551 def use_element?(element) element.name.to_s == "use" && (element.namespace&.href.to_s.empty? || element.namespace&.href == SVG_NAMESPACE) end |
.use_target(element, id_map) ⇒ Object
555 556 557 558 559 560 561 562 563 564 565 566 |
# File 'lib/safe_image/svg_sanitizer.rb', line 555 def use_target(element, id_map) ref = nil element.attribute_nodes.each do |attr| next unless href_attribute?(attr) ref = attr.value.to_s break end return unless ref&.start_with?("#") id_map[ref[1..]] end |