Module: Rubino::Documents::Limits
- Defined in:
- lib/rubino/documents/limits.rb
Overview
Shared decompression-bomb / runaway-conversion guard for the in-process converters (#S4-1). The 25 MB on-disk ‘max_file_bytes` is trivially defeated by zip compression: a 100 KB .docx expands to 34 MB of XML and ~1M paragraphs, driving rubino to ~1.4 GB RSS / ~100 s of uninterruptible CPU before the output cap (applied only AFTER full conversion) throws the result away. The fix caps BEFORE/DURING conversion.
A Budget is created once per conversion and threaded into the converter’s per-element loop. Each iteration calls #tick(elements:, bytes:), which:
- honors the cancel_token (raises Rubino::Interrupted so the turn is
interruptible mid-conversion, not just at chunk boundaries);
- enforces an element/page/row count ceiling (paragraphs, rows, pages,
slides) so a structural bomb stops after N units;
- enforces a decompressed-bytes ceiling (accumulated extracted/parsed
text) so an expand bomb stops once it has produced a few x the output
cap of text;
- enforces a wall-clock budget so any pathological slow path (a single
huge element, a quadratic gem call) still bails in bounded time.
On any ceiling, it raises CapExceeded -> shell-hint. All caps are generous relative to a real document but tiny relative to a bomb.
Defined Under Namespace
Classes: Budget
Constant Summary collapse
- DEFAULT_MAX_ELEMENTS =
Defaults. Overridable via config (attachments.policy.convert_*), so an operator can loosen them, but the secure defaults bound a bomb hard.
- MAX_ELEMENTS: paragraphs/rows/pages/slides processed before bail. - MAX_DECOMPRESSED_BYTES: accumulated extracted text bytes; ~5 MB is ~50 x the 100 KB inline budget and far below the 34 MB an expand bomb produces. - WALL_CLOCK_SECONDS: total conversion budget. - TICK_INTERVAL: how often (in elements) to read the clock, so the time check itself is cheap in the hot loop. 50_000- DEFAULT_MAX_DECOMPRESSED =
~5 MB of extracted text
5_000_000- DEFAULT_WALL_CLOCK_SECONDS =
15.0- TICK_INTERVAL =
256- ARCHIVE_CAP_MULTIPLIER =
Whole-archive backstop cap (#350). Looser than the per-glob body cap so a legit doc with large media/thumbnails the converter never reads doesn’t false-positive, but still finite so an out-of-glob bomb can’t be unbounded. Defaults to ARCHIVE_CAP_MULTIPLIER x the body cap (∞ stays ∞).
20
Class Method Summary collapse
-
.budget(cancel_token: nil) ⇒ Object
Builds a Budget from config, falling back to the secure defaults.
- .flt(value, default) ⇒ Object
-
.guard_zip!(path, budget, globs) ⇒ Object
PRE-OPEN zip-bomb guard for the OOXML converters (docx/xlsx/pptx).
- .int(value, default) ⇒ Object
-
.null_budget ⇒ Object
A no-op budget for direct converter calls / tests that don’t thread a real budget.
- .policy_config ⇒ Object
- .total_archive_cap(budget) ⇒ Object
Class Method Details
.budget(cancel_token: nil) ⇒ Object
Builds a Budget from config, falling back to the secure defaults.
126 127 128 129 130 131 132 133 134 |
# File 'lib/rubino/documents/limits.rb', line 126 def budget(cancel_token: nil) cfg = policy_config Budget.new( max_elements: int(cfg["convert_max_elements"], DEFAULT_MAX_ELEMENTS), max_decompressed_bytes: int(cfg["convert_max_decompressed_bytes"], DEFAULT_MAX_DECOMPRESSED), wall_clock_seconds: flt(cfg["convert_wall_clock_seconds"], DEFAULT_WALL_CLOCK_SECONDS), cancel_token: cancel_token ) end |
.flt(value, default) ⇒ Object
148 149 150 151 152 |
# File 'lib/rubino/documents/limits.rb', line 148 def flt(value, default) value.nil? ? default : Float(value) rescue ArgumentError, TypeError default end |
.guard_zip!(path, budget, globs) ⇒ Object
PRE-OPEN zip-bomb guard for the OOXML converters (docx/xlsx/pptx). The decisive cost of a zip-expand bomb is paid the instant the gem opens the file: it reads the (e.g. 34 MB) decompressed XML entry into a String and builds the full Nokogiri DOM (~1.4 GB RSS) BEFORE yielding a single paragraph – so per-element ticking alone is too late. The central directory carries each entry’s UNCOMPRESSED size, readable without decompressing, so we sum the relevant XML entries first and bail to the shell-hint before the gem inflates anything.
The sum runs WITHOUT File::FNM_PATHNAME so ‘*` crosses `/` – a bomb planted at a nested, non-standard path (e.g. xl/worksheets/deep/sheet.xml, reachable via the workbook .rels Target, or ppt/slides/extra/s.xml) is caught just like one at the canonical depth. The pre-fix glob used FNM_PATHNAME, so `*` stopped at `/` and a deep bomb summed to zero and slipped through to roo’s inflate (#337). Globs still scope the sum to the body parts (word/document*.xml, xl/**, ppt/**) so a large thumbnail/media blob doesn’t false-positive. Raises CapExceeded over cap.
#350: scoping to the OOXML body globs alone missed formats whose read paths live OUTSIDE that prefix – notably an ODS, whose ‘content.xml` sits at the archive ROOT (not under xl/) yet is routed through the same roo/xlsx converter. Such a bomb summed to ZERO under `xl/**` and slipped to roo’s inflate. The converter now passes the ACTUAL read-path globs per format (ODS adds ‘content.xml`/root `*.xml`). As a backstop we ALSO sum the WHOLE archive’s uncompressed bytes against a (looser) total cap, so a bomb at any unforeseen path is still bounded even if no body glob matches it. The two caps are independent: the per-glob sum keeps the body tight, the whole-archive backstop guarantees no out-of-glob path is unbounded.
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 |
# File 'lib/rubino/documents/limits.rb', line 70 def guard_zip!(path, budget, globs) require "zip" scoped = 0 archive = 0 archive_cap = total_archive_cap(budget) Zip::File.open(path) do |zip| zip.each do |entry| size = entry.size.to_i archive += size if archive > archive_cap raise CapExceeded, "decompressed zip size cap (whole-archive #{archive_cap} bytes) exceeded" end # No FNM_PATHNAME: `*` matches across `/` so nested-path bombs sum. next unless globs.any? { |g| File.fnmatch?(g, entry.name) } scoped += size if scoped > budget.max_decompressed_bytes raise CapExceeded, "decompressed zip size cap (#{budget.max_decompressed_bytes} bytes) exceeded" end end end rescue CapExceeded raise rescue StandardError # A malformed/unreadable zip is not our concern here -- let the gem-level # converter handle it (it degrades to nil/shell-hint). Don't block a # valid file because the pre-check tripped on an exotic zip layout. nil end |
.int(value, default) ⇒ Object
142 143 144 145 146 |
# File 'lib/rubino/documents/limits.rb', line 142 def int(value, default) value.nil? ? default : Integer(value) rescue ArgumentError, TypeError default end |
.null_budget ⇒ Object
A no-op budget for direct converter calls / tests that don’t thread a real budget. Caps are effectively unbounded but cancellation still works if a token is supplied.
117 118 119 120 121 122 123 |
# File 'lib/rubino/documents/limits.rb', line 117 def null_budget Budget.new( max_elements: Float::INFINITY, max_decompressed_bytes: Float::INFINITY, wall_clock_seconds: Float::INFINITY ) end |
.policy_config ⇒ Object
136 137 138 139 140 |
# File 'lib/rubino/documents/limits.rb', line 136 def policy_config Rubino.configuration.dig("attachments", "policy") || {} rescue StandardError {} end |
.total_archive_cap(budget) ⇒ Object
107 108 109 110 111 112 |
# File 'lib/rubino/documents/limits.rb', line 107 def total_archive_cap(budget) body = budget.max_decompressed_bytes return body if body == Float::INFINITY body * ARCHIVE_CAP_MULTIPLIER end |