Class: Rubino::Documents::Converters::Xlsx

Inherits:
Object
  • Object
show all
Defined in:
lib/rubino/documents/converters/xlsx.rb

Overview

XLSX (and ODS/legacy XLS where roo supports them) -> Markdown. Each sheet becomes a ‘## SheetName` heading followed by a GFM table emitted by the shared Table emitter. The `roo` gem (MIT) is OPTIONAL: #available? reports false when it can’t be required, so the registry never offers this converter on an install without roo – the caller then falls back to the shell-extraction hint.

Constant Summary collapse

MIMES =
%w[
  application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
  application/vnd.oasis.opendocument.spreadsheet
  application/vnd.ms-excel
].freeze
EXTS =
%w[.xlsx .ods .xls].freeze
ODS_GLOBS =

OpenDocument (ODS) body globs: roo reads ‘content.xml` at the archive ROOT (and may touch other root *.xml like styles.xml/meta.xml) – NOT under xl/. Scoping the pre-open guard to xl/** alone let an ODS bomb sum to zero and slip to inflate (#350); we add the root XML read paths.

["content.xml", "*.xml"].freeze
XLSX_GLOBS =

OOXML (xlsx) body parts live under xl/ (across ‘/`, no FNM_PATHNAME).

["xl/**"].freeze

Instance Method Summary collapse

Instance Method Details

#accepts?(mime, path) ⇒ Boolean

Returns:

  • (Boolean)


27
28
29
30
31
# File 'lib/rubino/documents/converters/xlsx.rb', line 27

def accepts?(mime, path)
  return true if MIMES.include?(mime.to_s)

  EXTS.include?(File.extname(path.to_s).downcase)
end

#available?Boolean

Returns:

  • (Boolean)


20
21
22
23
24
25
# File 'lib/rubino/documents/converters/xlsx.rb', line 20

def available?
  require "roo"
  true
rescue LoadError
  false
end

#convert(path, budget = Limits.null_budget) ⇒ Object



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
# File 'lib/rubino/documents/converters/xlsx.rb', line 41

def convert(path, budget = Limits.null_budget)
  require "roo"
  # PRE-OPEN guard: a 400k-row spreadsheet expands its sheet/content XML
  # far past the on-disk cap. Sum the uncompressed sizes of the body
  # entries (and any nested/non-standard part a bomb could hide behind a
  # .rels Target) from the central directory and bail before roo inflates
  # them. Globs match across `/` (guard_zip! omits FNM_PATHNAME) so a deep
  # bomb is summed too (#337); the glob set is chosen per format so an ODS
  # bomb rooted at content.xml is also caught (#350).
  Limits.guard_zip!(path, budget, zip_globs(path))
  book = Roo::Spreadsheet.open(path)
  parts = book.sheets.map { |name| sheet_markdown(book, name, budget) }.compact
  parts.join("\n\n")
ensure
  book&.close if defined?(book) && book.respond_to?(:close)
end