Module: Rbxl::SharedStringsLoader Private

Defined in:
lib/rbxl/shared_strings_loader.rb

Overview

This module is part of a private API. You should avoid using this module if possible, as it may be removed or be changed in the future.

Streams xl/sharedStrings.xml out of an opened .xlsx ZIP and decodes the table to an immutable Array<String>.

Both the read-only and edit modes need this same view of the SST. The logic is identical — phonetic guides are skipped, <r>/<t> runs inside an <si> are concatenated, the count and byte caps configured on Rbxl are enforced — so it lives here as a single source of truth rather than being inlined twice.

Class Method Summary collapse

Class Method Details

.load(zip) ⇒ Array<String>

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns frozen, index-aligned shared strings table.

Parameters:

  • zip (Zip::File)

    the open package

Returns:

  • (Array<String>)

    frozen, index-aligned shared strings table

Raises:



19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
# File 'lib/rbxl/shared_strings_loader.rb', line 19

def load(zip)
  entry = zip.find_entry("xl/sharedStrings.xml")
  return [].freeze unless entry

  max_count = Rbxl.max_shared_strings
  max_bytes = Rbxl.max_shared_string_bytes

  # Reject zip-bomb style entries up front using the ZIP directory's
  # declared uncompressed size, before allocating any decompression buffer.
  if max_bytes && entry.size && entry.size > max_bytes
    raise SharedStringsTooLargeError,
          "shared strings uncompressed size #{entry.size} exceeds limit #{max_bytes}"
  end

  strings = []
  total_bytes = 0
  io = entry.get_input_stream
  reader = Nokogiri::XML::Reader(io)

  in_si = false
  in_run = false
  in_phonetic = false
  collecting_text = false
  buffer = +""
  current_fragments = []

  reader.each do |node|
    case node.node_type
    when Nokogiri::XML::Reader::TYPE_ELEMENT
      case node.local_name
      when "si"
        in_si = true
        current_fragments = []
      when "r"
        in_run = true if in_si
      when "rPh"
        in_phonetic = true if in_si
      when "t"
        next unless in_si && !in_phonetic

        collecting_text = !in_run || node.depth.positive?
        buffer.clear if collecting_text
      end
    when Nokogiri::XML::Reader::TYPE_TEXT, Nokogiri::XML::Reader::TYPE_CDATA
      buffer << node.value if collecting_text
    when Nokogiri::XML::Reader::TYPE_END_ELEMENT
      case node.local_name
      when "t"
        if collecting_text
          current_fragments << buffer.dup
          collecting_text = false
        end
      when "r"
        in_run = false
      when "rPh"
        in_phonetic = false
      when "si"
        value = current_fragments.join.freeze
        total_bytes += value.bytesize
        if max_bytes && total_bytes > max_bytes
          raise SharedStringsTooLargeError,
                "shared strings total size exceeds limit #{max_bytes}"
        end
        strings << value
        if max_count && strings.size > max_count
          raise SharedStringsTooLargeError,
                "shared strings count exceeds limit #{max_count}"
        end
        in_si = false
        in_run = false
        in_phonetic = false
        collecting_text = false
      end
    end
  end

  strings.freeze
ensure
  io&.close
end