Class: Rbxl::ReadOnlyWorksheet

Inherits:
Object
  • Object
show all
Defined in:
lib/rbxl/read_only_worksheet.rb

Overview

Row-by-row worksheet reader for a single sheet of a read-only workbook.

Instances are produced by Rbxl::ReadOnlyWorkbook#sheet and must not be constructed directly; their lifecycle is bound to the workbook’s ZIP handle. Rows can be consumed as Row objects or as plain value arrays depending on the iteration options.

Iteration modes

# Default: yield Rbxl::Row with cell wrappers.
sheet.each_row { |row| row.values }

# Fast path: yield plain Array<Object> of decoded values.
sheet.each_row(values_only: true) { |values| ... }

# Pad missing cells in sparse rows up to max_column.
sheet.each_row(pad_cells: true) { |row| ... }

# Replicate anchor values across merged ranges.
sheet.each_row(expand_merged: true) { |row| ... }

Iteration without a block returns an Enumerator.

Dimensions

The worksheet dimension (the A1:C10-style range) is read from the sheet’s <dimension> element when present. When absent or when the caller wants to recompute it, #calculate_dimension with force: true scans the sheet for actual cell coordinates.

Constant Summary collapse

ELEMENT_NODE =
Nokogiri::XML::Reader::TYPE_ELEMENT
TEXT_NODE =
Nokogiri::XML::Reader::TYPE_TEXT
CDATA_NODE =
Nokogiri::XML::Reader::TYPE_CDATA
END_ELEMENT_NODE =
Nokogiri::XML::Reader::TYPE_END_ELEMENT

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(zip:, entry_path:, workbook_path:, shared_strings:, name:, streaming: false, date_styles: nil, date_1904: false) ⇒ ReadOnlyWorksheet

Returns a new instance of ReadOnlyWorksheet.

Parameters:

  • zip (Zip::File)

    open archive shared with the workbook

  • entry_path (String)

    ZIP entry path for this sheet’s XML

  • workbook_path (String)

    filesystem path the workbook was opened from

  • shared_strings (Array<String>)

    pre-decoded shared strings table

  • name (String)

    visible sheet name

  • streaming (Boolean) (defaults to: false)

    when the native extension is loaded, feed worksheet XML to the parser in chunks instead of reading the entry into memory first

  • date_styles (Array<Boolean>, nil) (defaults to: nil)

    true at a style id when the id’s numFmt is a date/time format. When provided, numeric cells with a matching style are returned as Date or Time instead of Float, and the native fast path is bypassed.

  • date_1904 (Boolean) (defaults to: false)

    whether the workbook uses Excel’s 1904 date system instead of the default 1900 date system



65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/rbxl/read_only_worksheet.rb', line 65

def initialize(zip:, entry_path:, workbook_path:, shared_strings:, name:, streaming: false, date_styles: nil, date_1904: false)
  @zip = zip
  @entry_path = entry_path
  @workbook_path = workbook_path
  @shared_strings = shared_strings
  @name = name
  @streaming = streaming
  @date_styles = date_styles
  @date_1904 = date_1904
  @disable_native = !date_styles.nil?
  @dimensions = extract_dimensions
  @merge_ranges_by_row = nil
  @merge_anchor_values = {}
end

Instance Attribute Details

#dimensionsHash{Symbol => Object}? (readonly)

Parsed dimension metadata, nil when the sheet has no <dimension> element and no scan has been forced. When present the hash has keys :ref, :max_col, and :max_row.

Returns:

  • (Hash{Symbol => Object}, nil)


49
50
51
# File 'lib/rbxl/read_only_worksheet.rb', line 49

def dimensions
  @dimensions
end

#nameString (readonly)

Returns visible sheet name.

Returns:

  • (String)

    visible sheet name



42
43
44
# File 'lib/rbxl/read_only_worksheet.rb', line 42

def name
  @name
end

Instance Method Details

#calculate_dimension(force: false) ⇒ String

Returns the worksheet dimension reference (e.g. "A1:C10").

When the sheet lacks a <dimension> element the default is to raise UnsizedWorksheetError. Passing force: true scans the sheet for the actual cell bounds instead; a sheet with no cells at all falls back to "A1:A1".

Parameters:

  • force (Boolean) (defaults to: false)

    scan the sheet when no stored dimension exists

Returns:

  • (String)

    Excel-style range reference

Raises:



153
154
155
156
157
158
159
160
161
162
# File 'lib/rbxl/read_only_worksheet.rb', line 153

def calculate_dimension(force: false)
  if dimensions
    return dimensions[:ref]
  end

  raise UnsizedWorksheetError, "worksheet is unsized, use force: true" unless force

  @dimensions = scan_dimensions
  dimensions ? dimensions[:ref] : "A1:A1"
end

#each_row(pad_cells: false, values_only: false, expand_merged: false) {|row| ... } ⇒ Enumerator, void

Iterates rows in worksheet order.

With values_only and neither pad_cells nor expand_merged set, iteration takes a tighter path that yields frozen Array<Object> rows and skips allocating cell wrappers.

Parameters:

  • pad_cells (Boolean) (defaults to: false)

    pad sparse rows with EmptyCell (or [coordinate, nil] pairs in values_only mode) up to the worksheet’s max_column

  • values_only (Boolean) (defaults to: false)

    yield plain value arrays instead of Rbxl::Row instances

  • expand_merged (Boolean) (defaults to: false)

    propagate the anchor value of every merged range across the range’s cells

Yield Parameters:

Returns:

  • (Enumerator, void)

    enumerator when called without a block



95
96
97
98
99
100
101
102
103
# File 'lib/rbxl/read_only_worksheet.rb', line 95

def each_row(pad_cells: false, values_only: false, expand_merged: false, &block)
  return enum_for(:each_row, pad_cells: pad_cells, values_only: values_only, expand_merged: expand_merged) unless block

  if values_only && !pad_cells && !expand_merged
    each_row_values_only(&block)
  else
    each_row_full(pad_cells: pad_cells, values_only: values_only, expand_merged: expand_merged, &block)
  end
end

#max_columnInteger?

Returns rightmost column index (1-based) from the worksheet dimension, or nil when dimensions are unknown.

Returns:

  • (Integer, nil)

    rightmost column index (1-based) from the worksheet dimension, or nil when dimensions are unknown



120
121
122
123
124
# File 'lib/rbxl/read_only_worksheet.rb', line 120

def max_column
  return nil unless dimensions

  dimensions[:max_col]
end

#max_rowInteger?

Returns bottom row index (1-based) from the worksheet dimension, or nil when dimensions are unknown.

Returns:

  • (Integer, nil)

    bottom row index (1-based) from the worksheet dimension, or nil when dimensions are unknown



128
129
130
131
132
# File 'lib/rbxl/read_only_worksheet.rb', line 128

def max_row
  return nil unless dimensions

  dimensions[:max_row]
end

#reset_dimensionsnil

Clears cached dimension metadata so that the next call to #calculate_dimension recomputes it.

Returns:

  • (nil)


138
139
140
# File 'lib/rbxl/read_only_worksheet.rb', line 138

def reset_dimensions
  @dimensions = nil
end

#rows(values_only: false, pad_cells: false, expand_merged: false) ⇒ Enumerator

Enumerator-returning alias for #each_row that reads more naturally when the call site chains further enumerable operations.

sheet.rows(values_only: true).take(10)

Parameters:

  • values_only (Boolean) (defaults to: false)

    see #each_row

  • pad_cells (Boolean) (defaults to: false)

    see #each_row

  • expand_merged (Boolean) (defaults to: false)

    see #each_row

Returns:

  • (Enumerator)


114
115
116
# File 'lib/rbxl/read_only_worksheet.rb', line 114

def rows(values_only: false, pad_cells: false, expand_merged: false)
  each_row(values_only: values_only, pad_cells: pad_cells, expand_merged: expand_merged)
end