Class: Optimize::Codec::ObjectTable
- Inherits:
-
Object
- Object
- Optimize::Codec::ObjectTable
- Defined in:
- lib/optimize/codec/object_table.rb
Overview
The global object table holds every literal Ruby object referenced by iseq instructions. Instructions carry integer indices into this table; index 0 is always nil.
Binary layout (from research/cruby/ibf-format.md §3):
Object data region — per-object payloads at various offsets within the binary
Object offset array — global_object_list_size × uint32_t, at global_object_list_offset
Each object in the data region begins with a 1-byte header:
bits [4:0] — T_ type constant
bit [5] — special_const (1 = encoded as raw VALUE small_value)
bit [6] — frozen
bit [7] — internal
Ruby 4.0.2 special VALUE constants (empirically verified):
RUBY_Qfalse = 0
RUBY_Qtrue = 20
RUBY_Qnil = 4
RUBY_Qundef = 36 (= QNIL | 0x20; used as sentinel for unset keyword defaults)
Fixnum n: VALUE = (n << 1) | 1
Constant Summary collapse
- T_CLASS =
T_ type constants from ruby.h
2- T_FLOAT =
4- T_STRING =
5- T_REGEXP =
6- T_ARRAY =
7- T_HASH =
8- T_STRUCT =
9- T_BIGNUM =
10- T_DATA =
12- T_COMPLEX =
14- T_RATIONAL =
15- T_SYMBOL =
20- QFALSE =
Special VALUE constants for Ruby 4.0.2 (empirically verified via object_id)
0- QTRUE =
20- QNIL =
4- QUNDEF =
RUBY_Qundef = QNIL | 0x20; stored in the object table as a sentinel for keyword parameters whose default must be computed at runtime.
36
Instance Attribute Summary collapse
-
#objects ⇒ Array<Object>
readonly
Decoded Ruby objects in on-disk index order.
Class Method Summary collapse
-
.decode(binary_or_reader, header) ⇒ ObjectTable
Decode the object table.
-
.decode_array(reader, obj_offsets) ⇒ Object
T_ARRAY: len (small_value), then len object-table indices (small_values).
-
.decode_bignum(reader) ⇒ Object
T_BIGNUM: ssize_t slen (aligned), then |slen| BDIGIT words.
- .decode_class(reader) ⇒ Object
-
.decode_complex_rational(reader, obj_offsets, kind) ⇒ Object
T_COMPLEX / T_RATIONAL: struct { long a; long b } (aligned).
-
.decode_data(reader) ⇒ Object
T_DATA: only encoding objects are supported in IBF.
-
.decode_float(reader) ⇒ Object
T_FLOAT: 8-byte IEEE 754 double, aligned to 8 within the binary buffer.
-
.decode_from_binary(binary, header) ⇒ ObjectTable
Decode from a full YARB binary string.
-
.decode_hash(reader, obj_offsets) ⇒ Object
T_HASH: len (small_value key-value pairs), then 2*len object-table indices.
-
.decode_one_object(reader, obj_offsets) ⇒ Object
Decode one object beginning at the current reader position.
-
.decode_regexp(reader, obj_offsets) ⇒ Object
T_REGEXP: option byte + small_value (object-table index of source string).
-
.decode_special_const(reader) ⇒ Object
Decode a special_const object: the body is a single small_value holding the raw Ruby VALUE.
-
.decode_string(reader) ⇒ Object
T_STRING: encindex (small_value), len (small_value), raw bytes.
-
.decode_struct(reader, obj_offsets) ⇒ Object
T_STRUCT (Range only): struct { long class_index; long len; long beg; long end; int excl } (aligned, written raw).
-
.decode_symbol(reader) ⇒ Object
T_SYMBOL: same wire format as T_STRING (delegates to string decode).
-
.encoding_for_index(encindex) ⇒ Object
Map a CRuby encoding index to a Ruby Encoding.
Instance Method Summary collapse
-
#appended_count ⇒ Integer
Number of newly-interned objects pending append on the next encode.
-
#encode(writer, iseq_list_delta: 0) ⇒ Integer?
Write the object table bytes to
writer. -
#index_for(value) ⇒ Integer?
Find an existing index in the object table whose stored value equals
value(compared by both == and class so true does not collide with 1, etc.). -
#initialize(objects, raw_object_region, obj_list_size: 0, obj_list_offset_in_region: 0) ⇒ ObjectTable
constructor
A new instance of ObjectTable.
-
#intern(value) ⇒ Integer
Return the index of
valuein the table, appending it if absent. -
#resolve(index) ⇒ Object
Resolve an object-table index to the stored value (Symbol for ID refs, frozen String for STRING entries, etc.).
Constructor Details
#initialize(objects, raw_object_region, obj_list_size: 0, obj_list_offset_in_region: 0) ⇒ ObjectTable
Returns a new instance of ObjectTable.
56 57 58 59 60 61 62 63 64 65 66 |
# File 'lib/optimize/codec/object_table.rb', line 56 def initialize(objects, raw_object_region, obj_list_size: 0, obj_list_offset_in_region: 0) @objects = objects # Raw bytes covering the object data region + object offset array only. # Starts at (iseq_list_offset + iseq_list_size * 4) and runs to end of binary. @raw_object_region = raw_object_region # Number of objects (= size of the offset array in u32 entries). @obj_list_size = obj_list_size # Byte offset of the object offset array WITHIN @raw_object_region. # Used to patch absolute offsets when the iseq region has grown/shrunk. @obj_list_offset_in_region = obj_list_offset_in_region end |
Instance Attribute Details
#objects ⇒ Array<Object> (readonly)
Returns decoded Ruby objects in on-disk index order.
54 55 56 |
# File 'lib/optimize/codec/object_table.rb', line 54 def objects @objects end |
Class Method Details
.decode(binary_or_reader, header) ⇒ ObjectTable
Decode the object table.
Accepts either:
decode(binary_string, header) — preferred; full YARB binary as String
decode(reader, header) — legacy; BinaryReader positioned after header
77 78 79 80 81 82 83 84 85 86 87 88 89 |
# File 'lib/optimize/codec/object_table.rb', line 77 def self.decode(binary_or_reader, header) if binary_or_reader.is_a?(BinaryReader) # Legacy path: reconstruct binary from reader, then delegate. reader = binary_or_reader binary = reader.peek_bytes(0, reader.bytesize) result = decode_from_binary(binary, header) # Advance reader to end of object offset array (legacy contract). reader.seek(header.global_object_list_offset + header.global_object_list_size * 4) result else decode_from_binary(binary_or_reader, header) end end |
.decode_array(reader, obj_offsets) ⇒ Object
T_ARRAY: len (small_value), then len object-table indices (small_values)
340 341 342 343 344 |
# File 'lib/optimize/codec/object_table.rb', line 340 def self.decode_array(reader, obj_offsets) len = reader.read_small_value len.times.map { reader.read_small_value } # Returns the array of indices for now; full resolution needs two-pass decode end |
.decode_bignum(reader) ⇒ Object
T_BIGNUM: ssize_t slen (aligned), then |slen| BDIGIT words.
385 386 387 388 389 390 391 392 393 394 395 |
# File 'lib/optimize/codec/object_table.rb', line 385 def self.decode_bignum(reader) wordsize = 8 aligned = (reader.pos + wordsize - 1) & ~(wordsize - 1) reader.seek(aligned) slen = reader.read_bytes(wordsize).unpack1("q<") ndigits = slen.abs # BDIGIT size is platform-dependent; assume 64-bit here. digits = ndigits.times.map { reader.read_bytes(8).unpack1("Q<") } value = digits.each_with_index.sum { |d, i| d << (64 * i) } slen < 0 ? -value : value end |
.decode_class(reader) ⇒ Object
366 367 368 369 |
# File 'lib/optimize/codec/object_table.rb', line 366 def self.decode_class(reader) cindex = reader.read_small_value CLASS_NAMES[cindex] or raise Codec::UnsupportedObjectKind, "unknown class index #{cindex}" end |
.decode_complex_rational(reader, obj_offsets, kind) ⇒ Object
T_COMPLEX / T_RATIONAL: struct { long a; long b } (aligned)
413 414 415 416 417 418 419 420 421 |
# File 'lib/optimize/codec/object_table.rb', line 413 def self.decode_complex_rational(reader, obj_offsets, kind) wordsize = 8 aligned = (reader.pos + wordsize - 1) & ~(wordsize - 1) reader.seek(aligned) a = reader.read_bytes(wordsize).unpack1("q<") b = reader.read_bytes(wordsize).unpack1("q<") # Return index pairs for now kind == :complex ? Complex(a, b) : Rational(a, b) end |
.decode_data(reader) ⇒ Object
373 374 375 376 377 378 379 380 381 382 |
# File 'lib/optimize/codec/object_table.rb', line 373 def self.decode_data(reader) wordsize = 8 # we only support 64-bit hosts aligned = (reader.pos + wordsize - 1) & ~(wordsize - 1) reader.seek(aligned) kind = reader.read_bytes(wordsize).unpack1("q<") len = reader.read_bytes(wordsize).unpack1("q<") raise Codec::UnsupportedObjectKind, "T_DATA kind #{kind} (expected 0)" unless kind == 0 name = reader.read_bytes(len).delete("\x00") Encoding.find(name) end |
.decode_float(reader) ⇒ Object
T_FLOAT: 8-byte IEEE 754 double, aligned to 8 within the binary buffer. The reader is positioned at the byte immediately after the 1-byte header.
318 319 320 321 322 323 |
# File 'lib/optimize/codec/object_table.rb', line 318 def self.decode_float(reader) # Align to 8-byte boundary (absolute position in the binary) aligned = (reader.pos + 7) & ~7 reader.seek(aligned) reader.read_bytes(8).unpack1("d") end |
.decode_from_binary(binary, header) ⇒ ObjectTable
Decode from a full YARB binary string.
96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 |
# File 'lib/optimize/codec/object_table.rb', line 96 def self.decode_from_binary(binary, header) obj_list_size = header.global_object_list_size obj_list_offset = header.global_object_list_offset # The object data region starts immediately after the iseq offset array. obj_region_start = header.iseq_list_offset + header.iseq_list_size * 4 obj_region_len = binary.bytesize - obj_region_start raw_object_region = binary.byteslice(obj_region_start, obj_region_len) # Byte offset of the object offset array within the raw_object_region. obj_list_offset_in_region = obj_list_offset - obj_region_start # Build a temporary reader to decode object bodies. reader = BinaryReader.new(binary) # Read the object offset array reader.seek(obj_list_offset) obj_offsets = obj_list_size.times.map { reader.read_u32 } # Decode each object by seeking to its offset objects = obj_offsets.map do |off| reader.seek(off) decode_one_object(reader, obj_offsets) end new(objects, raw_object_region, obj_list_size: obj_list_size, obj_list_offset_in_region: obj_list_offset_in_region) end |
.decode_hash(reader, obj_offsets) ⇒ Object
T_HASH: len (small_value key-value pairs), then 2*len object-table indices
347 348 349 350 351 352 353 354 355 356 |
# File 'lib/optimize/codec/object_table.rb', line 347 def self.decode_hash(reader, obj_offsets) len = reader.read_small_value result = {} len.times do k = reader.read_small_value v = reader.read_small_value result[k] = v end result end |
.decode_one_object(reader, obj_offsets) ⇒ Object
Decode one object beginning at the current reader position. obj_offsets is the full array of absolute offsets (used by reference types to resolve object-table indices to Ruby objects).
241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 |
# File 'lib/optimize/codec/object_table.rb', line 241 def self.decode_one_object(reader, obj_offsets) hdr = reader.read_u8 type = hdr & 0x1f special_const = (hdr >> 5) & 1 if special_const == 1 decode_special_const(reader) else case type when T_STRING then decode_string(reader) when T_SYMBOL then decode_symbol(reader) when T_FLOAT then decode_float(reader) when T_REGEXP then decode_regexp(reader, obj_offsets) when T_ARRAY then decode_array(reader, obj_offsets) when T_HASH then decode_hash(reader, obj_offsets) when T_CLASS then decode_class(reader) when T_DATA then decode_data(reader) when T_BIGNUM then decode_bignum(reader) when T_STRUCT then decode_struct(reader, obj_offsets) when T_COMPLEX then decode_complex_rational(reader, obj_offsets, :complex) when T_RATIONAL then decode_complex_rational(reader, obj_offsets, :rational) else raise Codec::UnsupportedObjectKind, "unsupported IBF object type #{type} (header byte 0x#{hdr.to_s(16)})" end end end |
.decode_regexp(reader, obj_offsets) ⇒ Object
T_REGEXP: option byte + small_value (object-table index of source string)
326 327 328 329 330 331 332 333 334 335 336 337 |
# File 'lib/optimize/codec/object_table.rb', line 326 def self.decode_regexp(reader, obj_offsets) option = reader.read_u8 srcstr_idx = reader.read_small_value # We return a Regexp if we can; the source string is at obj_offsets[srcstr_idx]. # To avoid recursive seeks here we return a placeholder and let the caller sort it. # For the round-trip test we just need the object to be present (any Regexp). # Full decoding would require re-loading the source object. # For now: return a best-effort Regexp using stored info. Regexp.new("__ibf_srcidx_#{srcstr_idx}__", option) rescue raise Codec::UnsupportedObjectKind, "failed to decode T_REGEXP option=#{option}" end |
.decode_special_const(reader) ⇒ Object
Decode a special_const object: the body is a single small_value holding the raw Ruby VALUE.
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 |
# File 'lib/optimize/codec/object_table.rb', line 270 def self.decode_special_const(reader) value = reader.read_small_value if value & 1 == 1 # Fixnum: VALUE = (n << 1) | 1. The VALUE was stored as an unsigned # 64-bit integer; interpret it as signed before shifting so that # negative fixnums (e.g. -6 → VALUE 0xFFFF_FFFF_FFFF_FFF5) decode # correctly. Values with bit 63 set are negative CRuby fixnums. signed = value >= (1 << 63) ? value - (1 << 64) : value signed >> 1 elsif value == QNIL nil elsif value == QTRUE true elsif value == QFALSE false elsif value == QUNDEF # RUBY_Qundef: internal sentinel stored for keyword parameters whose default # is computed at runtime (e.g. `def f(k: expr)`). Return a frozen sentinel # string; the value is never used for re-encoding (raw bytes are preserved). :__qundef__ else # Could be a flonum or other special const — we surface the raw VALUE # (future tasks can interpret flonum bits if needed) raise Codec::UnsupportedObjectKind, "unknown special_const VALUE #{value} (0x#{value.to_s(16)})" end end |
.decode_string(reader) ⇒ Object
T_STRING: encindex (small_value), len (small_value), raw bytes
299 300 301 302 303 304 305 306 |
# File 'lib/optimize/codec/object_table.rb', line 299 def self.decode_string(reader) encindex = reader.read_small_value len = reader.read_small_value bytes = reader.read_bytes(len) enc = encoding_for_index(encindex) bytes.force_encoding(enc) bytes.encode(Encoding::UTF_8) rescue bytes.dup end |
.decode_struct(reader, obj_offsets) ⇒ Object
T_STRUCT (Range only): struct { long class_index; long len; long beg; long end; int excl } (aligned, written raw)
399 400 401 402 403 404 405 406 407 408 409 410 |
# File 'lib/optimize/codec/object_table.rb', line 399 def self.decode_struct(reader, obj_offsets) wordsize = 8 aligned = (reader.pos + wordsize - 1) & ~(wordsize - 1) reader.seek(aligned) _class_index = reader.read_bytes(wordsize).unpack1("q<") _len = reader.read_bytes(wordsize).unpack1("q<") beg_idx = reader.read_bytes(wordsize).unpack1("q<") end_idx = reader.read_bytes(wordsize).unpack1("q<") excl = reader.read_bytes(4).unpack1("l<") # Return indices for now; full resolution needs two-pass decode (beg_idx..end_idx) # approximate end |
.decode_symbol(reader) ⇒ Object
T_SYMBOL: same wire format as T_STRING (delegates to string decode)
309 310 311 312 313 314 |
# File 'lib/optimize/codec/object_table.rb', line 309 def self.decode_symbol(reader) encindex = reader.read_small_value len = reader.read_small_value bytes = reader.read_bytes(len) bytes.to_sym end |
.encoding_for_index(encindex) ⇒ Object
Map a CRuby encoding index to a Ruby Encoding. RUBY_ENCINDEX_ASCII_8BIT = 0, RUBY_ENCINDEX_UTF_8 = 1, RUBY_ENCINDEX_US_ASCII = 2
425 426 427 428 429 430 431 432 |
# File 'lib/optimize/codec/object_table.rb', line 425 def self.encoding_for_index(encindex) case encindex when 0 then Encoding::ASCII_8BIT when 1 then Encoding::UTF_8 when 2 then Encoding::US_ASCII else Encoding::UTF_8 # best-effort fallback for unknown encoding indices end end |
Instance Method Details
#appended_count ⇒ Integer
Number of newly-interned objects pending append on the next encode.
232 233 234 |
# File 'lib/optimize/codec/object_table.rb', line 232 def appended_count (@appended || []).size end |
#encode(writer, iseq_list_delta: 0) ⇒ Integer?
Write the object table bytes to writer. Emits the object data region + offset array. When iseq_list_delta is non-zero (the iseq region has grown or shrunk), the object offset array (which stores absolute positions in the binary) is patched so each entry shifts by that delta. The object payload bytes before the offset array are always verbatim.
139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
# File 'lib/optimize/codec/object_table.rb', line 139 def encode(writer, iseq_list_delta: 0) no_appends = @appended.nil? || @appended.empty? if (iseq_list_delta == 0 || @obj_list_size == 0) && no_appends # Fast path: byte-identical. writer.write_bytes(@raw_object_region) return nil else # Write object payload bytes verbatim (everything before the offset array). writer.write_bytes(@raw_object_region.byteslice(0, @obj_list_offset_in_region)) # Append payloads for any newly interned objects, recording their absolute # positions so we can write them into the offset array below. appended_offsets = [] (@appended || []).each do |value| appended_offsets << writer.pos if value.is_a?(String) write_string(writer, value) else write_special_const(writer, value) end end # The object offset array must be 4-byte aligned (ibf_dump_align uses # sizeof(ibf_offset_t) = 4). Pad after appended payloads if needed. writer.align_to(4) # Capture the absolute position where the offset array begins in the new buffer. fresh_obj_list_offset = writer.pos # Patch each u32 in the original offset array by adding iseq_list_delta. @obj_list_size.times do |i| orig = @raw_object_region.byteslice(@obj_list_offset_in_region + i * 4, 4).unpack1("V") writer.write_bytes([orig + iseq_list_delta].pack("V")) end # Write the new offset array entries. These are already absolute # positions in the NEW buffer (writer.pos accounts for the fresh # iseq region size), so no iseq_list_delta patch is needed. appended_offsets.each do |abs_pos| writer.write_bytes([abs_pos].pack("V")) end # Any trailing bytes after the offset array (if any). trail_start = @obj_list_offset_in_region + @obj_list_size * 4 trail = @raw_object_region.byteslice(trail_start, @raw_object_region.bytesize - trail_start) writer.write_bytes(trail) if trail && !trail.empty? return fresh_obj_list_offset end end |
#index_for(value) ⇒ Integer?
Find an existing index in the object table whose stored value equals value (compared by both == and class so true does not collide with 1, etc.).
198 199 200 |
# File 'lib/optimize/codec/object_table.rb', line 198 def index_for(value) @objects.index { |o| o == value && o.class == value.class } end |
#intern(value) ⇒ Integer
Return the index of value in the table, appending it if absent. Only special-const values are supported (Integer fixnum, true, false, nil). The new payload is emitted at encode time; the offset array is regrown there.
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 |
# File 'lib/optimize/codec/object_table.rb', line 206 def intern(value) existing = index_for(value) return existing if existing if value.is_a?(String) stored = value.dup.freeze new_idx = @objects.size @objects << stored @appended ||= [] @appended << stored return new_idx end unless special_const?(value) raise ArgumentError, "ObjectTable#intern only supports special-const values (Integer/true/false/nil) or String, got #{value.inspect}" end new_idx = @objects.size @objects << value @appended ||= [] @appended << value new_idx end |
#resolve(index) ⇒ Object
Resolve an object-table index to the stored value (Symbol for ID refs, frozen String for STRING entries, etc.). Used by IR::CallData#mid_symbol.
191 192 193 |
# File 'lib/optimize/codec/object_table.rb', line 191 def resolve(index) @objects[index] end |