Class: DaruLite::DataFrame
- Extended by:
- Gem::Deprecate
- Includes:
- Aggregatable, Calculatable, Convertible, Duplicatable, Fetchable, Filterable, IOAble, Indexable, Iterable, Joinable, Missable, Pivotable, Queryable, Setable, Sortable, Maths::Arithmetic::DataFrame, Maths::Statistics::DataFrame
- Defined in:
- lib/daru_lite/dataframe.rb,
lib/daru_lite/data_frame/setable.rb,
lib/daru_lite/data_frame/i_o_able.rb,
lib/daru_lite/data_frame/iterable.rb,
lib/daru_lite/data_frame/joinable.rb,
lib/daru_lite/data_frame/missable.rb,
lib/daru_lite/data_frame/sortable.rb,
lib/daru_lite/data_frame/fetchable.rb,
lib/daru_lite/data_frame/indexable.rb,
lib/daru_lite/data_frame/pivotable.rb,
lib/daru_lite/data_frame/queryable.rb,
lib/daru_lite/extensions/which_dsl.rb,
lib/daru_lite/data_frame/filterable.rb,
lib/daru_lite/data_frame/convertible.rb,
lib/daru_lite/data_frame/aggregatable.rb,
lib/daru_lite/data_frame/calculatable.rb,
lib/daru_lite/data_frame/duplicatable.rb
Overview
rubocop:disable Metrics/ClassLength
Defined Under Namespace
Modules: Aggregatable, Calculatable, Convertible, Duplicatable, Fetchable, Filterable, IOAble, Indexable, Iterable, Joinable, Missable, Pivotable, Queryable, Setable, Sortable
Constant Summary collapse
- AXES =
%i[row vector].freeze
Instance Attribute Summary collapse
-
#data ⇒ Object
readonly
TOREMOVE.
-
#index ⇒ Object
readonly
The index of the rows of the DataFrame.
-
#name ⇒ Object
readonly
The name of the DataFrame.
-
#size ⇒ Object
readonly
The number of rows present in the DataFrame.
-
#vectors ⇒ Object
readonly
The vectors (columns) index of the DataFrame.
Class Method Summary collapse
-
.crosstab_by_assignation(rows, columns, values) ⇒ Object
Generates a new dataset, using three vectors - Rows - Columns - Values.
-
.rows(source, opts = {}) ⇒ Object
Create DataFrame by specifying rows as an Array of Arrays or Array of DaruLite::Vector objects.
Instance Method Summary collapse
- #==(other) ⇒ Object
-
#add_level_to_vectors(top_level_label) ⇒ Object
Converts the vectors to a DaruLite::MultiIndex.
- #add_vectors_by_split(name, join = '-', sep = DaruLite::SPLIT_TOKEN) ⇒ Object
- #add_vectors_by_split_recode(nm, join = '-', sep = DaruLite::SPLIT_TOKEN) ⇒ Object
-
#bootstrap(n = nil) ⇒ DaruLite::DataFrame
Creates a DataFrame with the random data, of n size.
-
#delete_at_position(position) ⇒ Object
Delete a row based on its position More robust than #delete_row when working with a CategoricalIndex or when the Index includes integers.
-
#delete_row(index) ⇒ Object
Delete a row.
-
#delete_vector(vector) ⇒ Object
Delete a vector.
-
#delete_vectors(*vectors) ⇒ Object
Deletes a list of vectors.
-
#initialize(source = {}, opts = {}) ⇒ DataFrame
constructor
DataFrame basically consists of an Array of Vector objects.
-
#inspect(spacing = DaruLite.spacing, threshold = DaruLite.max_rows) ⇒ Object
Pretty print in a nice table format for the command line (irb/pry/iruby).
- #interact_code(vector_names, full) ⇒ Object
- #method_missing(name, *args) ⇒ Object
-
#ncols ⇒ Object
The number of vectors.
-
#nest(*tree_keys, &block) ⇒ Object
Return a nested hash using vector names as keys and an array constructed of hashes with other values.
-
#nrows ⇒ Object
The number of rows.
-
#rename(new_name) ⇒ Object
(also: #name=)
Rename the DataFrame.
-
#rename_vectors(name_map) ⇒ Object
Renames the vectors.
-
#rename_vectors!(name_map) ⇒ Object
Renames the vectors and returns itself.
- #respond_to_missing?(name, include_private = false) ⇒ Boolean
-
#row ⇒ Object
Access a row or set/create a row.
-
#shape ⇒ Object
Return the number of rows and columns of the DataFrame in an Array.
-
#to_category(*names) ⇒ DaruLite::DataFrame
Converts the specified non category type vectors to category type vectors.
-
#transpose ⇒ Object
Transpose a DataFrame, tranposing elements and row, column indexing.
-
#update ⇒ Object
Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc.
- #which ⇒ Object
Methods included from Maths::Statistics::DataFrame
#acf, #correlation, #count, #covariance, #cumsum, #describe, #ema, #max, #mean, #median, #min, #mode, #percent_change, #product, #range, #rolling_count, #rolling_max, #rolling_mean, #rolling_median, #rolling_min, #rolling_std, #rolling_variance, #standardize, #std, #sum, #variance_sample
Methods included from Maths::Arithmetic::DataFrame
#%, #*, #**, #+, #-, #/, #exp, #round, #sqrt
Methods included from Queryable
#all?, #any?, #has_vector?, #include_values?
Methods included from Sortable
#order=, #rotate_vectors, #sort, #sort!
Methods included from Setable
#[]=, #add_row, #add_vector, #insert_vector, #set_at, #set_row_at
Methods included from Pivotable
Methods included from Missable
#has_missing_data?, #missing_values_rows, #rolling_fillna, #rolling_fillna!
Methods included from Joinable
#concat, #join, #merge, #one_to_many, #union
Methods included from IOAble
#_dump, included, #save, #write_csv, #write_excel, #write_sql
Methods included from Iterable
#apply_method, #collect, #collect_matrix, #collect_row_with_index, #collect_rows, #collect_vector_with_index, #collect_vectors, #each, #each_index, #each_row, #each_row_with_index, #each_vector, #each_vector_with_index, #map, #map!, #map_rows, #map_rows!, #map_rows_with_index, #map_vectors, #map_vectors!, #map_vectors_with_index, #recode, #recode_rows, #recode_vectors, #replace_values, #verify
Methods included from Indexable
#index=, #reindex, #reindex_vectors, #reset_index, #set_index, #vectors=
Methods included from Filterable
#filter, #filter_rows, #filter_vector, #filter_vectors, #keep_row_if, #keep_vector_if, #reject_values, #uniq, #where
Methods included from Fetchable
#[], #access_row_tuples_by_indexs, #at, #get_sub_dataframe, #get_vector_anyways, #head, #numeric_vector_names, #numeric_vectors, #only_numerics, #row_at, #split_by_category, #tail
Methods included from Duplicatable
#clone, #clone_only_valid, #clone_structure, #dup, #dup_only_valid
Methods included from Convertible
#create_sql, #to_a, #to_df, #to_h, #to_html, #to_html_tbody, #to_html_thead, #to_json, #to_matrix, #to_s
Methods included from Calculatable
#compute, #summary, #vector_by_calculation, #vector_count_characters, #vector_mean, #vector_sum
Methods included from Aggregatable
#aggregate, #group_by, #group_by_and_aggregate
Constructor Details
#initialize(source = {}, opts = {}) ⇒ DataFrame
DataFrame basically consists of an Array of Vector objects. These objects are indexed by row and column by vectors and index Index objects.
Arguments
-
source - Source from the DataFrame is to be initialized. Can be a Hash
of names and vectors (array or DaruLite::Vector), an array of arrays or array of DaruLite::Vectors.
Options
:order - An Array/DaruLite::Index/DaruLite::MultiIndex containing the order in which Vectors should appear in the DataFrame.
:index - An Array/DaruLite::Index/DaruLite::MultiIndex containing the order in which rows of the DataFrame will be named.
:name - A name for the DataFrame.
:clone - Specify as true or false. When set to false, and Vector objects are passed for the source, the Vector objects will not duplicated when creating the DataFrame. Will have no effect if Array is passed in the source, or if the passed DaruLite::Vectors have different indexes. Default to true.
Usage
df = DaruLite::DataFrame.new
# =>
# <DaruLite::DataFrame(0x0)>
# Creates an empty DataFrame with no rows or columns.
df = DaruLite::DataFrame.new({}, order: [:a, :b])
#<DaruLite::DataFrame(0x2)>
a b
# Creates a DataFrame with no rows and columns :a and :b
df = DaruLite::DataFrame.new({a: [1,2,3,4], b: [6,7,8,9]}, order: [:b, :a],
index: [:a, :b, :c, :d], name: :spider_man)
# =>
# <DaruLite::DataFrame:80766980 @name = spider_man @size = 4>
# b a
# a 6 1
# b 7 2
# c 8 3
# d 9 4
df = DaruLite::DataFrame.new([[1,2,3,4],[6,7,8,9]], name: :bat_man)
# =>
# #<DaruLite::DataFrame: bat_man (4x2)>
# 0 1
# 0 1 6
# 1 2 7
# 2 3 8
# 3 4 9
# Dataframe having Index name
df = DaruLite::DataFrame.new({a: [1,2,3,4], b: [6,7,8,9]}, order: [:b, :a],
index: DaruLite::Index.new([:a, :b, :c, :d], name: 'idx_name'),
name: :spider_man)
# =>
# <DaruLite::DataFrame:80766980 @name = spider_man @size = 4>
# idx_name b a
# a 6 1
# b 7 2
# c 8 3
# d 9 4
idx = DaruLite::Index.new [100, 99, 101, 1, 2], name: "s1"
=> #<DaruLite::Index(5): s1 {100, 99, 101, 1, 2}>
df = DaruLite::DataFrame.new({b: [11,12,13,14,15], a: [1,2,3,4,5],
c: [11,22,33,44,55]},
order: [:a, :b, :c],
index: idx)
# =>
#<DaruLite::DataFrame(5x3)>
# s1 a b c
# 100 1 11 11
# 99 2 12 22
# 101 3 13 33
# 1 4 14 44
# 2 5 15 55
237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 |
# File 'lib/daru_lite/dataframe.rb', line 237 def initialize(source = {}, opts = {}) vectors = opts[:order] index = opts[:index] # FIXME: just keyword arges after Ruby 2.1 @data = [] @name = opts[:name] case source when [], {} create_empty_vectors(vectors, index) when Array initialize_from_array source, vectors, index, opts when Hash initialize_from_hash source, vectors, index, opts end set_size validate update end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(name, *args) ⇒ Object
492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 |
# File 'lib/daru_lite/dataframe.rb', line 492 def method_missing(name, *args, &) stringified_name = name.to_s if /^([^=]+)=/.match?(stringified_name) name = stringified_name[/^([^=]+)=/].delete('=') name = name.to_sym unless has_vector?(name) insert_or_modify_vector [name], args[0] elsif has_vector?(name) self[name] elsif has_vector?(stringified_name) self[stringified_name] else super end end |
Instance Attribute Details
#data ⇒ Object (readonly)
TOREMOVE
137 138 139 |
# File 'lib/daru_lite/dataframe.rb', line 137 def data @data end |
#index ⇒ Object (readonly)
The index of the rows of the DataFrame
140 141 142 |
# File 'lib/daru_lite/dataframe.rb', line 140 def index @index end |
#name ⇒ Object (readonly)
The name of the DataFrame
143 144 145 |
# File 'lib/daru_lite/dataframe.rb', line 143 def name @name end |
#size ⇒ Object (readonly)
The number of rows present in the DataFrame
146 147 148 |
# File 'lib/daru_lite/dataframe.rb', line 146 def size @size end |
#vectors ⇒ Object (readonly)
The vectors (columns) index of the DataFrame
135 136 137 |
# File 'lib/daru_lite/dataframe.rb', line 135 def vectors @vectors end |
Class Method Details
.crosstab_by_assignation(rows, columns, values) ⇒ Object
Generates a new dataset, using three vectors
-
Rows
-
Columns
-
Values
For example, you have these values
x y v
a a 0
a b 1
b a 1
b b 0
You obtain
id a b
a 0 1
b 1 0
Useful to process outputs from databases
84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 |
# File 'lib/daru_lite/dataframe.rb', line 84 def crosstab_by_assignation(rows, columns, values) raise 'Three vectors should be equal size' if rows.size != columns.size || rows.size != values.size row_index = rows.uniq.to_a data = Hash.new do |h, col| h[col] = row_index.map { |r| [r, nil] }.to_h end validate_no_duplicate_pairs(rows, columns) columns.zip(rows, values).each { |c, r, v| data[c][r] = v } # FIXME: in fact, WITHOUT this line you'll obtain more "right" # data: with vectors having "rows" as an index... data = data.transform_values(&:values) data[:_id] = row_index DataFrame.new(data) end |
.rows(source, opts = {}) ⇒ Object
Create DataFrame by specifying rows as an Array of Arrays or Array of DaruLite::Vector objects.
50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
# File 'lib/daru_lite/dataframe.rb', line 50 def rows(source, opts = {}) raise SizeError, 'All vectors must have same length' \ unless source.all? { |v| v.size == source.first.size } opts[:order] ||= guess_order(source) if ArrayHelper.array_of?(source, Array) || source.empty? DataFrame.new(source.transpose, opts) elsif ArrayHelper.array_of?(source, Vector) from_vector_rows(source, opts) else raise ArgumentError, "Can't create DataFrame from #{source}" end end |
Instance Method Details
#==(other) ⇒ Object
467 468 469 470 471 472 473 |
# File 'lib/daru_lite/dataframe.rb', line 467 def ==(other) self.class == other.class && @size == other.size && @index == other.index && @vectors == other.vectors && @vectors.to_a.all? { |v| self[v] == other[v] } end |
#add_level_to_vectors(top_level_label) ⇒ Object
Converts the vectors to a DaruLite::MultiIndex. The argument passed is used as the MultiIndex’s top level
408 409 410 411 |
# File 'lib/daru_lite/dataframe.rb', line 408 def add_level_to_vectors(top_level_label) tuples = vectors.map { |label| [top_level_label, *label] } self.vectors = DaruLite::MultiIndex.from_tuples(tuples) end |
#add_vectors_by_split(name, join = '-', sep = DaruLite::SPLIT_TOKEN) ⇒ Object
345 346 347 348 349 |
# File 'lib/daru_lite/dataframe.rb', line 345 def add_vectors_by_split(name, join = '-', sep = DaruLite::SPLIT_TOKEN) self[name] .split_by_separator(sep) .each { |k, v| self[:"#{name}#{join}#{k}"] = v } end |
#add_vectors_by_split_recode(nm, join = '-', sep = DaruLite::SPLIT_TOKEN) ⇒ Object
413 414 415 416 417 418 419 420 |
# File 'lib/daru_lite/dataframe.rb', line 413 def add_vectors_by_split_recode(nm, join = '-', sep = DaruLite::SPLIT_TOKEN) self[nm] .split_by_separator(sep) .each_with_index do |(k, v), i| v.rename "#{nm}:#{k}" self[:"#{nm}#{join}#{i + 1}"] = v end end |
#bootstrap(n = nil) ⇒ DaruLite::DataFrame
Creates a DataFrame with the random data, of n size. If n not given, uses original number of rows.
313 314 315 316 317 318 319 320 321 |
# File 'lib/daru_lite/dataframe.rb', line 313 def bootstrap(n = nil) n ||= nrows DaruLite::DataFrame.new({}, order: @vectors).tap do |df_boot| n.times do df_boot.add_row(row[rand(n)]) end df_boot.update end end |
#delete_at_position(position) ⇒ Object
Delete a row based on its position More robust than #delete_row when working with a CategoricalIndex or when the Index includes integers
300 301 302 303 304 305 306 307 |
# File 'lib/daru_lite/dataframe.rb', line 300 def delete_at_position(position) raise IndexError, "Position #{position} does not exist." unless position < size @index = @index.delete_at(position) each_vector { |vector| vector.delete_at_position(position) } set_size end |
#delete_row(index) ⇒ Object
Delete a row
284 285 286 287 288 289 290 291 292 293 294 295 |
# File 'lib/daru_lite/dataframe.rb', line 284 def delete_row(index) idx = named_index_for index raise IndexError, "Index #{index} does not exist." unless @index.include? idx @index = DaruLite::Index.new(@index.to_a - [idx]) each_vector do |vector| vector.delete_at idx end set_size end |
#delete_vector(vector) ⇒ Object
Delete a vector
267 268 269 270 271 272 273 274 |
# File 'lib/daru_lite/dataframe.rb', line 267 def delete_vector(vector) raise IndexError, "Vector #{vector} does not exist." unless @vectors.include?(vector) @data.delete_at @vectors[vector] @vectors = DaruLite::Index.new @vectors.to_a - [vector] self end |
#delete_vectors(*vectors) ⇒ Object
Deletes a list of vectors
277 278 279 280 281 |
# File 'lib/daru_lite/dataframe.rb', line 277 def delete_vectors(*vectors) Array(vectors).each { |vec| delete_vector vec } self end |
#inspect(spacing = DaruLite.spacing, threshold = DaruLite.max_rows) ⇒ Object
Pretty print in a nice table format for the command line (irb/pry/iruby)
450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 |
# File 'lib/daru_lite/dataframe.rb', line 450 def inspect(spacing = DaruLite.spacing, threshold = DaruLite.max_rows) name_part = @name ? ": #{@name} " : '' spacing = [ headers.to_a.map { |header| header.try(:length) || header.to_s.length }.max, spacing ].max "#<#{self.class}#{name_part}(#{nrows}x#{ncols})>#{$INPUT_RECORD_SEPARATOR}" + Formatters::Table.format( each_row.lazy, row_headers: row_headers, headers: headers, threshold: threshold, spacing: spacing ) end |
#interact_code(vector_names, full) ⇒ Object
512 513 514 515 516 517 518 519 520 |
# File 'lib/daru_lite/dataframe.rb', line 512 def interact_code(vector_names, full) dfs = vector_names.zip(full).map do |vec_name, f| self[vec_name].contrast_code(full: f).each.to_a end all_vectors = recursive_product(dfs) DaruLite::DataFrame.new all_vectors, order: all_vectors.map(&:name) end |
#ncols ⇒ Object
The number of vectors
362 363 364 |
# File 'lib/daru_lite/dataframe.rb', line 362 def ncols @vectors.size end |
#nest(*tree_keys, &block) ⇒ Object
Return a nested hash using vector names as keys and an array constructed of hashes with other values. If block provided, is used to provide the values, with parameters row of dataset, current last hash on hierarchy and name of the key to include
327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 |
# File 'lib/daru_lite/dataframe.rb', line 327 def nest(*tree_keys, &block) tree_keys = tree_keys[0] if tree_keys[0].is_a? Array each_row.with_object({}) do |row, current| # Create tree *keys, last = tree_keys current = keys.inject(current) { |c, f| c[row[f]] ||= {} } name = row[last] if block current[name] = yield(row, current, name) else current[name] ||= [] current[name].push(row.to_h.delete_if { |key, _value| tree_keys.include? key }) end end end |
#nrows ⇒ Object
The number of rows
357 358 359 |
# File 'lib/daru_lite/dataframe.rb', line 357 def nrows @index.size end |
#rename(new_name) ⇒ Object Also known as: name=
Rename the DataFrame.
432 433 434 435 |
# File 'lib/daru_lite/dataframe.rb', line 432 def rename(new_name) @name = new_name self end |
#rename_vectors(name_map) ⇒ Object
Renames the vectors
Arguments
-
name_map - A hash where the keys are the exising vector names and
the values are the new names. If a vector is renamed to a vector name that is already in use, the existing one is overwritten.
Usage
df = DaruLite::DataFrame.new({ a: [1,2,3,4], b: [:a,:b,:c,:d], c: [11,22,33,44] })
df.rename_vectors :a => :alpha, :c => :gamma
df.vectors.to_a #=> [:alpha, :b, :gamma]
380 381 382 383 384 385 386 |
# File 'lib/daru_lite/dataframe.rb', line 380 def rename_vectors(name_map) existing_targets = name_map.reject { |k, v| k == v }.values & vectors.to_a delete_vectors(*existing_targets) new_names = vectors.to_a.map { |v| name_map[v] || v } self.vectors = DaruLite::Index.new new_names end |
#rename_vectors!(name_map) ⇒ Object
Renames the vectors and returns itself
Arguments
-
name_map - A hash where the keys are the exising vector names and
the values are the new names. If a vector is renamed to a vector name that is already in use, the existing one is overwritten.
Usage
df = DaruLite::DataFrame.new({ a: [1,2,3,4], b: [:a,:b,:c,:d], c: [11,22,33,44] })
df.rename_vectors! :a => :alpha, :c => :gamma # df
401 402 403 404 |
# File 'lib/daru_lite/dataframe.rb', line 401 def rename_vectors!(name_map) rename_vectors(name_map) self end |
#respond_to_missing?(name, include_private = false) ⇒ Boolean
508 509 510 |
# File 'lib/daru_lite/dataframe.rb', line 508 def respond_to_missing?(name, include_private = false) name.to_s.end_with?('=') || has_vector?(name) || super end |
#row ⇒ Object
Access a row or set/create a row. Refer #[] and #[]= docs for details.
Usage
df.row[:a] # access row named ':a'
df.row[:b] = [1,2,3] # set row ':b' to [1,2,3]
262 263 264 |
# File 'lib/daru_lite/dataframe.rb', line 262 def row DaruLite::Accessors::DataFrameByRow.new(self) end |
#shape ⇒ Object
Return the number of rows and columns of the DataFrame in an Array.
352 353 354 |
# File 'lib/daru_lite/dataframe.rb', line 352 def shape [nrows, ncols] end |
#to_category(*names) ⇒ DaruLite::DataFrame
Converts the specified non category type vectors to category type vectors
487 488 489 490 |
# File 'lib/daru_lite/dataframe.rb', line 487 def to_category(*names) names.each { |n| self[n] = self[n].to_category } self end |
#transpose ⇒ Object
Transpose a DataFrame, tranposing elements and row, column indexing.
439 440 441 442 443 444 445 446 447 |
# File 'lib/daru_lite/dataframe.rb', line 439 def transpose DaruLite::DataFrame.new( each_vector.map(&:to_a).transpose, index: @vectors, order: @index, dtype: @dtype, name: @name ) end |
#update ⇒ Object
Method for updating the metadata (i.e. missing value positions) of the after assingment/deletion etc. are complete. This is provided so that time is not wasted in creating the metadata for the vector each time assignment/deletion of elements is done. Updating data this way is called lazy loading. To set or unset lazy loading, see the .lazy_update= method.
427 428 429 |
# File 'lib/daru_lite/dataframe.rb', line 427 def update @data.each(&:update) if DaruLite.lazy_update end |
#which ⇒ Object
15 16 17 |
# File 'lib/daru_lite/extensions/which_dsl.rb', line 15 def which(&) WhichQuery.new(self, &).exec end |