Class: SparkConnect::DataFrameNaFunctions

Inherits:

Object

Object
SparkConnect::DataFrameNaFunctions

show all

Defined in:: lib/spark_connect/na_functions.rb

Overview

Missing-data helpers, returned by SparkConnect::DataFrame#na. Mirrors PySpark’s ‘DataFrame.na` (`DataFrameNaFunctions`).

Examples:

df.na.drop(how: :any)
df.na.fill(0)
df.na.fill({ "name" => "unknown", "age" => 0 })
df.na.replace("UNKNOWN", nil, subset: ["name"])

Constant Summary collapse

Proto =

SparkConnect::Proto

Instance Method Summary collapse

#drop(how: :any, thresh: nil, subset: nil) ⇒ DataFrame

Drop rows containing null values.
#fill(value, subset: nil) ⇒ DataFrame

Replace null values.
#initialize(df) ⇒ DataFrameNaFunctions constructor

A new instance of DataFrameNaFunctions.
#replace(to_replace, value = nil, subset: nil) ⇒ DataFrame

Replace specific values with others.

Constructor Details

#initialize(df) ⇒ `DataFrameNaFunctions`

Returns a new instance of DataFrameNaFunctions.

Parameters:

df (DataFrame)



16
17
18

# File 'lib/spark_connect/na_functions.rb', line 16

def initialize(df)
  @df = df
end

Instance Method Details

#drop(how: :any, thresh: nil, subset: nil) ⇒ `DataFrame`

Drop rows containing null values.

Parameters:

how (Symbol) (defaults to: :any) —

‘:any` (drop if any field is null) or `:all`.
thresh (Integer, nil) (defaults to: nil) —

keep rows with at least this many non-null values (overrides ‘how` when given).
subset (Array<String>, nil) (defaults to: nil) —

only consider these columns.

Returns:

(DataFrame)

# File 'lib/spark_connect/na_functions.rb', line 27

def drop(how: :any, thresh: nil, subset: nil)
  cols = Array(subset).map(&:to_s)
  min_non_nulls = thresh || (if how.to_sym == :all
                               1
                             else
                               (cols.empty? ? nil : cols.size)
                             end)
  nd = Proto::NADrop.new(input: @df.relation, cols: cols)
  nd.min_non_nulls = min_non_nulls if min_non_nulls
  @df.build(drop_na: nd)
end

#fill(value, subset: nil) ⇒ `DataFrame` #fill(value_map) ⇒ `DataFrame`

Replace null values.

Overloads:

#fill(value, subset: nil) ⇒ DataFrame
Parameters:
- value (Object) —
  
  a scalar used to fill all (or ‘subset`) columns.
#fill(value_map) ⇒ DataFrame
Parameters:
- value_map (Hash{String=>Object}) —
  
  per-column fill values.

Returns:

(DataFrame)

# File 'lib/spark_connect/na_functions.rb', line 46

def fill(value, subset: nil)
  cols, values =
    if value.is_a?(Hash)
      [value.keys.map(&:to_s), value.values]
    else
      [Array(subset).map(&:to_s), Array(subset).empty? ? [value] : Array(subset).map { value }]
    end
  nf = Proto::NAFill.new(
    input: @df.relation, cols: cols, values: values.map { |v| na_literal(v) }
  )
  @df.build(fill_na: nf)
end

#replace(to_replace, value = nil, subset: nil) ⇒ `DataFrame`

Replace specific values with others.

Parameters:

to_replace (Object, Array, Hash) —

value(s) to replace, or a ‘=> new` mapping.
value (Object, Array, nil) (defaults to: nil) —

replacement value(s) when ‘to_replace` is not a Hash.
subset (Array<String>, nil) (defaults to: nil)

Returns:

(DataFrame)

# File 'lib/spark_connect/na_functions.rb', line 67

def replace(to_replace, value = nil, subset: nil)
  mapping =
    if to_replace.is_a?(Hash)
      to_replace
    else
      Array(to_replace).zip(Array(value)).to_h
    end
  replacements = mapping.map do |old, new_value|
    Proto::NAReplace::Replacement.new(
      old_value: na_literal(old), new_value: na_literal(new_value)
    )
  end
  nr = Proto::NAReplace.new(
    input: @df.relation, cols: Array(subset).map(&:to_s), replacements: replacements
  )
  @df.build(replace: nr)
end

Class: SparkConnect::DataFrameNaFunctions

Overview

Examples:

Constant Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(df) ⇒ DataFrameNaFunctions

Instance Method Details

#drop(how: :any, thresh: nil, subset: nil) ⇒ DataFrame

#fill(value, subset: nil) ⇒ DataFrame #fill(value_map) ⇒ DataFrame

#replace(to_replace, value = nil, subset: nil) ⇒ DataFrame

#initialize(df) ⇒ `DataFrameNaFunctions`

#drop(how: :any, thresh: nil, subset: nil) ⇒ `DataFrame`

#fill(value, subset: nil) ⇒ `DataFrame` #fill(value_map) ⇒ `DataFrame`

#replace(to_replace, value = nil, subset: nil) ⇒ `DataFrame`