Module: SparkConnect::Functions
Overview
The standard Spark SQL function library, mirroring PySpark’s ‘pyspark.sql.functions`. Every function returns a Column.
Available both as ‘SparkConnect::Functions` and the shorthand `SparkConnect::F`. All methods are module functions.
Following PySpark’s convention, a String argument denotes a **column name** for most functions (e.g. ‘F.sum(“salary”)` aggregates the `salary` column), while functions whose parameters are genuinely literal (regex patterns, date formats, JSON paths, …) treat their String arguments as literal values.
Constant Summary collapse
- Proto =
SparkConnect::Proto
- UNIFORM =
The following functions are generated programmatically below (‘UNIFORM` and `NO_ARG`). The `@!method` directives document them so they appear in the API reference; each returns a Column.
—- Generated uniform functions ————————————– Functions whose arguments are all ColumnOrName (a String denotes a column name). Defined programmatically to keep the surface complete and compact.
%w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze
- NO_ARG =
No-argument functions.
%w[ current_date current_timestamp now current_timezone current_user current_catalog current_database current_schema monotonically_increasing_id spark_partition_id input_file_name input_file_block_start input_file_block_length version uuid row_number rank dense_rank percent_rank cume_dist ].freeze
Class Attribute Summary collapse
- .lambda_counter ⇒ Object private
Instance Method Summary collapse
-
#_col(value) ⇒ Object
private
ColumnOrName coercion: String/Symbol -> column reference, Column -> itself, everything else -> literal.
-
#_lambda(block) ⇒ Object
private
Build a Column wrapping a LambdaFunction from a Ruby block.
- #_lit_or_col(value) ⇒ Object private
-
#abs(*cols) ⇒ Column
The Spark SQL ‘abs` function.
-
#acos(*cols) ⇒ Column
The Spark SQL ‘acos` function.
-
#acosh(*cols) ⇒ Column
The Spark SQL ‘acosh` function.
- #add_months(col, months) ⇒ Object
-
#aggregate(col, initial, merge, finish = nil) ⇒ Column
Aggregate (fold) an array.
-
#any_value(*cols) ⇒ Column
The Spark SQL ‘any_value` function.
-
#approx_count_distinct(col, rsd = nil) ⇒ Column
Approximate distinct count (optionally with relative SD).
-
#array(*cols) ⇒ Column
An array from the given columns.
- #array_append(col, value) ⇒ Object
-
#array_compact(*cols) ⇒ Column
The Spark SQL ‘array_compact` function.
-
#array_contains(col, value) ⇒ Object
—- Array / map functions with value arguments ———————–.
-
#array_distinct(*cols) ⇒ Column
The Spark SQL ‘array_distinct` function.
-
#array_except(*cols) ⇒ Column
The Spark SQL ‘array_except` function.
- #array_insert(col, pos, value) ⇒ Object
-
#array_intersect(*cols) ⇒ Column
The Spark SQL ‘array_intersect` function.
- #array_join(col, delimiter, null_replacement = nil) ⇒ Object
-
#array_max(*cols) ⇒ Column
The Spark SQL ‘array_max` function.
-
#array_min(*cols) ⇒ Column
The Spark SQL ‘array_min` function.
- #array_position(col, value) ⇒ Object
- #array_prepend(col, value) ⇒ Object
- #array_remove(col, element) ⇒ Object
- #array_repeat(col, count) ⇒ Object
-
#array_sort(*cols) ⇒ Column
The Spark SQL ‘array_sort` function.
-
#array_union(*cols) ⇒ Column
The Spark SQL ‘array_union` function.
-
#arrays_overlap(*cols) ⇒ Column
The Spark SQL ‘arrays_overlap` function.
-
#arrays_zip(*cols) ⇒ Column
The Spark SQL ‘arrays_zip` function.
-
#asc(col) ⇒ Column
An ascending sort order for the named/given column.
- #asc_nulls_first(col) ⇒ Object
- #asc_nulls_last(col) ⇒ Object
-
#ascii(*cols) ⇒ Column
The Spark SQL ‘ascii` function.
-
#asin(*cols) ⇒ Column
The Spark SQL ‘asin` function.
-
#asinh(*cols) ⇒ Column
The Spark SQL ‘asinh` function.
-
#atan(*cols) ⇒ Column
The Spark SQL ‘atan` function.
-
#atan2(*cols) ⇒ Column
The Spark SQL ‘atan2` function.
-
#atanh(*cols) ⇒ Column
The Spark SQL ‘atanh` function.
-
#avg(*cols) ⇒ Column
The Spark SQL ‘avg` function.
-
#base64(*cols) ⇒ Column
The Spark SQL ‘base64` function.
-
#bin(*cols) ⇒ Column
The Spark SQL ‘bin` function.
-
#bit_and(*cols) ⇒ Column
The Spark SQL ‘bit_and` function.
-
#bit_count(*cols) ⇒ Column
The Spark SQL ‘bit_count` function.
-
#bit_length(*cols) ⇒ Column
The Spark SQL ‘bit_length` function.
-
#bit_or(*cols) ⇒ Column
The Spark SQL ‘bit_or` function.
-
#bit_xor(*cols) ⇒ Column
The Spark SQL ‘bit_xor` function.
-
#bitwise_not(*cols) ⇒ Column
The Spark SQL ‘bitwise_not` function.
-
#bool_and(*cols) ⇒ Column
The Spark SQL ‘bool_and` function.
-
#bool_or(*cols) ⇒ Column
The Spark SQL ‘bool_or` function.
-
#broadcast(df) ⇒ DataFrame
Mark a DataFrame for broadcast (map-side) join.
-
#bround(col, scale = 0) ⇒ Column
HALF_EVEN (“banker’s”) rounding to ‘scale` places.
-
#cardinality(*cols) ⇒ Column
The Spark SQL ‘cardinality` function.
-
#cbrt(*cols) ⇒ Column
The Spark SQL ‘cbrt` function.
-
#ceil(*cols) ⇒ Column
The Spark SQL ‘ceil` function.
-
#ceiling(*cols) ⇒ Column
The Spark SQL ‘ceiling` function.
-
#char_length(*cols) ⇒ Column
The Spark SQL ‘char_length` function.
-
#character_length(*cols) ⇒ Column
The Spark SQL ‘character_length` function.
-
#coalesce(*cols) ⇒ Column
First non-null among the given columns.
-
#col(name) ⇒ Column
(also: #column)
A column reference by name.
-
#collect_list(*cols) ⇒ Column
The Spark SQL ‘collect_list` function.
-
#collect_set(*cols) ⇒ Column
The Spark SQL ‘collect_set` function.
-
#concat(*cols) ⇒ Column
The Spark SQL ‘concat` function.
-
#concat_ws(sep, *cols) ⇒ Column
Concatenation of columns separated by literal ‘sep`.
-
#conv(col, from_base, to_base) ⇒ Column
Convert a number string from ‘from_base` to `to_base`.
-
#corr(*cols) ⇒ Column
The Spark SQL ‘corr` function.
-
#cos(*cols) ⇒ Column
The Spark SQL ‘cos` function.
-
#cosh(*cols) ⇒ Column
The Spark SQL ‘cosh` function.
-
#cot(*cols) ⇒ Column
The Spark SQL ‘cot` function.
-
#count(col) ⇒ Column
Count of rows (or non-null values of a column).
-
#count_distinct(*cols) ⇒ Column
(also: #countDistinct)
Count of distinct combinations of the given columns.
-
#count_if(*cols) ⇒ Column
The Spark SQL ‘count_if` function.
-
#covar_pop(*cols) ⇒ Column
The Spark SQL ‘covar_pop` function.
-
#covar_samp(*cols) ⇒ Column
The Spark SQL ‘covar_samp` function.
-
#crc32(*cols) ⇒ Column
The Spark SQL ‘crc32` function.
-
#create_map(*cols) ⇒ Column
A map from alternating key/value columns.
-
#csc(*cols) ⇒ Column
The Spark SQL ‘csc` function.
-
#cume_dist ⇒ Column
The Spark SQL ‘cume_dist` function (takes no arguments).
-
#current_catalog ⇒ Column
The Spark SQL ‘current_catalog` function (takes no arguments).
-
#current_database ⇒ Column
The Spark SQL ‘current_database` function (takes no arguments).
-
#current_date ⇒ Column
The Spark SQL ‘current_date` function (takes no arguments).
-
#current_schema ⇒ Column
The Spark SQL ‘current_schema` function (takes no arguments).
-
#current_timestamp ⇒ Column
The Spark SQL ‘current_timestamp` function (takes no arguments).
-
#current_timezone ⇒ Column
The Spark SQL ‘current_timezone` function (takes no arguments).
-
#current_user ⇒ Column
The Spark SQL ‘current_user` function (takes no arguments).
- #date_add(col, days) ⇒ Object
-
#date_format(col, fmt) ⇒ Object
—- Date / time functions with literal arguments ———————.
-
#date_from_unix_date(*cols) ⇒ Column
The Spark SQL ‘date_from_unix_date` function.
- #date_sub(col, days) ⇒ Object
- #date_trunc(fmt, col) ⇒ Object
- #datediff(end_col, start_col) ⇒ Object
-
#day(*cols) ⇒ Column
The Spark SQL ‘day` function.
-
#dayofmonth(*cols) ⇒ Column
The Spark SQL ‘dayofmonth` function.
-
#dayofweek(*cols) ⇒ Column
The Spark SQL ‘dayofweek` function.
-
#dayofyear(*cols) ⇒ Column
The Spark SQL ‘dayofyear` function.
-
#degrees(*cols) ⇒ Column
The Spark SQL ‘degrees` function.
-
#dense_rank ⇒ Column
The Spark SQL ‘dense_rank` function (takes no arguments).
- #desc(col) ⇒ Object
- #desc_nulls_first(col) ⇒ Object
- #desc_nulls_last(col) ⇒ Object
- #element_at(col, extraction) ⇒ Object
-
#every(*cols) ⇒ Column
The Spark SQL ‘every` function.
- #exists(col, &block) ⇒ Object
-
#exp(*cols) ⇒ Column
The Spark SQL ‘exp` function.
-
#explode(*cols) ⇒ Column
The Spark SQL ‘explode` function.
-
#explode_outer(*cols) ⇒ Column
The Spark SQL ‘explode_outer` function.
-
#expm1(*cols) ⇒ Column
The Spark SQL ‘expm1` function.
-
#expr(sql) ⇒ Column
Parse a SQL expression string into a Column.
-
#factorial(*cols) ⇒ Column
The Spark SQL ‘factorial` function.
- #filter(col, &block) ⇒ Object
-
#first(*cols) ⇒ Column
The Spark SQL ‘first` function.
-
#first_value(*cols) ⇒ Column
The Spark SQL ‘first_value` function.
-
#flatten(*cols) ⇒ Column
The Spark SQL ‘flatten` function.
-
#floor(*cols) ⇒ Column
The Spark SQL ‘floor` function.
- #forall(col, &block) ⇒ Object
-
#format_number(col, d) ⇒ Column
Number formatted to ‘d` decimal places.
-
#format_string(fmt, *cols) ⇒ Column
Printf-style formatting using literal ‘fmt`.
- #from_json(col, schema, options = {}) ⇒ Object
- #from_unixtime(col, fmt = "yyyy-MM-dd HH:mm:ss") ⇒ Object
- #from_utc_timestamp(col, tz) ⇒ Object
-
#get_json_object(col, path) ⇒ Object
—- JSON / CSV ——————————————————–.
-
#greatest(*cols) ⇒ Column
The Spark SQL ‘greatest` function.
-
#grouping(*cols) ⇒ Column
The Spark SQL ‘grouping` function.
-
#hash(*cols) ⇒ Column
The Spark SQL ‘hash` function.
-
#hex(*cols) ⇒ Column
The Spark SQL ‘hex` function.
-
#hour(*cols) ⇒ Column
The Spark SQL ‘hour` function.
-
#hypot(*cols) ⇒ Column
The Spark SQL ‘hypot` function.
-
#initcap(*cols) ⇒ Column
The Spark SQL ‘initcap` function.
-
#inline(*cols) ⇒ Column
The Spark SQL ‘inline` function.
-
#inline_outer(*cols) ⇒ Column
The Spark SQL ‘inline_outer` function.
-
#input_file_block_length ⇒ Column
The Spark SQL ‘input_file_block_length` function (takes no arguments).
-
#input_file_block_start ⇒ Column
The Spark SQL ‘input_file_block_start` function (takes no arguments).
-
#input_file_name ⇒ Column
The Spark SQL ‘input_file_name` function (takes no arguments).
-
#instr(col, substr) ⇒ Column
1-based position of literal ‘substr` within `col` (0 if absent).
-
#isnan(*cols) ⇒ Column
The Spark SQL ‘isnan` function.
-
#isnull(*cols) ⇒ Column
The Spark SQL ‘isnull` function.
- #json_tuple(col, *fields) ⇒ Object
-
#kurtosis(*cols) ⇒ Column
The Spark SQL ‘kurtosis` function.
-
#lag(col, offset = 1, default = nil) ⇒ Object
—- Window / analytic functions ————————————–.
-
#last(*cols) ⇒ Column
The Spark SQL ‘last` function.
-
#last_day(*cols) ⇒ Column
The Spark SQL ‘last_day` function.
-
#last_value(*cols) ⇒ Column
The Spark SQL ‘last_value` function.
-
#lcase(*cols) ⇒ Column
The Spark SQL ‘lcase` function.
- #lead(col, offset = 1, default = nil) ⇒ Object
-
#least(*cols) ⇒ Column
The Spark SQL ‘least` function.
-
#length(*cols) ⇒ Column
The Spark SQL ‘length` function.
-
#lit(value) ⇒ Column
A literal value column.
-
#ln(*cols) ⇒ Column
The Spark SQL ‘ln` function.
-
#locate(substr, col, pos = 1) ⇒ Column
1-based position of ‘substr` in `col` at/after `pos`.
-
#log(*cols) ⇒ Column
The Spark SQL ‘log` function.
-
#log10(*cols) ⇒ Column
The Spark SQL ‘log10` function.
-
#log1p(*cols) ⇒ Column
The Spark SQL ‘log1p` function.
-
#log2(*cols) ⇒ Column
The Spark SQL ‘log2` function.
-
#lower(*cols) ⇒ Column
The Spark SQL ‘lower` function.
-
#lpad(col, len, pad) ⇒ Column
Left-padded string.
-
#ltrim(*cols) ⇒ Column
The Spark SQL ‘ltrim` function.
- #make_date(year, month, day) ⇒ Object
-
#map_concat(*cols) ⇒ Column
The Spark SQL ‘map_concat` function.
- #map_contains_key(col, key) ⇒ Object
-
#map_entries(*cols) ⇒ Column
The Spark SQL ‘map_entries` function.
- #map_filter(col, &block) ⇒ Object
-
#map_from_arrays(keys, values) ⇒ Column
A map from two array columns (keys, values).
-
#map_from_entries(*cols) ⇒ Column
The Spark SQL ‘map_from_entries` function.
-
#map_keys(*cols) ⇒ Column
The Spark SQL ‘map_keys` function.
-
#map_values(*cols) ⇒ Column
The Spark SQL ‘map_values` function.
- #map_zip_with(c1, c2, &block) ⇒ Object
-
#max(*cols) ⇒ Column
The Spark SQL ‘max` function.
-
#max_by(*cols) ⇒ Column
The Spark SQL ‘max_by` function.
-
#md5(*cols) ⇒ Column
The Spark SQL ‘md5` function.
-
#mean(*cols) ⇒ Column
The Spark SQL ‘mean` function.
-
#median(*cols) ⇒ Column
The Spark SQL ‘median` function.
-
#min(*cols) ⇒ Column
The Spark SQL ‘min` function.
-
#min_by(*cols) ⇒ Column
The Spark SQL ‘min_by` function.
-
#minute(*cols) ⇒ Column
The Spark SQL ‘minute` function.
-
#mode(*cols) ⇒ Column
The Spark SQL ‘mode` function.
-
#monotonically_increasing_id ⇒ Column
The Spark SQL ‘monotonically_increasing_id` function (takes no arguments).
-
#month(*cols) ⇒ Column
The Spark SQL ‘month` function.
- #months_between(d1, d2, round_off = true) ⇒ Object
-
#named_struct(*cols) ⇒ Column
A named struct from alternating name/value arguments.
-
#nanvl(col1, col2) ⇒ Column
‘value` if `col` is NaN else `col`.
-
#negate(*cols) ⇒ Column
The Spark SQL ‘negate` function.
-
#negative(*cols) ⇒ Column
The Spark SQL ‘negative` function.
- #next_day(col, day_of_week) ⇒ Object
-
#now ⇒ Column
The Spark SQL ‘now` function (takes no arguments).
- #nth_value(col, offset, ignore_nulls = false) ⇒ Object
- #ntile(n) ⇒ Object
-
#octet_length(*cols) ⇒ Column
The Spark SQL ‘octet_length` function.
-
#overlay(col, replace, pos, len = -1)) ⇒ Column
Overlay ‘replace` into `col` at `pos` for `len` chars.
-
#percent_rank ⇒ Column
The Spark SQL ‘percent_rank` function (takes no arguments).
-
#pmod(*cols) ⇒ Column
The Spark SQL ‘pmod` function.
-
#posexplode(*cols) ⇒ Column
The Spark SQL ‘posexplode` function.
-
#posexplode_outer(*cols) ⇒ Column
The Spark SQL ‘posexplode_outer` function.
-
#positive(*cols) ⇒ Column
The Spark SQL ‘positive` function.
-
#pow(*cols) ⇒ Column
The Spark SQL ‘pow` function.
-
#power(*cols) ⇒ Column
The Spark SQL ‘power` function.
-
#product(*cols) ⇒ Column
The Spark SQL ‘product` function.
-
#quarter(*cols) ⇒ Column
The Spark SQL ‘quarter` function.
-
#radians(*cols) ⇒ Column
The Spark SQL ‘radians` function.
-
#rand(seed = nil) ⇒ Object
—- Randomness ——————————————————–.
- #randn(seed = nil) ⇒ Object
-
#rank ⇒ Column
The Spark SQL ‘rank` function (takes no arguments).
- #regexp_count(col, pattern) ⇒ Object
-
#regexp_extract(col, pattern, idx = 0) ⇒ Column
The ‘idx`-th group of `pattern` matched in `col`.
-
#regexp_extract_all(col, pattern, idx = 1) ⇒ Column
All matches of group ‘idx` of `pattern`.
-
#regexp_like(col, pattern) ⇒ Column
Whether ‘col` matches `pattern`.
-
#regexp_replace(col, pattern, replacement) ⇒ Column
‘col` with `pattern` replaced by `replacement`.
- #regexp_substr(col, pattern) ⇒ Object
-
#repeat(col, n) ⇒ Column
The string repeated ‘n` times.
-
#reverse(*cols) ⇒ Column
The Spark SQL ‘reverse` function.
-
#rint(*cols) ⇒ Column
The Spark SQL ‘rint` function.
-
#round(col, scale = 0) ⇒ Column
HALF_UP rounding to ‘scale` decimal places.
-
#row_number ⇒ Column
The Spark SQL ‘row_number` function (takes no arguments).
-
#rpad(col, len, pad) ⇒ Column
Right-padded string.
-
#rtrim(*cols) ⇒ Column
The Spark SQL ‘rtrim` function.
- #schema_of_json(json, options = {}) ⇒ Object
-
#sec(*cols) ⇒ Column
The Spark SQL ‘sec` function.
-
#second(*cols) ⇒ Column
The Spark SQL ‘second` function.
- #sequence(start, stop, step = nil) ⇒ Object
-
#sha(*cols) ⇒ Column
The Spark SQL ‘sha` function.
-
#sha1(*cols) ⇒ Column
The Spark SQL ‘sha1` function.
-
#sha2(col, num_bits) ⇒ Column
SHA-2 hash with the given bit length (224/256/384/512).
-
#shiftleft(col, num_bits) ⇒ Column
Left shift / right shift by literal bit counts.
- #shiftright(col, num_bits) ⇒ Object
- #shiftrightunsigned(col, num_bits) ⇒ Object
-
#shuffle(*cols) ⇒ Column
The Spark SQL ‘shuffle` function.
-
#signum(*cols) ⇒ Column
The Spark SQL ‘signum` function.
-
#sin(*cols) ⇒ Column
The Spark SQL ‘sin` function.
-
#sinh(*cols) ⇒ Column
The Spark SQL ‘sinh` function.
-
#size(*cols) ⇒ Column
The Spark SQL ‘size` function.
-
#skewness(*cols) ⇒ Column
The Spark SQL ‘skewness` function.
- #slice(col, start, length) ⇒ Object
-
#some(*cols) ⇒ Column
The Spark SQL ‘some` function.
-
#sort_array(col, asc = true) ⇒ Object
—- Sorting helpers —————————————————.
-
#soundex(*cols) ⇒ Column
The Spark SQL ‘soundex` function.
-
#spark_partition_id ⇒ Column
The Spark SQL ‘spark_partition_id` function (takes no arguments).
-
#split(col, pattern, limit = -1)) ⇒ Column
Split ‘col` by the literal regex `pattern`.
-
#sqrt(*cols) ⇒ Column
The Spark SQL ‘sqrt` function.
-
#stddev(*cols) ⇒ Column
The Spark SQL ‘stddev` function.
-
#stddev_pop(*cols) ⇒ Column
The Spark SQL ‘stddev_pop` function.
-
#stddev_samp(*cols) ⇒ Column
The Spark SQL ‘stddev_samp` function.
-
#struct(*cols) ⇒ Column
A struct from the given columns.
-
#substring(col, pos, len) ⇒ Column
Substring of length ‘len` from 1-based `pos`.
-
#substring_index(col, delim, count) ⇒ Column
Substring before the ‘count`-th occurrence of `delim`.
-
#sum(*cols) ⇒ Column
The Spark SQL ‘sum` function.
-
#sum_distinct(col) ⇒ Column
Sum of distinct values.
-
#tan(*cols) ⇒ Column
The Spark SQL ‘tan` function.
-
#tanh(*cols) ⇒ Column
The Spark SQL ‘tanh` function.
-
#timestamp_micros(*cols) ⇒ Column
The Spark SQL ‘timestamp_micros` function.
-
#timestamp_millis(*cols) ⇒ Column
The Spark SQL ‘timestamp_millis` function.
-
#timestamp_seconds(*cols) ⇒ Column
The Spark SQL ‘timestamp_seconds` function.
- #to_date(col, fmt = nil) ⇒ Object
- #to_json(col, options = {}) ⇒ Object
- #to_timestamp(col, fmt = nil) ⇒ Object
- #to_utc_timestamp(col, tz) ⇒ Object
-
#transform(col) {|element| ... } ⇒ Column
Transform each element of an array.
- #transform_keys(col, &block) ⇒ Object
- #transform_values(col, &block) ⇒ Object
-
#translate(col, matching, replace) ⇒ Column
Characters of ‘col` matching `matching` replaced per `replace`.
-
#trim(*cols) ⇒ Column
The Spark SQL ‘trim` function.
- #trunc(col, fmt) ⇒ Object
-
#typeof(*cols) ⇒ Column
The Spark SQL ‘typeof` function.
-
#ucase(*cols) ⇒ Column
The Spark SQL ‘ucase` function.
-
#udf ⇒ Object
UDFs require a server-side execution environment (Python/Scala) and are not supported by the pure-Ruby client.
-
#unbase64(*cols) ⇒ Column
The Spark SQL ‘unbase64` function.
-
#unhex(*cols) ⇒ Column
The Spark SQL ‘unhex` function.
-
#unix_date(*cols) ⇒ Column
The Spark SQL ‘unix_date` function.
-
#unix_micros(*cols) ⇒ Column
The Spark SQL ‘unix_micros` function.
-
#unix_millis(*cols) ⇒ Column
The Spark SQL ‘unix_millis` function.
-
#unix_seconds(*cols) ⇒ Column
The Spark SQL ‘unix_seconds` function.
- #unix_timestamp(col = nil, fmt = "yyyy-MM-dd HH:mm:ss") ⇒ Object
-
#upper(*cols) ⇒ Column
The Spark SQL ‘upper` function.
-
#uuid ⇒ Column
The Spark SQL ‘uuid` function (takes no arguments).
-
#var_pop(*cols) ⇒ Column
The Spark SQL ‘var_pop` function.
-
#var_samp(*cols) ⇒ Column
The Spark SQL ‘var_samp` function.
-
#variance(*cols) ⇒ Column
The Spark SQL ‘variance` function.
-
#version ⇒ Column
The Spark SQL ‘version` function (takes no arguments).
-
#weekday(*cols) ⇒ Column
The Spark SQL ‘weekday` function.
-
#weekofyear(*cols) ⇒ Column
The Spark SQL ‘weekofyear` function.
-
#when(condition, value) ⇒ Column
Start a CASE WHEN expression.
-
#xxhash64(*cols) ⇒ Column
The Spark SQL ‘xxhash64` function.
-
#year(*cols) ⇒ Column
The Spark SQL ‘year` function.
- #zip_with(left, right, &block) ⇒ Object
Class Attribute Details
.lambda_counter ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
880 881 882 |
# File 'lib/spark_connect/functions.rb', line 880 def lambda_counter @lambda_counter end |
Instance Method Details
#_col(value) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
ColumnOrName coercion: String/Symbol -> column reference, Column -> itself, everything else -> literal.
863 864 865 866 867 868 869 |
# File 'lib/spark_connect/functions.rb', line 863 def _col(value) case value when Column then value when String, Symbol then col(value.to_s) else lit(value) end end |
#_lambda(block) ⇒ Object
886 887 888 889 890 891 892 893 894 895 896 897 898 |
# File 'lib/spark_connect/functions.rb', line 886 def _lambda(block) arity = block.arity.negative? ? 1 : [block.arity, 1].max Functions.lambda_counter += 1 names = (0...arity).map { |i| "x_#{Functions.lambda_counter}_#{i}" } vars = names.map do |n| Proto::Expression::UnresolvedNamedLambdaVariable.new(name_parts: [n]) end cols = vars.map { |v| Column.new(Proto::Expression.new(unresolved_named_lambda_variable: v)) } body = block.call(*cols) Column.new(Proto::Expression.new( lambda_function: Proto::Expression::LambdaFunction.new(function: body.to_expr, arguments: vars) )) end |
#_lit_or_col(value) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
872 873 874 |
# File 'lib/spark_connect/functions.rb', line 872 def _lit_or_col(value) value.is_a?(Column) ? value : lit(value) end |
#abs(*cols) ⇒ Column
The Spark SQL ‘abs` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#acos(*cols) ⇒ Column
The Spark SQL ‘acos` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#acosh(*cols) ⇒ Column
The Spark SQL ‘acosh` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#add_months(col, months) ⇒ Object
159 |
# File 'lib/spark_connect/functions.rb', line 159 def add_months(col, months) = Column.invoke("add_months", _col(col), lit(months)) |
#aggregate(col, initial, merge, finish = nil) ⇒ Column
Aggregate (fold) an array. ‘merge` combines accumulator and element; optional `finish` post-processes the result.
258 259 260 261 262 |
# File 'lib/spark_connect/functions.rb', line 258 def aggregate(col, initial, merge, finish = nil) args = [_col(col), _col(initial), _lambda(merge)] args << _lambda(finish) if finish Column.invoke("aggregate", *args) end |
#any_value(*cols) ⇒ Column
The Spark SQL ‘any_value` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#approx_count_distinct(col, rsd = nil) ⇒ Column
Returns approximate distinct count (optionally with relative SD).
70 71 72 |
# File 'lib/spark_connect/functions.rb', line 70 def approx_count_distinct(col, rsd = nil) rsd.nil? ? Column.invoke("approx_count_distinct", _col(col)) : Column.invoke("approx_count_distinct", _col(col), lit(rsd)) end |
#array(*cols) ⇒ Column
Returns an array from the given columns.
96 97 |
# File 'lib/spark_connect/functions.rb', line 96 def array(*cols) = Column.invoke("array", *cols.map { |c| _col(c) }) # @return [Column] a map from alternating key/value columns. |
#array_append(col, value) ⇒ Object
201 |
# File 'lib/spark_connect/functions.rb', line 201 def array_append(col, value) = Column.invoke("array_append", _col(col), lit(value)) |
#array_compact(*cols) ⇒ Column
The Spark SQL ‘array_compact` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_contains(col, value) ⇒ Object
—- Array / map functions with value arguments ———————–
197 |
# File 'lib/spark_connect/functions.rb', line 197 def array_contains(col, value) = Column.invoke("array_contains", _col(col), lit(value)) |
#array_distinct(*cols) ⇒ Column
The Spark SQL ‘array_distinct` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_except(*cols) ⇒ Column
The Spark SQL ‘array_except` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_insert(col, pos, value) ⇒ Object
203 |
# File 'lib/spark_connect/functions.rb', line 203 def array_insert(col, pos, value) = Column.invoke("array_insert", _col(col), lit(pos), lit(value)) |
#array_intersect(*cols) ⇒ Column
The Spark SQL ‘array_intersect` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_join(col, delimiter, null_replacement = nil) ⇒ Object
205 206 207 208 209 210 211 212 |
# File 'lib/spark_connect/functions.rb', line 205 def array_join(col, delimiter, null_replacement = nil) if null_replacement.nil? Column.invoke("array_join", _col(col), lit(delimiter)) else Column.invoke("array_join", _col(col), lit(delimiter), lit(null_replacement)) end end |
#array_max(*cols) ⇒ Column
The Spark SQL ‘array_max` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_min(*cols) ⇒ Column
The Spark SQL ‘array_min` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_position(col, value) ⇒ Object
198 |
# File 'lib/spark_connect/functions.rb', line 198 def array_position(col, value) = Column.invoke("array_position", _col(col), lit(value)) |
#array_prepend(col, value) ⇒ Object
202 |
# File 'lib/spark_connect/functions.rb', line 202 def array_prepend(col, value) = Column.invoke("array_prepend", _col(col), lit(value)) |
#array_remove(col, element) ⇒ Object
199 |
# File 'lib/spark_connect/functions.rb', line 199 def array_remove(col, element) = Column.invoke("array_remove", _col(col), lit(element)) |
#array_repeat(col, count) ⇒ Object
200 |
# File 'lib/spark_connect/functions.rb', line 200 def array_repeat(col, count) = Column.invoke("array_repeat", _col(col), lit(count)) |
#array_sort(*cols) ⇒ Column
The Spark SQL ‘array_sort` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_union(*cols) ⇒ Column
The Spark SQL ‘array_union` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#arrays_overlap(*cols) ⇒ Column
The Spark SQL ‘arrays_overlap` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#arrays_zip(*cols) ⇒ Column
The Spark SQL ‘arrays_zip` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#asc(col) ⇒ Column
Returns an ascending sort order for the named/given column.
42 |
# File 'lib/spark_connect/functions.rb', line 42 def asc(col) = _col(col).asc |
#asc_nulls_first(col) ⇒ Object
44 |
# File 'lib/spark_connect/functions.rb', line 44 def asc_nulls_first(col) = _col(col).asc_nulls_first |
#asc_nulls_last(col) ⇒ Object
45 |
# File 'lib/spark_connect/functions.rb', line 45 def asc_nulls_last(col) = _col(col).asc_nulls_last |
#ascii(*cols) ⇒ Column
The Spark SQL ‘ascii` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#asin(*cols) ⇒ Column
The Spark SQL ‘asin` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#asinh(*cols) ⇒ Column
The Spark SQL ‘asinh` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#atan(*cols) ⇒ Column
The Spark SQL ‘atan` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#atan2(*cols) ⇒ Column
The Spark SQL ‘atan2` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#atanh(*cols) ⇒ Column
The Spark SQL ‘atanh` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#avg(*cols) ⇒ Column
The Spark SQL ‘avg` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#base64(*cols) ⇒ Column
The Spark SQL ‘base64` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bin(*cols) ⇒ Column
The Spark SQL ‘bin` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bit_and(*cols) ⇒ Column
The Spark SQL ‘bit_and` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bit_count(*cols) ⇒ Column
The Spark SQL ‘bit_count` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bit_length(*cols) ⇒ Column
The Spark SQL ‘bit_length` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bit_or(*cols) ⇒ Column
The Spark SQL ‘bit_or` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bit_xor(*cols) ⇒ Column
The Spark SQL ‘bit_xor` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bitwise_not(*cols) ⇒ Column
The Spark SQL ‘bitwise_not` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bool_and(*cols) ⇒ Column
The Spark SQL ‘bool_and` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bool_or(*cols) ⇒ Column
The Spark SQL ‘bool_or` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#broadcast(df) ⇒ DataFrame
Mark a DataFrame for broadcast (map-side) join.
269 |
# File 'lib/spark_connect/functions.rb', line 269 def broadcast(df) = df.hint("broadcast") |
#bround(col, scale = 0) ⇒ Column
Returns HALF_EVEN (“banker’s”) rounding to ‘scale` places.
82 |
# File 'lib/spark_connect/functions.rb', line 82 def bround(col, scale = 0) = Column.invoke("bround", _col(col), lit(scale)) |
#cardinality(*cols) ⇒ Column
The Spark SQL ‘cardinality` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#cbrt(*cols) ⇒ Column
The Spark SQL ‘cbrt` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#ceil(*cols) ⇒ Column
The Spark SQL ‘ceil` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#ceiling(*cols) ⇒ Column
The Spark SQL ‘ceiling` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#char_length(*cols) ⇒ Column
The Spark SQL ‘char_length` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#character_length(*cols) ⇒ Column
The Spark SQL ‘character_length` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#coalesce(*cols) ⇒ Column
Returns first non-null among the given columns.
87 88 |
# File 'lib/spark_connect/functions.rb', line 87 def coalesce(*cols) = Column.invoke("coalesce", *cols.map { |c| _col(c) }) # @return [Column] `value` if `col` is NaN else `col`. |
#col(name) ⇒ Column Also known as: column
A column reference by name. ‘“*”` selects all columns.
28 |
# File 'lib/spark_connect/functions.rb', line 28 def col(name) = Column.from_name(name.to_s) |
#collect_list(*cols) ⇒ Column
The Spark SQL ‘collect_list` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#collect_set(*cols) ⇒ Column
The Spark SQL ‘collect_set` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#concat(*cols) ⇒ Column
The Spark SQL ‘concat` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#concat_ws(sep, *cols) ⇒ Column
Returns concatenation of columns separated by literal ‘sep`.
107 108 |
# File 'lib/spark_connect/functions.rb', line 107 def concat_ws(sep, *cols) = Column.invoke("concat_ws", lit(sep), *cols.map { |c| _col(c) }) # @return [Column] printf-style formatting using literal `fmt`. |
#conv(col, from_base, to_base) ⇒ Column
Returns convert a number string from ‘from_base` to `to_base`.
145 146 |
# File 'lib/spark_connect/functions.rb', line 145 def conv(col, from_base, to_base) = Column.invoke("conv", _col(col), lit(from_base), lit(to_base)) # @return [Column] left shift / right shift by literal bit counts. |
#corr(*cols) ⇒ Column
The Spark SQL ‘corr` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#cos(*cols) ⇒ Column
The Spark SQL ‘cos` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#cosh(*cols) ⇒ Column
The Spark SQL ‘cosh` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#cot(*cols) ⇒ Column
The Spark SQL ‘cot` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#count(col) ⇒ Column
Returns count of rows (or non-null values of a column). ‘“*”` counts all rows.
59 60 61 |
# File 'lib/spark_connect/functions.rb', line 59 def count(col) col.to_s == "*" ? Column.invoke("count", lit(1)) : Column.invoke("count", _col(col)) end |
#count_distinct(*cols) ⇒ Column Also known as: countDistinct
Returns count of distinct combinations of the given columns.
64 65 66 |
# File 'lib/spark_connect/functions.rb', line 64 def count_distinct(*cols) Column.invoke("count", *cols.map { |c| _col(c) }, is_distinct: true) end |
#count_if(*cols) ⇒ Column
The Spark SQL ‘count_if` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#covar_pop(*cols) ⇒ Column
The Spark SQL ‘covar_pop` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#covar_samp(*cols) ⇒ Column
The Spark SQL ‘covar_samp` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#crc32(*cols) ⇒ Column
The Spark SQL ‘crc32` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#create_map(*cols) ⇒ Column
Returns a map from alternating key/value columns.
98 99 |
# File 'lib/spark_connect/functions.rb', line 98 def create_map(*cols) = Column.invoke("map", *cols.map { |c| _col(c) }) # @return [Column] a map from two array columns (keys, values). |
#csc(*cols) ⇒ Column
The Spark SQL ‘csc` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#cume_dist ⇒ Column
The Spark SQL ‘cume_dist` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_catalog ⇒ Column
The Spark SQL ‘current_catalog` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_database ⇒ Column
The Spark SQL ‘current_database` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_date ⇒ Column
The Spark SQL ‘current_date` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_schema ⇒ Column
The Spark SQL ‘current_schema` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_timestamp ⇒ Column
The Spark SQL ‘current_timestamp` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_timezone ⇒ Column
The Spark SQL ‘current_timezone` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_user ⇒ Column
The Spark SQL ‘current_user` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#date_add(col, days) ⇒ Object
156 |
# File 'lib/spark_connect/functions.rb', line 156 def date_add(col, days) = Column.invoke("date_add", _col(col), lit(days)) |
#date_format(col, fmt) ⇒ Object
—- Date / time functions with literal arguments ———————
153 |
# File 'lib/spark_connect/functions.rb', line 153 def date_format(col, fmt) = Column.invoke("date_format", _col(col), lit(fmt)) |
#date_from_unix_date(*cols) ⇒ Column
The Spark SQL ‘date_from_unix_date` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#date_sub(col, days) ⇒ Object
157 |
# File 'lib/spark_connect/functions.rb', line 157 def date_sub(col, days) = Column.invoke("date_sub", _col(col), lit(days)) |
#date_trunc(fmt, col) ⇒ Object
163 |
# File 'lib/spark_connect/functions.rb', line 163 def date_trunc(fmt, col) = Column.invoke("date_trunc", lit(fmt), _col(col)) |
#datediff(end_col, start_col) ⇒ Object
158 |
# File 'lib/spark_connect/functions.rb', line 158 def datediff(end_col, start_col) = Column.invoke("datediff", _col(end_col), _col(start_col)) |
#day(*cols) ⇒ Column
The Spark SQL ‘day` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#dayofmonth(*cols) ⇒ Column
The Spark SQL ‘dayofmonth` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#dayofweek(*cols) ⇒ Column
The Spark SQL ‘dayofweek` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#dayofyear(*cols) ⇒ Column
The Spark SQL ‘dayofyear` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#degrees(*cols) ⇒ Column
The Spark SQL ‘degrees` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#dense_rank ⇒ Column
The Spark SQL ‘dense_rank` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#desc(col) ⇒ Object
43 |
# File 'lib/spark_connect/functions.rb', line 43 def desc(col) = _col(col).desc |
#desc_nulls_first(col) ⇒ Object
46 |
# File 'lib/spark_connect/functions.rb', line 46 def desc_nulls_first(col) = _col(col).desc_nulls_first |
#desc_nulls_last(col) ⇒ Object
47 |
# File 'lib/spark_connect/functions.rb', line 47 def desc_nulls_last(col) = _col(col).desc_nulls_last |
#element_at(col, extraction) ⇒ Object
214 |
# File 'lib/spark_connect/functions.rb', line 214 def element_at(col, extraction) = Column.invoke("element_at", _col(col), lit(extraction)) |
#every(*cols) ⇒ Column
The Spark SQL ‘every` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#exists(col, &block) ⇒ Object
246 |
# File 'lib/spark_connect/functions.rb', line 246 def exists(col, &block) = Column.invoke("exists", _col(col), _lambda(block)) |
#exp(*cols) ⇒ Column
The Spark SQL ‘exp` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#explode(*cols) ⇒ Column
The Spark SQL ‘explode` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#explode_outer(*cols) ⇒ Column
The Spark SQL ‘explode_outer` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#expm1(*cols) ⇒ Column
The Spark SQL ‘expm1` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#expr(sql) ⇒ Column
Parse a SQL expression string into a Column.
37 38 39 |
# File 'lib/spark_connect/functions.rb', line 37 def expr(sql) Column.from_expr(Proto::Expression.new(expression_string: Proto::Expression::ExpressionString.new(expression: sql))) end |
#factorial(*cols) ⇒ Column
The Spark SQL ‘factorial` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#filter(col, &block) ⇒ Object
248 |
# File 'lib/spark_connect/functions.rb', line 248 def filter(col, &block) = Column.invoke("filter", _col(col), _lambda(block)) |
#first(*cols) ⇒ Column
The Spark SQL ‘first` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#first_value(*cols) ⇒ Column
The Spark SQL ‘first_value` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#flatten(*cols) ⇒ Column
The Spark SQL ‘flatten` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#floor(*cols) ⇒ Column
The Spark SQL ‘floor` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#forall(col, &block) ⇒ Object
247 |
# File 'lib/spark_connect/functions.rb', line 247 def forall(col, &block) = Column.invoke("forall", _col(col), _lambda(block)) |
#format_number(col, d) ⇒ Column
Returns number formatted to ‘d` decimal places.
111 112 |
# File 'lib/spark_connect/functions.rb', line 111 def format_number(col, d) = Column.invoke("format_number", _col(col), lit(d)) # @return [Column] substring of length `len` from 1-based `pos`. |
#format_string(fmt, *cols) ⇒ Column
Returns printf-style formatting using literal ‘fmt`.
109 110 |
# File 'lib/spark_connect/functions.rb', line 109 def format_string(fmt, *cols) = Column.invoke("format_string", lit(fmt), *cols.map { |c| _col(c) }) # @return [Column] number formatted to `d` decimal places. |
#from_json(col, schema, options = {}) ⇒ Object
180 181 182 183 184 |
# File 'lib/spark_connect/functions.rb', line 180 def from_json(col, schema, = {}) schema_col = schema.is_a?(Types::DataType) ? lit(schema.json) : lit(schema.to_s) args = [_col(col), schema_col] + .flat_map { |k, v| [lit(k.to_s), lit(v.to_s)] } Column.invoke("from_json", *args) end |
#from_unixtime(col, fmt = "yyyy-MM-dd HH:mm:ss") ⇒ Object
164 |
# File 'lib/spark_connect/functions.rb', line 164 def from_unixtime(col, fmt = "yyyy-MM-dd HH:mm:ss") = Column.invoke("from_unixtime", _col(col), lit(fmt)) |
#from_utc_timestamp(col, tz) ⇒ Object
170 |
# File 'lib/spark_connect/functions.rb', line 170 def (col, tz) = Column.invoke("from_utc_timestamp", _col(col), lit(tz)) |
#get_json_object(col, path) ⇒ Object
—- JSON / CSV ——————————————————–
176 |
# File 'lib/spark_connect/functions.rb', line 176 def get_json_object(col, path) = Column.invoke("get_json_object", _col(col), lit(path)) |
#greatest(*cols) ⇒ Column
The Spark SQL ‘greatest` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#grouping(*cols) ⇒ Column
The Spark SQL ‘grouping` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#hash(*cols) ⇒ Column
The Spark SQL ‘hash` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#hex(*cols) ⇒ Column
The Spark SQL ‘hex` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#hour(*cols) ⇒ Column
The Spark SQL ‘hour` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#hypot(*cols) ⇒ Column
The Spark SQL ‘hypot` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#initcap(*cols) ⇒ Column
The Spark SQL ‘initcap` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#inline(*cols) ⇒ Column
The Spark SQL ‘inline` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#inline_outer(*cols) ⇒ Column
The Spark SQL ‘inline_outer` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#input_file_block_length ⇒ Column
The Spark SQL ‘input_file_block_length` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#input_file_block_start ⇒ Column
The Spark SQL ‘input_file_block_start` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#input_file_name ⇒ Column
The Spark SQL ‘input_file_name` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#instr(col, substr) ⇒ Column
Returns 1-based position of literal ‘substr` within `col` (0 if absent).
117 118 |
# File 'lib/spark_connect/functions.rb', line 117 def instr(col, substr) = Column.invoke("instr", _col(col), lit(substr)) # @return [Column] 1-based position of `substr` in `col` at/after `pos`. |
#isnan(*cols) ⇒ Column
The Spark SQL ‘isnan` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#isnull(*cols) ⇒ Column
The Spark SQL ‘isnull` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#json_tuple(col, *fields) ⇒ Object
177 |
# File 'lib/spark_connect/functions.rb', line 177 def json_tuple(col, *fields) = Column.invoke("json_tuple", _col(col), *fields.map { |f| lit(f) }) |
#kurtosis(*cols) ⇒ Column
The Spark SQL ‘kurtosis` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lag(col, offset = 1, default = nil) ⇒ Object
—- Window / analytic functions ————————————–
225 |
# File 'lib/spark_connect/functions.rb', line 225 def lag(col, offset = 1, default = nil) = Column.invoke("lag", _col(col), lit(offset), lit(default)) |
#last(*cols) ⇒ Column
The Spark SQL ‘last` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#last_day(*cols) ⇒ Column
The Spark SQL ‘last_day` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#last_value(*cols) ⇒ Column
The Spark SQL ‘last_value` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lcase(*cols) ⇒ Column
The Spark SQL ‘lcase` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lead(col, offset = 1, default = nil) ⇒ Object
226 |
# File 'lib/spark_connect/functions.rb', line 226 def lead(col, offset = 1, default = nil) = Column.invoke("lead", _col(col), lit(offset), lit(default)) |
#least(*cols) ⇒ Column
The Spark SQL ‘least` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#length(*cols) ⇒ Column
The Spark SQL ‘length` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lit(value) ⇒ Column
A literal value column. See Column.lit for supported Ruby types.
33 |
# File 'lib/spark_connect/functions.rb', line 33 def lit(value) = Column.lit(value) |
#ln(*cols) ⇒ Column
The Spark SQL ‘ln` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#locate(substr, col, pos = 1) ⇒ Column
Returns 1-based position of ‘substr` in `col` at/after `pos`.
119 120 |
# File 'lib/spark_connect/functions.rb', line 119 def locate(substr, col, pos = 1) = Column.invoke("locate", lit(substr), _col(col), lit(pos)) # @return [Column] left-padded string. |
#log(*cols) ⇒ Column
The Spark SQL ‘log` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#log10(*cols) ⇒ Column
The Spark SQL ‘log10` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#log1p(*cols) ⇒ Column
The Spark SQL ‘log1p` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#log2(*cols) ⇒ Column
The Spark SQL ‘log2` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lower(*cols) ⇒ Column
The Spark SQL ‘lower` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lpad(col, len, pad) ⇒ Column
Returns left-padded string.
121 122 |
# File 'lib/spark_connect/functions.rb', line 121 def lpad(col, len, pad) = Column.invoke("lpad", _col(col), lit(len), lit(pad)) # @return [Column] right-padded string. |
#ltrim(*cols) ⇒ Column
The Spark SQL ‘ltrim` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#make_date(year, month, day) ⇒ Object
172 |
# File 'lib/spark_connect/functions.rb', line 172 def make_date(year, month, day) = Column.invoke("make_date", _col(year), _col(month), _col(day)) |
#map_concat(*cols) ⇒ Column
The Spark SQL ‘map_concat` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#map_contains_key(col, key) ⇒ Object
221 |
# File 'lib/spark_connect/functions.rb', line 221 def map_contains_key(col, key) = Column.invoke("map_contains_key", _col(col), lit(key)) |
#map_entries(*cols) ⇒ Column
The Spark SQL ‘map_entries` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#map_filter(col, &block) ⇒ Object
252 |
# File 'lib/spark_connect/functions.rb', line 252 def map_filter(col, &block) = Column.invoke("map_filter", _col(col), _lambda(block)) |
#map_from_arrays(keys, values) ⇒ Column
Returns a map from two array columns (keys, values).
100 101 |
# File 'lib/spark_connect/functions.rb', line 100 def map_from_arrays(keys, values) = Column.invoke("map_from_arrays", _col(keys), _col(values)) # @return [Column] a named struct from alternating name/value arguments. |
#map_from_entries(*cols) ⇒ Column
The Spark SQL ‘map_from_entries` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#map_keys(*cols) ⇒ Column
The Spark SQL ‘map_keys` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#map_values(*cols) ⇒ Column
The Spark SQL ‘map_values` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#map_zip_with(c1, c2, &block) ⇒ Object
253 |
# File 'lib/spark_connect/functions.rb', line 253 def map_zip_with(c1, c2, &block) = Column.invoke("map_zip_with", _col(c1), _col(c2), _lambda(block)) |
#max(*cols) ⇒ Column
The Spark SQL ‘max` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#max_by(*cols) ⇒ Column
The Spark SQL ‘max_by` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#md5(*cols) ⇒ Column
The Spark SQL ‘md5` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#mean(*cols) ⇒ Column
The Spark SQL ‘mean` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#median(*cols) ⇒ Column
The Spark SQL ‘median` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#min(*cols) ⇒ Column
The Spark SQL ‘min` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#min_by(*cols) ⇒ Column
The Spark SQL ‘min_by` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#minute(*cols) ⇒ Column
The Spark SQL ‘minute` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#mode(*cols) ⇒ Column
The Spark SQL ‘mode` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#monotonically_increasing_id ⇒ Column
The Spark SQL ‘monotonically_increasing_id` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#month(*cols) ⇒ Column
The Spark SQL ‘month` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#months_between(d1, d2, round_off = true) ⇒ Object
160 |
# File 'lib/spark_connect/functions.rb', line 160 def months_between(d1, d2, round_off = true) = Column.invoke("months_between", _col(d1), _col(d2), lit(round_off)) |
#named_struct(*cols) ⇒ Column
Returns a named struct from alternating name/value arguments.
102 |
# File 'lib/spark_connect/functions.rb', line 102 def named_struct(*cols) = Column.invoke("named_struct", *cols.map { |c| _col(c) }) |
#nanvl(col1, col2) ⇒ Column
Returns ‘value` if `col` is NaN else `col`.
89 |
# File 'lib/spark_connect/functions.rb', line 89 def nanvl(col1, col2) = Column.invoke("nanvl", _col(col1), _col(col2)) |
#negate(*cols) ⇒ Column
The Spark SQL ‘negate` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#negative(*cols) ⇒ Column
The Spark SQL ‘negative` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#next_day(col, day_of_week) ⇒ Object
161 |
# File 'lib/spark_connect/functions.rb', line 161 def next_day(col, day_of_week) = Column.invoke("next_day", _col(col), lit(day_of_week)) |
#now ⇒ Column
The Spark SQL ‘now` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#nth_value(col, offset, ignore_nulls = false) ⇒ Object
228 |
# File 'lib/spark_connect/functions.rb', line 228 def nth_value(col, offset, ignore_nulls = false) = Column.invoke("nth_value", _col(col), lit(offset), lit(ignore_nulls)) |
#ntile(n) ⇒ Object
227 |
# File 'lib/spark_connect/functions.rb', line 227 def ntile(n) = Column.invoke("ntile", lit(n)) |
#octet_length(*cols) ⇒ Column
The Spark SQL ‘octet_length` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#overlay(col, replace, pos, len = -1)) ⇒ Column
Returns overlay ‘replace` into `col` at `pos` for `len` chars.
141 142 |
# File 'lib/spark_connect/functions.rb', line 141 def (col, replace, pos, len = -1) = Column.invoke("overlay", _col(col), _col(replace), lit(pos), lit(len)) # @return [Column] SHA-2 hash with the given bit length (224/256/384/512). |
#percent_rank ⇒ Column
The Spark SQL ‘percent_rank` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#pmod(*cols) ⇒ Column
The Spark SQL ‘pmod` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#posexplode(*cols) ⇒ Column
The Spark SQL ‘posexplode` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#posexplode_outer(*cols) ⇒ Column
The Spark SQL ‘posexplode_outer` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#positive(*cols) ⇒ Column
The Spark SQL ‘positive` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#pow(*cols) ⇒ Column
The Spark SQL ‘pow` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#power(*cols) ⇒ Column
The Spark SQL ‘power` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#product(*cols) ⇒ Column
The Spark SQL ‘product` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#quarter(*cols) ⇒ Column
The Spark SQL ‘quarter` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#radians(*cols) ⇒ Column
The Spark SQL ‘radians` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#rand(seed = nil) ⇒ Object
—- Randomness ——————————————————–
236 |
# File 'lib/spark_connect/functions.rb', line 236 def rand(seed = nil) = seed.nil? ? Column.invoke("rand") : Column.invoke("rand", lit(seed)) |
#randn(seed = nil) ⇒ Object
237 |
# File 'lib/spark_connect/functions.rb', line 237 def randn(seed = nil) = seed.nil? ? Column.invoke("randn") : Column.invoke("randn", lit(seed)) |
#rank ⇒ Column
The Spark SQL ‘rank` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#regexp_count(col, pattern) ⇒ Object
138 |
# File 'lib/spark_connect/functions.rb', line 138 def regexp_count(col, pattern) = Column.invoke("regexp_count", _col(col), lit(pattern)) |
#regexp_extract(col, pattern, idx = 0) ⇒ Column
Returns the ‘idx`-th group of `pattern` matched in `col`.
131 132 |
# File 'lib/spark_connect/functions.rb', line 131 def regexp_extract(col, pattern, idx = 0) = Column.invoke("regexp_extract", _col(col), lit(pattern), lit(idx)) # @return [Column] all matches of group `idx` of `pattern`. |
#regexp_extract_all(col, pattern, idx = 1) ⇒ Column
Returns all matches of group ‘idx` of `pattern`.
133 134 |
# File 'lib/spark_connect/functions.rb', line 133 def regexp_extract_all(col, pattern, idx = 1) = Column.invoke("regexp_extract_all", _col(col), lit(pattern), lit(idx)) # @return [Column] `col` with `pattern` replaced by `replacement`. |
#regexp_like(col, pattern) ⇒ Column
Returns whether ‘col` matches `pattern`.
137 |
# File 'lib/spark_connect/functions.rb', line 137 def regexp_like(col, pattern) = Column.invoke("regexp_like", _col(col), lit(pattern)) |
#regexp_replace(col, pattern, replacement) ⇒ Column
Returns ‘col` with `pattern` replaced by `replacement`.
135 136 |
# File 'lib/spark_connect/functions.rb', line 135 def regexp_replace(col, pattern, replacement) = Column.invoke("regexp_replace", _col(col), lit(pattern), lit(replacement)) # @return [Column] whether `col` matches `pattern`. |
#regexp_substr(col, pattern) ⇒ Object
139 140 |
# File 'lib/spark_connect/functions.rb', line 139 def regexp_substr(col, pattern) = Column.invoke("regexp_substr", _col(col), lit(pattern)) # @return [Column] overlay `replace` into `col` at `pos` for `len` chars. |
#repeat(col, n) ⇒ Column
Returns the string repeated ‘n` times.
125 126 |
# File 'lib/spark_connect/functions.rb', line 125 def repeat(col, n) = Column.invoke("repeat", _col(col), lit(n)) # @return [Column] split `col` by the literal regex `pattern`. |
#reverse(*cols) ⇒ Column
The Spark SQL ‘reverse` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#rint(*cols) ⇒ Column
The Spark SQL ‘rint` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#round(col, scale = 0) ⇒ Column
Returns HALF_UP rounding to ‘scale` decimal places.
80 81 |
# File 'lib/spark_connect/functions.rb', line 80 def round(col, scale = 0) = Column.invoke("round", _col(col), lit(scale)) # @return [Column] HALF_EVEN ("banker's") rounding to `scale` places. |
#row_number ⇒ Column
The Spark SQL ‘row_number` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#rpad(col, len, pad) ⇒ Column
Returns right-padded string.
123 124 |
# File 'lib/spark_connect/functions.rb', line 123 def rpad(col, len, pad) = Column.invoke("rpad", _col(col), lit(len), lit(pad)) # @return [Column] the string repeated `n` times. |
#rtrim(*cols) ⇒ Column
The Spark SQL ‘rtrim` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#schema_of_json(json, options = {}) ⇒ Object
191 192 193 |
# File 'lib/spark_connect/functions.rb', line 191 def schema_of_json(json, = {}) Column.invoke("schema_of_json", _lit_or_col(json), *.flat_map { |k, v| [lit(k.to_s), lit(v.to_s)] }) end |
#sec(*cols) ⇒ Column
The Spark SQL ‘sec` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#second(*cols) ⇒ Column
The Spark SQL ‘second` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sequence(start, stop, step = nil) ⇒ Object
217 218 219 |
# File 'lib/spark_connect/functions.rb', line 217 def sequence(start, stop, step = nil) step.nil? ? Column.invoke("sequence", _col(start), _col(stop)) : Column.invoke("sequence", _col(start), _col(stop), _col(step)) end |
#sha(*cols) ⇒ Column
The Spark SQL ‘sha` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sha1(*cols) ⇒ Column
The Spark SQL ‘sha1` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sha2(col, num_bits) ⇒ Column
Returns SHA-2 hash with the given bit length (224/256/384/512).
143 144 |
# File 'lib/spark_connect/functions.rb', line 143 def sha2(col, num_bits) = Column.invoke("sha2", _col(col), lit(num_bits)) # @return [Column] convert a number string from `from_base` to `to_base`. |
#shiftleft(col, num_bits) ⇒ Column
Returns left shift / right shift by literal bit counts.
147 |
# File 'lib/spark_connect/functions.rb', line 147 def shiftleft(col, num_bits) = Column.invoke("shiftleft", _col(col), lit(num_bits)) |
#shiftright(col, num_bits) ⇒ Object
148 |
# File 'lib/spark_connect/functions.rb', line 148 def shiftright(col, num_bits) = Column.invoke("shiftright", _col(col), lit(num_bits)) |
#shiftrightunsigned(col, num_bits) ⇒ Object
149 |
# File 'lib/spark_connect/functions.rb', line 149 def shiftrightunsigned(col, num_bits) = Column.invoke("shiftrightunsigned", _col(col), lit(num_bits)) |
#shuffle(*cols) ⇒ Column
The Spark SQL ‘shuffle` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#signum(*cols) ⇒ Column
The Spark SQL ‘signum` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sin(*cols) ⇒ Column
The Spark SQL ‘sin` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sinh(*cols) ⇒ Column
The Spark SQL ‘sinh` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#size(*cols) ⇒ Column
The Spark SQL ‘size` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#skewness(*cols) ⇒ Column
The Spark SQL ‘skewness` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#slice(col, start, length) ⇒ Object
215 |
# File 'lib/spark_connect/functions.rb', line 215 def slice(col, start, length) = Column.invoke("slice", _col(col), _lit_or_col(start), _lit_or_col(length)) |
#some(*cols) ⇒ Column
The Spark SQL ‘some` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sort_array(col, asc = true) ⇒ Object
—- Sorting helpers —————————————————
232 |
# File 'lib/spark_connect/functions.rb', line 232 def sort_array(col, asc = true) = Column.invoke("sort_array", _col(col), lit(asc)) |
#soundex(*cols) ⇒ Column
The Spark SQL ‘soundex` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#spark_partition_id ⇒ Column
The Spark SQL ‘spark_partition_id` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#split(col, pattern, limit = -1)) ⇒ Column
Returns split ‘col` by the literal regex `pattern`.
127 128 |
# File 'lib/spark_connect/functions.rb', line 127 def split(col, pattern, limit = -1) = Column.invoke("split", _col(col), lit(pattern), lit(limit)) # @return [Column] characters of `col` matching `matching` replaced per `replace`. |
#sqrt(*cols) ⇒ Column
The Spark SQL ‘sqrt` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#stddev(*cols) ⇒ Column
The Spark SQL ‘stddev` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#stddev_pop(*cols) ⇒ Column
The Spark SQL ‘stddev_pop` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#stddev_samp(*cols) ⇒ Column
The Spark SQL ‘stddev_samp` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#struct(*cols) ⇒ Column
Returns a struct from the given columns.
94 95 |
# File 'lib/spark_connect/functions.rb', line 94 def struct(*cols) = Column.invoke("struct", *cols.map { |c| _col(c) }) # @return [Column] an array from the given columns. |
#substring(col, pos, len) ⇒ Column
Returns substring of length ‘len` from 1-based `pos`.
113 114 |
# File 'lib/spark_connect/functions.rb', line 113 def substring(col, pos, len) = Column.invoke("substring", _col(col), lit(pos), lit(len)) # @return [Column] substring before the `count`-th occurrence of `delim`. |
#substring_index(col, delim, count) ⇒ Column
Returns substring before the ‘count`-th occurrence of `delim`.
115 116 |
# File 'lib/spark_connect/functions.rb', line 115 def substring_index(col, delim, count) = Column.invoke("substring_index", _col(col), lit(delim), lit(count)) # @return [Column] 1-based position of literal `substr` within `col` (0 if absent). |
#sum(*cols) ⇒ Column
The Spark SQL ‘sum` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sum_distinct(col) ⇒ Column
Returns sum of distinct values.
75 |
# File 'lib/spark_connect/functions.rb', line 75 def sum_distinct(col) = Column.invoke("sum", _col(col), is_distinct: true) |
#tan(*cols) ⇒ Column
The Spark SQL ‘tan` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#tanh(*cols) ⇒ Column
The Spark SQL ‘tanh` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#timestamp_micros(*cols) ⇒ Column
The Spark SQL ‘timestamp_micros` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#timestamp_millis(*cols) ⇒ Column
The Spark SQL ‘timestamp_millis` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#timestamp_seconds(*cols) ⇒ Column
The Spark SQL ‘timestamp_seconds` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#to_date(col, fmt = nil) ⇒ Object
154 |
# File 'lib/spark_connect/functions.rb', line 154 def to_date(col, fmt = nil) = fmt ? Column.invoke("to_date", _col(col), lit(fmt)) : Column.invoke("to_date", _col(col)) |
#to_json(col, options = {}) ⇒ Object
186 187 188 189 |
# File 'lib/spark_connect/functions.rb', line 186 def to_json(col, = {}) args = [_col(col)] + .flat_map { |k, v| [lit(k.to_s), lit(v.to_s)] } Column.invoke("to_json", *args) end |
#to_timestamp(col, fmt = nil) ⇒ Object
155 |
# File 'lib/spark_connect/functions.rb', line 155 def (col, fmt = nil) = fmt ? Column.invoke("to_timestamp", _col(col), lit(fmt)) : Column.invoke("to_timestamp", _col(col)) |
#to_utc_timestamp(col, tz) ⇒ Object
171 |
# File 'lib/spark_connect/functions.rb', line 171 def (col, tz) = Column.invoke("to_utc_timestamp", _col(col), lit(tz)) |
#transform(col) {|element| ... } ⇒ Column
245 |
# File 'lib/spark_connect/functions.rb', line 245 def transform(col, &block) = Column.invoke("transform", _col(col), _lambda(block)) |
#transform_keys(col, &block) ⇒ Object
250 |
# File 'lib/spark_connect/functions.rb', line 250 def transform_keys(col, &block) = Column.invoke("transform_keys", _col(col), _lambda(block)) |
#transform_values(col, &block) ⇒ Object
251 |
# File 'lib/spark_connect/functions.rb', line 251 def transform_values(col, &block) = Column.invoke("transform_values", _col(col), _lambda(block)) |
#translate(col, matching, replace) ⇒ Column
Returns characters of ‘col` matching `matching` replaced per `replace`.
129 130 |
# File 'lib/spark_connect/functions.rb', line 129 def translate(col, matching, replace) = Column.invoke("translate", _col(col), lit(matching), lit(replace)) # @return [Column] the `idx`-th group of `pattern` matched in `col`. |
#trim(*cols) ⇒ Column
The Spark SQL ‘trim` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#trunc(col, fmt) ⇒ Object
162 |
# File 'lib/spark_connect/functions.rb', line 162 def trunc(col, fmt) = Column.invoke("trunc", _col(col), lit(fmt)) |
#typeof(*cols) ⇒ Column
The Spark SQL ‘typeof` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#ucase(*cols) ⇒ Column
The Spark SQL ‘ucase` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#udf ⇒ Object
UDFs require a server-side execution environment (Python/Scala) and are not supported by the pure-Ruby client.
273 274 275 |
# File 'lib/spark_connect/functions.rb', line 273 def udf(*) raise NotImplementedError, "User-defined functions are not supported by the Ruby Spark Connect client" end |
#unbase64(*cols) ⇒ Column
The Spark SQL ‘unbase64` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unhex(*cols) ⇒ Column
The Spark SQL ‘unhex` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unix_date(*cols) ⇒ Column
The Spark SQL ‘unix_date` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unix_micros(*cols) ⇒ Column
The Spark SQL ‘unix_micros` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unix_millis(*cols) ⇒ Column
The Spark SQL ‘unix_millis` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unix_seconds(*cols) ⇒ Column
The Spark SQL ‘unix_seconds` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unix_timestamp(col = nil, fmt = "yyyy-MM-dd HH:mm:ss") ⇒ Object
166 167 168 |
# File 'lib/spark_connect/functions.rb', line 166 def (col = nil, fmt = "yyyy-MM-dd HH:mm:ss") col.nil? ? Column.invoke("unix_timestamp") : Column.invoke("unix_timestamp", _col(col), lit(fmt)) end |
#upper(*cols) ⇒ Column
The Spark SQL ‘upper` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#uuid ⇒ Column
The Spark SQL ‘uuid` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#var_pop(*cols) ⇒ Column
The Spark SQL ‘var_pop` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#var_samp(*cols) ⇒ Column
The Spark SQL ‘var_samp` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#variance(*cols) ⇒ Column
The Spark SQL ‘variance` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#version ⇒ Column
The Spark SQL ‘version` function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#weekday(*cols) ⇒ Column
The Spark SQL ‘weekday` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#weekofyear(*cols) ⇒ Column
The Spark SQL ‘weekofyear` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#when(condition, value) ⇒ Column
Start a CASE WHEN expression. Chain Column#when / Column#otherwise.
51 52 53 |
# File 'lib/spark_connect/functions.rb', line 51 def when(condition, value) Column.invoke("when", condition, value) end |
#xxhash64(*cols) ⇒ Column
The Spark SQL ‘xxhash64` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#year(*cols) ⇒ Column
The Spark SQL ‘year` function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |