Class: DataDrain::Storage::S3

Inherits:
Base
  • Object
show all
Defined in:
lib/data_drain/storage/s3.rb

Overview

Implementación del adaptador de almacenamiento para Amazon S3.

Instance Attribute Summary

Attributes inherited from Base

#config

Instance Method Summary collapse

Methods inherited from Base

#initialize, #prepare_export_path

Constructor Details

This class inherits a constructor from DataDrain::Storage::Base

Instance Method Details

#build_path(bucket, folder_name, partition_path) ⇒ String

Parameters:

  • bucket (String)
  • folder_name (String)
  • partition_path (String, nil)

Returns:

  • (String)


59
60
61
62
63
# File 'lib/data_drain/storage/s3.rb', line 59

def build_path(bucket, folder_name, partition_path)
  base = File.join(bucket, folder_name)
  base = File.join(base, partition_path) if partition_path && !partition_path.empty?
  "s3://#{base}/**/*.parquet"
end

#destroy_partitions(bucket, folder_name, partition_keys, partitions) ⇒ Integer

Parameters:

  • bucket (String)
  • folder_name (String)
  • partition_keys (Array<Symbol>)
  • partitions (Hash)

Returns:

  • (Integer)


70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/data_drain/storage/s3.rb', line 70

def destroy_partitions(bucket, folder_name, partition_keys, partitions)
  client = Aws::S3::Client.new(
    region: @config.aws_region,
    access_key_id: @config.aws_access_key_id,
    secret_access_key: @config.aws_secret_access_key
  )

  regex_parts = partition_keys.map do |key|
    val = partitions[key]
    val.nil? || val.to_s.empty? ? "#{key}=[^/]+" : "#{key}=#{val}"
  end
  pattern_regex = Regexp.new("^#{folder_name}/#{regex_parts.join("/")}")

  objects_to_delete = []
  prefix = "#{folder_name}/"
  first_key = partition_keys.first
  prefix += "#{first_key}=#{partitions[first_key]}/" if partitions[first_key]

  client.list_objects_v2(bucket: bucket, prefix: prefix).each do |response|
    response.contents.each do |obj|
      objects_to_delete << { key: obj.key } if obj.key.match?(pattern_regex)
    end
  end

  delete_in_batches(client, bucket, objects_to_delete)
end

#setup_duckdb(connection) ⇒ Object

rubocop:disable Metrics/AbcSize, Metrics/CyclomaticComplexity, Metrics/MethodLength Carga la extensión httpfs en DuckDB e inyecta las credenciales de AWS. Si aws_access_key_id y aws_secret_access_key están seteados, usa credenciales explícitas. Si no, usa credential_chain (IAM role, env vars, ~/.aws/credentials).

Parameters:

  • connection (DuckDB::Connection)

Raises:



14
15
16
17
# File 'lib/data_drain/storage/s3.rb', line 14

def setup_duckdb(connection)
  connection.query("INSTALL httpfs; LOAD httpfs;")
  create_s3_secret(connection)
end