Class: Bulkrax::CsvEntry

Inherits:

Entry

Object
ActiveRecord::Base
ApplicationRecord
Entry
Bulkrax::CsvEntry

show all

Defined in:: app/models/bulkrax/csv_entry.rb

Overview

TODO: We need to rework this class some to address the Metrics/ClassLength rubocop offense. We do too much in these entry classes. We need to extract the common logic from the various entry models into a module that can be shared between them.

Direct Known Subclasses

CsvCollectionEntry, CsvFileSetEntry

Defined Under Namespace

Modules: AttributeBuilderMethod Classes: CsvPathError, CsvWrapper, MissingMetadata, RecordNotFound

Instance Attribute Summary

Attributes inherited from Entry

#all_attrs

Class Method Summary collapse

.data_for_entry(data, _source_id, parser) ⇒ Object
.fields_from_data(data) ⇒ Object
.matcher_class ⇒ Object
.read_data(path) ⇒ Object

there’s a risk that this reads the whole file into memory and could cause a memory leak we strip any special characters out of the headers.

Instance Method Summary collapse

#add_file ⇒ Object
#add_identifier ⇒ Object
#add_ingested_metadata ⇒ Object
#add_metadata_for_model ⇒ Object
#build_export_metadata ⇒ Object
#build_files_metadata ⇒ Object
#build_mapping_metadata ⇒ Object
#build_metadata ⇒ Object
#build_metadata_for_delete ⇒ Object

limited metadata is needed for delete jobs.
#build_object(_key, value) ⇒ Object
#build_relationship_metadata ⇒ Object
#build_system_metadata ⇒ Object

Metadata required by Bulkrax for round-tripping.
#build_thumbnail_files ⇒ Object
#build_value(property_name, mapping_config) ⇒ Object
#collection_identifiers ⇒ Object
#collections_created? ⇒ Boolean
#establish_factory_class ⇒ Object
#find_collection_ids ⇒ Object
#handle_join_on_export(key, values, join) ⇒ Object
#key_for_export(key) ⇒ Object

On export the key becomes the from and the from becomes the destination.
#object_metadata(data) ⇒ Object
#path_to_file(file) ⇒ Object

If only filename is given, construct the path (/files/my_file).
#prepare_export_data(datum) ⇒ Object
#prepare_export_data_with_join(data) ⇒ Object
#record ⇒ Object
#validate_record ⇒ Object

Class Method Details

.data_for_entry(data, _source_id, parser) ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 85

def self.data_for_entry(data, _source_id, parser)
  # If a multi-line CSV data is passed, grab the first row
  data = data.first if data.is_a?(CSV::Table)
  # model has to be separated so that it doesn't get mistranslated by to_h
  raw_data = data.to_h
  raw_data[:model] = data[:model] if data[:model].present?
  # If the parents/children field mapping uses a custom column name, alias it to the standard key
  # so downstream code can find it regardless of what the CSV column is named.
  raw_data[:parents] = raw_data[parser.related_parents_raw_mapping.to_sym] if parser.related_parents_raw_mapping.present? && raw_data.key?(parser.related_parents_raw_mapping.to_sym) && parser.related_parents_raw_mapping != 'parents'
  raw_data[:children] = raw_data[parser.related_children_raw_mapping.to_sym] if parser.related_children_raw_mapping.present? && raw_data.key?(parser.related_children_raw_mapping.to_sym) && parser.related_children_raw_mapping != 'children'
  return raw_data
end

.fields_from_data(data) ⇒ `Object`



32
33
34

# File 'app/models/bulkrax/csv_entry.rb', line 32

def self.fields_from_data(data)
  data.headers.flatten.compact.uniq
end

.matcher_class ⇒ `Object`



384
385
386

# File 'app/models/bulkrax/csv_entry.rb', line 384

def self.matcher_class
  Bulkrax::CsvMatcher
end

.read_data(path) ⇒ `Object`

there’s a risk that this reads the whole file into memory and could cause a memory leak we strip any special characters out of the headers. looking at you Excel

Raises:

(CsvPathError)

# File 'app/models/bulkrax/csv_entry.rb', line 40

def self.read_data(path)
  raise CsvPathError, 'CSV path empty' if path.blank?
  options = {
    headers: true,
    header_converters: ->(h) { h.to_s.gsub(/[^\w\d\. -]+/, '').strip.to_sym },
    encoding: 'utf-8'
  }.merge(csv_read_data_options)

  results = if path.respond_to?(:read)
              path.rewind if path.respond_to?(:rewind)
              CSV.parse(path.read, **options)
            else
              CSV.read(path, **options)
            end
  csv_wrapper_class.new(results)
end

Instance Method Details

#add_file ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 165

def add_file
  self.parsed_metadata['file'] ||= []
  if record['file']&.is_a?(String)
    self.parsed_metadata['file'] = record['file'].split(Bulkrax.multi_value_element_split_on)
  elsif record['file'].is_a?(Array)
    self.parsed_metadata['file'] = record['file']
  end
  self.parsed_metadata['file'] = self.parsed_metadata['file'].map do |f|
    next if f.blank?

    path_to_file(f.tr(' ', '_'))
  end.compact
end

#add_identifier ⇒ `Object`



132
133
134

# File 'app/models/bulkrax/csv_entry.rb', line 132

def add_identifier
  self.parsed_metadata[work_identifier] = [record[source_identifier]]
end

#add_ingested_metadata ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 156

def add_ingested_metadata
  # we do not want to sort the values in the record before adding the metadata.
  # if we do, the factory_class will be set to the default_work_type for all values that come before "model" or "work type"
  record.each do |key, value|
    index = key[/\d+/].to_i - 1 if key[/\d+/].to_i != 0
    add_metadata(key_without_numbers(key), value, index)
  end
end

#add_metadata_for_model ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 142

def add_metadata_for_model
  if factory_class.present? && factory_class == Bulkrax.collection_model_class
    add_collection_type_gid if defined?(::Hyrax)
    # add any additional collection metadata methods here
  elsif factory_class == Bulkrax.file_model_class
    validate_presence_of_filename!
    add_path_to_file
    validate_presence_of_parent!
  else
    add_file unless importerexporter.metadata_only?
    add_admin_set_id
  end
end

#build_export_metadata ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 179

def build_export_metadata
  self.parsed_metadata = {}

  build_system_metadata
  build_files_metadata if Bulkrax.collection_model_class.present? && !hyrax_record.is_a?(Bulkrax.collection_model_class)
  build_relationship_metadata
  build_mapping_metadata
  self.save!

  self.parsed_metadata
end

#build_files_metadata ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 203

def build_files_metadata
  # attaching files to the FileSet row only so we don't have duplicates when importing to a new tenant
  if hyrax_record.work?
    build_thumbnail_files
  else
    file_mapping = key_for_export('file')
    file_sets = hyrax_record.file_set? ? Array.wrap(hyrax_record) : hyrax_record.file_sets
    filenames = map_file_sets(file_sets)

    handle_join_on_export(file_mapping, filenames, mapping['file']&.[]('join')&.present?)
  end
end

#build_mapping_metadata ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 272

def build_mapping_metadata
  mapping = fetch_field_mapping
  mapping.each do |key, value|
    method_name = AttributeBuilderMethod.for(key: key, value: value, entry: self)
    next unless method_name

    send(method_name, key, value)
  end
end

#build_metadata ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 98

def build_metadata
  validate_record

  self.parsed_metadata = {}
  add_identifier
  establish_factory_class
  add_ingested_metadata
  # TODO(alishaevn): remove the collections stuff entirely and only reference collections via the new parents code
  add_collections
  add_visibility
  add_metadata_for_model
  add_rights_statement
  sanitize_controlled_uri_values!
  add_local

  self.parsed_metadata
end

#build_metadata_for_delete ⇒ `Object`

limited metadata is needed for delete jobs

# File 'app/models/bulkrax/csv_entry.rb', line 117

def build_metadata_for_delete
  self.parsed_metadata = {}
  establish_factory_class
  add_ingested_metadata
  self.parsed_metadata
end

#build_object(_key, value) ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 282

def build_object(_key, value)
  return unless hyrax_record.respond_to?(value['object'])

  data = hyrax_record.send(value['object'])
  return if data.empty?

  data = data.to_a if data.is_a?(ActiveTriples::Relation)
  object_metadata(Array.wrap(data))
end

#build_relationship_metadata ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 216

def build_relationship_metadata
  # Includes all relationship methods for all exportable record types (works, Collections, FileSets)
  # @TODO: this logic assumes that the relationships are all available via a method that can be called
  #        on the object. With Valkyrie, this is only true for Hyrax-based models which include the
  #        ArResource module. We need to consider reworking this logic into an object factory method
  #        that can handle different types of models.
  relationship_methods = {
    related_parents_parsed_mapping => %i[member_of_collection_ids member_of_work_ids in_work_ids parent],
    related_children_parsed_mapping => %i[member_collection_ids member_work_ids file_set_ids member_ids]
  }

  relationship_methods.each do |relationship_key, methods|
    next if relationship_key.blank?

    values = []
    methods.each do |m|
      value = hyrax_record.public_send(m) if hyrax_record.respond_to?(m)
      value_id = value.try(:id)&.to_s || value # get the id if it's an object
      values << value_id if value_id.present?
    end
    values = values.flatten.uniq
    next if values.blank?

    handle_join_on_export(relationship_key, values, mapping[related_parents_parsed_mapping]['join'].present?)
  end
end

#build_system_metadata ⇒ `Object`

Metadata required by Bulkrax for round-tripping

# File 'app/models/bulkrax/csv_entry.rb', line 192

def build_system_metadata
  self.parsed_metadata['id'] = hyrax_record.id
  source_id = hyrax_record.send(work_identifier)
  # Because ActiveTriples::Relation does not respond to #to_ary we can't rely on Array.wrap universally
  source_id = source_id.to_a if source_id.is_a?(ActiveTriples::Relation)
  source_id = Array.wrap(source_id).first
  self.parsed_metadata[source_identifier] = source_id
  model_name = Bulkrax.object_factory.model_name(resource: hyrax_record)
  self.parsed_metadata[key_for_export('model')] = model_name
end

#build_thumbnail_files ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 359

def build_thumbnail_files
  return unless importerexporter.include_thumbnails
  thumbnail = Bulkrax.object_factory.thumbnail_for(resource: hyrax_record)
  return unless thumbnail

  filenames = map_file_sets(Array.wrap(thumbnail))
  thumbnail_mapping = 'thumbnail_file'
  handle_join_on_export(thumbnail_mapping, filenames, false)
end

#build_value(property_name, mapping_config) ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 292

def build_value(property_name, mapping_config)
  return unless hyrax_record.respond_to?(property_name.to_s)

  data = hyrax_record.send(property_name.to_s)

  if mapping_config['join'] || !data.is_a?(Enumerable)
    self.parsed_metadata[key_for_export(property_name)] = prepare_export_data_with_join(data)
  else
    data.each_with_index do |d, i|
      self.parsed_metadata["#{key_for_export(property_name)}_#{i + 1}"] = prepare_export_data(d)
    end
  end
end

#collection_identifiers ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 388

def collection_identifiers
  return @collection_identifiers if @collection_identifiers.present?

  parent_field_mapping = self.class.parent_field(parser)
  return [] unless parent_field_mapping.present? && record[parent_field_mapping].present?

  identifiers = []
  split_references = record[parent_field_mapping].split(Bulkrax.multi_value_element_split_on)
  split_references.each do |c_reference|
    matching_collection_entries = importerexporter.entries.select do |e|
      (e.raw_metadata&.[](source_identifier) == c_reference) &&
        e.is_a?(CsvCollectionEntry)
    end
    raise ::StandardError, 'Only expected to find one matching entry' if matching_collection_entries.count > 1
    identifiers << matching_collection_entries.first&.identifier
  end
  @collection_identifiers = identifiers.compact.presence || []
end

#collections_created? ⇒ `Boolean`

Returns:

(Boolean)

# File 'app/models/bulkrax/csv_entry.rb', line 407

def collections_created?
  # TODO: look into if this method is still needed after new relationships code
  true
end

#establish_factory_class ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 136

def establish_factory_class
  parser.model_field_mappings.each do |key|
    add_metadata('model', record[key]) if record.key?(key)
  end
end

#find_collection_ids ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 412

def find_collection_ids
  return self.collection_ids if collections_created?
  if collection_identifiers.present?
    collection_identifiers.each do |collection_id|
      c = find_collection(collection_id)
      skip = c.blank? || self.collection_ids.include?(c.id)
      self.collection_ids << c.id unless skip
    end
  end

  self.collection_ids
end

#handle_join_on_export(key, values, join) ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 369

def handle_join_on_export(key, values, join)
  if join
    parsed_metadata[key] = values.join(Bulkrax.multi_value_element_join_on)
  else
    values.each_with_index do |value, i|
      parsed_metadata["#{key}_#{i + 1}"] = value
    end
    parsed_metadata.delete(key)
  end
end

#key_for_export(key) ⇒ `Object`

On export the key becomes the from and the from becomes the destination. It is the opposite of the import because we are moving data the opposite direction metadata that does not have a specific Bulkrax entry is mapped to the key name, as matching keys coming in are mapped by the csv parser automatically

# File 'app/models/bulkrax/csv_entry.rb', line 308

def key_for_export(key)
  clean_key = key_without_numbers(key)
  unnumbered_key = mapping[clean_key] ? mapping[clean_key]['from'].first : clean_key
  # Bring the number back if there is one
  "#{unnumbered_key}#{key.sub(clean_key, '')}"
end

#object_metadata(data) ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 331

def object_metadata(data)
  # NOTE: What is `d` in this case:
  #
  #  "[{\"single_object_first_name\"=>\"Fake\", \"single_object_last_name\"=>\"Fakerson\", \"single_object_position\"=>\"Leader, Jester, Queen\", \"single_object_language\"=>\"english\"}]"
  #
  # The above is a stringified version of a Ruby string.  Using eval is a very bad idea as it
  # will execute the value of `d` within the full Ruby interpreter context.
  #
  # TODO: Would it be possible to store this as a non-string?  Maybe the actual Ruby Array and Hash?
  data = data.map { |d| eval(d) }.flatten # rubocop:disable Security/Eval

  data.each_with_index do |obj, index|
    next if obj.nil?
    # allow the object_key to be valid whether it's a string or symbol
    obj = obj.with_indifferent_access

    obj.each_key do |key|
      if obj[key].is_a?(Array)
        obj[key].each_with_index do |_nested_item, nested_index|
          self.parsed_metadata["#{key_for_export(key)}_#{index + 1}_#{nested_index + 1}"] = prepare_export_data(obj[key][nested_index])
        end
      else
        self.parsed_metadata["#{key_for_export(key)}_#{index + 1}"] = prepare_export_data(obj[key])
      end
    end
  end
end

#path_to_file(file) ⇒ `Object`

If only filename is given, construct the path (/files/my_file). If file contains a path separator (e.g. attachments/cat_scan.jpg), resolve relative to the CSV’s directory.

# File 'app/models/bulkrax/csv_entry.rb', line 427

def path_to_file(file)
  return file if File.exist?(file)

  # Relative path: resolve from CSV's directory (allows arbitrary subdirectory names, not just "files")
  return resolve_relative_file_path(file) if file.include?('/')

  # Bare filename: use legacy files/ directory for backward compatibility and round-tripping
  path = importerexporter.parser.path_to_files
  raise "Could not determine path to files directory. Ensure the import package contains a zip or a valid import_file_path." if path.nil?

  f = File.join(path, file)
  return f if File.exist?(f)
  raise "File not found: #{f}. Check the file column in your CSV and ensure the file exists in the import package or path_to_files directory."
end

#prepare_export_data(datum) ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 323

def prepare_export_data(datum)
  if datum.is_a?(ActiveTriples::Resource)
    datum.to_uri.to_s
  else
    datum
  end
end

#prepare_export_data_with_join(data) ⇒ `Object`

# File 'app/models/bulkrax/csv_entry.rb', line 315

def prepare_export_data_with_join(data)
  # Yes...it's possible we're asking to coerce a multi-value but only have a single value.
  return data.to_s unless data.is_a?(Enumerable)
  return "" if data.empty?

  data.map { |d| prepare_export_data(d) }.join(Bulkrax.multi_value_element_join_on).to_s
end

#record ⇒ `Object`



380
381
382

# File 'app/models/bulkrax/csv_entry.rb', line 380

def record
  @record ||= raw_metadata
end

#validate_record ⇒ `Object`

Raises:

(RecordNotFound)

# File 'app/models/bulkrax/csv_entry.rb', line 124

def validate_record
  raise RecordNotFound, 'Record not found' if record.nil?
  unless importerexporter.parser.required_elements?(record)
    raise MissingMetadata, "Missing required elements, missing element(s) are: "\
"#{importerexporter.parser.missing_elements(record).join(', ')}"
  end
end

Class: Bulkrax::CsvEntry

Overview

Direct Known Subclasses

Defined Under Namespace

Instance Attribute Summary

Attributes inherited from Entry

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Entry

Methods included from HasLocalProcessing

Methods included from StatusInfo

Methods included from ExportBehavior

Methods included from ImportBehavior

Methods included from HasMatchers

Class Method Details

.data_for_entry(data, _source_id, parser) ⇒ Object

.fields_from_data(data) ⇒ Object

.matcher_class ⇒ Object

.read_data(path) ⇒ Object

Instance Method Details

#add_file ⇒ Object

#add_identifier ⇒ Object

#add_ingested_metadata ⇒ Object

#add_metadata_for_model ⇒ Object

#build_export_metadata ⇒ Object

#build_files_metadata ⇒ Object

#build_mapping_metadata ⇒ Object

#build_metadata ⇒ Object

#build_metadata_for_delete ⇒ Object

#build_object(_key, value) ⇒ Object

#build_relationship_metadata ⇒ Object

#build_system_metadata ⇒ Object

#build_thumbnail_files ⇒ Object

#build_value(property_name, mapping_config) ⇒ Object

#collection_identifiers ⇒ Object

#collections_created? ⇒ Boolean

#establish_factory_class ⇒ Object

#find_collection_ids ⇒ Object

#handle_join_on_export(key, values, join) ⇒ Object

#key_for_export(key) ⇒ Object

#object_metadata(data) ⇒ Object

#path_to_file(file) ⇒ Object

#prepare_export_data(datum) ⇒ Object

#prepare_export_data_with_join(data) ⇒ Object

#record ⇒ Object

#validate_record ⇒ Object