Class: ROCrate::Reader
- Inherits:
-
Object
- Object
- ROCrate::Reader
- Defined in:
- lib/ro_crate/reader.rb
Overview
A class to handle reading of RO-Crates from Zip files or directories.
Constant Summary collapse
- LEGACY_EXTRACT =
Zip::VERSION.start_with?('2.').freeze
Class Method Summary collapse
-
.build_crate(entity_hash, source, crate_class: ROCrate::Crate, context:) ⇒ Crate
Create and populate crate from the given set of entities.
-
.create_data_entity(crate, entity_class, source, entity_props) ⇒ ROCrate::File, ...
Create a DataEntity of the given class.
-
.detect_root_directory(source) ⇒ Pathname?
Finds an RO-Crate’s root directory (where ‘ro-crate-metadata.json` is located) within a given directory.
-
.entities_from_metadata(metadata) ⇒ Hash{String => Hash}
Extracts all the entities from the @graph of the RO-Crate Metadata.
-
.extract_contextual_entities(crate, entity_hash) ⇒ Array<ContextualEntity>
Create appropriately specialized ContextualEntity objects from the given hash of entities and their properties.
-
.extract_data_entities(crate, source, entity_hash) ⇒ Array<ROCrate::File, ROCrate::Directory>
Discover data entities from the ‘hasPart` property of a crate, and create DataEntity objects for them.
-
.extract_metadata_entity(entities) ⇒ nil, Hash{String => Hash}
Extract the metadata entity from the entity hash, according to the rules defined here: www.researchobject.org/ro-crate/specification/1.2/root-data-entity.html#finding-the-root-data-entity mapped by its @id, or nil if nothing is found.
-
.extract_preview_entity(entities) ⇒ Hash{String => Hash}
Extract the ro-crate-preview entity from the entity hash.
-
.extract_root_entity(entities) ⇒ Hash{String => Hash}
Extract the root entity from the entity hash, according to the rules defined here: www.researchobject.org/ro-crate/specification/1.2/root-data-entity.html#finding-the-root-data-entity mapped by its @id.
-
.extract_version(metadata_props) ⇒ String?
Extract the spec version from the metadata entity’s ‘conformsTo`.
-
.initialize_crate(entity_hash, source, crate_class: ROCrate::Crate, context:) ⇒ Crate
Initialize a crate from the given set of entities.
-
.read(source, target_dir: Dir.mktmpdir) ⇒ Crate
Reads an RO-Crate from a directory or zip file.
-
.read_directory(source) ⇒ Crate
Reads an RO-Crate from a directory.
-
.read_zip(source, target_dir: Dir.mktmpdir) ⇒ Crate
Reads an RO-Crate from a zip file.
-
.safe_join(base, path) ⇒ Pathname
Safely joins a desired file path onto a base directory, raising an exception if the path attempts to traverse outside it.
-
.unzip_file_to(source, target) ⇒ Object
Extract the contents of the given Zip file to the given directory.
-
.unzip_io_to(source, target) ⇒ Object
Extract the given Zip file data to the given directory.
-
.unzip_to(source, target) ⇒ Object
Extract the contents of the given Zip file/data to the given directory.
Class Method Details
.build_crate(entity_hash, source, crate_class: ROCrate::Crate, context:) ⇒ Crate
Create and populate crate from the given set of entities.
164 165 166 167 168 169 170 171 172 173 174 175 176 177 |
# File 'lib/ro_crate/reader.rb', line 164 def self.build_crate(entity_hash, source, crate_class: ROCrate::Crate, context:) crate = initialize_crate(entity_hash, source, crate_class: crate_class, context: context) extract_data_entities(crate, source, entity_hash).each do |entity| crate.add_data_entity(entity) end # The remaining entities in the hash must be contextual. extract_contextual_entities(crate, entity_hash).each do |entity| crate.add_contextual_entity(entity) end crate end |
.create_data_entity(crate, entity_class, source, entity_props) ⇒ ROCrate::File, ...
Create a DataEntity of the given class.
250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
# File 'lib/ro_crate/reader.rb', line 250 def self.create_data_entity(crate, entity_class, source, entity_props) id = entity_props.delete('@id') raise ROCrate::ReadException, "Data Entity missing '@id': #{entity_props.inspect}" unless id decoded_id = URI.decode_www_form_component(id) path = nil uri = URI(id) rescue nil if uri&.absolute? path = uri decoded_id = nil elsif !id.start_with?('#') [id, decoded_id].each do |i| fullpath = ::File.join(source, i) path = Pathname.new(fullpath) if ::File.exist?(fullpath) end if path.nil? raise ROCrate::ReadException, "Local Data Entity not found in crate: #{id}" end end entity_class.new(crate, path, decoded_id, entity_props) end |
.detect_root_directory(source) ⇒ Pathname?
Finds an RO-Crate’s root directory (where ‘ro-crate-metadata.json` is located) within a given directory.
335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 |
# File 'lib/ro_crate/reader.rb', line 335 def self.detect_root_directory(source) queue = [source] until queue.empty? entry = Pathname(queue.shift) if entry.file? name = entry.basename.to_s if name == ROCrate::Metadata::IDENTIFIER || name == ROCrate::Metadata::IDENTIFIER_1_0 return entry.parent end elsif entry.directory? queue += entry.children end end nil end |
.entities_from_metadata(metadata) ⇒ Hash{String => Hash}
Extracts all the entities from the @graph of the RO-Crate Metadata.
132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/ro_crate/reader.rb', line 132 def self.() graph = ['@graph'] if graph # Collect all the things in the graph, mapped by their @id entities = {} graph.each do |entity| entities[entity['@id']] = entity end # Do some normalization... entities[ROCrate::Metadata::IDENTIFIER] = (entities) raise ROCrate::ReadException, "No metadata entity found in @graph!" unless entities[ROCrate::Metadata::IDENTIFIER] entities[ROCrate::Preview::IDENTIFIER] = extract_preview_entity(entities) entities[ROCrate::Crate::IDENTIFIER] = extract_root_entity(entities) raise ROCrate::ReadException, "No root entity (with @id: #{entities[ROCrate::Metadata::IDENTIFIER].dig('about', '@id')}) found in @graph!" unless entities[ROCrate::Crate::IDENTIFIER] entities else raise ROCrate::ReadException, "No @graph found in metadata!" end end |
.extract_contextual_entities(crate, entity_hash) ⇒ Array<ContextualEntity>
Create appropriately specialized ContextualEntity objects from the given hash of entities and their properties.
231 232 233 234 235 236 237 238 239 240 241 |
# File 'lib/ro_crate/reader.rb', line 231 def self.extract_contextual_entities(crate, entity_hash) entities = [] entity_hash.each do |id, entity_props| entity_class = ROCrate::ContextualEntity.specialize(entity_props) entity = entity_class.new(crate, id, entity_props) entities << entity end entities end |
.extract_data_entities(crate, source, entity_hash) ⇒ Array<ROCrate::File, ROCrate::Directory>
Discover data entities from the ‘hasPart` property of a crate, and create DataEntity objects for them. Entities are looked up in the given `entity_hash` (and then removed from it).
213 214 215 216 217 218 219 220 221 222 223 224 |
# File 'lib/ro_crate/reader.rb', line 213 def self.extract_data_entities(crate, source, entity_hash) parts = crate.raw_properties['hasPart'] || [] parts = [parts] unless parts.is_a?(Array) parts.map do |ref| entity_props = entity_hash.delete(ref['@id']) next unless entity_props entity_class = ROCrate::DataEntity.specialize(entity_props) entity = create_data_entity(crate, entity_class, source, entity_props) next if entity.nil? entity end.compact end |
.extract_metadata_entity(entities) ⇒ nil, Hash{String => Hash}
Extract the metadata entity from the entity hash, according to the rules defined here: www.researchobject.org/ro-crate/specification/1.2/root-data-entity.html#finding-the-root-data-entity mapped by its @id, or nil if nothing is found.
278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
# File 'lib/ro_crate/reader.rb', line 278 def self.(entities) key = entities.detect do |_, props| conforms = props['conformsTo'] conforms = [conforms] unless conforms.is_a?(Array) conforms.compact.any? { |c| c['@id']&.start_with?(ROCrate::Metadata::RO_CRATE_BASE) } end&.first return entities.delete(key) if key # Legacy support (entities.delete("./#{ROCrate::Metadata::IDENTIFIER}") || entities.delete(ROCrate::Metadata::IDENTIFIER) || entities.delete("./#{ROCrate::Metadata::IDENTIFIER_1_0}") || entities.delete(ROCrate::Metadata::IDENTIFIER_1_0)) end |
.extract_preview_entity(entities) ⇒ Hash{String => Hash}
Extract the ro-crate-preview entity from the entity hash.
315 316 317 |
# File 'lib/ro_crate/reader.rb', line 315 def self.extract_preview_entity(entities) entities.delete("./#{ROCrate::Preview::IDENTIFIER}") || entities.delete(ROCrate::Preview::IDENTIFIER) end |
.extract_root_entity(entities) ⇒ Hash{String => Hash}
Extract the root entity from the entity hash, according to the rules defined here: www.researchobject.org/ro-crate/specification/1.2/root-data-entity.html#finding-the-root-data-entity mapped by its @id.
324 325 326 327 328 |
# File 'lib/ro_crate/reader.rb', line 324 def self.extract_root_entity(entities) root_id = entities[ROCrate::Metadata::IDENTIFIER].dig('about', '@id') raise ROCrate::ReadException, "Metadata entity does not reference any root entity" unless root_id entities.delete(root_id) end |
.extract_version(metadata_props) ⇒ String?
Extract the spec version from the metadata entity’s ‘conformsTo`. Looks for an `@id` matching `w3id.org/ro/crate/<version>` and returns `<version>`.
299 300 301 302 303 304 305 306 307 308 309 310 |
# File 'lib/ro_crate/reader.rb', line 299 def self.extract_version() return nil unless conforms = ['conformsTo'] conforms = [conforms] unless conforms.is_a?(Array) conforms.compact.each do |c| id = c.is_a?(Hash) ? c['@id'] : c next unless id&.start_with?(ROCrate::Metadata::RO_CRATE_BASE) version = id.sub(ROCrate::Metadata::RO_CRATE_BASE, '').split('/').first return version if version && !version.empty? end nil end |
.initialize_crate(entity_hash, source, crate_class: ROCrate::Crate, context:) ⇒ Crate
Initialize a crate from the given set of entities.
188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
# File 'lib/ro_crate/reader.rb', line 188 def self.initialize_crate(entity_hash, source, crate_class: ROCrate::Crate, context:) crate_class.new.tap do |crate| crate.properties = entity_hash.delete(ROCrate::Crate::IDENTIFIER) = entity_hash.delete(ROCrate::Metadata::IDENTIFIER) crate..properties = parsed_version = extract_version() crate..version = parsed_version if parsed_version crate..context = context preview_properties = entity_hash.delete(ROCrate::Preview::IDENTIFIER) preview_path = ::File.join(source, ROCrate::Preview::IDENTIFIER) preview_path = ::File.exist?(preview_path) ? Pathname.new(preview_path) : nil if preview_properties || preview_path crate.preview = ROCrate::Preview.new(crate, preview_path, preview_properties || {}) end crate.add_all(source, false) end end |
.read(source, target_dir: Dir.mktmpdir) ⇒ Crate
Reads an RO-Crate from a directory or zip file.
15 16 17 18 19 20 21 22 23 24 25 26 27 |
# File 'lib/ro_crate/reader.rb', line 15 def self.read(source, target_dir: Dir.mktmpdir) begin is_dir = ::File.directory?(source) rescue TypeError is_dir = false end if is_dir read_directory(source) else read_zip(source, target_dir: target_dir) end end |
.read_directory(source) ⇒ Crate
Reads an RO-Crate from a directory.
103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/ro_crate/reader.rb', line 103 def self.read_directory(source) raise ROCrate::ReadException, "Source is not a directory!" unless ::File.directory?(source) source = ::File.(source) = Dir.entries(source).detect { |entry| entry == ROCrate::Metadata::IDENTIFIER || entry == ROCrate::Metadata::IDENTIFIER_1_0 } if = ::File.read(::File.join(source, )) begin = JSON.parse() rescue JSON::ParserError => e raise ROCrate::ReadException.new("Error parsing metadata", e) end entities = () context = ['@context'] build_crate(entities, source, context: context) else raise ROCrate::ReadException, "No metadata found!" end end |
.read_zip(source, target_dir: Dir.mktmpdir) ⇒ Crate
Reads an RO-Crate from a zip file. It first extracts the Zip file to a temporary directory, and then calls #read_directory.
86 87 88 89 90 91 92 93 94 95 96 |
# File 'lib/ro_crate/reader.rb', line 86 def self.read_zip(source, target_dir: Dir.mktmpdir) raise ROCrate::ReadException, "Target is not a directory!" unless ::File.directory?(target_dir) unzip_to(source, target_dir) # Traverse the unzipped directory to try and find the crate's root root_dir = detect_root_directory(target_dir) raise ROCrate::ReadException, "No metadata found!" unless root_dir read_directory(root_dir) end |
.safe_join(base, path) ⇒ Pathname
Safely joins a desired file path onto a base directory, raising an exception if the path attempts to traverse outside it.
362 363 364 365 366 367 368 369 370 371 372 373 |
# File 'lib/ro_crate/reader.rb', line 362 def self.safe_join(base, path) dest = base.join(path) # Guard against zip-slip attacks. begin unsafe = dest..relative_path_from(base.).each_filename.first == '..' rescue ArgumentError # Handle unjoinable paths, e.g. on different drives. unsafe = true end raise ROCrate::ReadException, "Unsafe path in zip entry: #{path}" if unsafe dest end |
.unzip_file_to(source, target) ⇒ Object
Extract the contents of the given Zip file to the given directory.
67 68 69 70 71 72 73 74 75 76 77 |
# File 'lib/ro_crate/reader.rb', line 67 def self.unzip_file_to(source, target) target_path = Pathname(target) Zip::File.open(source) do |zipfile| zipfile.each do |entry| dest = safe_join(target_path, entry.name) next if dest.exist? FileUtils.mkdir_p(dest.dirname) LEGACY_EXTRACT ? entry.extract(dest) : entry.extract(entry.name, destination_directory: target_path) end end end |
.unzip_io_to(source, target) ⇒ Object
Extract the given Zip file data to the given directory.
49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/ro_crate/reader.rb', line 49 def self.unzip_io_to(source, target) target_path = Pathname(target) Zip::InputStream.open(source) do |input| while (entry = input.get_next_entry) next if entry.name_is_directory? dest = safe_join(target_path, entry.name) next if dest.exist? FileUtils.mkdir_p(dest.dirname) ::File.binwrite(dest, input.read) end end end |
.unzip_to(source, target) ⇒ Object
Extract the contents of the given Zip file/data to the given directory.
34 35 36 37 38 39 40 41 42 |
# File 'lib/ro_crate/reader.rb', line 34 def self.unzip_to(source, target) source = Pathname.new(::File.(source)) if source.is_a?(String) if source.is_a?(Pathname) || source.respond_to?(:path) unzip_file_to(source, target) else unzip_io_to(source, target) end end |