Class: Relaton::Index::FileIO
- Inherits:
-
Object
- Object
- Relaton::Index::FileIO
- Includes:
- IdNumber
- Defined in:
- lib/relaton/index/file_io.rb
Overview
File IO class is used to read and write index files. In searh mode url is used to fetch index from external repository and save it to storage. In index mode url should be nil.
Defined Under Namespace
Classes: InvalidIndexError
Constant Summary collapse
- @@file_locks =
{}
- @@file_locks_mutex =
Mutex.new
Instance Attribute Summary collapse
-
#pubid_class ⇒ Object
readonly
Returns the value of attribute pubid_class.
-
#sorted ⇒ Object
Returns the value of attribute sorted.
-
#url ⇒ Object
readonly
Returns the value of attribute url.
Instance Method Summary collapse
- #check_basic_format(index) ⇒ Object
-
#check_file ⇒ Array<Hash>?
Check if index file exists and is not older than 24 hours.
-
#check_format(index) ⇒ Boolean
Check if index has correct format.
-
#deserialize_id(raw) ⇒ Object
Deserialize one id and verify pubid understands it.
-
#deserialize_pubid(index) ⇒ Object
Deserialize and sort by the same narrowing key Type#search bsearches on, so binary search always has a consistent total order.
-
#fetch_and_save ⇒ Array<Hash>
Fetch index from external repository and save it to storage.
- #file ⇒ Object
-
#id_supported?(obj, raw) ⇒ Boolean
An id is supported when ‘from_hash` either resolves it to a concrete type (a subclass — the polymorphic `_type` matched) or round-trips losslessly through `to_hash`.
-
#initialize(dir, url, filename, _id_keys = nil, pubid_class = nil) ⇒ FileIO
constructor
Initialize FileIO.
- #load_index(yaml, save = false) ⇒ Object
-
#normalize(value) ⇒ Object
Stringify hash keys and scalar values so the comparison ignores YAML scalar typing (e.g. 1 vs “1”) and string/symbol key differences, while still detecting dropped/added keys or genuinely changed values.
-
#path_to_local_file ⇒ <Type>
Create path to local file.
- #progname ⇒ Object
-
#read ⇒ Array<Hash>
If url is String, check if index file exists and is not older than 24 hours.
-
#read_file ⇒ Array<Hash>
Read index from storage.
-
#remove ⇒ Array
Remove index file from storage.
- #report_invalid_index(save, reason) ⇒ Object
-
#save(index) ⇒ void
Save index to storage.
- #sort_structured_index(index) ⇒ Object
- #warn_local_index_error(reason) ⇒ Object
- #warn_remote_index_error(reason) ⇒ Object
-
#warn_unless_sorted(index) ⇒ Object
Log when the loaded index is not already in get_id_number order, so the in-memory sort above (and the underlying not-sorted index file) is visible.
Methods included from IdNumber
Constructor Details
#initialize(dir, url, filename, _id_keys = nil, pubid_class = nil) ⇒ FileIO
Initialize FileIO
‘id_keys` is accepted for backward compatibility but no longer used: the index format is now validated by round-tripping a sample of ids through the pubid class (see #check_serialization), which understands the pubid v2 (lutaml) `_type` serialization that the old key-allowlist could not.
37 38 39 40 41 42 43 |
# File 'lib/relaton/index/file_io.rb', line 37 def initialize(dir, url, filename, _id_keys = nil, pubid_class = nil) @dir = dir @url = url @filename = filename @pubid_class = pubid_class @sorted = false end |
Instance Attribute Details
#pubid_class ⇒ Object (readonly)
Returns the value of attribute pubid_class.
16 17 18 |
# File 'lib/relaton/index/file_io.rb', line 16 def pubid_class @pubid_class end |
#sorted ⇒ Object
Returns the value of attribute sorted.
17 18 19 |
# File 'lib/relaton/index/file_io.rb', line 17 def sorted @sorted end |
#url ⇒ Object (readonly)
Returns the value of attribute url.
16 17 18 |
# File 'lib/relaton/index/file_io.rb', line 16 def url @url end |
Instance Method Details
#check_basic_format(index) ⇒ Object
105 106 107 108 109 110 |
# File 'lib/relaton/index/file_io.rb', line 105 def check_basic_format(index) return false unless index.is_a? Array keys = %i[file id] index.all? { |item| item.respond_to?(:keys) && item.keys.sort == keys } end |
#check_file ⇒ Array<Hash>?
Check if index file exists and is not older than 24 hours
83 84 85 86 87 88 |
# File 'lib/relaton/index/file_io.rb', line 83 def check_file ctime = Index.config.storage.ctime(file) return unless ctime && ctime > Time.now - 86400 read_file end |
#check_format(index) ⇒ Boolean
Check if index has correct format
Structural check only. Per-id serialization is validated during deserialization (see #deserialize_id), which reuses the ‘from_hash` the index load performs anyway, so every row is checked at no extra parse cost.
101 102 103 |
# File 'lib/relaton/index/file_io.rb', line 101 def check_format(index) check_basic_format(index) end |
#deserialize_id(raw) ⇒ Object
Deserialize one id and verify pubid understands it. Reuses the ‘from_hash` deserialization the load performs anyway, so validating every row costs only the `to_hash`/compare for ids that need the round-trip clause. Raises InvalidIndexError when an id cannot be parsed or is unsupported, so `#load_index` rejects (and re-downloads) the whole index.
177 178 179 180 181 182 183 184 185 |
# File 'lib/relaton/index/file_io.rb', line 177 def deserialize_id(raw) obj = @pubid_class.from_hash(raw) rescue StandardError => e raise InvalidIndexError, "cannot parse id #{raw.inspect}: #{e.}" else return obj if id_supported?(obj, raw) raise InvalidIndexError, "unsupported id #{raw.inspect}" end |
#deserialize_pubid(index) ⇒ Object
Deserialize and sort by the same narrowing key Type#search bsearches on, so binary search always has a consistent total order. The published index is only approximately sorted (generated under pubid 1.x base semantics); merely detecting sortedness left bsearch disabled and every search a full O(n) scan. Sorting here is one-time per load.
160 161 162 163 164 165 166 167 168 169 170 |
# File 'lib/relaton/index/file_io.rb', line 160 def deserialize_pubid(index) return index unless @pubid_class deserialized = index.map do |r| { id: deserialize_id(r[:id]), file: r[:file] } end warn_unless_sorted(deserialized) deserialized.sort_by! { |r| get_id_number(r[:id]) } @sorted = true deserialized end |
#fetch_and_save ⇒ Array<Hash>
Fetch index from external repository and save it to storage
242 243 244 245 246 247 248 249 250 251 252 |
# File 'lib/relaton/index/file_io.rb', line 242 def fetch_and_save uri = URI.parse(url) body = Net::HTTP.get(uri) yaml = nil Zip::File.open_buffer(body) do |zip| entry = zip.entries.first yaml = entry.get_input_stream.read end Util.info "Downloaded index from `#{url}`", progname load_index(yaml, true) end |
#file ⇒ Object
65 66 67 |
# File 'lib/relaton/index/file_io.rb', line 65 def file @file ||= url ? path_to_local_file : @filename end |
#id_supported?(obj, raw) ⇒ Boolean
An id is supported when ‘from_hash` either resolves it to a concrete type (a subclass — the polymorphic `_type` matched) or round-trips losslessly through `to_hash`. The subclass clause covers valid entries pubid cannot fully rebuild on re-serialize (e.g. ISO directives drop a redundant subgroup number); the round-trip clause covers pubid classes without a subclass hierarchy. A wrong-format/garbled id satisfies neither: it falls back to the bare base class and fails to round-trip.
119 120 121 122 123 124 125 126 127 128 129 |
# File 'lib/relaton/index/file_io.rb', line 119 def id_supported?(obj, raw) # A concrete subtype means pubid recognized the `_type`; accept without # round-tripping. This both skips the false positive for valid-but-lossy # types (e.g. ISO directives) and avoids the costly hash compare for the # ~all rows that resolve to a subtype (it would otherwise add ~33%). return true unless obj.instance_of?(@pubid_class) normalize(obj.to_hash) == normalize(raw) rescue StandardError false end |
#load_index(yaml, save = false) ⇒ Object
217 218 219 220 221 222 223 224 225 226 227 |
# File 'lib/relaton/index/file_io.rb', line 217 def load_index(yaml, save = false) index = YAML.safe_load(yaml, permitted_classes: [Symbol]) save index if save return deserialize_pubid(index) if check_format(index) report_invalid_index(save, "Wrong structure of") rescue Psych::SyntaxError report_invalid_index(save, "YAML parsing error when reading") rescue InvalidIndexError report_invalid_index(save, "Wrong structure of") end |
#normalize(value) ⇒ Object
Stringify hash keys and scalar values so the comparison ignores YAML scalar typing (e.g. 1 vs “1”) and string/symbol key differences, while still detecting dropped/added keys or genuinely changed values.
134 135 136 137 138 139 140 141 |
# File 'lib/relaton/index/file_io.rb', line 134 def normalize(value) case value when Hash then value.to_h { |k, v| [k.to_s, normalize(v)] } when Array then value.map { |v| normalize(v) } when nil then nil else value.to_s end end |
#path_to_local_file ⇒ <Type>
Create path to local file
74 75 76 |
# File 'lib/relaton/index/file_io.rb', line 74 def path_to_local_file File.join(Index.config.storage_dir, ".relaton", @dir, @filename) end |
#progname ⇒ Object
213 214 215 |
# File 'lib/relaton/index/file_io.rb', line 213 def progname @progname ||= "relaton-#{@dir}" end |
#read ⇒ Array<Hash>
If url is String, check if index file exists and is not older than 24
hours. If not, fetch index from external repository and save it to
storage.
If url is true, read index from path to local file. If url is nil, read index from filename.
54 55 56 57 58 59 60 61 62 63 |
# File 'lib/relaton/index/file_io.rb', line 54 def read case url when String with_file_lock do check_file || fetch_and_save end else read_file || [] end end |
#read_file ⇒ Array<Hash>
Read index from storage
148 149 150 151 152 153 |
# File 'lib/relaton/index/file_io.rb', line 148 def read_file yaml = Index.config.storage.read(file) return unless yaml load_index(yaml) || [] end |
#remove ⇒ Array
Remove index file from storage
289 290 291 292 |
# File 'lib/relaton/index/file_io.rb', line 289 def remove Index.config.storage.remove file [] end |
#report_invalid_index(save, reason) ⇒ Object
229 230 231 232 233 234 235 |
# File 'lib/relaton/index/file_io.rb', line 229 def report_invalid_index(save, reason) if save warn_remote_index_error reason else warn_local_index_error reason end end |
#save(index) ⇒ void
This method returns an undefined value.
Save index to storage
267 268 269 270 271 272 273 274 |
# File 'lib/relaton/index/file_io.rb', line 267 def save(index) yaml = sort_structured_index(index).map do |item| item.transform_values do |value| @pubid_class && value.is_a?(@pubid_class) ? value.to_hash : value end end.to_yaml Index.config.storage.write file, yaml end |
#sort_structured_index(index) ⇒ Object
276 277 278 279 280 281 282 |
# File 'lib/relaton/index/file_io.rb', line 276 def sort_structured_index(index) if @pubid_class && index.first&.dig(:id).is_a?(@pubid_class) index.sort_by { |item| get_id_number item[:id] } else index end end |
#warn_local_index_error(reason) ⇒ Object
203 204 205 206 207 208 209 210 211 |
# File 'lib/relaton/index/file_io.rb', line 203 def warn_local_index_error(reason) Util.info "#{reason} file `#{file}`", progname if url.is_a? String Util.info "Considering `#{file}` file corrupt, re-downloading from `#{url}`", progname else Util.info "Considering `#{file}` file corrupt, removing it.", progname remove end end |
#warn_remote_index_error(reason) ⇒ Object
254 255 256 257 258 |
# File 'lib/relaton/index/file_io.rb', line 254 def warn_remote_index_error(reason) Util.info "#{reason} newly downloaded file `#{file}` at `#{url}`, " \ "the remote index seems to be invalid. Please report this " \ "issue at https://github.com/relaton/relaton-cli.", progname end |
#warn_unless_sorted(index) ⇒ Object
Log when the loaded index is not already in get_id_number order, so the in-memory sort above (and the underlying not-sorted index file) is visible. Stops at the first out-of-order pair.
190 191 192 193 194 195 196 197 198 199 200 201 |
# File 'lib/relaton/index/file_io.rb', line 190 def warn_unless_sorted(index) prev = nil index.each do |r| num = get_id_number(r[:id]) if prev && prev > num Util.warn "Index file `#{file}` is not sorted by id number; " \ "sorting #{index.size} entries in memory.", progname return end prev = num end end |