Class: Relaton::Index::FileIO

Inherits:
Object
  • Object
show all
Includes:
IdNumber
Defined in:
lib/relaton/index/file_io.rb

Overview

File IO class is used to read and write index files. In searh mode url is used to fetch index from external repository and save it to storage. In index mode url should be nil.

Defined Under Namespace

Classes: InvalidIndexError

Constant Summary collapse

@@file_locks =
{}
@@file_locks_mutex =
Mutex.new

Instance Attribute Summary collapse

Instance Method Summary collapse

Methods included from IdNumber

#get_id_number, #id_base

Constructor Details

#initialize(dir, url, filename, _id_keys = nil, pubid_class = nil) ⇒ FileIO

Initialize FileIO

‘id_keys` is accepted for backward compatibility but no longer used: the index format is now validated by round-tripping a sample of ids through the pubid class (see #check_serialization), which understands the pubid v2 (lutaml) `_type` serialization that the old key-allowlist could not.

Parameters:

  • dir (String)

    falvor specific local directory in ~/.relaton to store index

  • url (String, Boolean, nil)

    if String then the URL is used to fetch an index from a Git repository

    and save it to the storage (if not exists, or older than 24 hours)
    

    if true then the index is read from the storage (used to remove index file) if nil then the fiename is used to read and write file (used to create indes in GH actions)

  • pubid (Pubid::Identifier)

    class for deserialization



37
38
39
40
41
42
43
# File 'lib/relaton/index/file_io.rb', line 37

def initialize(dir, url, filename, _id_keys = nil, pubid_class = nil)
  @dir = dir
  @url = url
  @filename = filename
  @pubid_class = pubid_class
  @sorted = false
end

Instance Attribute Details

#pubid_classObject (readonly)

Returns the value of attribute pubid_class.



16
17
18
# File 'lib/relaton/index/file_io.rb', line 16

def pubid_class
  @pubid_class
end

#sortedObject

Returns the value of attribute sorted.



17
18
19
# File 'lib/relaton/index/file_io.rb', line 17

def sorted
  @sorted
end

#urlObject (readonly)

Returns the value of attribute url.



16
17
18
# File 'lib/relaton/index/file_io.rb', line 16

def url
  @url
end

Instance Method Details

#check_basic_format(index) ⇒ Object



105
106
107
108
109
110
# File 'lib/relaton/index/file_io.rb', line 105

def check_basic_format(index)
  return false unless index.is_a? Array

  keys = %i[file id]
  index.all? { |item| item.respond_to?(:keys) && item.keys.sort == keys }
end

#check_fileArray<Hash>?

Check if index file exists and is not older than 24 hours

Returns:

  • (Array<Hash>, nil)

    index or nil



83
84
85
86
87
88
# File 'lib/relaton/index/file_io.rb', line 83

def check_file
  ctime = Index.config.storage.ctime(file)
  return unless ctime && ctime > Time.now - 86400

  read_file
end

#check_format(index) ⇒ Boolean

Check if index has correct format

Structural check only. Per-id serialization is validated during deserialization (see #deserialize_id), which reuses the ‘from_hash` the index load performs anyway, so every row is checked at no extra parse cost.

Parameters:

  • index (Array<Hash>)

    index to check

Returns:

  • (Boolean)

    <description>



101
102
103
# File 'lib/relaton/index/file_io.rb', line 101

def check_format(index)
  check_basic_format(index)
end

#deserialize_id(raw) ⇒ Object

Deserialize one id and verify pubid understands it. Reuses the ‘from_hash` deserialization the load performs anyway, so validating every row costs only the `to_hash`/compare for ids that need the round-trip clause. Raises InvalidIndexError when an id cannot be parsed or is unsupported, so `#load_index` rejects (and re-downloads) the whole index.



177
178
179
180
181
182
183
184
185
# File 'lib/relaton/index/file_io.rb', line 177

def deserialize_id(raw)
  obj = @pubid_class.from_hash(raw)
rescue StandardError => e
  raise InvalidIndexError, "cannot parse id #{raw.inspect}: #{e.message}"
else
  return obj if id_supported?(obj, raw)

  raise InvalidIndexError, "unsupported id #{raw.inspect}"
end

#deserialize_pubid(index) ⇒ Object

Deserialize and sort by the same narrowing key Type#search bsearches on, so binary search always has a consistent total order. The published index is only approximately sorted (generated under pubid 1.x base semantics); merely detecting sortedness left bsearch disabled and every search a full O(n) scan. Sorting here is one-time per load.



160
161
162
163
164
165
166
167
168
169
170
# File 'lib/relaton/index/file_io.rb', line 160

def deserialize_pubid(index)
  return index unless @pubid_class

  deserialized = index.map do |r|
    { id: deserialize_id(r[:id]), file: r[:file] }
  end
  warn_unless_sorted(deserialized)
  deserialized.sort_by! { |r| get_id_number(r[:id]) }
  @sorted = true
  deserialized
end

#fetch_and_saveArray<Hash>

Fetch index from external repository and save it to storage

Returns:

  • (Array<Hash>)

    index



242
243
244
245
246
247
248
249
250
251
252
# File 'lib/relaton/index/file_io.rb', line 242

def fetch_and_save
  uri = URI.parse(url)
  body = Net::HTTP.get(uri)
  yaml = nil
  Zip::File.open_buffer(body) do |zip|
    entry = zip.entries.first
    yaml = entry.get_input_stream.read
  end
  Util.info "Downloaded index from `#{url}`", progname
  load_index(yaml, true)
end

#fileObject



65
66
67
# File 'lib/relaton/index/file_io.rb', line 65

def file
  @file ||= url ? path_to_local_file : @filename
end

#id_supported?(obj, raw) ⇒ Boolean

An id is supported when ‘from_hash` either resolves it to a concrete type (a subclass — the polymorphic `_type` matched) or round-trips losslessly through `to_hash`. The subclass clause covers valid entries pubid cannot fully rebuild on re-serialize (e.g. ISO directives drop a redundant subgroup number); the round-trip clause covers pubid classes without a subclass hierarchy. A wrong-format/garbled id satisfies neither: it falls back to the bare base class and fails to round-trip.

Returns:

  • (Boolean)


119
120
121
122
123
124
125
126
127
128
129
# File 'lib/relaton/index/file_io.rb', line 119

def id_supported?(obj, raw)
  # A concrete subtype means pubid recognized the `_type`; accept without
  # round-tripping. This both skips the false positive for valid-but-lossy
  # types (e.g. ISO directives) and avoids the costly hash compare for the
  # ~all rows that resolve to a subtype (it would otherwise add ~33%).
  return true unless obj.instance_of?(@pubid_class)

  normalize(obj.to_hash) == normalize(raw)
rescue StandardError
  false
end

#load_index(yaml, save = false) ⇒ Object



217
218
219
220
221
222
223
224
225
226
227
# File 'lib/relaton/index/file_io.rb', line 217

def load_index(yaml, save = false)
  index = YAML.safe_load(yaml, permitted_classes: [Symbol])
  save index if save
  return deserialize_pubid(index) if check_format(index)

  report_invalid_index(save, "Wrong structure of")
rescue Psych::SyntaxError
  report_invalid_index(save, "YAML parsing error when reading")
rescue InvalidIndexError
  report_invalid_index(save, "Wrong structure of")
end

#normalize(value) ⇒ Object

Stringify hash keys and scalar values so the comparison ignores YAML scalar typing (e.g. 1 vs “1”) and string/symbol key differences, while still detecting dropped/added keys or genuinely changed values.



134
135
136
137
138
139
140
141
# File 'lib/relaton/index/file_io.rb', line 134

def normalize(value)
  case value
  when Hash then value.to_h { |k, v| [k.to_s, normalize(v)] }
  when Array then value.map { |v| normalize(v) }
  when nil then nil
  else value.to_s
  end
end

#path_to_local_file<Type>

Create path to local file

Returns:

  • (<Type>)

    <description>



74
75
76
# File 'lib/relaton/index/file_io.rb', line 74

def path_to_local_file
  File.join(Index.config.storage_dir, ".relaton", @dir, @filename)
end

#prognameObject



213
214
215
# File 'lib/relaton/index/file_io.rb', line 213

def progname
  @progname ||= "relaton-#{@dir}"
end

#readArray<Hash>

If url is String, check if index file exists and is not older than 24

hours. If not, fetch index from external repository and save it to
storage.

If url is true, read index from path to local file. If url is nil, read index from filename.

Returns:

  • (Array<Hash>)

    index



54
55
56
57
58
59
60
61
62
63
# File 'lib/relaton/index/file_io.rb', line 54

def read
  case url
  when String
    with_file_lock do
      check_file || fetch_and_save
    end
  else
    read_file || []
  end
end

#read_fileArray<Hash>

Read index from storage

Returns:

  • (Array<Hash>)

    index



148
149
150
151
152
153
# File 'lib/relaton/index/file_io.rb', line 148

def read_file
  yaml = Index.config.storage.read(file)
  return unless yaml

  load_index(yaml) || []
end

#removeArray

Remove index file from storage

Returns:

  • (Array)


289
290
291
292
# File 'lib/relaton/index/file_io.rb', line 289

def remove
  Index.config.storage.remove file
  []
end

#report_invalid_index(save, reason) ⇒ Object



229
230
231
232
233
234
235
# File 'lib/relaton/index/file_io.rb', line 229

def report_invalid_index(save, reason)
  if save
    warn_remote_index_error reason
  else
    warn_local_index_error reason
  end
end

#save(index) ⇒ void

This method returns an undefined value.

Save index to storage

Parameters:

  • index (Array<Hash>)

    index to save



267
268
269
270
271
272
273
274
# File 'lib/relaton/index/file_io.rb', line 267

def save(index)
  yaml = sort_structured_index(index).map do |item|
    item.transform_values do |value|
      @pubid_class && value.is_a?(@pubid_class) ? value.to_hash : value
    end
  end.to_yaml
  Index.config.storage.write file, yaml
end

#sort_structured_index(index) ⇒ Object



276
277
278
279
280
281
282
# File 'lib/relaton/index/file_io.rb', line 276

def sort_structured_index(index)
  if @pubid_class && index.first&.dig(:id).is_a?(@pubid_class)
    index.sort_by { |item| get_id_number item[:id] }
  else
    index
  end
end

#warn_local_index_error(reason) ⇒ Object



203
204
205
206
207
208
209
210
211
# File 'lib/relaton/index/file_io.rb', line 203

def warn_local_index_error(reason)
  Util.info "#{reason} file `#{file}`", progname
  if url.is_a? String
    Util.info "Considering `#{file}` file corrupt, re-downloading from `#{url}`", progname
  else
    Util.info "Considering `#{file}` file corrupt, removing it.", progname
    remove
  end
end

#warn_remote_index_error(reason) ⇒ Object



254
255
256
257
258
# File 'lib/relaton/index/file_io.rb', line 254

def warn_remote_index_error(reason)
  Util.info "#{reason} newly downloaded file `#{file}` at `#{url}`, " \
       "the remote index seems to be invalid. Please report this " \
       "issue at https://github.com/relaton/relaton-cli.", progname
end

#warn_unless_sorted(index) ⇒ Object

Log when the loaded index is not already in get_id_number order, so the in-memory sort above (and the underlying not-sorted index file) is visible. Stops at the first out-of-order pair.



190
191
192
193
194
195
196
197
198
199
200
201
# File 'lib/relaton/index/file_io.rb', line 190

def warn_unless_sorted(index)
  prev = nil
  index.each do |r|
    num = get_id_number(r[:id])
    if prev && prev > num
      Util.warn "Index file `#{file}` is not sorted by id number; " \
                "sorting #{index.size} entries in memory.", progname
      return
    end
    prev = num
  end
end