Class: Bundler::Spinel::Enricher

Inherits:
Object
  • Object
show all
Defined in:
lib/bundler/spinel/enricher.rb

Overview

Fetches public gem metadata from rubygems.org (description, total downloads, latest version + date, homepage, license) for a list of gems, into a sidecar meta.jsonl — one JSON line per gem. The catalog uses it to be enticing (real descriptions, sort by popularity) and to weed out low-signal / test gems by a downloads floor.

Append-only and resumable: a re-run skips gems already recorded, so a flaky network just needs another pass. Transient (non-200/404) responses are left unrecorded so the next run retries them. Committed alongside the survey, so the deploy build renders the catalog offline — no network at build time.

Constant Summary collapse

HOST =
"rubygems.org"

Instance Method Summary collapse

Constructor Details

#initialize(out:, jobs: 8) ⇒ Enricher

Returns a new instance of Enricher.



20
21
22
23
24
# File 'lib/bundler/spinel/enricher.rb', line 20

def initialize(out:, jobs: 8)
  @out = out
  @jobs = jobs
  @write = Mutex.new
end

Instance Method Details

#run(names, progress: $stderr) ⇒ Object

names: Array<String>. Appends one JSON line per newly-fetched gem.



27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# File 'lib/bundler/spinel/enricher.rb', line 27

def run(names, progress: $stderr)
  have = existing
  todo = names.uniq.reject { |n| have.include?(n) }
  queue = Queue.new
  todo.each { |n| queue << n }
  total = todo.size
  done = 0
  progress&.puts("[enrich] #{have.size} already recorded, #{total} to fetch")

  File.open(@out, "a") do |f|
    workers = Array.new([@jobs, [total, 1].max].min) do
      Thread.new do
        http = open_http
        until queue.empty?
          name = (queue.pop(true) rescue break)
          rec = fetch(http, name)
          @write.synchronize do
            f.puts(JSON.generate(rec)) && f.flush if rec
            done += 1
            progress&.print("\r[enrich] #{done}/#{total}  #{name.ljust(30)}")
          end
        end
        http.finish if http.started?
      end
    end
    workers.each(&:join)
  end
  progress&.puts("\r[enrich] #{done}/#{total} done#{' ' * 30}")
end