Module: RubyWorker::Fetcher

Defined in:
lib/ruby_worker/fetcher.rb

Overview

Fetcher clones the repo into a local cache directory when the worker is dispatched in self-fetch mode (RepoSource on the request). Returns the absolute path to the cached checkout so the rest of the worker (Parser) can operate against it exactly as if the bot had cloned it.

# Cache layout

$REVUND_WORKER_CACHE_DIR/<sha256(url@ref)>/

Default cache dir is /var/cache/revund-worker. The hash key includes both URL and ref so two reviews targeting different commits of the same repo share nothing — keeps tenant blast-radius to one cache entry.

# Token hygiene (security)

The token is used at clone time only:

1. Compose the authenticated URL via x-access-token convention.
2. Run `git clone --filter=blob:none --no-checkout <auth-url>`.
3. Immediately rewrite the remote URL to the un-authenticated
   form via `git remote set-url`. After this step the on-disk
   .git/config carries no token.
4. Fetch the requested ref and check it out.

Errors and log messages NEVER include the URL with the embedded token; the sanitizer strips it before raising.

Constant Summary collapse

DEFAULT_CACHE_DIR =
'/var/cache/revund-worker'
DEFAULT_IDLE_TTL_SEC =

10 min

10 * 60

Class Method Summary collapse

Class Method Details

.fetch_or_cache(src) ⇒ Object

Resolve the local checkout for ‘src` (a Hash with :url, :ref, :auth_token, :auth_user). Clones if cold, returns the cached path if warm. Idempotent within the process lifetime.



46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
# File 'lib/ruby_worker/fetcher.rb', line 46

def self.fetch_or_cache(src)
  url = src[:url].to_s
  ref = src[:ref].to_s
  raise 'fetcher: repo_source.url is required' if url.empty?
  raise 'fetcher: repo_source.ref is required' if ref.empty?

  cache_dir = ENV['REVUND_WORKER_CACHE_DIR'] || DEFAULT_CACHE_DIR
  FileUtils.mkdir_p(cache_dir)

  key = cache_key(url, ref)
  repo_dir = File.join(cache_dir, key)

  if File.directory?(File.join(repo_dir, '.git'))
    touch(repo_dir)
    return repo_dir
  end

  clean_url = url
  auth_url = inject_token(clean_url, src[:auth_token].to_s, src[:auth_user].to_s)
  FileUtils.mkdir_p(repo_dir)

  run('git', 'clone', '--filter=blob:none', '--no-checkout', auth_url, repo_dir)
  # Strip the token BEFORE doing anything else.
  run('git', '-C', repo_dir, 'remote', 'set-url', 'origin', clean_url)

  run('git', '-C', repo_dir, 'fetch', 'origin', ref)
  run('git', '-C', repo_dir, 'checkout', ref)

  touch(repo_dir)
  Thread.new { evict_idle(cache_dir) }

  repo_dir
end