Module: RubyWorker::Fetcher
- Defined in:
- lib/ruby_worker/fetcher.rb
Overview
Fetcher clones the repo into a local cache directory when the worker is dispatched in self-fetch mode (RepoSource on the request). Returns the absolute path to the cached checkout so the rest of the worker (Parser) can operate against it exactly as if the bot had cloned it.
# Cache layout
$REVUND_WORKER_CACHE_DIR/<sha256(url@ref)>/
Default cache dir is /var/cache/revund-worker. The hash key includes both URL and ref so two reviews targeting different commits of the same repo share nothing — keeps tenant blast-radius to one cache entry.
# Token hygiene (security)
The token is used at clone time only:
1. Compose the authenticated URL via x-access-token convention.
2. Run `git clone --filter=blob:none --no-checkout <auth-url>`.
3. Immediately rewrite the remote URL to the un-authenticated
form via `git remote set-url`. After this step the on-disk
.git/config carries no token.
4. Fetch the requested ref and check it out.
Errors and log messages NEVER include the URL with the embedded token; the sanitizer strips it before raising.
Constant Summary collapse
- DEFAULT_CACHE_DIR =
'/var/cache/revund-worker'- DEFAULT_IDLE_TTL_SEC =
10 min
10 * 60
Class Method Summary collapse
-
.fetch_or_cache(src) ⇒ Object
Resolve the local checkout for ‘src` (a Hash with :url, :ref, :auth_token, :auth_user).
Class Method Details
.fetch_or_cache(src) ⇒ Object
Resolve the local checkout for ‘src` (a Hash with :url, :ref, :auth_token, :auth_user). Clones if cold, returns the cached path if warm. Idempotent within the process lifetime.
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/ruby_worker/fetcher.rb', line 46 def self.fetch_or_cache(src) url = src[:url].to_s ref = src[:ref].to_s raise 'fetcher: repo_source.url is required' if url.empty? raise 'fetcher: repo_source.ref is required' if ref.empty? cache_dir = ENV['REVUND_WORKER_CACHE_DIR'] || DEFAULT_CACHE_DIR FileUtils.mkdir_p(cache_dir) key = cache_key(url, ref) repo_dir = File.join(cache_dir, key) if File.directory?(File.join(repo_dir, '.git')) touch(repo_dir) return repo_dir end clean_url = url auth_url = inject_token(clean_url, src[:auth_token].to_s, src[:auth_user].to_s) FileUtils.mkdir_p(repo_dir) run('git', 'clone', '--filter=blob:none', '--no-checkout', auth_url, repo_dir) # Strip the token BEFORE doing anything else. run('git', '-C', repo_dir, 'remote', 'set-url', 'origin', clean_url) run('git', '-C', repo_dir, 'fetch', 'origin', ref) run('git', '-C', repo_dir, 'checkout', ref) touch(repo_dir) Thread.new { evict_idle(cache_dir) } repo_dir end |