Class: Archaeo::HttpClient

Inherits:
Object
  • Object
show all
Defined in:
lib/archaeo/http_client.rb

Overview

HTTP client with retry logic, gzip decompression, and rotating realistic User-Agent profiles.

Injected via constructor for testability.

Defined Under Namespace

Classes: Response

Constant Summary collapse

DEFAULT_TIMEOUT =
30
DEFAULT_MAX_RETRIES =
3
DEFAULT_RETRY_DELAY =
2
TRANSIENT_ERRORS =
[
  Net::ReadTimeout,
  Net::OpenTimeout,
  IOError,
  Errno::ECONNRESET,
  Errno::ECONNREFUSED,
].freeze
USER_AGENT_PROFILES =
[
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " \
  "AppleWebKit/537.36 (KHTML, like Gecko) " \
  "Chrome/131.0.0.0 Safari/537.36",
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " \
  "AppleWebKit/537.36 (KHTML, like Gecko) " \
  "Chrome/130.0.0.0 Safari/537.36",
  "Mozilla/5.0 (X11; Linux x86_64) " \
  "AppleWebKit/537.36 (KHTML, like Gecko) " \
  "Chrome/131.0.0.0 Safari/537.36",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " \
  "AppleWebKit/537.36 (KHTML, like Gecko) " \
  "Chrome/129.0.0.0 Safari/537.36",
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " \
  "AppleWebKit/537.36 (KHTML, like Gecko) " \
  "Chrome/131.0.0.0 Safari/537.36",
].freeze

Instance Method Summary collapse

Constructor Details

#initialize(timeout: DEFAULT_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, retry_delay: DEFAULT_RETRY_DELAY, user_agent: nil) ⇒ HttpClient

Returns a new instance of HttpClient.



55
56
57
58
59
60
61
62
63
# File 'lib/archaeo/http_client.rb', line 55

def initialize(timeout: DEFAULT_TIMEOUT,
               max_retries: DEFAULT_MAX_RETRIES,
               retry_delay: DEFAULT_RETRY_DELAY,
               user_agent: nil)
  @timeout = timeout
  @max_retries = max_retries
  @retry_delay = retry_delay
  @user_agent = user_agent
end

Instance Method Details

#get(url, headers: {}) ⇒ Object



65
66
67
68
# File 'lib/archaeo/http_client.rb', line 65

def get(url, headers: {})
  merged = default_headers.merge(headers)
  attempt_with_retries(url, merged)
end