Class: Archaeo::HttpClient

Inherits:
Object
  • Object
show all
Defined in:
lib/archaeo/http_client.rb

Overview

HTTP client with retry logic, gzip decompression, rotating realistic User-Agent profiles, and connection pooling.

Injected via constructor for testability. Connections are reused across requests to the same host for improved performance.

Defined Under Namespace

Classes: Response

Constant Summary collapse

DEFAULT_TIMEOUT =
30
DEFAULT_MAX_RETRIES =
3
DEFAULT_RETRY_DELAY =
2
TRANSIENT_ERRORS =
[
  Net::ReadTimeout,
  Net::OpenTimeout,
  IOError,
  Errno::ECONNRESET,
  Errno::ECONNREFUSED,
  EOFError,
  Errno::EPIPE,
].freeze
USER_AGENT_PROFILES =
[
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " \
  "AppleWebKit/537.36 (KHTML, like Gecko) " \
  "Chrome/131.0.0.0 Safari/537.36",
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " \
  "AppleWebKit/537.36 (KHTML, like Gecko) " \
  "Chrome/130.0.0.0 Safari/537.36",
  "Mozilla/5.0 (X11; Linux x86_64) " \
  "AppleWebKit/537.36 (KHTML, like Gecko) " \
  "Chrome/131.0.0.0 Safari/537.36",
  "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " \
  "AppleWebKit/537.36 (KHTML, like Gecko) " \
  "Chrome/129.0.0.0 Safari/537.36",
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " \
  "AppleWebKit/537.36 (KHTML, like Gecko) " \
  "Chrome/131.0.0.0 Safari/537.36",
].freeze

Instance Method Summary collapse

Constructor Details

#initialize(timeout: DEFAULT_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, retry_delay: DEFAULT_RETRY_DELAY, user_agent: nil) ⇒ HttpClient

Returns a new instance of HttpClient.



58
59
60
61
62
63
64
65
66
67
68
# File 'lib/archaeo/http_client.rb', line 58

def initialize(timeout: DEFAULT_TIMEOUT,
               max_retries: DEFAULT_MAX_RETRIES,
               retry_delay: DEFAULT_RETRY_DELAY,
               user_agent: nil)
  @timeout = timeout
  @max_retries = max_retries
  @retry_delay = retry_delay
  @user_agent = user_agent
  @connections = {}
  @mutex = Mutex.new
end

Instance Method Details

#get(url, headers: {}) ⇒ Object



70
71
72
73
74
# File 'lib/archaeo/http_client.rb', line 70

def get(url, headers: {})
  merged = default_headers.merge(headers)
  uri = URI(url)
  attempt_with_retries(uri, merged)
end

#shutdownObject



76
77
78
79
80
81
82
83
84
85
# File 'lib/archaeo/http_client.rb', line 76

def shutdown
  @mutex.synchronize do
    @connections.each_value do |http|
      http.finish
    rescue StandardError
      nil
    end
    @connections.clear
  end
end