Class: Archaeo::HttpClient
- Inherits:
-
Object
- Object
- Archaeo::HttpClient
- Defined in:
- lib/archaeo/http_client.rb
Overview
HTTP client with retry logic, gzip decompression, and rotating realistic User-Agent profiles.
Injected via constructor for testability.
Defined Under Namespace
Classes: Response
Constant Summary collapse
- DEFAULT_TIMEOUT =
30- DEFAULT_MAX_RETRIES =
3- DEFAULT_RETRY_DELAY =
2- TRANSIENT_ERRORS =
[ Net::ReadTimeout, Net::OpenTimeout, IOError, Errno::ECONNRESET, Errno::ECONNREFUSED, ].freeze
- USER_AGENT_PROFILES =
[ "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " \ "AppleWebKit/537.36 (KHTML, like Gecko) " \ "Chrome/131.0.0.0 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " \ "AppleWebKit/537.36 (KHTML, like Gecko) " \ "Chrome/130.0.0.0 Safari/537.36", "Mozilla/5.0 (X11; Linux x86_64) " \ "AppleWebKit/537.36 (KHTML, like Gecko) " \ "Chrome/131.0.0.0 Safari/537.36", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) " \ "AppleWebKit/537.36 (KHTML, like Gecko) " \ "Chrome/129.0.0.0 Safari/537.36", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) " \ "AppleWebKit/537.36 (KHTML, like Gecko) " \ "Chrome/131.0.0.0 Safari/537.36", ].freeze
Instance Method Summary collapse
- #get(url, headers: {}) ⇒ Object
-
#initialize(timeout: DEFAULT_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, retry_delay: DEFAULT_RETRY_DELAY, user_agent: nil) ⇒ HttpClient
constructor
A new instance of HttpClient.
Constructor Details
#initialize(timeout: DEFAULT_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, retry_delay: DEFAULT_RETRY_DELAY, user_agent: nil) ⇒ HttpClient
Returns a new instance of HttpClient.
55 56 57 58 59 60 61 62 63 |
# File 'lib/archaeo/http_client.rb', line 55 def initialize(timeout: DEFAULT_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, retry_delay: DEFAULT_RETRY_DELAY, user_agent: nil) @timeout = timeout @max_retries = max_retries @retry_delay = retry_delay @user_agent = user_agent end |
Instance Method Details
#get(url, headers: {}) ⇒ Object
65 66 67 68 |
# File 'lib/archaeo/http_client.rb', line 65 def get(url, headers: {}) merged = default_headers.merge(headers) attempt_with_retries(url, merged) end |