Class: Html2rss::RequestService::BotasaurusContract

Inherits:
Object
  • Object
show all
Defined in:
lib/html2rss/request_service/botasaurus_contract.rb

Overview

Maps html2rss request/response handling to the botasaurus-scrape-api contract.

Defined Under Namespace

Classes: ParsedResponse

Constant Summary collapse

DEFAULT_OPTIONS =

Default Botasaurus scrape options when no explicit config is provided.

{
  navigation_mode: 'auto',
  max_retries: 2,
  headless: false
}.freeze
OPTION_KEYS =

Allowlisted request.botasaurus keys forwarded to upstream.

%i[
  navigation_mode
  max_retries
  wait_for_selector
  wait_timeout_seconds
  block_images
  block_images_and_css
  wait_for_complete_page_load
  headless
  proxy
  user_agent
  window_size
  lang
].freeze

Instance Method Summary collapse

Constructor Details

#initialize(url:, options: {}) ⇒ BotasaurusContract

Returns a new instance of BotasaurusContract.

Parameters:

  • url (Html2rss::Url)

    canonical URL to scrape

  • options (Hash) (defaults to: {})

    validated request.botasaurus options

Options Hash (options:):

  • :navigation_mode (String)
  • :max_retries (Integer)
  • :wait_for_selector (String)
  • :wait_timeout_seconds (Integer)
  • :block_images (Boolean)
  • :block_images_and_css (Boolean)
  • :wait_for_complete_page_load (Boolean)
  • :headless (Boolean)
  • :proxy (String)
  • :user_agent (String)
  • :window_size (Array<Integer>)
  • :lang (String)


128
129
130
131
# File 'lib/html2rss/request_service/botasaurus_contract.rb', line 128

def initialize(url:, options: {})
  @url = url
  @options = options
end

Instance Method Details

#parse_response(transport_response) ⇒ ParsedResponse

Parameters:

  • transport_response (Faraday::Response)

    upstream HTTP response

Returns:

Raises:



141
142
143
144
145
146
147
148
# File 'lib/html2rss/request_service/botasaurus_contract.rb', line 141

def parse_response(transport_response)
  payload = JSON.parse(transport_response.body.to_s)
  raise BotasaurusConnectionFailed, 'Botasaurus response must be a JSON object' unless payload.is_a?(Hash)

  ParsedResponse.new(payload:, transport_status: transport_response.status)
rescue JSON::ParserError => error
  raise BotasaurusConnectionFailed, "Botasaurus response JSON parse failed: #{error.message}"
end

#request_payloadHash

Returns payload for POST /scrape.

Returns:

  • (Hash)

    payload for POST /scrape



134
135
136
# File 'lib/html2rss/request_service/botasaurus_contract.rb', line 134

def request_payload
  DEFAULT_OPTIONS.merge(filtered_options).merge(url: url.to_s)
end