Class: Resilientlink::Client

Inherits:

Object

Object
Resilientlink::Client

show all

Defined in:: lib/resilientlink.rb

Instance Method Summary collapse

#initialize(api_key:, base_url: DEFAULT_URL, timeout: DEFAULT_TIMEOUT) ⇒ Client constructor

A new instance of Client.
#scrape(url, **opts) ⇒ Hash

Scrape a URL and return structured metadata.

Constructor Details

#initialize(api_key:, base_url: DEFAULT_URL, timeout: DEFAULT_TIMEOUT) ⇒ `Client`

Returns a new instance of Client.

Parameters:

api_key (String) —

Your ResilientLink API key (required)
base_url (String) (defaults to: DEFAULT_URL) —

Override API base URL
timeout (Integer) (defaults to: DEFAULT_TIMEOUT) —

Request timeout in seconds (default: 60)

Raises:

(ArgumentError)

# File 'lib/resilientlink.rb', line 27

def initialize(api_key:, base_url: DEFAULT_URL, timeout: DEFAULT_TIMEOUT)
  raise ArgumentError, 'api_key is required.' if api_key.nil? || api_key.empty?

  @api_key  = api_key
  @base_url = base_url.chomp('/')
  @timeout  = timeout
end

Instance Method Details

#scrape(url, **opts) ⇒ `Hash`

Scrape a URL and return structured metadata.

Parameters:

url (String) —

The URL to scrape (required)
opts (Hash) —

Scrape options: :return_html [Boolean] Include raw HTML in response :screenshot [Boolean] Return base64 PNG screenshot (Pro/Enterprise) :full_page [Boolean] Full-page screenshot (default: true) :pdf [Boolean] Return base64 PDF (Pro/Enterprise) :pdf_format [String] ‘A4’, ‘Letter’, etc. (default: ‘A4’) :pdf_background [Boolean] Include background (default: true) :pdf_landscape [Boolean] Landscape PDF :bypass_cache [Boolean] Skip cache, force fresh scrape :js_render [Boolean] Enable JS rendering (Pro/Enterprise) :wait_for_selector [String] CSS selector to wait for :wait_until [String] ‘networkidle0’, ‘load’, ‘domcontentloaded’ :wait_ms [Integer] Extra ms to wait before scraping :custom_headers [Hash] HTTP headers to forward :custom_js [String] JavaScript to execute (Enterprise) :return_cookies [Boolean] Return cookies from the page :block_resources [Array] Resource types to block: [‘media’,‘font’] :timeout [Integer] Per-request timeout in ms (max 60000)

Returns:

(Hash) —

Scrape result

Raises:

(Resilientlink::Error)

# File 'lib/resilientlink.rb', line 59

def scrape(url, **opts)
  raise ArgumentError, 'url is required.' if url.nil? || url.empty?

  body = { url: url }
  body[:return_html]       = opts[:return_html]       unless opts[:return_html].nil?
  body[:screenshot]        = opts[:screenshot]        unless opts[:screenshot].nil?
  body[:full_page]         = opts[:full_page]         unless opts[:full_page].nil?
  body[:pdf]               = opts[:pdf]               unless opts[:pdf].nil?
  body[:pdf_format]        = opts[:pdf_format]        unless opts[:pdf_format].nil?
  body[:pdf_background]    = opts[:pdf_background]    unless opts[:pdf_background].nil?
  body[:pdf_landscape]     = opts[:pdf_landscape]     unless opts[:pdf_landscape].nil?
  body[:bypass_cache]      = opts[:bypass_cache]      unless opts[:bypass_cache].nil?
  body[:js_render]         = opts[:js_render]         unless opts[:js_render].nil?
  body[:wait_for_selector] = opts[:wait_for_selector] unless opts[:wait_for_selector].nil?
  body[:wait_until]        = opts[:wait_until]        unless opts[:wait_until].nil?
  body[:wait_ms]           = opts[:wait_ms]           unless opts[:wait_ms].nil?
  body[:custom_headers]    = opts[:custom_headers]    unless opts[:custom_headers].nil?
  body[:custom_js]         = opts[:custom_js]         unless opts[:custom_js].nil?
  body[:return_cookies]    = opts[:return_cookies]    unless opts[:return_cookies].nil?
  body[:block_resources]   = opts[:block_resources]   unless opts[:block_resources].nil?
  body[:timeout]           = opts[:timeout]           unless opts[:timeout].nil?

  request('POST', '/api/scrape', body)
end

Class: Resilientlink::Client

Instance Method Summary collapse

Constructor Details

#initialize(api_key:, base_url: DEFAULT_URL, timeout: DEFAULT_TIMEOUT) ⇒ Client

Instance Method Details

#scrape(url, **opts) ⇒ Hash

#initialize(api_key:, base_url: DEFAULT_URL, timeout: DEFAULT_TIMEOUT) ⇒ `Client`

#scrape(url, **opts) ⇒ `Hash`