Class: ContextDev::Models::WebWebScrapeHTMLParams

Inherits:

Internal::Type::BaseModel

Object
Internal::Type::BaseModel
ContextDev::Models::WebWebScrapeHTMLParams

show all

Extended by:: Internal::Type::RequestParameters::Converter

Includes:: Internal::Type::RequestParameters

Defined in:: lib/context_dev/models/web_web_scrape_html_params.rb

Overview

Defined Under Namespace

Modules: Country Classes: Pdf

Instance Attribute Summary collapse

#country ⇒ Symbol, ...

Two-letter ISO 3166-1 alpha-2 country code for the website request location.
#exclude_selectors ⇒ Array<String>^?

CSS selectors to remove from the result.
#headers ⇒ Hash{Symbol=>String}^?

Optional outbound HTTP headers forwarded only to the target URL, sent as deep-object query params such as headers=value.
#include_frames ⇒ Boolean^?

When true, iframes are rendered inline into the returned HTML.
#include_selectors ⇒ Array<String>^?

CSS selectors.
#max_age_ms ⇒ Integer^?

Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds.
#pdf ⇒ ContextDev::Models::WebWebScrapeHTMLParams::Pdf^?

PDF parsing controls.
#timeout_ms ⇒ Integer^?

Optional timeout in milliseconds for the request.
#url ⇒ String

Full URL to scrape (must include http:// or https:// protocol).
#use_main_content_only ⇒ Boolean^?

When true, return only the page’s main content in the HTML response, excluding headers, footers, sidebars, and navigation when detectable.
#wait_for_ms ⇒ Integer^?

Optional browser wait time in milliseconds after initial page load.

Attributes included from Internal::Type::RequestParameters

#request_options

Class Method Summary collapse

.values ⇒ Array<Symbol>

Instance Method Summary collapse

#initialize(url:, country: nil, exclude_selectors: nil, headers: nil, include_frames: nil, include_selectors: nil, max_age_ms: nil, pdf: nil, timeout_ms: nil, use_main_content_only: nil, wait_for_ms: nil, request_options: {}) ⇒ Object constructor

Some parameter documentations has been truncated, see WebWebScrapeHTMLParams for more details.

Constructor Details

#initialize(url:, country: nil, exclude_selectors: nil, headers: nil, include_frames: nil, include_selectors: nil, max_age_ms: nil, pdf: nil, timeout_ms: nil, use_main_content_only: nil, wait_for_ms: nil, request_options: {}) ⇒ `Object`

Some parameter documentations has been truncated, see ContextDev::Models::WebWebScrapeHTMLParams for more details.

Parameters:

url (String) —

Full URL to scrape (must include http:// or https:// protocol)
country (Symbol, ContextDev::Models::WebWebScrapeHTMLParams::Country) (defaults to: nil) —

Two-letter ISO 3166-1 alpha-2 country code for the website request location. Whe
exclude_selectors (Array<String>) (defaults to: nil) —

CSS selectors to remove from the result. Applied after includeSelectors. Exclusi
headers (Hash{Symbol=>String}) (defaults to: nil) —

Optional outbound HTTP headers forwarded only to the target URL, sent as deep-ob
include_frames (Boolean) (defaults to: nil) —

When true, iframes are rendered inline into the returned HTML.
include_selectors (Array<String>) (defaults to: nil) —

CSS selectors. When provided, only matching subtrees (and their descendants) are
max_age_ms (Integer) (defaults to: nil) —

Return a cached result if a prior scrape for the same parameters exists and is y
pdf (ContextDev::Models::WebWebScrapeHTMLParams::Pdf) (defaults to: nil) —

PDF parsing controls. Use start/end to limit text extraction and OCR to an inclu
timeout_ms (Integer) (defaults to: nil) —

Optional timeout in milliseconds for the request. If the request takes longer th
use_main_content_only (Boolean) (defaults to: nil) —

When true, return only the page’s main content in the HTML response, excluding h
wait_for_ms (Integer) (defaults to: nil) —

Optional browser wait time in milliseconds after initial page load. Min: 0. Max:
request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}) (defaults to: {})

# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 90

Instance Attribute Details

#country ⇒ `Symbol`, ...

Two-letter ISO 3166-1 alpha-2 country code for the website request location. When provided, Context.dev fetches the target page from that country.

Returns:

(Symbol, ContextDev::Models::WebWebScrapeHTMLParams::Country, nil)

21	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 21 optional :country, enum: -> { ContextDev::WebWebScrapeHTMLParams::Country }

#exclude_selectors ⇒ `Array<String>`^?

CSS selectors to remove from the result. Applied after includeSelectors. Exclusion takes precedence: an element matching both is removed. Examples: “nav”, “footer”, “.ad-banner”, “[aria-hidden=true]”.

Returns:

(Array<String>, nil)

29	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 29 optional :exclude_selectors, ContextDev::Internal::Type::ArrayOf[String]

#headers ⇒ `Hash{Symbol=>String}`^?

Optional outbound HTTP headers forwarded only to the target URL, sent as deep-object query params such as headers=value. When provided, caching is bypassed: the result is neither read from nor written to cache.

Returns:

(Hash{Symbol=>String}, nil)

37	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 37 optional :headers, ContextDev::Internal::Type::HashOf[String]

#include_frames ⇒ `Boolean`^?

When true, iframes are rendered inline into the returned HTML.

Returns:

(Boolean, nil)

43	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 43 optional :include_frames, ContextDev::Internal::Type::Boolean

#include_selectors ⇒ `Array<String>`^?

CSS selectors. When provided, only matching subtrees (and their descendants) are kept and everything else is dropped. When omitted, the entire document is kept. Examples: “article.main”, “#content”, “[role=main]”.

Returns:

(Array<String>, nil)

51	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 51 optional :include_selectors, ContextDev::Internal::Type::ArrayOf[String]

#max_age_ms ⇒ `Integer`^?

Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds. Defaults to 1 day (86400000 ms) when omitted. Max is 30 days (2592000000 ms). Set to 0 to always scrape fresh.

Returns:

(Integer, nil)

59	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 59 optional :max_age_ms, Integer

#pdf ⇒ `ContextDev::Models::WebWebScrapeHTMLParams::Pdf`^?

PDF parsing controls. Use start/end to limit text extraction and OCR to an inclusive 1-based page range.

Returns:

(ContextDev::Models::WebWebScrapeHTMLParams::Pdf, nil)

66	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 66 optional :pdf, -> { ContextDev::WebWebScrapeHTMLParams::Pdf }

#timeout_ms ⇒ `Integer`^?

Optional timeout in milliseconds for the request. If the request takes longer than this value, it will be aborted with a 408 status code. Maximum allowed value is 300000ms (5 minutes).

Returns:

(Integer, nil)

74	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 74 optional :timeout_ms, Integer

#url ⇒ `String`

Full URL to scrape (must include http:// or https:// protocol)

Returns:

(String)

14	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 14 required :url, String

#use_main_content_only ⇒ `Boolean`^?

When true, return only the page’s main content in the HTML response, excluding headers, footers, sidebars, and navigation when detectable.

Returns:

(Boolean, nil)

81	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 81 optional :use_main_content_only, ContextDev::Internal::Type::Boolean

#wait_for_ms ⇒ `Integer`^?

Optional browser wait time in milliseconds after initial page load. Min: 0. Max: 30000 (30 seconds).

Returns:

(Integer, nil)

88	# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 88 optional :wait_for_ms, Integer

Class Method Details

.values ⇒ `Array<Symbol>`

Returns:

(Array<Symbol>)

# File 'lib/context_dev/models/web_web_scrape_html_params.rb', line 328

Class: ContextDev::Models::WebWebScrapeHTMLParams

Overview

Defined Under Namespace

Instance Attribute Summary collapse

Attributes included from Internal::Type::RequestParameters

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Internal::Type::RequestParameters::Converter

Methods included from Internal::Type::RequestParameters

Methods inherited from Internal::Type::BaseModel

Methods included from Internal::Type::Converter

Methods included from Internal::Util::SorbetRuntimeSupport

Constructor Details

#initialize(url:, country: nil, exclude_selectors: nil, headers: nil, include_frames: nil, include_selectors: nil, max_age_ms: nil, pdf: nil, timeout_ms: nil, use_main_content_only: nil, wait_for_ms: nil, request_options: {}) ⇒ Object

Instance Attribute Details

#country ⇒ Symbol, ...

#exclude_selectors ⇒ Array<String>?

#headers ⇒ Hash{Symbol=>String}?

#include_frames ⇒ Boolean?

#include_selectors ⇒ Array<String>?

#max_age_ms ⇒ Integer?

#pdf ⇒ ContextDev::Models::WebWebScrapeHTMLParams::Pdf?

#timeout_ms ⇒ Integer?

#url ⇒ String

#use_main_content_only ⇒ Boolean?

#wait_for_ms ⇒ Integer?

Class Method Details

.values ⇒ Array<Symbol>

#initialize(url:, country: nil, exclude_selectors: nil, headers: nil, include_frames: nil, include_selectors: nil, max_age_ms: nil, pdf: nil, timeout_ms: nil, use_main_content_only: nil, wait_for_ms: nil, request_options: {}) ⇒ `Object`

#country ⇒ `Symbol`, ...

#exclude_selectors ⇒ `Array<String>`^?

#headers ⇒ `Hash{Symbol=>String}`^?

#include_frames ⇒ `Boolean`^?

#include_selectors ⇒ `Array<String>`^?

#max_age_ms ⇒ `Integer`^?

#pdf ⇒ `ContextDev::Models::WebWebScrapeHTMLParams::Pdf`^?

#timeout_ms ⇒ `Integer`^?

#url ⇒ `String`

#use_main_content_only ⇒ `Boolean`^?

#wait_for_ms ⇒ `Integer`^?

.values ⇒ `Array<Symbol>`