Class: ContextDev::Resources::Web

Inherits:
Object
  • Object
show all
Defined in:
lib/context_dev/resources/web.rb

Instance Method Summary collapse

Constructor Details

#initialize(client:) ⇒ Web

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.

Returns a new instance of Web.

Parameters:



438
439
440
# File 'lib/context_dev/resources/web.rb', line 438

def initialize(client:)
  @client = client
end

Instance Method Details

#extract(schema:, url:, fact_check: nil, follow_subdomains: nil, include_frames: nil, instructions: nil, max_age_ms: nil, pdf: nil, stop_after_ms: nil, timeout_ms: nil, wait_for_ms: nil, request_options: {}) ⇒ ContextDev::Models::WebExtractResponse

Some parameter documentations has been truncated, see Models::WebExtractParams for more details.

Crawl a website, convert pages to Markdown using the scrape cache, and extract structured data into the provided JSON Schema. The schema must describe the response data object. This endpoint does not accept targeted page-type selection.

Parameters:

  • schema (Hash{Symbol=>Object})

    JSON Schema for the returned data object. TypeScript Zod users can pass a JSON S

  • url (String)

    The starting website URL to crawl and extract from. Must include http:// or http

  • fact_check (Boolean)

    When true (default), every returned value must be grounded in facts stated on th

  • follow_subdomains (Boolean)

    When true, follow links on subdomains of the starting URL’s domain.

  • include_frames (Boolean)

    When true, iframe contents are included in Markdown before extraction.

  • instructions (String)

    Optional extraction guidance, such as which facts to prioritize or how to interp

  • max_age_ms (Integer)

    Return cached scrape results if a prior scrape for the same parameters is younge

  • pdf (ContextDev::Models::WebExtractParams::Pdf)
  • stop_after_ms (Integer)

    Soft time budget for the crawl in milliseconds.

  • timeout_ms (Integer)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • wait_for_ms (Integer)

    Optional browser wait time in milliseconds after initial page load for each craw

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}, nil)

Returns:

See Also:



43
44
45
46
47
48
49
50
51
52
# File 'lib/context_dev/resources/web.rb', line 43

def extract(params)
  parsed, options = ContextDev::WebExtractParams.dump_request(params)
  @client.request(
    method: :post,
    path: "web/extract",
    body: parsed,
    model: ContextDev::Models::WebExtractResponse,
    options: options
  )
end

#extract_fonts(direct_url: nil, domain: nil, max_age_ms: nil, timeout_ms: nil, request_options: {}) ⇒ ContextDev::Models::WebExtractFontsResponse

Some parameter documentations has been truncated, see Models::WebExtractFontsParams for more details.

Scrape font information from a website including font families, usage statistics, fallbacks, and element/word counts.

Parameters:

  • direct_url (String)

    A specific URL to fetch fonts from directly, bypassing domain resolution (e.g.,

  • domain (String)

    Domain name to extract fonts from (e.g., ‘example.com’, ‘google.com’). The domai

  • max_age_ms (Integer)

    Maximum age in milliseconds for cached data before the API performs a hard refre

  • timeout_ms (Integer)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}, nil)

Returns:

See Also:



75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/context_dev/resources/web.rb', line 75

def extract_fonts(params = {})
  parsed, options = ContextDev::WebExtractFontsParams.dump_request(params)
  query = ContextDev::Internal::Util.encode_query_params(parsed)
  @client.request(
    method: :get,
    path: "web/fonts",
    query: query.transform_keys(
      direct_url: "directUrl",
      max_age_ms: "maxAgeMs",
      timeout_ms: "timeoutMS"
    ),
    model: ContextDev::Models::WebExtractFontsResponse,
    options: options
  )
end

#extract_styleguide(direct_url: nil, domain: nil, max_age_ms: nil, timeout_ms: nil, request_options: {}) ⇒ ContextDev::Models::WebExtractStyleguideResponse

Some parameter documentations has been truncated, see Models::WebExtractStyleguideParams for more details.

Extract a comprehensive design system from a website including colors, typography, spacing, shadows, and UI components.

Parameters:

  • direct_url (String)

    A specific URL to fetch the styleguide from directly, bypassing domain resolutio

  • domain (String)

    Domain name to extract styleguide from (e.g., ‘example.com’, ‘google.com’). The

  • max_age_ms (Integer)

    Maximum age in milliseconds for cached data before the API performs a hard refre

  • timeout_ms (Integer)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}, nil)

Returns:

See Also:



112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/context_dev/resources/web.rb', line 112

def extract_styleguide(params = {})
  parsed, options = ContextDev::WebExtractStyleguideParams.dump_request(params)
  query = ContextDev::Internal::Util.encode_query_params(parsed)
  @client.request(
    method: :get,
    path: "web/styleguide",
    query: query.transform_keys(
      direct_url: "directUrl",
      max_age_ms: "maxAgeMs",
      timeout_ms: "timeoutMS"
    ),
    model: ContextDev::Models::WebExtractStyleguideResponse,
    options: options
  )
end

#screenshot(direct_url: nil, domain: nil, full_screenshot: nil, handle_cookie_popup: nil, max_age_ms: nil, page: nil, timeout_ms: nil, viewport: nil, wait_for_ms: nil, request_options: {}) ⇒ ContextDev::Models::WebScreenshotResponse

Some parameter documentations has been truncated, see Models::WebScreenshotParams for more details.

Capture a screenshot of a website.

Parameters:

  • direct_url (String)

    A specific URL to screenshot directly, bypassing domain resolution (e.g., ‘https

  • domain (String)

    Domain name to take screenshot of (e.g., ‘example.com’, ‘google.com’). The domai

  • full_screenshot (Symbol, ContextDev::Models::WebScreenshotParams::FullScreenshot)

    Optional parameter to determine screenshot type. If ‘true’, takes a full page sc

  • handle_cookie_popup (Symbol, ContextDev::Models::WebScreenshotParams::HandleCookiePopup)

    Optional parameter to control cookie/consent popup handling. If ‘true’, we dismi

  • max_age_ms (Integer)

    Return a cached screenshot if a prior screenshot for the same parameters exists

  • page (Symbol, ContextDev::Models::WebScreenshotParams::Page)

    Optional parameter to specify which page type to screenshot. If provided, the sy

  • timeout_ms (Integer)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • viewport (ContextDev::Models::WebScreenshotParams::Viewport)

    Optional browser viewport dimensions for the screenshot. Defaults to 1920x1080.

  • wait_for_ms (Integer)

    Optional browser wait time in milliseconds after initial page load before taking

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}, nil)

Returns:

See Also:



158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
# File 'lib/context_dev/resources/web.rb', line 158

def screenshot(params = {})
  parsed, options = ContextDev::WebScreenshotParams.dump_request(params)
  query = ContextDev::Internal::Util.encode_query_params(parsed)
  @client.request(
    method: :get,
    path: "web/screenshot",
    query: query.transform_keys(
      direct_url: "directUrl",
      full_screenshot: "fullScreenshot",
      handle_cookie_popup: "handleCookiePopup",
      max_age_ms: "maxAgeMs",
      timeout_ms: "timeoutMS",
      wait_for_ms: "waitForMs"
    ),
    model: ContextDev::Models::WebScreenshotResponse,
    options: options
  )
end

#search(query:, exclude_domains: nil, freshness: nil, include_domains: nil, markdown_options: nil, query_fanout: nil, timeout_ms: nil, request_options: {}) ⇒ ContextDev::Models::WebSearchResponse

Some parameter documentations has been truncated, see Models::WebSearchParams for more details.

Search the web and optionally scrape each result to Markdown in one round-trip.

Parameters:

  • query (String)

    Natural-language search query.

  • exclude_domains (Array<String>)

    Blocklist — drop results from these domains. Example: [“pinterest.com”, “reddit.

  • freshness (Symbol, ContextDev::Models::WebSearchParams::Freshness)

    Restrict results to content published within this window.

  • include_domains (Array<String>)

    Allowlist — only return results from these domains. Example: [“arxiv.org”, “gith

  • markdown_options (ContextDev::Models::WebSearchParams::MarkdownOptions)

    Inline Markdown scraping for each result. Set ‘enabled: true` to activate.

  • query_fanout (Boolean)

    Expand the query into multiple parallel variants for broader recall.

  • timeout_ms (Integer)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}, nil)

Returns:

See Also:



203
204
205
206
207
208
209
210
211
212
# File 'lib/context_dev/resources/web.rb', line 203

def search(params)
  parsed, options = ContextDev::WebSearchParams.dump_request(params)
  @client.request(
    method: :post,
    path: "web/search",
    body: parsed,
    model: ContextDev::Models::WebSearchResponse,
    options: options
  )
end

#web_crawl_md(url:, follow_subdomains: nil, include_frames: nil, include_images: nil, include_links: nil, max_age_ms: nil, max_depth: nil, max_pages: nil, pdf: nil, shorten_base64_images: nil, stop_after_ms: nil, timeout_ms: nil, url_regex: nil, use_main_content_only: nil, wait_for_ms: nil, request_options: {}) ⇒ ContextDev::Models::WebWebCrawlMdResponse

Some parameter documentations has been truncated, see Models::WebWebCrawlMdParams for more details.

Performs a crawl starting from a given URL, extracts page content as Markdown, and returns results for all crawled pages.

Parameters:

  • url (String)

    The starting URL for the crawl (must include http:// or https:// protocol)

  • follow_subdomains (Boolean)

    When true, follow links on subdomains of the starting URL’s domain (e.g. docs.ex

  • include_frames (Boolean)

    When true, the contents of iframes are rendered to Markdown for each crawled pag

  • include_images (Boolean)

    Include image references in the Markdown output

  • include_links (Boolean)

    Preserve hyperlinks in the Markdown output

  • max_age_ms (Integer)

    Return a cached result if a prior scrape for the same parameters exists and is y

  • max_depth (Integer)

    Maximum link depth from the starting URL (0 = only the starting page)

  • max_pages (Integer)

    Maximum number of pages to crawl. Hard cap: 500.

  • pdf (ContextDev::Models::WebWebCrawlMdParams::Pdf)

    PDF parsing controls. Use start/end to limit text extraction and OCR to an inclu

  • shorten_base64_images (Boolean)

    Truncate base64-encoded image data in the Markdown output

  • stop_after_ms (Integer)

    Soft time budget for the crawl in milliseconds. After each scrape, the crawler c

  • timeout_ms (Integer)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • url_regex (String)

    Regex pattern. Only URLs matching this pattern will be followed and scraped.

  • use_main_content_only (Boolean)

    Extract only the main content, stripping headers, footers, sidebars, and navigat

  • wait_for_ms (Integer)

    Optional browser wait time in milliseconds after initial page load for each craw

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}, nil)

Returns:

See Also:



257
258
259
260
261
262
263
264
265
266
# File 'lib/context_dev/resources/web.rb', line 257

def web_crawl_md(params)
  parsed, options = ContextDev::WebWebCrawlMdParams.dump_request(params)
  @client.request(
    method: :post,
    path: "web/crawl",
    body: parsed,
    model: ContextDev::Models::WebWebCrawlMdResponse,
    options: options
  )
end

#web_scrape_html(url:, include_frames: nil, max_age_ms: nil, pdf: nil, timeout_ms: nil, wait_for_ms: nil, request_options: {}) ⇒ ContextDev::Models::WebWebScrapeHTMLResponse

Some parameter documentations has been truncated, see Models::WebWebScrapeHTMLParams for more details.

Scrapes the given URL and returns the raw HTML content of the page.

Parameters:

  • url (String)

    Full URL to scrape (must include http:// or https:// protocol)

  • include_frames (Boolean)

    When true, iframes are rendered inline into the returned HTML.

  • max_age_ms (Integer)

    Return a cached result if a prior scrape for the same parameters exists and is y

  • pdf (ContextDev::Models::WebWebScrapeHTMLParams::Pdf)

    PDF parsing controls. Use start/end to limit text extraction and OCR to an inclu

  • timeout_ms (Integer)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • wait_for_ms (Integer)

    Optional browser wait time in milliseconds after initial page load. Min: 0. Max:

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}, nil)

Returns:

See Also:



292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
# File 'lib/context_dev/resources/web.rb', line 292

def web_scrape_html(params)
  parsed, options = ContextDev::WebWebScrapeHTMLParams.dump_request(params)
  query = ContextDev::Internal::Util.encode_query_params(parsed)
  @client.request(
    method: :get,
    path: "web/scrape/html",
    query: query.transform_keys(
      include_frames: "includeFrames",
      max_age_ms: "maxAgeMs",
      timeout_ms: "timeoutMS",
      wait_for_ms: "waitForMs"
    ),
    model: ContextDev::Models::WebWebScrapeHTMLResponse,
    options: options
  )
end

#web_scrape_images(url:, enrichment: nil, max_age_ms: nil, timeout_ms: nil, wait_for_ms: nil, request_options: {}) ⇒ ContextDev::Models::WebWebScrapeImagesResponse

Some parameter documentations has been truncated, see Models::WebWebScrapeImagesParams for more details.

Extract image assets from a web page, including standard URLs, inline SVGs, data URIs, responsive image sources, metadata, CSS backgrounds, video posters, and embeds. The base request costs 1 credit. When enrichment is enabled, the entire call costs 5 credits.

Parameters:

  • url (String)

    Page URL to inspect. Must include http:// or https://.

  • enrichment (ContextDev::Models::WebWebScrapeImagesParams::Enrichment)

    Optional per-image processing, sent as deep-object query params such as enrichme

  • max_age_ms (Integer)

    Reuse a cached result this many milliseconds old or newer. Default: 86400000 (1

  • timeout_ms (Integer)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • wait_for_ms (Integer)

    Optional browser wait time in milliseconds after initial page load before collec

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}, nil)

Returns:

See Also:



334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
# File 'lib/context_dev/resources/web.rb', line 334

def web_scrape_images(params)
  parsed, options = ContextDev::WebWebScrapeImagesParams.dump_request(params)
  query = ContextDev::Internal::Util.encode_query_params(parsed)
  @client.request(
    method: :get,
    path: "web/scrape/images",
    query: query.transform_keys(
      max_age_ms: "maxAgeMs",
      timeout_ms: "timeoutMS",
      wait_for_ms: "waitForMs"
    ),
    model: ContextDev::Models::WebWebScrapeImagesResponse,
    options: options
  )
end

#web_scrape_md(url:, include_frames: nil, include_images: nil, include_links: nil, max_age_ms: nil, pdf: nil, shorten_base64_images: nil, timeout_ms: nil, use_main_content_only: nil, wait_for_ms: nil, request_options: {}) ⇒ ContextDev::Models::WebWebScrapeMdResponse

Some parameter documentations has been truncated, see Models::WebWebScrapeMdParams for more details.

Scrapes the given URL into LLM usable Markdown.

Parameters:

  • url (String)

    Full URL to scrape into LLM usable Markdown (must include http:// or https:// pr

  • include_frames (Boolean)

    When true, the contents of iframes are rendered to Markdown.

  • include_images (Boolean)

    Include image references in Markdown output

  • include_links (Boolean)

    Preserve hyperlinks in Markdown output

  • max_age_ms (Integer)

    Return a cached result if a prior scrape for the same parameters exists and is y

  • pdf (ContextDev::Models::WebWebScrapeMdParams::Pdf)

    PDF parsing controls. Use start/end to limit text extraction and OCR to an inclu

  • shorten_base64_images (Boolean)

    Shorten base64-encoded image data in the Markdown output

  • timeout_ms (Integer)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • use_main_content_only (Boolean)

    Extract only the main content of the page, excluding headers, footers, sidebars,

  • wait_for_ms (Integer)

    Optional browser wait time in milliseconds after initial page load before conver

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}, nil)

Returns:

See Also:



382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
# File 'lib/context_dev/resources/web.rb', line 382

def web_scrape_md(params)
  parsed, options = ContextDev::WebWebScrapeMdParams.dump_request(params)
  query = ContextDev::Internal::Util.encode_query_params(parsed)
  @client.request(
    method: :get,
    path: "web/scrape/markdown",
    query: query.transform_keys(
      include_frames: "includeFrames",
      include_images: "includeImages",
      include_links: "includeLinks",
      max_age_ms: "maxAgeMs",
      shorten_base64_images: "shortenBase64Images",
      timeout_ms: "timeoutMS",
      use_main_content_only: "useMainContentOnly",
      wait_for_ms: "waitForMs"
    ),
    model: ContextDev::Models::WebWebScrapeMdResponse,
    options: options
  )
end

#web_scrape_sitemap(domain:, max_links: nil, timeout_ms: nil, url_regex: nil, request_options: {}) ⇒ ContextDev::Models::WebWebScrapeSitemapResponse

Some parameter documentations has been truncated, see Models::WebWebScrapeSitemapParams for more details.

Crawl an entire website’s sitemap and return all discovered page URLs.

Parameters:

  • domain (String)

    Domain to build a sitemap for

  • max_links (Integer)

    Maximum number of links to return from the sitemap crawl. Defaults to 10,000. Mi

  • timeout_ms (Integer)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • url_regex (String)

    Optional RE2-compatible regex pattern. Only URLs matching this pattern are retur

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}, nil)

Returns:

See Also:



423
424
425
426
427
428
429
430
431
432
433
# File 'lib/context_dev/resources/web.rb', line 423

def web_scrape_sitemap(params)
  parsed, options = ContextDev::WebWebScrapeSitemapParams.dump_request(params)
  query = ContextDev::Internal::Util.encode_query_params(parsed)
  @client.request(
    method: :get,
    path: "web/scrape/sitemap",
    query: query.transform_keys(max_links: "maxLinks", timeout_ms: "timeoutMS", url_regex: "urlRegex"),
    model: ContextDev::Models::WebWebScrapeSitemapResponse,
    options: options
  )
end