Class: ContextDev::Models::WebWebCrawlMdParams

Inherits:

Internal::Type::BaseModel

Object
Internal::Type::BaseModel
ContextDev::Models::WebWebCrawlMdParams

show all

Extended by:: Internal::Type::RequestParameters::Converter

Includes:: Internal::Type::RequestParameters

Defined in:: lib/context_dev/models/web_web_crawl_md_params.rb

Overview

Defined Under Namespace

Classes: Pdf

Instance Attribute Summary collapse

#follow_subdomains ⇒ Boolean^?

When true, follow links on subdomains of the starting URL’s domain (e.g. docs.example.com when starting from example.com).
#include_frames ⇒ Boolean^?

When true, the contents of iframes are rendered to Markdown for each crawled page.
#include_images ⇒ Boolean^?

Include image references in the Markdown output.
#include_links ⇒ Boolean^?

Preserve hyperlinks in the Markdown output.
#max_age_ms ⇒ Integer^?

Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds.
#max_depth ⇒ Integer^?

Maximum link depth from the starting URL (0 = only the starting page).
#max_pages ⇒ Integer^?

Maximum number of pages to crawl.
#pdf ⇒ ContextDev::Models::WebWebCrawlMdParams::Pdf^?

PDF parsing controls.
#shorten_base64_images ⇒ Boolean^?

Truncate base64-encoded image data in the Markdown output.
#stop_after_ms ⇒ Integer^?

Soft time budget for the crawl in milliseconds.
#timeout_ms ⇒ Integer^?

Optional timeout in milliseconds for the request.
#url ⇒ String

The starting URL for the crawl (must include http:// or https:// protocol).
#url_regex ⇒ String^?

Regex pattern.
#use_main_content_only ⇒ Boolean^?

Extract only the main content, stripping headers, footers, sidebars, and navigation.
#wait_for_ms ⇒ Integer^?

Optional browser wait time in milliseconds after initial page load for each crawled page.

Attributes included from Internal::Type::RequestParameters

#request_options

Method Summary

Constructor Details

This class inherits a constructor from ContextDev::Internal::Type::BaseModel

Instance Attribute Details

#follow_subdomains ⇒ `Boolean`^?

When true, follow links on subdomains of the starting URL’s domain (e.g. docs.example.com when starting from example.com). www and apex are always treated as equivalent.

Returns:

(Boolean, nil)

22	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 22 optional :follow_subdomains, ContextDev::Internal::Type::Boolean, api_name: :followSubdomains

#include_frames ⇒ `Boolean`^?

When true, the contents of iframes are rendered to Markdown for each crawled page.

Returns:

(Boolean, nil)

29	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 29 optional :include_frames, ContextDev::Internal::Type::Boolean, api_name: :includeFrames

#include_images ⇒ `Boolean`^?

Include image references in the Markdown output

Returns:

(Boolean, nil)

35	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 35 optional :include_images, ContextDev::Internal::Type::Boolean, api_name: :includeImages

#include_links ⇒ `Boolean`^?

Preserve hyperlinks in the Markdown output

Returns:

(Boolean, nil)

41	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 41 optional :include_links, ContextDev::Internal::Type::Boolean, api_name: :includeLinks

#max_age_ms ⇒ `Integer`^?

Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds. Defaults to 1 day (86400000 ms) when omitted. Max is 30 days (2592000000 ms). Set to 0 to always scrape fresh.

Returns:

(Integer, nil)

49	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 49 optional :max_age_ms, Integer, api_name: :maxAgeMs

#max_depth ⇒ `Integer`^?

Maximum link depth from the starting URL (0 = only the starting page)

Returns:

(Integer, nil)

55	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 55 optional :max_depth, Integer, api_name: :maxDepth

#max_pages ⇒ `Integer`^?

Maximum number of pages to crawl. Hard cap: 500.

Returns:

(Integer, nil)

61	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 61 optional :max_pages, Integer, api_name: :maxPages

#pdf ⇒ `ContextDev::Models::WebWebCrawlMdParams::Pdf`^?

PDF parsing controls. Use start/end to limit text extraction and OCR to an inclusive 1-based page range.

Returns:

(ContextDev::Models::WebWebCrawlMdParams::Pdf, nil)

68	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 68 optional :pdf, -> { ContextDev::WebWebCrawlMdParams::Pdf }

#shorten_base64_images ⇒ `Boolean`^?

Truncate base64-encoded image data in the Markdown output

Returns:

(Boolean, nil)

74	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 74 optional :shorten_base64_images, ContextDev::Internal::Type::Boolean, api_name: :shortenBase64Images

#stop_after_ms ⇒ `Integer`^?

Soft time budget for the crawl in milliseconds. After each scrape, the crawler checks the elapsed time and, if exceeded, returns the pages collected so far instead of continuing. Min: 10000 (10s). Max: 240000 (4 min). Default: 120000 (2 min).

Returns:

(Integer, nil)

83	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 83 optional :stop_after_ms, Integer, api_name: :stopAfterMs

#timeout_ms ⇒ `Integer`^?

Optional timeout in milliseconds for the request. If the request takes longer than this value, it will be aborted with a 408 status code. Maximum allowed value is 300000ms (5 minutes).

Returns:

(Integer, nil)

91	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 91 optional :timeout_ms, Integer, api_name: :timeoutMS

#url ⇒ `String`

The starting URL for the crawl (must include http:// or https:// protocol)

Returns:

(String)

14	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 14 required :url, String

#url_regex ⇒ `String`^?

Regex pattern. Only URLs matching this pattern will be followed and scraped.

Returns:

(String, nil)

97	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 97 optional :url_regex, String, api_name: :urlRegex

#use_main_content_only ⇒ `Boolean`^?

Extract only the main content, stripping headers, footers, sidebars, and navigation

Returns:

(Boolean, nil)

104	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 104 optional :use_main_content_only, ContextDev::Internal::Type::Boolean, api_name: :useMainContentOnly

#wait_for_ms ⇒ `Integer`^?

Optional browser wait time in milliseconds after initial page load for each crawled page. Min: 0. Max: 30000 (30 seconds).

Returns:

(Integer, nil)

111	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 111 optional :wait_for_ms, Integer, api_name: :waitForMs

Class: ContextDev::Models::WebWebCrawlMdParams

Overview

Defined Under Namespace

Instance Attribute Summary collapse

Attributes included from Internal::Type::RequestParameters

Method Summary

Methods included from Internal::Type::RequestParameters::Converter

Methods included from Internal::Type::RequestParameters

Methods inherited from Internal::Type::BaseModel

Methods included from Internal::Type::Converter

Methods included from Internal::Util::SorbetRuntimeSupport

Constructor Details

Instance Attribute Details

#follow_subdomains ⇒ Boolean?

#include_frames ⇒ Boolean?

#include_images ⇒ Boolean?

#include_links ⇒ Boolean?

#max_age_ms ⇒ Integer?

#max_depth ⇒ Integer?

#max_pages ⇒ Integer?

#pdf ⇒ ContextDev::Models::WebWebCrawlMdParams::Pdf?

#shorten_base64_images ⇒ Boolean?

#stop_after_ms ⇒ Integer?

#timeout_ms ⇒ Integer?

#url ⇒ String

#url_regex ⇒ String?

#use_main_content_only ⇒ Boolean?

#wait_for_ms ⇒ Integer?

#follow_subdomains ⇒ `Boolean`^?

#include_frames ⇒ `Boolean`^?

#include_images ⇒ `Boolean`^?

#include_links ⇒ `Boolean`^?

#max_age_ms ⇒ `Integer`^?

#max_depth ⇒ `Integer`^?

#max_pages ⇒ `Integer`^?

#pdf ⇒ `ContextDev::Models::WebWebCrawlMdParams::Pdf`^?

#shorten_base64_images ⇒ `Boolean`^?

#stop_after_ms ⇒ `Integer`^?

#timeout_ms ⇒ `Integer`^?

#url ⇒ `String`

#url_regex ⇒ `String`^?

#use_main_content_only ⇒ `Boolean`^?

#wait_for_ms ⇒ `Integer`^?