Class: ContextDev::Models::WebWebCrawlMdParams

Inherits:

Internal::Type::BaseModel

Object
Internal::Type::BaseModel
ContextDev::Models::WebWebCrawlMdParams

show all

Extended by:: Internal::Type::RequestParameters::Converter

Includes:: Internal::Type::RequestParameters

Defined in:: lib/context_dev/models/web_web_crawl_md_params.rb

Overview

Instance Attribute Summary collapse

#follow_subdomains ⇒ Boolean^?

When true, follow links on subdomains of the starting URL’s domain (e.g. docs.example.com when starting from example.com).
#include_images ⇒ Boolean^?

Include image references in the Markdown output.
#include_links ⇒ Boolean^?

Preserve hyperlinks in the Markdown output.
#max_depth ⇒ Integer^?

Maximum link depth from the starting URL (0 = only the starting page).
#max_pages ⇒ Integer^?

Maximum number of pages to crawl.
#shorten_base64_images ⇒ Boolean^?

Truncate base64-encoded image data in the Markdown output.
#url ⇒ String

The starting URL for the crawl (must include http:// or https:// protocol).
#url_regex ⇒ String^?

Regex pattern.
#use_main_content_only ⇒ Boolean^?

Extract only the main content, stripping headers, footers, sidebars, and navigation.

Attributes included from Internal::Type::RequestParameters

#request_options

Instance Method Summary collapse

#initialize(url:, follow_subdomains: nil, include_images: nil, include_links: nil, max_depth: nil, max_pages: nil, shorten_base64_images: nil, url_regex: nil, use_main_content_only: nil, request_options: {}) ⇒ Object constructor

Some parameter documentations has been truncated, see WebWebCrawlMdParams for more details.

Constructor Details

#initialize(url:, follow_subdomains: nil, include_images: nil, include_links: nil, max_depth: nil, max_pages: nil, shorten_base64_images: nil, url_regex: nil, use_main_content_only: nil, request_options: {}) ⇒ `Object`

Some parameter documentations has been truncated, see ContextDev::Models::WebWebCrawlMdParams for more details.

Parameters:

url (String) —

The starting URL for the crawl (must include http:// or https:// protocol)
follow_subdomains (Boolean) (defaults to: nil) —

When true, follow links on subdomains of the starting URL’s domain (e.g. docs.ex
include_images (Boolean) (defaults to: nil) —

Include image references in the Markdown output
include_links (Boolean) (defaults to: nil) —

Preserve hyperlinks in the Markdown output
max_depth (Integer) (defaults to: nil) —

Maximum link depth from the starting URL (0 = only the starting page)
max_pages (Integer) (defaults to: nil) —

Maximum number of pages to crawl. Hard cap: 500.
shorten_base64_images (Boolean) (defaults to: nil) —

Truncate base64-encoded image data in the Markdown output
url_regex (String) (defaults to: nil) —

Regex pattern. Only URLs matching this pattern will be followed and scraped.
use_main_content_only (Boolean) (defaults to: nil) —

Extract only the main content, stripping headers, footers, sidebars, and navigat
request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}) (defaults to: {})

# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 67

Instance Attribute Details

#follow_subdomains ⇒ `Boolean`^?

When true, follow links on subdomains of the starting URL’s domain (e.g. docs.example.com when starting from example.com). www and apex are always treated as equivalent.

Returns:

(Boolean, nil)

22	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 22 optional :follow_subdomains, ContextDev::Internal::Type::Boolean, api_name: :followSubdomains

#include_images ⇒ `Boolean`^?

Include image references in the Markdown output

Returns:

(Boolean, nil)

28	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 28 optional :include_images, ContextDev::Internal::Type::Boolean, api_name: :includeImages

#include_links ⇒ `Boolean`^?

Preserve hyperlinks in the Markdown output

Returns:

(Boolean, nil)

34	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 34 optional :include_links, ContextDev::Internal::Type::Boolean, api_name: :includeLinks

#max_depth ⇒ `Integer`^?

Maximum link depth from the starting URL (0 = only the starting page)

Returns:

(Integer, nil)

40	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 40 optional :max_depth, Integer, api_name: :maxDepth

#max_pages ⇒ `Integer`^?

Maximum number of pages to crawl. Hard cap: 500.

Returns:

(Integer, nil)

46	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 46 optional :max_pages, Integer, api_name: :maxPages

#shorten_base64_images ⇒ `Boolean`^?

Truncate base64-encoded image data in the Markdown output

Returns:

(Boolean, nil)

52	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 52 optional :shorten_base64_images, ContextDev::Internal::Type::Boolean, api_name: :shortenBase64Images

#url ⇒ `String`

The starting URL for the crawl (must include http:// or https:// protocol)

Returns:

(String)

14	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 14 required :url, String

#url_regex ⇒ `String`^?

Regex pattern. Only URLs matching this pattern will be followed and scraped.

Returns:

(String, nil)

58	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 58 optional :url_regex, String, api_name: :urlRegex

#use_main_content_only ⇒ `Boolean`^?

Extract only the main content, stripping headers, footers, sidebars, and navigation

Returns:

(Boolean, nil)

65	# File 'lib/context_dev/models/web_web_crawl_md_params.rb', line 65 optional :use_main_content_only, ContextDev::Internal::Type::Boolean, api_name: :useMainContentOnly

Class: ContextDev::Models::WebWebCrawlMdParams

Overview

Instance Attribute Summary collapse

Attributes included from Internal::Type::RequestParameters

Instance Method Summary collapse

Methods included from Internal::Type::RequestParameters::Converter

Methods included from Internal::Type::RequestParameters

Methods inherited from Internal::Type::BaseModel

Methods included from Internal::Type::Converter

Methods included from Internal::Util::SorbetRuntimeSupport

Constructor Details

#initialize(url:, follow_subdomains: nil, include_images: nil, include_links: nil, max_depth: nil, max_pages: nil, shorten_base64_images: nil, url_regex: nil, use_main_content_only: nil, request_options: {}) ⇒ Object

Instance Attribute Details

#follow_subdomains ⇒ Boolean?

#include_images ⇒ Boolean?

#include_links ⇒ Boolean?

#max_depth ⇒ Integer?

#max_pages ⇒ Integer?

#shorten_base64_images ⇒ Boolean?

#url ⇒ String

#url_regex ⇒ String?

#use_main_content_only ⇒ Boolean?

#initialize(url:, follow_subdomains: nil, include_images: nil, include_links: nil, max_depth: nil, max_pages: nil, shorten_base64_images: nil, url_regex: nil, use_main_content_only: nil, request_options: {}) ⇒ `Object`

#follow_subdomains ⇒ `Boolean`^?

#include_images ⇒ `Boolean`^?

#include_links ⇒ `Boolean`^?

#max_depth ⇒ `Integer`^?

#max_pages ⇒ `Integer`^?

#shorten_base64_images ⇒ `Boolean`^?

#url ⇒ `String`

#url_regex ⇒ `String`^?

#use_main_content_only ⇒ `Boolean`^?