Class: ContextDev::Models::WebWebScrapeMdParams

Inherits:
Internal::Type::BaseModel show all
Extended by:
Internal::Type::RequestParameters::Converter
Includes:
Internal::Type::RequestParameters
Defined in:
lib/context_dev/models/web_web_scrape_md_params.rb

Overview

Defined Under Namespace

Modules: Country Classes: Pdf

Instance Attribute Summary collapse

Attributes included from Internal::Type::RequestParameters

#request_options

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Internal::Type::RequestParameters::Converter

dump_request

Methods included from Internal::Type::RequestParameters

included

Methods inherited from Internal::Type::BaseModel

==, #==, #[], coerce, #deconstruct_keys, #deep_to_h, dump, fields, hash, #hash, inherited, inspect, #inspect, known_fields, optional, recursively_to_h, required, #to_h, #to_json, #to_s, to_sorbet_type, #to_yaml

Methods included from Internal::Type::Converter

#coerce, coerce, #dump, dump, #inspect, inspect, meta_info, new_coerce_state, type_info

Methods included from Internal::Util::SorbetRuntimeSupport

#const_missing, #define_sorbet_constant!, #sorbet_constant_defined?, #to_sorbet_type, to_sorbet_type

Constructor Details

#initialize(url:, country: nil, exclude_selectors: nil, headers: nil, include_frames: nil, include_images: nil, include_links: nil, include_selectors: nil, max_age_ms: nil, pdf: nil, shorten_base64_images: nil, timeout_ms: nil, use_main_content_only: nil, wait_for_ms: nil, request_options: {}) ⇒ Object

Some parameter documentations has been truncated, see ContextDev::Models::WebWebScrapeMdParams for more details.

Parameters:

  • url (String)

    Full URL to scrape into LLM usable Markdown (must include http:// or https:// pr

  • country (Symbol, ContextDev::Models::WebWebScrapeMdParams::Country) (defaults to: nil)

    Two-letter ISO 3166-1 alpha-2 country code for the website request location. Whe

  • exclude_selectors (Array<String>) (defaults to: nil)

    CSS selectors to remove before conversion to Markdown. Applied after includeSele

  • headers (Hash{Symbol=>String}) (defaults to: nil)

    Optional outbound HTTP headers forwarded only to the target URL, sent as deep-ob

  • include_frames (Boolean) (defaults to: nil)

    When true, the contents of iframes are rendered to Markdown.

  • include_images (Boolean) (defaults to: nil)

    Include image references in Markdown output

  • include_links (Boolean) (defaults to: nil)

    Preserve hyperlinks in Markdown output

  • include_selectors (Array<String>) (defaults to: nil)

    CSS selectors. When provided, only matching HTML subtrees (and their descendants

  • max_age_ms (Integer) (defaults to: nil)

    Return a cached result if a prior scrape for the same parameters exists and is y

  • pdf (ContextDev::Models::WebWebScrapeMdParams::Pdf) (defaults to: nil)

    PDF parsing controls. Use start/end to limit text extraction and OCR to an inclu

  • shorten_base64_images (Boolean) (defaults to: nil)

    Shorten base64-encoded image data in the Markdown output

  • timeout_ms (Integer) (defaults to: nil)

    Optional timeout in milliseconds for the request. If the request takes longer th

  • use_main_content_only (Boolean) (defaults to: nil)

    Extract only the main content of the page, excluding headers, footers, sidebars,

  • wait_for_ms (Integer) (defaults to: nil)

    Optional browser wait time in milliseconds after initial page load before conver

  • request_options (ContextDev::RequestOptions, Hash{Symbol=>Object}) (defaults to: {})


# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 109

Instance Attribute Details

#countrySymbol, ...

Two-letter ISO 3166-1 alpha-2 country code for the website request location. When provided, Context.dev fetches the target page from that country.



22
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 22

optional :country, enum: -> { ContextDev::WebWebScrapeMdParams::Country }

#exclude_selectorsArray<String>?

CSS selectors to remove before conversion to Markdown. Applied after includeSelectors. Exclusion takes precedence: an element matching both is removed. Examples: “nav”, “footer”, “.ad-banner”, “[aria-hidden=true]”.

Returns:

  • (Array<String>, nil)


30
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 30

optional :exclude_selectors, ContextDev::Internal::Type::ArrayOf[String]

#headersHash{Symbol=>String}?

Optional outbound HTTP headers forwarded only to the target URL, sent as deep-object query params such as headers=value. When provided, caching is bypassed: the result is neither read from nor written to cache.

Returns:

  • (Hash{Symbol=>String}, nil)


38
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 38

optional :headers, ContextDev::Internal::Type::HashOf[String]

#include_framesBoolean?

When true, the contents of iframes are rendered to Markdown.

Returns:

  • (Boolean, nil)


44
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 44

optional :include_frames, ContextDev::Internal::Type::Boolean

#include_imagesBoolean?

Include image references in Markdown output

Returns:

  • (Boolean, nil)


50
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 50

optional :include_images, ContextDev::Internal::Type::Boolean

Preserve hyperlinks in Markdown output

Returns:

  • (Boolean, nil)


56
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 56

optional :include_links, ContextDev::Internal::Type::Boolean

#include_selectorsArray<String>?

CSS selectors. When provided, only matching HTML subtrees (and their descendants) are kept before conversion to Markdown. When omitted, the entire document is kept. Examples: “article.main”, “#content”, “[role=main]”.

Returns:

  • (Array<String>, nil)


64
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 64

optional :include_selectors, ContextDev::Internal::Type::ArrayOf[String]

#max_age_msInteger?

Return a cached result if a prior scrape for the same parameters exists and is younger than this many milliseconds. Defaults to 1 day (86400000 ms) when omitted. Max is 30 days (2592000000 ms). Set to 0 to always scrape fresh.

Returns:

  • (Integer, nil)


72
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 72

optional :max_age_ms, Integer

#pdfContextDev::Models::WebWebScrapeMdParams::Pdf?

PDF parsing controls. Use start/end to limit text extraction and OCR to an inclusive 1-based page range.



79
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 79

optional :pdf, -> { ContextDev::WebWebScrapeMdParams::Pdf }

#shorten_base64_imagesBoolean?

Shorten base64-encoded image data in the Markdown output

Returns:

  • (Boolean, nil)


85
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 85

optional :shorten_base64_images, ContextDev::Internal::Type::Boolean

#timeout_msInteger?

Optional timeout in milliseconds for the request. If the request takes longer than this value, it will be aborted with a 408 status code. Maximum allowed value is 300000ms (5 minutes).

Returns:

  • (Integer, nil)


93
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 93

optional :timeout_ms, Integer

#urlString

Full URL to scrape into LLM usable Markdown (must include http:// or https:// protocol)

Returns:

  • (String)


15
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 15

required :url, String

#use_main_content_onlyBoolean?

Extract only the main content of the page, excluding headers, footers, sidebars, and navigation

Returns:

  • (Boolean, nil)


100
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 100

optional :use_main_content_only, ContextDev::Internal::Type::Boolean

#wait_for_msInteger?

Optional browser wait time in milliseconds after initial page load before converting the page to Markdown. Min: 0. Max: 30000 (30 seconds).

Returns:

  • (Integer, nil)


107
# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 107

optional :wait_for_ms, Integer

Class Method Details

.valuesArray<Symbol>

Returns:

  • (Array<Symbol>)


# File 'lib/context_dev/models/web_web_scrape_md_params.rb', line 353