Class: Legate::Tools::ReadWebpage

Inherits:
Legate::Tool show all
Includes:
Base::HttpClient
Defined in:
lib/legate/tools/read_webpage_tool.rb

Overview

Fetches a web page and returns its readable text content with markup removed.

This is the backbone of research/RAG agents: give it a URL and it returns the page title and plain text (script/style stripped, entities decoded, whitespace collapsed), capped to a sane size. SSRF-safe via Base::SafeUrl.

Constant Summary collapse

DEFAULT_MAX_CHARS =
20_000
HARD_MAX_CHARS =
200_000
ENTITIES =
{ '&amp;' => '&', '&lt;' => '<', '&gt;' => '>', '&quot;' => '"',
'&#39;' => "'", '&apos;' => "'", '&nbsp;' => ' ' }.freeze

Instance Attribute Summary

Attributes included from Base::HttpClient

#http_base_url, #http_client

Attributes inherited from Legate::Tool

#description, #name, #parameters

Instance Method Summary collapse

Methods included from Base::HttpClient

#http_delete, #http_get, #http_head, #http_post, #http_put

Methods inherited from Legate::Tool

define_metadata, #execute, inherited, #validate_and_coerce_params, #validate_params

Methods included from Legate::Tool::MetadataDsl

included

Constructor Details

#initialize(**options) ⇒ ReadWebpage

Returns a new instance of ReadWebpage.



32
33
34
35
# File 'lib/legate/tools/read_webpage_tool.rb', line 32

def initialize(**options)
  super(**options)
  setup_http_client(base_url: 'https://placeholder.invalid')
end