Class: LlmDocsBuilder::UrlFetcher

Inherits:
Object
  • Object
show all
Defined in:
lib/llm_docs_builder/url_fetcher.rb

Overview

Lightweight HTTP client for fetching remote documentation pages.

Provides common functionality needed by multiple commands (transform, compare) including strict scheme validation, redirect handling and sensible timeouts.

Constant Summary collapse

DEFAULT_USER_AGENT =

Default user agent string for HTTP requests

'llm-docs-builder/1.0 (+https://github.com/mensfeld/llm-docs-builder)'
MAX_REDIRECTS =

Maximum number of redirects to follow

10

Instance Method Summary collapse

Constructor Details

#initialize(user_agent: DEFAULT_USER_AGENT, verbose: false, output: $stdout) ⇒ UrlFetcher

Returns a new instance of UrlFetcher.

Parameters:

  • user_agent (String) (defaults to: DEFAULT_USER_AGENT)

    HTTP user agent header value

  • verbose (Boolean) (defaults to: false)

    enable redirect logging

  • output (IO) (defaults to: $stdout)

    IO stream used for redirect logging



21
22
23
24
25
# File 'lib/llm_docs_builder/url_fetcher.rb', line 21

def initialize(user_agent: DEFAULT_USER_AGENT, verbose: false, output: $stdout)
  @user_agent = user_agent
  @verbose = verbose
  @output = output
end

Instance Method Details

#fetch(url_string, redirect_count = 0) ⇒ String

Fetch remote URL content while following redirects.

Parameters:

  • url_string (String)

    URL to fetch

  • redirect_count (Integer) (defaults to: 0)

    current redirect depth (internal use)

Returns:

  • (String)

    response body

Raises:



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
# File 'lib/llm_docs_builder/url_fetcher.rb', line 33

def fetch(url_string, redirect_count = 0)
  if redirect_count >= MAX_REDIRECTS
    raise(
      Errors::GenerationError,
      "Too many redirects (#{MAX_REDIRECTS}) when fetching #{url_string}"
    )
  end

  uri = validate_and_parse_url(url_string)

  http = Net::HTTP.new(uri.host, uri.port)
  http.use_ssl = uri.scheme == 'https'
  http.open_timeout = 10
  http.read_timeout = 30

  request = Net::HTTP::Get.new(uri.request_uri)
  request['User-Agent'] = @user_agent

  response = http.request(request)

  case response
  when Net::HTTPSuccess
    response.body
  when Net::HTTPRedirection
    redirect_url = absolute_redirect_url(uri, response['location'])
    log_redirect(redirect_url)
    fetch(redirect_url, redirect_count + 1)
  else
    raise(
      Errors::GenerationError,
      "Failed to fetch #{url_string}: #{response.code} #{response.message}"
    )
  end
rescue Errors::GenerationError
  raise
rescue StandardError => e
  raise(
    Errors::GenerationError,
    "Error fetching #{url_string}: #{e.message}"
  )
end