Class: Radioactive::Fetcher

Inherits:
Object
  • Object
show all
Defined in:
lib/radioactive/fetcher.rb

Constant Summary collapse

REDIRECT_STATUSES =
[301, 302, 303, 307, 308].freeze
RESERVED_HEADERS =
%w[host user-agent accept-encoding].freeze
CHUNK_SIZE =
16 * 1024
DEFAULT_USER_AGENT =
"Radioactive/#{Radioactive::VERSION}"
NUMERIC_ONLY_HOST =

Single-label hosts that are entirely digits or 0x-prefix hex are not valid RFC 1123 hostnames; they’re SSRF-bypass attempts that some libc getaddrinfo implementations historically resolved as IPs.

/\A(\d+|0x[\da-f]+)\z/i
HEADER_INVALID_CHAR =

CRLF and NUL are illegal in HTTP header names and values (RFC 9110); caller-supplied input containing these is a header-injection attempt.

/[\r\n\0]/
DEFAULTS =
{
  schemes: %w[http https].freeze,
  max_size: 2_097_152,
  open_timeout: 5,
  read_timeout: 10,
  total_timeout: 30,
  max_redirects: 3,
  accept_encoding: "identity",
  user_agent: DEFAULT_USER_AGENT,
  private_ranges: AddressCheck::DEFAULT_PRIVATE_RANGES,
  allow_private: false,
  allow_credentials: false,
  headers: {}.freeze
}.freeze

Instance Method Summary collapse

Constructor Details

#initialize(**opts) ⇒ Fetcher

Returns a new instance of Fetcher.



42
43
44
45
46
47
# File 'lib/radioactive/fetcher.rb', line 42

def initialize(**opts)
  validate_opts!(opts)
  @opts = DEFAULTS.merge(opts)
  @resolver = opts[:resolver] || Resolv
  @clock = opts[:clock] || MonotonicClock
end

Instance Method Details

#fetch(url, **call_opts) ⇒ Object



49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/radioactive/fetcher.rb', line 49

def fetch(url, **call_opts)
  body = String.new(capacity: CHUNK_SIZE)
  meta = run_streaming(url, call_opts) { |chunk| body << chunk }
  Result.new(
    url: meta[:url],
    final_url: meta[:final_url],
    status: meta[:status],
    headers: meta[:headers],
    body: body,
    hops: meta[:hops]
  )
end

#open(url, **call_opts) ⇒ Object

No-block form returns a StringIO of the fully-buffered body (size-capped at max_size; matches ‘URI.open` semantics). Block form streams chunks straight to a Tempfile and yields it rewound, so peak memory per fetch is ~CHUNK_SIZE rather than max_size — useful for high-concurrency or low-RAM callers.



66
67
68
69
70
71
72
73
74
75
76
77
78
79
# File 'lib/radioactive/fetcher.rb', line 66

def open(url, **call_opts)
  return StringIO.new(fetch(url, **call_opts).body) unless block_given?

  io = Tempfile.new("radioactive")
  io.binmode
  begin
    run_streaming(url, call_opts) { |chunk| io.write(chunk) }
    io.rewind
    yield io
  ensure
    io.close
    io.unlink
  end
end