Module: WebStruct

Defined in:
lib/webstruct.rb,
lib/webstruct/http.rb,
lib/webstruct/page.rb,
lib/webstruct/errors.rb,
lib/webstruct/version.rb,
lib/webstruct/http/url.rb,
lib/webstruct/http/mime.rb,
lib/webstruct/http/shell.rb,
lib/webstruct/page/content.rb,
lib/webstruct/page/content_type.rb,
lib/webstruct/page/csv_header_sniffer.rb

Overview

Entry point for fetching page content over HTTP.

Defined Under Namespace

Modules: Http Classes: BodyTooLargeError, InvalidUrlError, JavaScriptRequiredError, Page, ParseError

Constant Summary collapse

VERSION =

Semantic version of the installed gem (e.g. for diagnostics and gemspec).

"0.1.0"

Class Method Summary collapse

Class Method Details

.scrape(url) ⇒ Page

Fetches a URL over HTTP and returns a Page built from the response body.

Parameters:

  • url (String)

    absolute http(s) URL

  • options (Hash)

    keyword arguments forwarded to WebStruct::Http.get (e.g. :user_agent, :max_redirects, :max_body_bytes — positive Integer, or omit for no limit)

Returns:

Raises:

  • (InvalidUrlError)

    from Url.verify! when the URL is invalid

  • (ArgumentError)

    when :max_body_bytes is present but not a positive Integer

  • (BodyTooLargeError)

    when the response body exceeds :max_body_bytes

  • (JavaScriptRequiredError)

    from Shell.detect! when the HTML looks like a JS shell

  • (ParseError)

    when JSON bodies are invalid



16
17
18
# File 'lib/webstruct.rb', line 16

def self.scrape(url, **)
  Http.get(url, **)
end