Class: LlmScraper::Scraper

Inherits:
Object
  • Object
show all
Defined in:
lib/llm_scraper/scraper.rb

Instance Method Summary collapse

Constructor Details

#initialize(schema:, config: LlmScraper.configuration) ⇒ Scraper

Returns a new instance of Scraper.

Parameters:



7
8
9
10
# File 'lib/llm_scraper/scraper.rb', line 7

def initialize(schema:, config: LlmScraper.configuration)
  @schema = normalize_schema(schema)
  @config = config
end

Instance Method Details

#extract(content) ⇒ Result

Extract from raw content — skips fetching step

Parameters:

  • content (String)

Returns:



31
32
33
34
# File 'lib/llm_scraper/scraper.rb', line 31

def extract(content)
  start = monotonic_now
  attach_timing(run_llm_pipeline(content: content), start)
end

#scrape(url, rescue_errors: false) ⇒ Result

Parameters:

  • url (String)
  • rescue_errors (Boolean) (defaults to: false)

    return error Result instead of raising

Returns:

Raises:



16
17
18
19
20
21
22
23
24
25
26
# File 'lib/llm_scraper/scraper.rb', line 16

def scrape(url, rescue_errors: false)
  start = monotonic_now
  result = run_pipeline(url: url)
  attach_timing(result, start)
rescue LlmScraper::Error => e
  raise unless rescue_errors

  Result.new(success: false, error: e.message, url: url,
             fetcher: @config.fetcher, provider: @config.llm_provider,
             model: @config.llm_model)
end

#scrape_batch(urls) ⇒ Array<Result>

Returns never raises — errors captured in result.error.

Parameters:

  • urls (Array<String>)

Returns:

  • (Array<Result>)

    never raises — errors captured in result.error



38
39
40
# File 'lib/llm_scraper/scraper.rb', line 38

def scrape_batch(urls)
  urls.map { |url| scrape(url, rescue_errors: true) }
end

#with_fetcher(fetcher) ⇒ Scraper

Returns new instance with swapped fetcher.

Parameters:

  • fetcher (Symbol)

    :jina | :firecrawl | :markdownify | :local

Returns:

  • (Scraper)

    new instance with swapped fetcher



50
51
52
# File 'lib/llm_scraper/scraper.rb', line 50

def with_fetcher(fetcher)
  self.class.new(schema: @schema, config: clone_config(fetcher: fetcher))
end

#with_provider(provider) ⇒ Scraper

Returns new instance with swapped LLM provider.

Parameters:

  • provider (Symbol)

    :openai_compatible | :anthropic

Returns:

  • (Scraper)

    new instance with swapped LLM provider



44
45
46
# File 'lib/llm_scraper/scraper.rb', line 44

def with_provider(provider)
  self.class.new(schema: @schema, config: clone_config(llm_provider: provider))
end