Class: Firecrawl::Client
- Inherits:
-
Object
- Object
- Firecrawl::Client
- Defined in:
- lib/firecrawl/client.rb
Overview
Client for the Firecrawl v2 API.
Constant Summary collapse
- DEFAULT_API_URL =
"https://api.firecrawl.dev"- DEFAULT_TIMEOUT =
seconds
300- DEFAULT_MAX_RETRIES =
3- DEFAULT_BACKOFF_FACTOR =
0.5- DEFAULT_POLL_INTERVAL =
seconds
2- DEFAULT_JOB_TIMEOUT =
seconds
300
Class Method Summary collapse
-
.from_env ⇒ Client
Creates a client from the FIRECRAWL_API_KEY environment variable.
Instance Method Summary collapse
-
#agent(options, poll_interval: DEFAULT_POLL_INTERVAL, timeout: DEFAULT_JOB_TIMEOUT) ⇒ Models::AgentStatusResponse
Runs an agent task and waits for completion (auto-polling).
-
#batch_scrape(urls, options = nil, poll_interval: DEFAULT_POLL_INTERVAL, timeout: DEFAULT_JOB_TIMEOUT) ⇒ Models::BatchScrapeJob
Batch-scrapes URLs and waits for completion (auto-polling).
-
#cancel_agent(job_id) ⇒ Hash
Cancels a running agent task.
-
#cancel_batch_scrape(job_id) ⇒ Hash
Cancels a running batch scrape job.
-
#cancel_crawl(job_id) ⇒ Hash
Cancels a running crawl job.
-
#crawl(url, options = nil, poll_interval: DEFAULT_POLL_INTERVAL, timeout: DEFAULT_JOB_TIMEOUT) ⇒ Models::CrawlJob
Crawls a website and waits for completion (auto-polling).
-
#get_agent_status(job_id) ⇒ Models::AgentStatusResponse
Gets the status of an agent task.
-
#get_batch_scrape_status(job_id) ⇒ Models::BatchScrapeJob
Gets the status and results of a batch scrape job.
-
#get_concurrency ⇒ Models::ConcurrencyCheck
Gets current concurrency usage.
-
#get_crawl_errors(job_id) ⇒ Hash
Gets errors from a crawl job.
-
#get_crawl_status(job_id) ⇒ Models::CrawlJob
Gets the status and results of a crawl job.
-
#get_credit_usage ⇒ Models::CreditUsage
Gets current credit usage.
-
#initialize(api_key: nil, api_url: nil, timeout: DEFAULT_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, backoff_factor: DEFAULT_BACKOFF_FACTOR) ⇒ Client
constructor
Creates a new Firecrawl client.
-
#interact(job_id, code, language: "node", timeout: nil) ⇒ Hash
Interacts with the scrape-bound browser session for a scrape job.
-
#map(url, options = nil) ⇒ Models::MapData
Discovers URLs on a website.
-
#scrape(url, options = nil) ⇒ Models::Document
Scrapes a single URL and returns the document.
-
#search(query, options = nil) ⇒ Models::SearchData
Performs a web search.
-
#start_agent(options) ⇒ Models::AgentResponse
Starts an async agent task.
-
#start_batch_scrape(urls, options = nil) ⇒ Models::BatchScrapeResponse
Starts an async batch scrape job.
-
#start_crawl(url, options = nil) ⇒ Models::CrawlResponse
Starts an async crawl job and returns immediately.
-
#stop_interactive_browser(job_id) ⇒ Hash
Stops the interactive browser session for a scrape job.
Constructor Details
#initialize(api_key: nil, api_url: nil, timeout: DEFAULT_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, backoff_factor: DEFAULT_BACKOFF_FACTOR) ⇒ Client
Creates a new Firecrawl client.
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
# File 'lib/firecrawl/client.rb', line 31 def initialize( api_key: nil, api_url: nil, timeout: DEFAULT_TIMEOUT, max_retries: DEFAULT_MAX_RETRIES, backoff_factor: DEFAULT_BACKOFF_FACTOR ) resolved_key = api_key || ENV["FIRECRAWL_API_KEY"] if resolved_key.nil? || resolved_key.strip.empty? raise FirecrawlError, "API key is required. Provide api_key: or set FIRECRAWL_API_KEY environment variable." end resolved_url = api_url || ENV["FIRECRAWL_API_URL"] || DEFAULT_API_URL unless resolved_url.match?(%r{\Ahttps?://}i) raise FirecrawlError, "API URL must be a fully qualified HTTP or HTTPS URL (got: #{resolved_url})." end @http = HttpClient.new( api_key: resolved_key, base_url: resolved_url, timeout: timeout, max_retries: max_retries, backoff_factor: backoff_factor ) end |
Class Method Details
.from_env ⇒ Client
Creates a client from the FIRECRAWL_API_KEY environment variable.
60 61 62 |
# File 'lib/firecrawl/client.rb', line 60 def self.from_env new end |
Instance Method Details
#agent(options, poll_interval: DEFAULT_POLL_INTERVAL, timeout: DEFAULT_JOB_TIMEOUT) ⇒ Models::AgentStatusResponse
Runs an agent task and waits for completion (auto-polling).
304 305 306 307 308 309 310 311 312 313 314 315 316 |
# File 'lib/firecrawl/client.rb', line 304 def agent(, poll_interval: DEFAULT_POLL_INTERVAL, timeout: DEFAULT_JOB_TIMEOUT) start = start_agent() raise FirecrawlError, "Agent start did not return a job ID" if start.id.nil? deadline = Time.now + timeout while Time.now < deadline status = get_agent_status(start.id) return status if status.done? sleep(poll_interval) end raise JobTimeoutError.new(start.id, timeout, "Agent") end |
#batch_scrape(urls, options = nil, poll_interval: DEFAULT_POLL_INTERVAL, timeout: DEFAULT_JOB_TIMEOUT) ⇒ Models::BatchScrapeJob
Batch-scrapes URLs and waits for completion (auto-polling).
219 220 221 222 |
# File 'lib/firecrawl/client.rb', line 219 def batch_scrape(urls, = nil, poll_interval: DEFAULT_POLL_INTERVAL, timeout: DEFAULT_JOB_TIMEOUT) start = start_batch_scrape(urls, ) poll_batch_scrape(start.id, poll_interval, timeout) end |
#cancel_agent(job_id) ⇒ Hash
Cancels a running agent task.
322 323 324 325 326 |
# File 'lib/firecrawl/client.rb', line 322 def cancel_agent(job_id) raise ArgumentError, "Job ID is required" if job_id.nil? @http.delete("/v2/agent/#{job_id}") end |
#cancel_batch_scrape(job_id) ⇒ Hash
Cancels a running batch scrape job.
228 229 230 231 232 |
# File 'lib/firecrawl/client.rb', line 228 def cancel_batch_scrape(job_id) raise ArgumentError, "Job ID is required" if job_id.nil? @http.delete("/v2/batch/scrape/#{job_id}") end |
#cancel_crawl(job_id) ⇒ Hash
Cancels a running crawl job.
154 155 156 157 158 |
# File 'lib/firecrawl/client.rb', line 154 def cancel_crawl(job_id) raise ArgumentError, "Job ID is required" if job_id.nil? @http.delete("/v2/crawl/#{job_id}") end |
#crawl(url, options = nil, poll_interval: DEFAULT_POLL_INTERVAL, timeout: DEFAULT_JOB_TIMEOUT) ⇒ Models::CrawlJob
Crawls a website and waits for completion (auto-polling).
145 146 147 148 |
# File 'lib/firecrawl/client.rb', line 145 def crawl(url, = nil, poll_interval: DEFAULT_POLL_INTERVAL, timeout: DEFAULT_JOB_TIMEOUT) start = start_crawl(url, ) poll_crawl(start.id, poll_interval, timeout) end |
#get_agent_status(job_id) ⇒ Models::AgentStatusResponse
Gets the status of an agent task.
291 292 293 294 295 296 |
# File 'lib/firecrawl/client.rb', line 291 def get_agent_status(job_id) raise ArgumentError, "Job ID is required" if job_id.nil? raw = @http.get("/v2/agent/#{job_id}") Models::AgentStatusResponse.new(raw) end |
#get_batch_scrape_status(job_id) ⇒ Models::BatchScrapeJob
Gets the status and results of a batch scrape job.
205 206 207 208 209 210 |
# File 'lib/firecrawl/client.rb', line 205 def get_batch_scrape_status(job_id) raise ArgumentError, "Job ID is required" if job_id.nil? raw = @http.get("/v2/batch/scrape/#{job_id}") Models::BatchScrapeJob.new(raw) end |
#get_concurrency ⇒ Models::ConcurrencyCheck
Gets current concurrency usage.
335 336 337 338 |
# File 'lib/firecrawl/client.rb', line 335 def get_concurrency raw = @http.get("/v2/concurrency-check") Models::ConcurrencyCheck.new(raw) end |
#get_crawl_errors(job_id) ⇒ Hash
Gets errors from a crawl job.
164 165 166 167 168 |
# File 'lib/firecrawl/client.rb', line 164 def get_crawl_errors(job_id) raise ArgumentError, "Job ID is required" if job_id.nil? @http.get("/v2/crawl/#{job_id}/errors") end |
#get_crawl_status(job_id) ⇒ Models::CrawlJob
Gets the status and results of a crawl job.
131 132 133 134 135 136 |
# File 'lib/firecrawl/client.rb', line 131 def get_crawl_status(job_id) raise ArgumentError, "Job ID is required" if job_id.nil? raw = @http.get("/v2/crawl/#{job_id}") Models::CrawlJob.new(raw) end |
#get_credit_usage ⇒ Models::CreditUsage
Gets current credit usage.
343 344 345 346 |
# File 'lib/firecrawl/client.rb', line 343 def get_credit_usage raw = @http.get("/v2/team/credit-usage") Models::CreditUsage.new(raw) end |
#interact(job_id, code, language: "node", timeout: nil) ⇒ Hash
Interacts with the scrape-bound browser session for a scrape job.
90 91 92 93 94 95 96 97 |
# File 'lib/firecrawl/client.rb', line 90 def interact(job_id, code, language: "node", timeout: nil) raise ArgumentError, "Job ID is required" if job_id.nil? raise ArgumentError, "Code is required" if code.nil? body = { "code" => code, "language" => language } body["timeout"] = timeout if timeout @http.post("/v2/scrape/#{job_id}/interact", body) end |
#map(url, options = nil) ⇒ Models::MapData
Discovers URLs on a website.
243 244 245 246 247 248 249 250 251 |
# File 'lib/firecrawl/client.rb', line 243 def map(url, = nil) raise ArgumentError, "URL is required" if url.nil? body = { "url" => url } body.merge!(.to_h) if raw = @http.post("/v2/map", body) data = raw["data"] || raw Models::MapData.new(data) end |
#scrape(url, options = nil) ⇒ Models::Document
Scrapes a single URL and returns the document.
73 74 75 76 77 78 79 80 81 |
# File 'lib/firecrawl/client.rb', line 73 def scrape(url, = nil) raise ArgumentError, "URL is required" if url.nil? body = { "url" => url } body.merge!(.to_h) if raw = @http.post("/v2/scrape", body) data = raw["data"] || raw Models::Document.new(data) end |
#search(query, options = nil) ⇒ Models::SearchData
Performs a web search.
262 263 264 265 266 267 268 269 270 |
# File 'lib/firecrawl/client.rb', line 262 def search(query, = nil) raise ArgumentError, "Query is required" if query.nil? body = { "query" => query } body.merge!(.to_h) if raw = @http.post("/v2/search", body) data = raw["data"] || raw Models::SearchData.new(data) end |
#start_agent(options) ⇒ Models::AgentResponse
Starts an async agent task.
280 281 282 283 284 285 |
# File 'lib/firecrawl/client.rb', line 280 def start_agent() raise ArgumentError, "Agent options are required" if .nil? raw = @http.post("/v2/agent", .to_h) Models::AgentResponse.new(raw) end |
#start_batch_scrape(urls, options = nil) ⇒ Models::BatchScrapeResponse
Starts an async batch scrape job.
179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
# File 'lib/firecrawl/client.rb', line 179 def start_batch_scrape(urls, = nil) raise ArgumentError, "URLs list is required" if urls.nil? body = { "urls" => urls } extra_headers = {} if opts_hash = .to_h # idempotencyKey goes as a header, not in body if .idempotency_key && !.idempotency_key.empty? extra_headers["x-idempotency-key"] = .idempotency_key end # Flatten nested scrape options to top level (API expects this) nested = opts_hash.delete("options") body.merge!(opts_hash) body.merge!(nested) if nested end raw = @http.post("/v2/batch/scrape", body, extra_headers: extra_headers) Models::BatchScrapeResponse.new(raw) end |
#start_crawl(url, options = nil) ⇒ Models::CrawlResponse
Starts an async crawl job and returns immediately.
118 119 120 121 122 123 124 125 |
# File 'lib/firecrawl/client.rb', line 118 def start_crawl(url, = nil) raise ArgumentError, "URL is required" if url.nil? body = { "url" => url } body.merge!(.to_h) if raw = @http.post("/v2/crawl", body) Models::CrawlResponse.new(raw) end |
#stop_interactive_browser(job_id) ⇒ Hash
Stops the interactive browser session for a scrape job.
103 104 105 106 107 |
# File 'lib/firecrawl/client.rb', line 103 def stop_interactive_browser(job_id) raise ArgumentError, "Job ID is required" if job_id.nil? @http.delete("/v2/scrape/#{job_id}/interact") end |