Module: Html2rss::BlockedSurface
- Defined in:
- lib/html2rss/blocked_surface.rb
Overview
Shared anti-bot/interstitial signatures used by request and auto-source flows.
This module centralizes signature matching so request-time guards and auto-source surface classification stay consistent.
Constant Summary collapse
- INTERSTITIAL_SIGNATURES =
Known interstitial fingerprints used to detect blocked or anti-bot surfaces.
[ { key: :cloudflare_interstitial, min_matches: 2, patterns: [ %r{<title>\s*just a moment\.\.\.\s*</title>}i, /checking your browser before accessing/i, /please (?:enable|turn on) javascript and cookies/i, %r{cdn-cgi/challenge-platform}i, /cloudflare ray id/i ], message: 'Blocked surface detected: Cloudflare anti-bot interstitial page. ' \ 'Retry with --strategy browserless, try a more specific public listing URL, ' \ 'or run from an environment that can complete anti-bot checks.' } ].freeze
Class Method Summary collapse
-
.interstitial?(body) ⇒ Boolean
True when body matches a known interstitial signature.
-
.interstitial_signature_for(body) ⇒ Hash?
Returns the first matching interstitial signature for the provided body.
Class Method Details
.interstitial?(body) ⇒ Boolean
Returns true when body matches a known interstitial signature.
41 42 43 |
# File 'lib/html2rss/blocked_surface.rb', line 41 def self.interstitial?(body) !interstitial_signature_for(body).nil? end |
.interstitial_signature_for(body) ⇒ Hash?
Returns the first matching interstitial signature for the provided body.
33 34 35 36 |
# File 'lib/html2rss/blocked_surface.rb', line 33 def self.interstitial_signature_for(body) normalized_body = normalize_body(body) INTERSTITIAL_SIGNATURES.find { |signature| interstitial_signature_match?(normalized_body, signature) } end |