Module: Html2rss::BlockedSurface

Defined in:
lib/html2rss/blocked_surface.rb

Overview

Shared anti-bot/interstitial signatures used by request and auto-source flows.

This module centralizes signature matching so request-time guards and auto-source surface classification stay consistent.

Constant Summary collapse

INTERSTITIAL_SIGNATURES =

Known interstitial fingerprints used to detect blocked or anti-bot surfaces.

[
  {
    key: :cloudflare_interstitial,
    min_matches: 2,
    patterns: [
      %r{<title>\s*just a moment\.\.\.\s*</title>}i,
      /checking your browser before accessing/i,
      /please (?:enable|turn on) javascript and cookies/i,
      %r{cdn-cgi/challenge-platform}i,
      /cloudflare ray id/i
    ],
    message: 'Blocked surface detected: Cloudflare anti-bot interstitial page. ' \
             'Retry with --strategy browserless, try a more specific public listing URL, ' \
             'or run from an environment that can complete anti-bot checks.'
  }
].freeze

Class Method Summary collapse

Class Method Details

.interstitial?(body) ⇒ Boolean

Returns true when body matches a known interstitial signature.

Parameters:

  • body (String, nil)

    response body candidate

Returns:

  • (Boolean)

    true when body matches a known interstitial signature



41
42
43
# File 'lib/html2rss/blocked_surface.rb', line 41

def self.interstitial?(body)
  !interstitial_signature_for(body).nil?
end

.interstitial_signature_for(body) ⇒ Hash?

Returns the first matching interstitial signature for the provided body.

Parameters:

  • body (String, nil)

    response body candidate

Returns:

  • (Hash, nil)

    signature hash when matched, otherwise nil



33
34
35
36
# File 'lib/html2rss/blocked_surface.rb', line 33

def self.interstitial_signature_for(body)
  normalized_body = normalize_body(body)
  INTERSTITIAL_SIGNATURES.find { |signature| interstitial_signature_match?(normalized_body, signature) }
end