Class: Request

Inherits:

Object

Object
Request

show all

Defined in:: lib/Request.rb

Defined Under Namespace

Modules: InteractiveCloudflareRecovery Classes: CloudflareBlockedError

Constant Summary collapse

CLOUDFLARE_MITIGATION_VALUES =

%w[challenge block managed_challenge].freeze

CLOUDFLARE_RECOVERY_LIMIT = Cap how many times a single self.URL call chain can fall through the Cloudflare-recovery branch, so a user who keeps saying yes to the prompt while Medium keeps blocking can’t loop forever.

Class Method Summary collapse

.body(response) ⇒ Object
.cloudflareBlocked?(response) ⇒ Boolean

Cloudflare tags blocked responses via either the cf-mitigated header or the standard “Just a moment…” challenge HTML.
.html(response) ⇒ Object
.mediumGraphqlEndpoint ⇒ Object

GraphQL endpoint the gem should POST to.
.mediumProxiedURL(url) ⇒ Object

If the user has configured a Cloudflare Worker proxy via MEDIUM_HOST, rewrite any medium.com/<path> OR miro.medium.com/<path> URL to <worker-origin>/<path> so non-GraphQL hits (iframe metadata at /media/<id>, OG-image fallback to /<user>/<post>, miro image downloads, etc.) all benefit from the proxy.
.mediumProxyOrigin ⇒ Object

Extract the ‘<scheme>://<host>` of MEDIUM_HOST, or nil if no proxy is configured (or it still points at upstream medium.com).
.miroHost ⇒ Object

Resolve the host the gem should use for miro.medium.com image fetches.
.proxyURI?(uri) ⇒ Boolean

True iff ‘uri` is hosted by the configured Worker proxy — i.e.
.readBodyAsUTF8(response) ⇒ Object

Net::HTTP#read_body returns ASCII-8BIT (binary).
.URL(url, method = 'GET', data = nil, retryCount = 0) ⇒ Object

Class Method Details

.body(response) ⇒ `Object`



377
378
379

# File 'lib/Request.rb', line 377

def self.body(response)
  readBodyAsUTF8(response)
end

.cloudflareBlocked?(response) ⇒ `Boolean`

Cloudflare tags blocked responses via either the cf-mitigated header or the standard “Just a moment…” challenge HTML. We check both so we catch challenges even on Cloudflare deployments that don’t set the explicit header.

Returns:

(Boolean)

# File 'lib/Request.rb', line 357

def self.cloudflareBlocked?(response)
    return false if response.nil?
    code = response.code.to_i
    return false unless code == 403 || code == 503

    mitigated = response['cf-mitigated'].to_s.downcase
    return true if CLOUDFLARE_MITIGATION_VALUES.include?(mitigated)

    body = response.body.to_s
    return false if body.empty?
    body.include?('Just a moment...') ||
        body.include?('cf-error-details') ||
        body.include?('Attention Required')
end

.html(response) ⇒ `Object`

# File 'lib/Request.rb', line 372

def self.html(response)
  body = readBodyAsUTF8(response)
  body.nil? ? nil : Nokogiri::HTML(body)
end

.mediumGraphqlEndpoint ⇒ `Object`

GraphQL endpoint the gem should POST to. When MEDIUM_HOST configures a proxy, it’s <proxy-origin>/_/graphql regardless of whether the user set MEDIUM_HOST to the bare root or already with the /_/graphql suffix.

# File 'lib/Request.rb', line 327

def self.mediumGraphqlEndpoint
    origin = mediumProxyOrigin
    origin.nil? ? 'https://medium.com/_/graphql' : "#{origin}/_/graphql"
end

.mediumProxiedURL(url) ⇒ `Object`

If the user has configured a Cloudflare Worker proxy via MEDIUM_HOST, rewrite any medium.com/<path> OR miro.medium.com/<path> URL to <worker-origin>/<path> so non-GraphQL hits (iframe metadata at /media/<id>, OG-image fallback to /<user>/<post>, miro image downloads, etc.) all benefit from the proxy. GraphQL callers already hand us the proxy URL directly via mediumGraphqlEndpoint, so they short-circuit.

# File 'lib/Request.rb', line 296

def self.mediumProxiedURL(url)
    return url unless url.is_a?(String)
    origin = mediumProxyOrigin
    return url if origin.nil?
    if url.start_with?('https://medium.com/')
        url.sub(%r{\Ahttps://medium\.com}, origin)
    elsif url.start_with?('https://miro.medium.com/')
        url.sub(%r{\Ahttps://miro\.medium\.com}, origin)
    else
        url
    end
end

.mediumProxyOrigin ⇒ `Object`

Extract the ‘<scheme>://<host>` of MEDIUM_HOST, or nil if no proxy is configured (or it still points at upstream medium.com). Accepts MEDIUM_HOST in any form — bare root, with /_/graphql suffix, or any other path — only the origin matters here.

# File 'lib/Request.rb', line 313

def self.mediumProxyOrigin
    host = ENV['MEDIUM_HOST'].to_s
    return nil if host.empty?
    uri = URI.parse(host)
    return nil if uri.host.nil? || uri.host == 'medium.com' || uri.host == 'miro.medium.com'
    port = (uri.port && uri.port != uri.default_port) ? ":#{uri.port}" : ''
    "#{uri.scheme}://#{uri.host}#{port}"
rescue URI::InvalidURIError
    nil
end

.miroHost ⇒ `Object`

Resolve the host the gem should use for miro.medium.com image fetches. Single-Worker setups: the same MEDIUM_HOST proxy handles both medium.com and miro.medium.com via path dispatch, so we always derive miro from MEDIUM_HOST’s origin. No proxy → upstream miro.medium.com.



336
337
338

# File 'lib/Request.rb', line 336

def self.miroHost
    mediumProxyOrigin || 'https://miro.medium.com'
end

.proxyURI?(uri) ⇒ `Boolean`

True iff ‘uri` is hosted by the configured Worker proxy — i.e. its host matches MEDIUM_HOST’s origin. Used to gate the MEDIUM_HOST_SECRET auth header so the secret only leaves the process when heading to the user’s own proxy.

Returns:

(Boolean)

# File 'lib/Request.rb', line 344

def self.proxyURI?(uri)
    return false if uri.nil? || uri.host.nil?
    origin = mediumProxyOrigin
    return false if origin.nil?
    parsed = URI.parse(origin) rescue nil
    return false if parsed.nil? || parsed.host.nil?
    parsed.host == uri.host
end

.readBodyAsUTF8(response) ⇒ `Object`

Net::HTTP#read_body returns ASCII-8BIT (binary). Without an explicit UTF-8 tag, downstream parsers misinterpret multi-byte sequences: Nokogiri’s encoding sniffer falls back to ISO-8859-1 for inline <script> contents, which then mojibakes the embedded JSON (e.g. CJK comes back as garbage like “ä½¿” instead of “使”).

# File 'lib/Request.rb', line 386

def self.readBodyAsUTF8(response)
  return nil if response.nil? || response.code.to_i != 200
  body = response.read_body
  return body if body.nil? || body.empty?
  body.force_encoding(Encoding::UTF_8)
  body
end

.URL(url, method = 'GET', data = nil, retryCount = 0) ⇒ `Object`

# File 'lib/Request.rb', line 175

def self.URL(url, method = 'GET', data = nil, retryCount = 0)
    retryCount += 1
    url = mediumProxiedURL(url)

    uri = URI(url)
    https = Net::HTTP.new(uri.host, uri.port)
    https.use_ssl = true

    # --- TLS / Certificate verification setup ---
    # Some OpenSSL builds/configs enable CRL checking, which can fail with:
    # "certificate verify failed (unable to get certificate CRL)".
    # Net::HTTP/OpenSSL does not automatically fetch CRLs, so we use a default
    # cert store and clear CRL-related flags to avoid hard failures while still
    # verifying the peer certificate.
    https.verify_mode = OpenSSL::SSL::VERIFY_PEER

    store = OpenSSL::X509::Store.new
    store.set_default_paths
    # Ensure no CRL-check flags are enabled by default
    store.flags = 0
    https.cert_store = store

    # Allow overriding CA bundle paths via environment variables if needed.
    if ENV['SSL_CERT_FILE'] && !ENV['SSL_CERT_FILE'].empty?
      https.ca_file = ENV['SSL_CERT_FILE']
    end
    if ENV['SSL_CERT_DIR'] && !ENV['SSL_CERT_DIR'].empty?
      https.ca_path = ENV['SSL_CERT_DIR']
    end

    # (Optional) timeouts to avoid hanging on network issues
    https.open_timeout = 10
    https.read_timeout = 30
    # --- end TLS setup ---

    if method.upcase == "GET"
        request = Net::HTTP::Get.new(uri)
    else
        request = Net::HTTP::Post.new(uri)
        request['Content-Type'] = 'application/json'
        if !data.nil?
            request.body = JSON.dump(data)
        end
    end

    request['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 Edg/142.0.0.0';

    cookiesString = $cookies.reject { |_, value| value.nil? }
    .map { |key, value| "#{key}=#{value}" }
    .join("; ");

    if !cookiesString.nil? && cookiesString != ""
      request['Cookie'] = cookiesString;
    end

    # When the request is going to a configured Worker proxy (and only
    # then), attach the user's MEDIUM_HOST_SECRET as a header so the
    # Worker can authenticate the caller. Skipped for upstream
    # medium.com / miro.medium.com so the secret never leaks to Medium.
    if proxyURI?(uri) && (proxySecret = ENV['MEDIUM_HOST_SECRET'].to_s) && !proxySecret.empty?
        request['X-Medium-Proxy-Secret'] = proxySecret
    end

    response = https.request(request);

    setCookieString = response.get_fields('set-cookie');
    if !setCookieString.nil? && setCookieString != ""
      setCookies = setCookieString.map { |cookie| cookie.split('; ').first }.each_with_object({}) do |cookie, hash|
        key, value = cookie.split('=', 2) # Split by '=' into key and value
        hash[key] = value
      end;

      setCookies.each do |key, value|
        $cookies[key] = value
      end
    end

    if cloudflareBlocked?(response)
        # On every Cloudflare block — even when cookies are already
        # set — re-run the recovery flow on a TTY. ChromeAuth refreshes
        # sid/uid/cf_clearance/_cfuvid into $cookies + the cache, so
        # the next attempt usually succeeds. Bounded by retryCount so
        # a degenerate loop (user keeps clearing, Medium keeps blocking)
        # eventually surfaces the error. CI / non-TTY just raises.
        if retryCount <= CLOUDFLARE_RECOVERY_LIMIT && InteractiveCloudflareRecovery.available?
            if InteractiveCloudflareRecovery.run(url)
                return self.URL(url, method, data, retryCount)
            end
        end
        raise CloudflareBlockedError.new(response.code.to_i, url)
    end

    # 3XX Redirect
    if response.code.to_i == 429
      if retryCount >= 10
        raise "Error: Too Many Requests, blocked by Medium. URL: #{url}"
      else
        response = self.URL(url, method, data, retryCount);
      end
    elsif response.code.to_i >= 300 && response.code.to_i <= 399 && !response['location'].nil? && response['location'] != ''
        if retryCount >= 10
            raise "Error: Retry limit reached. URL: #{url}"
        else
            location = response['location']
            if !location.match? /^(http)/
                location = "#{uri.scheme}://#{uri.host}#{location}"
            end

            response = self.URL(location, method, data, retryCount)
        end
    end

    response
end

Class: Request

Defined Under Namespace

Constant Summary collapse

Class Method Summary collapse

Class Method Details

.body(response) ⇒ Object

.cloudflareBlocked?(response) ⇒ Boolean

.html(response) ⇒ Object

.mediumGraphqlEndpoint ⇒ Object

.mediumProxiedURL(url) ⇒ Object

.mediumProxyOrigin ⇒ Object

.miroHost ⇒ Object

.proxyURI?(uri) ⇒ Boolean

.readBodyAsUTF8(response) ⇒ Object

.URL(url, method = 'GET', data = nil, retryCount = 0) ⇒ Object