Class: Request

Inherits:
Object
  • Object
show all
Defined in:
lib/Request.rb

Defined Under Namespace

Modules: InteractiveCloudflareRecovery Classes: CloudflareBlockedError

Constant Summary collapse

CLOUDFLARE_MITIGATION_VALUES =
%w[challenge block managed_challenge].freeze
CLOUDFLARE_RECOVERY_LIMIT =

Cap how many times a single self.URL call chain can fall through the Cloudflare-recovery branch, so a user who keeps saying yes to the prompt while Medium keeps blocking can’t loop forever.

5

Class Method Summary collapse

Class Method Details

.body(response) ⇒ Object



362
363
364
# File 'lib/Request.rb', line 362

def self.body(response)
  readBodyAsUTF8(response)
end

.cloudflareBlocked?(response) ⇒ Boolean

Cloudflare tags blocked responses via either the cf-mitigated header or the standard “Just a moment…” challenge HTML. We check both so we catch challenges even on Cloudflare deployments that don’t set the explicit header.

Returns:

  • (Boolean)


342
343
344
345
346
347
348
349
350
351
352
353
354
355
# File 'lib/Request.rb', line 342

def self.cloudflareBlocked?(response)
    return false if response.nil?
    code = response.code.to_i
    return false unless code == 403 || code == 503

    mitigated = response['cf-mitigated'].to_s.downcase
    return true if CLOUDFLARE_MITIGATION_VALUES.include?(mitigated)

    body = response.body.to_s
    return false if body.empty?
    body.include?('Just a moment...') ||
        body.include?('cf-error-details') ||
        body.include?('Attention Required')
end

.html(response) ⇒ Object



357
358
359
360
# File 'lib/Request.rb', line 357

def self.html(response)
  body = readBodyAsUTF8(response)
  body.nil? ? nil : Nokogiri::HTML(body)
end

.mediumProxiedURL(url) ⇒ Object

If the user has configured a Cloudflare Worker proxy via MEDIUM_HOST, rewrite any medium.com/<path> URL to <worker-origin>/<path> so non-GraphQL hits (iframe metadata at /media/<id>, OG-image fallback to /<user>/<post>, etc.) also benefit from the proxy. GraphQL callers already hand us the proxy URL directly via ENV, so they short-circuit the rewrite.



296
297
298
299
300
301
# File 'lib/Request.rb', line 296

def self.mediumProxiedURL(url)
    return url unless url.is_a?(String) && url.start_with?('https://medium.com/')
    origin = mediumProxyOrigin
    return url if origin.nil?
    url.sub(%r{\Ahttps://medium\.com}, origin)
end

.mediumProxyOriginObject

Extract the ‘<scheme>://<host>` of MEDIUM_HOST, or nil if no proxy is configured (or it still points at medium.com itself).



305
306
307
308
309
310
311
312
313
314
# File 'lib/Request.rb', line 305

def self.mediumProxyOrigin
    host = ENV['MEDIUM_HOST'].to_s
    return nil if host.empty?
    uri = URI.parse(host)
    return nil if uri.host.nil? || uri.host == 'medium.com'
    port = (uri.port && uri.port != uri.default_port) ? ":#{uri.port}" : ''
    "#{uri.scheme}://#{uri.host}#{port}"
rescue URI::InvalidURIError
    nil
end

.miroHostObject

Resolve the host the gem should use for miro.medium.com image fetches. Single-Worker setups: the same MEDIUM_HOST proxy handles both medium.com and miro.medium.com via path dispatch, so we always derive miro from MEDIUM_HOST’s origin. No proxy → upstream miro.medium.com.



320
321
322
# File 'lib/Request.rb', line 320

def self.miroHost
    mediumProxyOrigin || 'https://miro.medium.com'
end

.proxyURI?(uri) ⇒ Boolean

True iff ‘uri` is hosted by the configured Worker proxy — i.e. its host matches MEDIUM_HOST and MEDIUM_HOST is set to something other than upstream medium.com. Used to gate the MEDIUM_HOST_SECRET auth header so the secret only leaves the process when heading to the user’s own proxy.

Returns:

  • (Boolean)


329
330
331
332
333
334
335
336
# File 'lib/Request.rb', line 329

def self.proxyURI?(uri)
    return false if uri.nil? || uri.host.nil?
    envValue = ENV['MEDIUM_HOST'].to_s
    return false if envValue.empty?
    parsed = URI.parse(envValue) rescue nil
    return false if parsed.nil? || parsed.host.nil?
    parsed.host != 'medium.com' && parsed.host == uri.host
end

.readBodyAsUTF8(response) ⇒ Object

Net::HTTP#read_body returns ASCII-8BIT (binary). Without an explicit UTF-8 tag, downstream parsers misinterpret multi-byte sequences: Nokogiri’s encoding sniffer falls back to ISO-8859-1 for inline <script> contents, which then mojibakes the embedded JSON (e.g. CJK comes back as garbage like “使” instead of “使”).



371
372
373
374
375
376
377
# File 'lib/Request.rb', line 371

def self.readBodyAsUTF8(response)
  return nil if response.nil? || response.code.to_i != 200
  body = response.read_body
  return body if body.nil? || body.empty?
  body.force_encoding(Encoding::UTF_8)
  body
end

.URL(url, method = 'GET', data = nil, retryCount = 0) ⇒ Object



175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
# File 'lib/Request.rb', line 175

def self.URL(url, method = 'GET', data = nil, retryCount = 0)
    retryCount += 1
    url = mediumProxiedURL(url)

    uri = URI(url)
    https = Net::HTTP.new(uri.host, uri.port)
    https.use_ssl = true

    # --- TLS / Certificate verification setup ---
    # Some OpenSSL builds/configs enable CRL checking, which can fail with:
    # "certificate verify failed (unable to get certificate CRL)".
    # Net::HTTP/OpenSSL does not automatically fetch CRLs, so we use a default
    # cert store and clear CRL-related flags to avoid hard failures while still
    # verifying the peer certificate.
    https.verify_mode = OpenSSL::SSL::VERIFY_PEER

    store = OpenSSL::X509::Store.new
    store.set_default_paths
    # Ensure no CRL-check flags are enabled by default
    store.flags = 0
    https.cert_store = store

    # Allow overriding CA bundle paths via environment variables if needed.
    if ENV['SSL_CERT_FILE'] && !ENV['SSL_CERT_FILE'].empty?
      https.ca_file = ENV['SSL_CERT_FILE']
    end
    if ENV['SSL_CERT_DIR'] && !ENV['SSL_CERT_DIR'].empty?
      https.ca_path = ENV['SSL_CERT_DIR']
    end

    # (Optional) timeouts to avoid hanging on network issues
    https.open_timeout = 10
    https.read_timeout = 30
    # --- end TLS setup ---

    if method.upcase == "GET"
        request = Net::HTTP::Get.new(uri)
    else
        request = Net::HTTP::Post.new(uri)
        request['Content-Type'] = 'application/json'
        if !data.nil?
            request.body = JSON.dump(data)
        end
    end

    request['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 Edg/142.0.0.0';

    cookiesString = $cookies.reject { |_, value| value.nil? }
    .map { |key, value| "#{key}=#{value}" }
    .join("; ");

    if !cookiesString.nil? && cookiesString != ""
      request['Cookie'] = cookiesString;
    end

    # When the request is going to a configured Worker proxy (and only
    # then), attach the user's MEDIUM_HOST_SECRET as a header so the
    # Worker can authenticate the caller. Skipped for upstream
    # medium.com / miro.medium.com so the secret never leaks to Medium.
    if proxyURI?(uri) && (proxySecret = ENV['MEDIUM_HOST_SECRET'].to_s) && !proxySecret.empty?
        request['X-Medium-Proxy-Secret'] = proxySecret
    end

    response = https.request(request);

    setCookieString = response.get_fields('set-cookie');
    if !setCookieString.nil? && setCookieString != ""
      setCookies = setCookieString.map { |cookie| cookie.split('; ').first }.each_with_object({}) do |cookie, hash|
        key, value = cookie.split('=', 2) # Split by '=' into key and value
        hash[key] = value
      end;

      setCookies.each do |key, value|
        $cookies[key] = value
      end
    end

    if cloudflareBlocked?(response)
        # On every Cloudflare block — even when cookies are already
        # set — re-run the recovery flow on a TTY. ChromeAuth refreshes
        # sid/uid/cf_clearance/_cfuvid into $cookies + the cache, so
        # the next attempt usually succeeds. Bounded by retryCount so
        # a degenerate loop (user keeps clearing, Medium keeps blocking)
        # eventually surfaces the error. CI / non-TTY just raises.
        if retryCount <= CLOUDFLARE_RECOVERY_LIMIT && InteractiveCloudflareRecovery.available?
            if InteractiveCloudflareRecovery.run(url)
                return self.URL(url, method, data, retryCount)
            end
        end
        raise CloudflareBlockedError.new(response.code.to_i, url)
    end

    # 3XX Redirect
    if response.code.to_i == 429
      if retryCount >= 10
        raise "Error: Too Many Requests, blocked by Medium. URL: #{url}"
      else
        response = self.URL(url, method, data, retryCount);
      end
    elsif response.code.to_i >= 300 && response.code.to_i <= 399 && !response['location'].nil? && response['location'] != ''
        if retryCount >= 10
            raise "Error: Retry limit reached. URL: #{url}"
        else
            location = response['location']
            if !location.match? /^(http)/
                location = "#{uri.scheme}://#{uri.host}#{location}"
            end

            response = self.URL(location, method, data, retryCount)
        end
    end

    response
end