Class: Request

Inherits:
Object
  • Object
show all
Defined in:
lib/Request.rb

Defined Under Namespace

Modules: InteractiveCloudflareRecovery Classes: CloudflareBlockedError

Constant Summary collapse

CLOUDFLARE_MITIGATION_VALUES =
%w[challenge block managed_challenge].freeze
CLOUDFLARE_RECOVERY_LIMIT =

Cap how many times a single self.URL call chain can fall through the Cloudflare-recovery branch, so a user who keeps saying yes to the prompt while Medium keeps blocking can’t loop forever.

5

Class Method Summary collapse

Class Method Details

.body(response) ⇒ Object



377
378
379
# File 'lib/Request.rb', line 377

def self.body(response)
  readBodyAsUTF8(response)
end

.cloudflareBlocked?(response) ⇒ Boolean

Cloudflare tags blocked responses via either the cf-mitigated header or the standard “Just a moment…” challenge HTML. We check both so we catch challenges even on Cloudflare deployments that don’t set the explicit header.

Returns:

  • (Boolean)


357
358
359
360
361
362
363
364
365
366
367
368
369
370
# File 'lib/Request.rb', line 357

def self.cloudflareBlocked?(response)
    return false if response.nil?
    code = response.code.to_i
    return false unless code == 403 || code == 503

    mitigated = response['cf-mitigated'].to_s.downcase
    return true if CLOUDFLARE_MITIGATION_VALUES.include?(mitigated)

    body = response.body.to_s
    return false if body.empty?
    body.include?('Just a moment...') ||
        body.include?('cf-error-details') ||
        body.include?('Attention Required')
end

.html(response) ⇒ Object



372
373
374
375
# File 'lib/Request.rb', line 372

def self.html(response)
  body = readBodyAsUTF8(response)
  body.nil? ? nil : Nokogiri::HTML(body)
end

.mediumGraphqlEndpointObject

GraphQL endpoint the gem should POST to. When MEDIUM_HOST configures a proxy, it’s <proxy-origin>/_/graphql regardless of whether the user set MEDIUM_HOST to the bare root or already with the /_/graphql suffix.



327
328
329
330
# File 'lib/Request.rb', line 327

def self.mediumGraphqlEndpoint
    origin = mediumProxyOrigin
    origin.nil? ? 'https://medium.com/_/graphql' : "#{origin}/_/graphql"
end

.mediumProxiedURL(url) ⇒ Object

If the user has configured a Cloudflare Worker proxy via MEDIUM_HOST, rewrite any medium.com/<path> OR miro.medium.com/<path> URL to <worker-origin>/<path> so non-GraphQL hits (iframe metadata at /media/<id>, OG-image fallback to /<user>/<post>, miro image downloads, etc.) all benefit from the proxy. GraphQL callers already hand us the proxy URL directly via mediumGraphqlEndpoint, so they short-circuit.



296
297
298
299
300
301
302
303
304
305
306
307
# File 'lib/Request.rb', line 296

def self.mediumProxiedURL(url)
    return url unless url.is_a?(String)
    origin = mediumProxyOrigin
    return url if origin.nil?
    if url.start_with?('https://medium.com/')
        url.sub(%r{\Ahttps://medium\.com}, origin)
    elsif url.start_with?('https://miro.medium.com/')
        url.sub(%r{\Ahttps://miro\.medium\.com}, origin)
    else
        url
    end
end

.mediumProxyOriginObject

Extract the ‘<scheme>://<host>` of MEDIUM_HOST, or nil if no proxy is configured (or it still points at upstream medium.com). Accepts MEDIUM_HOST in any form — bare root, with /_/graphql suffix, or any other path — only the origin matters here.



313
314
315
316
317
318
319
320
321
322
# File 'lib/Request.rb', line 313

def self.mediumProxyOrigin
    host = ENV['MEDIUM_HOST'].to_s
    return nil if host.empty?
    uri = URI.parse(host)
    return nil if uri.host.nil? || uri.host == 'medium.com' || uri.host == 'miro.medium.com'
    port = (uri.port && uri.port != uri.default_port) ? ":#{uri.port}" : ''
    "#{uri.scheme}://#{uri.host}#{port}"
rescue URI::InvalidURIError
    nil
end

.miroHostObject

Resolve the host the gem should use for miro.medium.com image fetches. Single-Worker setups: the same MEDIUM_HOST proxy handles both medium.com and miro.medium.com via path dispatch, so we always derive miro from MEDIUM_HOST’s origin. No proxy → upstream miro.medium.com.



336
337
338
# File 'lib/Request.rb', line 336

def self.miroHost
    mediumProxyOrigin || 'https://miro.medium.com'
end

.proxyURI?(uri) ⇒ Boolean

True iff ‘uri` is hosted by the configured Worker proxy — i.e. its host matches MEDIUM_HOST’s origin. Used to gate the MEDIUM_HOST_SECRET auth header so the secret only leaves the process when heading to the user’s own proxy.

Returns:

  • (Boolean)


344
345
346
347
348
349
350
351
# File 'lib/Request.rb', line 344

def self.proxyURI?(uri)
    return false if uri.nil? || uri.host.nil?
    origin = mediumProxyOrigin
    return false if origin.nil?
    parsed = URI.parse(origin) rescue nil
    return false if parsed.nil? || parsed.host.nil?
    parsed.host == uri.host
end

.readBodyAsUTF8(response) ⇒ Object

Net::HTTP#read_body returns ASCII-8BIT (binary). Without an explicit UTF-8 tag, downstream parsers misinterpret multi-byte sequences: Nokogiri’s encoding sniffer falls back to ISO-8859-1 for inline <script> contents, which then mojibakes the embedded JSON (e.g. CJK comes back as garbage like “使” instead of “使”).



386
387
388
389
390
391
392
# File 'lib/Request.rb', line 386

def self.readBodyAsUTF8(response)
  return nil if response.nil? || response.code.to_i != 200
  body = response.read_body
  return body if body.nil? || body.empty?
  body.force_encoding(Encoding::UTF_8)
  body
end

.URL(url, method = 'GET', data = nil, retryCount = 0) ⇒ Object



175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
# File 'lib/Request.rb', line 175

def self.URL(url, method = 'GET', data = nil, retryCount = 0)
    retryCount += 1
    url = mediumProxiedURL(url)

    uri = URI(url)
    https = Net::HTTP.new(uri.host, uri.port)
    https.use_ssl = true

    # --- TLS / Certificate verification setup ---
    # Some OpenSSL builds/configs enable CRL checking, which can fail with:
    # "certificate verify failed (unable to get certificate CRL)".
    # Net::HTTP/OpenSSL does not automatically fetch CRLs, so we use a default
    # cert store and clear CRL-related flags to avoid hard failures while still
    # verifying the peer certificate.
    https.verify_mode = OpenSSL::SSL::VERIFY_PEER

    store = OpenSSL::X509::Store.new
    store.set_default_paths
    # Ensure no CRL-check flags are enabled by default
    store.flags = 0
    https.cert_store = store

    # Allow overriding CA bundle paths via environment variables if needed.
    if ENV['SSL_CERT_FILE'] && !ENV['SSL_CERT_FILE'].empty?
      https.ca_file = ENV['SSL_CERT_FILE']
    end
    if ENV['SSL_CERT_DIR'] && !ENV['SSL_CERT_DIR'].empty?
      https.ca_path = ENV['SSL_CERT_DIR']
    end

    # (Optional) timeouts to avoid hanging on network issues
    https.open_timeout = 10
    https.read_timeout = 30
    # --- end TLS setup ---

    if method.upcase == "GET"
        request = Net::HTTP::Get.new(uri)
    else
        request = Net::HTTP::Post.new(uri)
        request['Content-Type'] = 'application/json'
        if !data.nil?
            request.body = JSON.dump(data)
        end
    end

    request['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 Edg/142.0.0.0';

    cookiesString = $cookies.reject { |_, value| value.nil? }
    .map { |key, value| "#{key}=#{value}" }
    .join("; ");

    if !cookiesString.nil? && cookiesString != ""
      request['Cookie'] = cookiesString;
    end

    # When the request is going to a configured Worker proxy (and only
    # then), attach the user's MEDIUM_HOST_SECRET as a header so the
    # Worker can authenticate the caller. Skipped for upstream
    # medium.com / miro.medium.com so the secret never leaks to Medium.
    if proxyURI?(uri) && (proxySecret = ENV['MEDIUM_HOST_SECRET'].to_s) && !proxySecret.empty?
        request['X-Medium-Proxy-Secret'] = proxySecret
    end

    response = https.request(request);

    setCookieString = response.get_fields('set-cookie');
    if !setCookieString.nil? && setCookieString != ""
      setCookies = setCookieString.map { |cookie| cookie.split('; ').first }.each_with_object({}) do |cookie, hash|
        key, value = cookie.split('=', 2) # Split by '=' into key and value
        hash[key] = value
      end;

      setCookies.each do |key, value|
        $cookies[key] = value
      end
    end

    if cloudflareBlocked?(response)
        # On every Cloudflare block — even when cookies are already
        # set — re-run the recovery flow on a TTY. ChromeAuth refreshes
        # sid/uid/cf_clearance/_cfuvid into $cookies + the cache, so
        # the next attempt usually succeeds. Bounded by retryCount so
        # a degenerate loop (user keeps clearing, Medium keeps blocking)
        # eventually surfaces the error. CI / non-TTY just raises.
        if retryCount <= CLOUDFLARE_RECOVERY_LIMIT && InteractiveCloudflareRecovery.available?
            if InteractiveCloudflareRecovery.run(url)
                return self.URL(url, method, data, retryCount)
            end
        end
        raise CloudflareBlockedError.new(response.code.to_i, url)
    end

    # 3XX Redirect
    if response.code.to_i == 429
      if retryCount >= 10
        raise "Error: Too Many Requests, blocked by Medium. URL: #{url}"
      else
        response = self.URL(url, method, data, retryCount);
      end
    elsif response.code.to_i >= 300 && response.code.to_i <= 399 && !response['location'].nil? && response['location'] != ''
        if retryCount >= 10
            raise "Error: Retry limit reached. URL: #{url}"
        else
            location = response['location']
            if !location.match? /^(http)/
                location = "#{uri.scheme}://#{uri.host}#{location}"
            end

            response = self.URL(location, method, data, retryCount)
        end
    end

    response
end