Class: Request
- Inherits:
-
Object
- Object
- Request
- Defined in:
- lib/Request.rb
Defined Under Namespace
Modules: InteractiveCloudflareRecovery Classes: CloudflareBlockedError
Constant Summary collapse
- CLOUDFLARE_MITIGATION_VALUES =
%w[challenge block managed_challenge].freeze
- CLOUDFLARE_RECOVERY_LIMIT =
Cap how many times a single self.URL call chain can fall through the Cloudflare-recovery branch, so a user who keeps saying yes to the prompt while Medium keeps blocking can’t loop forever.
5
Class Method Summary collapse
- .body(response) ⇒ Object
-
.cloudflareBlocked?(response) ⇒ Boolean
Cloudflare tags blocked responses via either the cf-mitigated header or the standard “Just a moment…” challenge HTML.
- .html(response) ⇒ Object
-
.mediumGraphqlEndpoint ⇒ Object
GraphQL endpoint the gem should POST to.
-
.mediumProxiedURL(url) ⇒ Object
If the user has configured a Cloudflare Worker proxy via MEDIUM_HOST, rewrite any medium.com/<path> OR miro.medium.com/<path> URL to <worker-origin>/<path> so non-GraphQL hits (iframe metadata at /media/<id>, OG-image fallback to /<user>/<post>, miro image downloads, etc.) all benefit from the proxy.
-
.mediumProxyOrigin ⇒ Object
Extract the ‘<scheme>://<host>` of MEDIUM_HOST, or nil if no proxy is configured (or it still points at upstream medium.com).
-
.miroHost ⇒ Object
Resolve the host the gem should use for miro.medium.com image fetches.
-
.proxyURI?(uri) ⇒ Boolean
True iff ‘uri` is hosted by the configured Worker proxy — i.e.
-
.readBodyAsUTF8(response) ⇒ Object
Net::HTTP#read_body returns ASCII-8BIT (binary).
- .URL(url, method = 'GET', data = nil, retryCount = 0) ⇒ Object
Class Method Details
.body(response) ⇒ Object
377 378 379 |
# File 'lib/Request.rb', line 377 def self.body(response) readBodyAsUTF8(response) end |
.cloudflareBlocked?(response) ⇒ Boolean
Cloudflare tags blocked responses via either the cf-mitigated header or the standard “Just a moment…” challenge HTML. We check both so we catch challenges even on Cloudflare deployments that don’t set the explicit header.
357 358 359 360 361 362 363 364 365 366 367 368 369 370 |
# File 'lib/Request.rb', line 357 def self.cloudflareBlocked?(response) return false if response.nil? code = response.code.to_i return false unless code == 403 || code == 503 mitigated = response['cf-mitigated'].to_s.downcase return true if CLOUDFLARE_MITIGATION_VALUES.include?(mitigated) body = response.body.to_s return false if body.empty? body.include?('Just a moment...') || body.include?('cf-error-details') || body.include?('Attention Required') end |
.html(response) ⇒ Object
372 373 374 375 |
# File 'lib/Request.rb', line 372 def self.html(response) body = readBodyAsUTF8(response) body.nil? ? nil : Nokogiri::HTML(body) end |
.mediumGraphqlEndpoint ⇒ Object
GraphQL endpoint the gem should POST to. When MEDIUM_HOST configures a proxy, it’s <proxy-origin>/_/graphql regardless of whether the user set MEDIUM_HOST to the bare root or already with the /_/graphql suffix.
327 328 329 330 |
# File 'lib/Request.rb', line 327 def self.mediumGraphqlEndpoint origin = mediumProxyOrigin origin.nil? ? 'https://medium.com/_/graphql' : "#{origin}/_/graphql" end |
.mediumProxiedURL(url) ⇒ Object
If the user has configured a Cloudflare Worker proxy via MEDIUM_HOST, rewrite any medium.com/<path> OR miro.medium.com/<path> URL to <worker-origin>/<path> so non-GraphQL hits (iframe metadata at /media/<id>, OG-image fallback to /<user>/<post>, miro image downloads, etc.) all benefit from the proxy. GraphQL callers already hand us the proxy URL directly via mediumGraphqlEndpoint, so they short-circuit.
296 297 298 299 300 301 302 303 304 305 306 307 |
# File 'lib/Request.rb', line 296 def self.mediumProxiedURL(url) return url unless url.is_a?(String) origin = mediumProxyOrigin return url if origin.nil? if url.start_with?('https://medium.com/') url.sub(%r{\Ahttps://medium\.com}, origin) elsif url.start_with?('https://miro.medium.com/') url.sub(%r{\Ahttps://miro\.medium\.com}, origin) else url end end |
.mediumProxyOrigin ⇒ Object
Extract the ‘<scheme>://<host>` of MEDIUM_HOST, or nil if no proxy is configured (or it still points at upstream medium.com). Accepts MEDIUM_HOST in any form — bare root, with /_/graphql suffix, or any other path — only the origin matters here.
313 314 315 316 317 318 319 320 321 322 |
# File 'lib/Request.rb', line 313 def self.mediumProxyOrigin host = ENV['MEDIUM_HOST'].to_s return nil if host.empty? uri = URI.parse(host) return nil if uri.host.nil? || uri.host == 'medium.com' || uri.host == 'miro.medium.com' port = (uri.port && uri.port != uri.default_port) ? ":#{uri.port}" : '' "#{uri.scheme}://#{uri.host}#{port}" rescue URI::InvalidURIError nil end |
.miroHost ⇒ Object
Resolve the host the gem should use for miro.medium.com image fetches. Single-Worker setups: the same MEDIUM_HOST proxy handles both medium.com and miro.medium.com via path dispatch, so we always derive miro from MEDIUM_HOST’s origin. No proxy → upstream miro.medium.com.
336 337 338 |
# File 'lib/Request.rb', line 336 def self.miroHost mediumProxyOrigin || 'https://miro.medium.com' end |
.proxyURI?(uri) ⇒ Boolean
True iff ‘uri` is hosted by the configured Worker proxy — i.e. its host matches MEDIUM_HOST’s origin. Used to gate the MEDIUM_HOST_SECRET auth header so the secret only leaves the process when heading to the user’s own proxy.
344 345 346 347 348 349 350 351 |
# File 'lib/Request.rb', line 344 def self.proxyURI?(uri) return false if uri.nil? || uri.host.nil? origin = mediumProxyOrigin return false if origin.nil? parsed = URI.parse(origin) rescue nil return false if parsed.nil? || parsed.host.nil? parsed.host == uri.host end |
.readBodyAsUTF8(response) ⇒ Object
Net::HTTP#read_body returns ASCII-8BIT (binary). Without an explicit UTF-8 tag, downstream parsers misinterpret multi-byte sequences: Nokogiri’s encoding sniffer falls back to ISO-8859-1 for inline <script> contents, which then mojibakes the embedded JSON (e.g. CJK comes back as garbage like “使” instead of “使”).
386 387 388 389 390 391 392 |
# File 'lib/Request.rb', line 386 def self.readBodyAsUTF8(response) return nil if response.nil? || response.code.to_i != 200 body = response.read_body return body if body.nil? || body.empty? body.force_encoding(Encoding::UTF_8) body end |
.URL(url, method = 'GET', data = nil, retryCount = 0) ⇒ Object
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
# File 'lib/Request.rb', line 175 def self.URL(url, method = 'GET', data = nil, retryCount = 0) retryCount += 1 url = mediumProxiedURL(url) uri = URI(url) https = Net::HTTP.new(uri.host, uri.port) https.use_ssl = true # --- TLS / Certificate verification setup --- # Some OpenSSL builds/configs enable CRL checking, which can fail with: # "certificate verify failed (unable to get certificate CRL)". # Net::HTTP/OpenSSL does not automatically fetch CRLs, so we use a default # cert store and clear CRL-related flags to avoid hard failures while still # verifying the peer certificate. https.verify_mode = OpenSSL::SSL::VERIFY_PEER store = OpenSSL::X509::Store.new store.set_default_paths # Ensure no CRL-check flags are enabled by default store.flags = 0 https.cert_store = store # Allow overriding CA bundle paths via environment variables if needed. if ENV['SSL_CERT_FILE'] && !ENV['SSL_CERT_FILE'].empty? https.ca_file = ENV['SSL_CERT_FILE'] end if ENV['SSL_CERT_DIR'] && !ENV['SSL_CERT_DIR'].empty? https.ca_path = ENV['SSL_CERT_DIR'] end # (Optional) timeouts to avoid hanging on network issues https.open_timeout = 10 https.read_timeout = 30 # --- end TLS setup --- if method.upcase == "GET" request = Net::HTTP::Get.new(uri) else request = Net::HTTP::Post.new(uri) request['Content-Type'] = 'application/json' if !data.nil? request.body = JSON.dump(data) end end request['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 Edg/142.0.0.0'; = $cookies.reject { |_, value| value.nil? } .map { |key, value| "#{key}=#{value}" } .join("; "); if !.nil? && != "" request['Cookie'] = ; end # When the request is going to a configured Worker proxy (and only # then), attach the user's MEDIUM_HOST_SECRET as a header so the # Worker can authenticate the caller. Skipped for upstream # medium.com / miro.medium.com so the secret never leaks to Medium. if proxyURI?(uri) && (proxySecret = ENV['MEDIUM_HOST_SECRET'].to_s) && !proxySecret.empty? request['X-Medium-Proxy-Secret'] = proxySecret end response = https.request(request); setCookieString = response.get_fields('set-cookie'); if !setCookieString.nil? && setCookieString != "" setCookies = setCookieString.map { || .split('; ').first }.each_with_object({}) do |, hash| key, value = .split('=', 2) # Split by '=' into key and value hash[key] = value end; setCookies.each do |key, value| $cookies[key] = value end end if cloudflareBlocked?(response) # On every Cloudflare block — even when cookies are already # set — re-run the recovery flow on a TTY. ChromeAuth refreshes # sid/uid/cf_clearance/_cfuvid into $cookies + the cache, so # the next attempt usually succeeds. Bounded by retryCount so # a degenerate loop (user keeps clearing, Medium keeps blocking) # eventually surfaces the error. CI / non-TTY just raises. if retryCount <= CLOUDFLARE_RECOVERY_LIMIT && InteractiveCloudflareRecovery.available? if InteractiveCloudflareRecovery.run(url) return self.URL(url, method, data, retryCount) end end raise CloudflareBlockedError.new(response.code.to_i, url) end # 3XX Redirect if response.code.to_i == 429 if retryCount >= 10 raise "Error: Too Many Requests, blocked by Medium. URL: #{url}" else response = self.URL(url, method, data, retryCount); end elsif response.code.to_i >= 300 && response.code.to_i <= 399 && !response['location'].nil? && response['location'] != '' if retryCount >= 10 raise "Error: Retry limit reached. URL: #{url}" else location = response['location'] if !location.match? /^(http)/ location = "#{uri.scheme}://#{uri.host}#{location}" end response = self.URL(location, method, data, retryCount) end end response end |