Class: Request
- Inherits:
-
Object
- Object
- Request
- Defined in:
- lib/Request.rb
Defined Under Namespace
Modules: InteractiveCloudflareRecovery Classes: CloudflareBlockedError
Constant Summary collapse
- CLOUDFLARE_MITIGATION_VALUES =
%w[challenge block managed_challenge].freeze
- CLOUDFLARE_RECOVERY_LIMIT =
Cap how many times a single self.URL call chain can fall through the Cloudflare-recovery branch, so a user who keeps saying yes to the prompt while Medium keeps blocking can’t loop forever.
5
Class Method Summary collapse
- .body(response) ⇒ Object
-
.cloudflareBlocked?(response) ⇒ Boolean
Cloudflare tags blocked responses via either the cf-mitigated header or the standard “Just a moment…” challenge HTML.
- .html(response) ⇒ Object
-
.mediumProxiedURL(url) ⇒ Object
If the user has configured a Cloudflare Worker proxy via MEDIUM_HOST, rewrite any medium.com/<path> URL to <worker-origin>/<path> so non-GraphQL hits (iframe metadata at /media/<id>, OG-image fallback to /<user>/<post>, etc.) also benefit from the proxy.
-
.mediumProxyOrigin ⇒ Object
Extract the ‘<scheme>://<host>` of MEDIUM_HOST, or nil if no proxy is configured (or it still points at medium.com itself).
-
.miroHost ⇒ Object
Resolve the host the gem should use for miro.medium.com image fetches.
-
.proxyURI?(uri) ⇒ Boolean
True iff ‘uri` is hosted by the configured Worker proxy — i.e.
-
.readBodyAsUTF8(response) ⇒ Object
Net::HTTP#read_body returns ASCII-8BIT (binary).
- .URL(url, method = 'GET', data = nil, retryCount = 0) ⇒ Object
Class Method Details
.body(response) ⇒ Object
362 363 364 |
# File 'lib/Request.rb', line 362 def self.body(response) readBodyAsUTF8(response) end |
.cloudflareBlocked?(response) ⇒ Boolean
Cloudflare tags blocked responses via either the cf-mitigated header or the standard “Just a moment…” challenge HTML. We check both so we catch challenges even on Cloudflare deployments that don’t set the explicit header.
342 343 344 345 346 347 348 349 350 351 352 353 354 355 |
# File 'lib/Request.rb', line 342 def self.cloudflareBlocked?(response) return false if response.nil? code = response.code.to_i return false unless code == 403 || code == 503 mitigated = response['cf-mitigated'].to_s.downcase return true if CLOUDFLARE_MITIGATION_VALUES.include?(mitigated) body = response.body.to_s return false if body.empty? body.include?('Just a moment...') || body.include?('cf-error-details') || body.include?('Attention Required') end |
.html(response) ⇒ Object
357 358 359 360 |
# File 'lib/Request.rb', line 357 def self.html(response) body = readBodyAsUTF8(response) body.nil? ? nil : Nokogiri::HTML(body) end |
.mediumProxiedURL(url) ⇒ Object
If the user has configured a Cloudflare Worker proxy via MEDIUM_HOST, rewrite any medium.com/<path> URL to <worker-origin>/<path> so non-GraphQL hits (iframe metadata at /media/<id>, OG-image fallback to /<user>/<post>, etc.) also benefit from the proxy. GraphQL callers already hand us the proxy URL directly via ENV, so they short-circuit the rewrite.
296 297 298 299 300 301 |
# File 'lib/Request.rb', line 296 def self.mediumProxiedURL(url) return url unless url.is_a?(String) && url.start_with?('https://medium.com/') origin = mediumProxyOrigin return url if origin.nil? url.sub(%r{\Ahttps://medium\.com}, origin) end |
.mediumProxyOrigin ⇒ Object
Extract the ‘<scheme>://<host>` of MEDIUM_HOST, or nil if no proxy is configured (or it still points at medium.com itself).
305 306 307 308 309 310 311 312 313 314 |
# File 'lib/Request.rb', line 305 def self.mediumProxyOrigin host = ENV['MEDIUM_HOST'].to_s return nil if host.empty? uri = URI.parse(host) return nil if uri.host.nil? || uri.host == 'medium.com' port = (uri.port && uri.port != uri.default_port) ? ":#{uri.port}" : '' "#{uri.scheme}://#{uri.host}#{port}" rescue URI::InvalidURIError nil end |
.miroHost ⇒ Object
Resolve the host the gem should use for miro.medium.com image fetches. Single-Worker setups: the same MEDIUM_HOST proxy handles both medium.com and miro.medium.com via path dispatch, so we always derive miro from MEDIUM_HOST’s origin. No proxy → upstream miro.medium.com.
320 321 322 |
# File 'lib/Request.rb', line 320 def self.miroHost mediumProxyOrigin || 'https://miro.medium.com' end |
.proxyURI?(uri) ⇒ Boolean
True iff ‘uri` is hosted by the configured Worker proxy — i.e. its host matches MEDIUM_HOST and MEDIUM_HOST is set to something other than upstream medium.com. Used to gate the MEDIUM_HOST_SECRET auth header so the secret only leaves the process when heading to the user’s own proxy.
329 330 331 332 333 334 335 336 |
# File 'lib/Request.rb', line 329 def self.proxyURI?(uri) return false if uri.nil? || uri.host.nil? envValue = ENV['MEDIUM_HOST'].to_s return false if envValue.empty? parsed = URI.parse(envValue) rescue nil return false if parsed.nil? || parsed.host.nil? parsed.host != 'medium.com' && parsed.host == uri.host end |
.readBodyAsUTF8(response) ⇒ Object
Net::HTTP#read_body returns ASCII-8BIT (binary). Without an explicit UTF-8 tag, downstream parsers misinterpret multi-byte sequences: Nokogiri’s encoding sniffer falls back to ISO-8859-1 for inline <script> contents, which then mojibakes the embedded JSON (e.g. CJK comes back as garbage like “使” instead of “使”).
371 372 373 374 375 376 377 |
# File 'lib/Request.rb', line 371 def self.readBodyAsUTF8(response) return nil if response.nil? || response.code.to_i != 200 body = response.read_body return body if body.nil? || body.empty? body.force_encoding(Encoding::UTF_8) body end |
.URL(url, method = 'GET', data = nil, retryCount = 0) ⇒ Object
175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 |
# File 'lib/Request.rb', line 175 def self.URL(url, method = 'GET', data = nil, retryCount = 0) retryCount += 1 url = mediumProxiedURL(url) uri = URI(url) https = Net::HTTP.new(uri.host, uri.port) https.use_ssl = true # --- TLS / Certificate verification setup --- # Some OpenSSL builds/configs enable CRL checking, which can fail with: # "certificate verify failed (unable to get certificate CRL)". # Net::HTTP/OpenSSL does not automatically fetch CRLs, so we use a default # cert store and clear CRL-related flags to avoid hard failures while still # verifying the peer certificate. https.verify_mode = OpenSSL::SSL::VERIFY_PEER store = OpenSSL::X509::Store.new store.set_default_paths # Ensure no CRL-check flags are enabled by default store.flags = 0 https.cert_store = store # Allow overriding CA bundle paths via environment variables if needed. if ENV['SSL_CERT_FILE'] && !ENV['SSL_CERT_FILE'].empty? https.ca_file = ENV['SSL_CERT_FILE'] end if ENV['SSL_CERT_DIR'] && !ENV['SSL_CERT_DIR'].empty? https.ca_path = ENV['SSL_CERT_DIR'] end # (Optional) timeouts to avoid hanging on network issues https.open_timeout = 10 https.read_timeout = 30 # --- end TLS setup --- if method.upcase == "GET" request = Net::HTTP::Get.new(uri) else request = Net::HTTP::Post.new(uri) request['Content-Type'] = 'application/json' if !data.nil? request.body = JSON.dump(data) end end request['User-Agent'] = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36 Edg/142.0.0.0'; = $cookies.reject { |_, value| value.nil? } .map { |key, value| "#{key}=#{value}" } .join("; "); if !.nil? && != "" request['Cookie'] = ; end # When the request is going to a configured Worker proxy (and only # then), attach the user's MEDIUM_HOST_SECRET as a header so the # Worker can authenticate the caller. Skipped for upstream # medium.com / miro.medium.com so the secret never leaks to Medium. if proxyURI?(uri) && (proxySecret = ENV['MEDIUM_HOST_SECRET'].to_s) && !proxySecret.empty? request['X-Medium-Proxy-Secret'] = proxySecret end response = https.request(request); setCookieString = response.get_fields('set-cookie'); if !setCookieString.nil? && setCookieString != "" setCookies = setCookieString.map { || .split('; ').first }.each_with_object({}) do |, hash| key, value = .split('=', 2) # Split by '=' into key and value hash[key] = value end; setCookies.each do |key, value| $cookies[key] = value end end if cloudflareBlocked?(response) # On every Cloudflare block — even when cookies are already # set — re-run the recovery flow on a TTY. ChromeAuth refreshes # sid/uid/cf_clearance/_cfuvid into $cookies + the cache, so # the next attempt usually succeeds. Bounded by retryCount so # a degenerate loop (user keeps clearing, Medium keeps blocking) # eventually surfaces the error. CI / non-TTY just raises. if retryCount <= CLOUDFLARE_RECOVERY_LIMIT && InteractiveCloudflareRecovery.available? if InteractiveCloudflareRecovery.run(url) return self.URL(url, method, data, retryCount) end end raise CloudflareBlockedError.new(response.code.to_i, url) end # 3XX Redirect if response.code.to_i == 429 if retryCount >= 10 raise "Error: Too Many Requests, blocked by Medium. URL: #{url}" else response = self.URL(url, method, data, retryCount); end elsif response.code.to_i >= 300 && response.code.to_i <= 399 && !response['location'].nil? && response['location'] != '' if retryCount >= 10 raise "Error: Retry limit reached. URL: #{url}" else location = response['location'] if !location.match? /^(http)/ location = "#{uri.scheme}://#{uri.host}#{location}" end response = self.URL(location, method, data, retryCount) end end response end |