Module: Oddb2xml::ProxyCheck
- Defined in:
- lib/oddb2xml/proxy_check.rb
Overview
Preflight connectivity check. Run once at the very start of a CLI run, it probes every outbound host oddb2xml needs (honouring the http(s)_proxy environment) and prints a loud warning if any host is blocked by the proxy (HTTP 407 on an allow-list proxy such as Aspectra’s Skyhigh gateway) or is otherwise unreachable. It never aborts the run – downloads still proceed and fail individually as before; this just surfaces the cause up front instead of leaving the user to decode a later Errno/empty-output symptom. See issue #121.
Constant Summary collapse
- BASE_HOSTS =
host => human-readable description of what breaks when it is unreachable. Hosts only needed for certain options are added conditionally (see #hosts_for).
{ "files.refdata.ch" => "Refdata articles", "www.swissmedic.ch" => "Swissmedic registrations", "raw.githubusercontent.com" => "ATC codes (cpp2sqlite)" }.freeze
- TIMEOUT =
seconds, per host (open + read); checks run concurrently
6- PROBE_PATHS =
Representative resource path per host – the actual file the downloader fetches, NOT “/”. Probing “/” gives misleading host redirects (e.g. raw.githubusercontent.com/ -> github.com, while the real raw file path returns 200), whereas the genuine download paths reveal the real forwarder chain the proxy must allow (id.gs1.ch -> id.gs1.org -> apitools.gs1.ch; www.spezialitaetenliste.ch/File.axd -> sl.bag.admin.ch).
{ "files.refdata.ch" => "/simis-public-prod/Articles/1.0/Refdata.Articles.zip", "raw.githubusercontent.com" => "/zdavatz/oddb2xml_files/master/LPPV.txt", "id.gs1.ch" => "/01/07612345000961", "id.gs1.org" => "/01/07612345000961", "www.spezialitaetenliste.ch" => "/File.axd?file=XMLPublications.zip", "www.medregbm.admin.ch" => "/Publikation/" }.freeze
- FORWARDERS =
Redirect targets (“forwarders”) that an allow-list proxy must permit in addition to the host we actually request. id.gs1.ch 301-redirects every path to the global resolver id.gs1.org, so allowing only id.gs1.ch is not enough – the firstbase download follows the redirect and dies on the blocked target. The real firstbase chain is id.gs1.ch -> id.gs1.org -> apitools.gs1.ch, so the redirect is followed dynamically too (see check_host); this list just guarantees the known target is probed even when the redirect probe is short-circuited.
{ "id.gs1.org" => "GS1 global resolver (id.gs1.ch redirect target, --firstbase / -b)" }.freeze
Class Method Summary collapse
-
.all_hosts ⇒ Object
Full union of every host any run could need, regardless of options.
-
.check_host(host, proxy, path = "/", hops = 4, origin = nil) ⇒ Object
Probe a host (following HTTP redirects to other hosts) and return a Hash: { result: :ok | :blocked | :unreachable, via: “final.host” | nil } ‘:via` is set only when the host redirected to a different host, so the caller can surface that the redirect target (e.g. id.gs1.ch -> id.gs1.org) must be reachable too – a 301 to a blocked host used to be reported as OK.
- .hosts_for(options = {}) ⇒ Object
- .probe_path(host) ⇒ Object
- .proxy_uri ⇒ Object
-
.report(_options = {}) ⇒ Object
Probe every host and print a full OK/BLOCKED/UNREACHABLE table.
-
.run(options = {}) ⇒ Object
Probe all relevant hosts concurrently and warn about any that fail.
-
.via_for(origin, host) ⇒ Object
The final host reached, but only when it differs from where we started.
- .warn_about(problems, proxy) ⇒ Object
Class Method Details
.all_hosts ⇒ Object
Full union of every host any run could need, regardless of options. Used by –proxy-check so the report covers everything in one go.
82 83 84 85 86 87 88 89 |
# File 'lib/oddb2xml/proxy_check.rb', line 82 def all_hosts BASE_HOSTS.merge( "epl.bag.admin.ch" => "BAG FHIR data (--fhir)", "id.gs1.ch" => "GS1 NONPHARMA (--firstbase / -b)", "www.spezialitaetenliste.ch" => "BAG Spezialitätenliste", "www.medregbm.admin.ch" => "Medizinalberuferegister (-x address)" ).merge(FORWARDERS) end |
.check_host(host, proxy, path = "/", hops = 4, origin = nil) ⇒ Object
Probe a host (following HTTP redirects to other hosts) and return a Hash:
{ result: :ok | :blocked | :unreachable, via: "final.host" | nil }
‘:via` is set only when the host redirected to a different host, so the caller can surface that the redirect target (e.g. id.gs1.ch -> id.gs1.org) must be reachable too – a 301 to a blocked host used to be reported as OK.
129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/oddb2xml/proxy_check.rb', line 129 def check_host(host, proxy, path = "/", hops = 4, origin = nil) http = if proxy Net::HTTP.new(host, 443, proxy.host, proxy.port, proxy.user, proxy.password) else Net::HTTP.new(host, 443) end http.use_ssl = true http.verify_mode = OpenSSL::SSL::VERIFY_NONE http.open_timeout = TIMEOUT http.read_timeout = TIMEOUT http.start do |h| res = h.head(path) return {result: :blocked, via: via_for(origin, host)} if res.code.to_s == "407" if res.code.to_s.start_with?("3") && res["location"] && hops > 0 loc = URI.parse(res["location"]) if loc.host && loc.host != host next_path = (loc.respond_to?(:request_uri) && loc.request_uri) ? loc.request_uri : "/" return check_host(loc.host, proxy, next_path, hops - 1, origin || host) end end # any other HTTP answer (200/403/404/...) means this host is reachable return {result: :ok, via: via_for(origin, host)} end rescue => error msg = error..to_s.downcase blocked = msg.include?("407") || msg.include?("authenticationrequired") || msg.include?("proxy") {result: blocked ? :blocked : :unreachable, via: via_for(origin, host)} end |
.hosts_for(options = {}) ⇒ Object
68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/oddb2xml/proxy_check.rb', line 68 def hosts_for( = {}) hosts = BASE_HOSTS.dup hosts["epl.bag.admin.ch"] = "BAG FHIR data (--fhir)" if [:fhir] if [:firstbase] hosts["id.gs1.ch"] = "GS1 NONPHARMA (--firstbase / -b)" hosts["id.gs1.org"] = FORWARDERS["id.gs1.org"] end hosts["www.spezialitaetenliste.ch"] = "BAG Spezialitätenliste" unless [:fhir] hosts["www.medregbm.admin.ch"] = "Medizinalberuferegister (-x address)" if [:address] hosts end |
.probe_path(host) ⇒ Object
43 44 45 |
# File 'lib/oddb2xml/proxy_check.rb', line 43 def probe_path(host) PROBE_PATHS[host] || "/" end |
.proxy_uri ⇒ Object
47 48 49 50 51 52 53 54 |
# File 'lib/oddb2xml/proxy_check.rb', line 47 def proxy_uri env = ENV["https_proxy"] || ENV["HTTPS_PROXY"] || ENV["http_proxy"] || ENV["HTTP_PROXY"] return nil if env.nil? || env.empty? env = "http://#{env}" unless env.start_with?("http") URI.parse(env) rescue URI::InvalidURIError nil end |
.report(_options = {}) ⇒ Object
Probe every host and print a full OK/BLOCKED/UNREACHABLE table. Returns true when all hosts are reachable. Used by ‘oddb2xml –proxy-check`.
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
# File 'lib/oddb2xml/proxy_check.rb', line 93 def report( = {}) proxy = proxy_uri results = all_hosts.map do |host, desc| Thread.new { [host, desc, check_host(host, proxy, probe_path(host))] } end.map(&:value).sort_by { |(host, _desc, _status)| host } header = "oddb2xml connectivity check" header += proxy ? " (via proxy #{proxy.host}:#{proxy.port})" : " (no proxy configured)" puts header results.each do |(host, desc, status)| tag = case status[:result] when :ok then "OK " when :blocked then "BLOCKED" # proxy returned 407 else "UNREACH" end label = status[:via] ? "#{host} -> #{status[:via]}" : host puts format(" [%s] %-36s %s", tag, label, desc) end unreachable = results.reject { |(_host, _desc, status)| status[:result] == :ok } if unreachable.empty? puts "All #{results.size} hosts reachable." true else puts "#{unreachable.size} of #{results.size} host(s) NOT reachable -- downloads using them will fail." results.select { |(_host, _desc, status)| status[:via] }.each do |(host, _desc, status)| puts " note: #{host} redirects to #{status[:via]} -- that host must be on the proxy allow-list too." end false end end |
.run(options = {}) ⇒ Object
Probe all relevant hosts concurrently and warn about any that fail.
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 |
# File 'lib/oddb2xml/proxy_check.rb', line 165 def run( = {}) return if defined?(RSpec) || defined?(VCR) # never touch the network in tests return if ENV["ODDB2XML_SKIP_PROXY_CHECK"] proxy = proxy_uri hosts = hosts_for() results = hosts.map do |host, desc| Thread.new { [host, desc, check_host(host, proxy, probe_path(host))] } end.map(&:value) problems = results.reject { |(_host, _desc, status)| status[:result] == :ok } return if problems.empty? warn_about(problems, proxy) end |
.via_for(origin, host) ⇒ Object
The final host reached, but only when it differs from where we started.
160 161 162 |
# File 'lib/oddb2xml/proxy_check.rb', line 160 def via_for(origin, host) (origin && origin != host) ? host : nil end |
.warn_about(problems, proxy) ⇒ Object
181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 |
# File 'lib/oddb2xml/proxy_check.rb', line 181 def warn_about(problems, proxy) line = "=" * 72 warn line warn " oddb2xml CONNECTIVITY WARNING" warn " The following hosts could not be reached -- the corresponding" warn " downloads will FAIL or produce incomplete data:" problems.each do |(host, desc, status)| tag = (status[:result] == :blocked) ? "BLOCKED by proxy (407)" : "UNREACHABLE " label = status[:via] ? "#{host} -> #{status[:via]}" : host warn format(" [%s] %-34s %s", tag, label, desc) end if proxy warn "" warn " Proxy in use: #{proxy.host}:#{proxy.port}" if problems.any? { |(_h, _d, s)| s == :blocked } warn " This looks like an allow-list proxy. Ask your admin to allow the" warn " hosts above (HTTPS/443), or set credentials in http(s)_proxy." end end warn " (Set ODDB2XML_SKIP_PROXY_CHECK=1 to silence this check.)" warn line end |