Class: Swot
- Inherits:
-
Object
- Object
- Swot
- Extended by:
- SwotCollectionMethods
- Includes:
- NaughtyOrNice
- Defined in:
- lib/swot.rb,
lib/swot/academic_tlds.rb
Constant Summary collapse
- VERSION =
"1.0.5"
- BLACKLIST =
These are domains that snuck into the edu registry, but don’t pass the education sniff test Note: validated domain must not end with the blacklisted string
File.readlines(File.join(__dir__, '../academic_data/stoplist.txt')).map(&:chomp).freeze
- ACADEMIC_TLDS =
These top-level domains are guaranteed to be academic institutions.
%w( ac.ae ac.at ac.bd ac.be ac.cn ac.cr ac.cy ac.fj ac.gg ac.gn ac.id ac.il ac.in ac.ir ac.jp ac.ke ac.kr ac.ma ac.me ac.mu ac.mw ac.mz ac.ni ac.nz ac.om ac.pa ac.pg ac.pr ac.rs ac.ru ac.rw ac.sz ac.th ac.tz ac.ug ac.uk ac.yu ac.za ac.zm ac.zw cc.al.us cc.ar.us cc.az.us cc.ca.us cc.co.us cc.fl.us cc.ga.us cc.hi.us cc.ia.us cc.id.us cc.il.us cc.in.us cc.ks.us cc.ky.us cc.la.us cc.md.us cc.me.us cc.mi.us cc.mn.us cc.mo.us cc.ms.us cc.mt.us cc.nc.us cc.nd.us cc.ne.us cc.nj.us cc.nm.us cc.nv.us cc.ny.us cc.oh.us cc.ok.us cc.or.us cc.pa.us cc.ri.us cc.sc.us cc.sd.us cc.tx.us cc.va.us cc.vi.us cc.wa.us cc.wi.us cc.wv.us cc.wy.us ed.ao ed.cr ed.jp edu edu.af edu.al edu.ar edu.au edu.az edu.ba edu.bb edu.bd edu.bh edu.bi edu.bn edu.bo edu.br edu.bs edu.bt edu.bz edu.ck edu.cn edu.co edu.cu edu.do edu.dz edu.ec edu.ee edu.eg edu.er edu.es edu.et edu.ge edu.gh edu.gr edu.gt edu.hk edu.hn edu.ht edu.in edu.iq edu.jm edu.jo edu.kg edu.kh edu.kn edu.kw edu.ky edu.kz edu.la edu.lb edu.lr edu.lv edu.ly edu.me edu.mg edu.mk edu.ml edu.mm edu.mn edu.mo edu.mt edu.mv edu.mw edu.mx edu.my edu.ni edu.np edu.om edu.pa edu.pe edu.ph edu.pk edu.pl edu.pr edu.ps edu.pt edu.pw edu.py edu.qa edu.rs edu.ru edu.sa edu.sc edu.sd edu.sg edu.sh edu.sl edu.sv edu.sy edu.tr edu.tt edu.tw edu.ua edu.uy edu.ve edu.vn edu.ws edu.ye edu.zm es.kr g12.br hs.kr ms.kr sc.kr sc.ug sch.ae sch.gg sch.id sch.ir sch.je sch.jo sch.lk sch.ly sch.my sch.om sch.ps sch.sa sch.uk school.nz school.za tec.ar.us tec.az.us tec.co.us tec.fl.us tec.ga.us tec.ia.us tec.id.us tec.il.us tec.in.us tec.ks.us tec.ky.us tec.la.us tec.ma.us tec.md.us tec.me.us tec.mi.us tec.mn.us tec.mo.us tec.ms.us tec.mt.us tec.nc.us tec.nd.us tec.nh.us tec.nm.us tec.nv.us tec.ny.us tec.oh.us tec.ok.us tec.pa.us tec.sc.us tec.sd.us tec.tx.us tec.ut.us tec.vi.us tec.wa.us tec.wi.us tec.wv.us vic.edu.au ).to_set.freeze
Class Method Summary collapse
- .academic? ⇒ Object
- .domains_path ⇒ Object
-
.from_path(path_string_or_path) ⇒ Object
Returns a new Swot instance for the domain file at the given path.
- .get_institution_name(text) ⇒ Object (also: school_name)
- .is_academic? ⇒ Object
Instance Method Summary collapse
-
#academic_domain? ⇒ Boolean
Figure out if a domain name is a know academic institution.
-
#institution_name ⇒ Object
(also: #school_name, #name)
Figure out the institution name based on the email address/domain.
-
#valid? ⇒ Boolean
Figure out if an email or domain belongs to academic institution.
Methods included from SwotCollectionMethods
Class Method Details
.academic? ⇒ Object
19 |
# File 'lib/swot.rb', line 19 alias_method :academic?, :valid? |
.domains_path ⇒ Object
26 27 28 |
# File 'lib/swot.rb', line 26 def domains_path @domains_path ||= File. "../academic_data", File.dirname(__FILE__) end |
.from_path(path_string_or_path) ⇒ Object
Returns a new Swot instance for the domain file at the given path.
Note that the path must be absolute.
Returns a Swot instance or false is no domain is found at the given path.
34 35 36 37 38 39 40 41 |
# File 'lib/swot.rb', line 34 def from_path(path_string_or_path) path = Pathname.new(path_string_or_path) return false unless path.exist? path_dir, file = path.relative_path_from(Pathname.new(domains_path)).split backwards_path = path_dir.to_s.split('/').push(file.basename('.txt').to_s) domain = backwards_path.reverse.join('.') Swot.new(domain) end |
.get_institution_name(text) ⇒ Object Also known as: school_name
21 22 23 |
# File 'lib/swot.rb', line 21 def get_institution_name(text) Swot.new(text).institution_name end |
.is_academic? ⇒ Object
18 |
# File 'lib/swot.rb', line 18 alias_method :is_academic?, :valid? |
Instance Method Details
#academic_domain? ⇒ Boolean
Figure out if a domain name is a know academic institution.
Returns true if the domain name belongs to a known academic institution;
false otherwise.
77 78 79 |
# File 'lib/swot.rb', line 77 def academic_domain? @academic_domain ||= File.exist?(file_path) || File.exist?(file_extended_path) end |
#institution_name ⇒ Object Also known as: school_name, name
Figure out the institution name based on the email address/domain.
Returns a string with the institution name; nil if nothing is found.
65 66 67 68 69 |
# File 'lib/swot.rb', line 65 def institution_name @institution_name ||= File.read(file_path, :mode => "rb", :external_encoding => "UTF-8").strip rescue nil end |
#valid? ⇒ Boolean
Figure out if an email or domain belongs to academic institution.
Returns true if the domain name belongs to an academic institution;
false otherwise.
48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/swot.rb', line 48 def valid? if domain.nil? false elsif BLACKLIST.any? { |d| to_s =~ /(\A|\.)#{Regexp.escape(d)}\z/ } false elsif ACADEMIC_TLDS.include?(domain.tld) true elsif academic_domain? true else false end end |