Class: Swot

Inherits:
Object
  • Object
show all
Extended by:
SwotCollectionMethods
Includes:
NaughtyOrNice
Defined in:
lib/swot.rb,
lib/swot/academic_tlds.rb

Constant Summary collapse

VERSION =
"1.0.5"
BLACKLIST =

These are domains that snuck into the edu registry, but don’t pass the education sniff test Note: validated domain must not end with the blacklisted string

File.readlines(File.join(__dir__, '../academic_data/stoplist.txt')).map(&:chomp).freeze
ACADEMIC_TLDS =

These top-level domains are guaranteed to be academic institutions.

%w(
  ac.ae
  ac.at
  ac.bd
  ac.be
  ac.cn
  ac.cr
  ac.cy
  ac.fj
  ac.gg
  ac.gn
  ac.id
  ac.il
  ac.in
  ac.ir
  ac.jp
  ac.ke
  ac.kr
  ac.ma
  ac.me
  ac.mu
  ac.mw
  ac.mz
  ac.ni
  ac.nz
  ac.om
  ac.pa
  ac.pg
  ac.pr
  ac.rs
  ac.ru
  ac.rw
  ac.sz
  ac.th
  ac.tz
  ac.ug
  ac.uk
  ac.yu
  ac.za
  ac.zm
  ac.zw
  cc.al.us
  cc.ar.us
  cc.az.us
  cc.ca.us
  cc.co.us
  cc.fl.us
  cc.ga.us
  cc.hi.us
  cc.ia.us
  cc.id.us
  cc.il.us
  cc.in.us
  cc.ks.us
  cc.ky.us
  cc.la.us
  cc.md.us
  cc.me.us
  cc.mi.us
  cc.mn.us
  cc.mo.us
  cc.ms.us
  cc.mt.us
  cc.nc.us
  cc.nd.us
  cc.ne.us
  cc.nj.us
  cc.nm.us
  cc.nv.us
  cc.ny.us
  cc.oh.us
  cc.ok.us
  cc.or.us
  cc.pa.us
  cc.ri.us
  cc.sc.us
  cc.sd.us
  cc.tx.us
  cc.va.us
  cc.vi.us
  cc.wa.us
  cc.wi.us
  cc.wv.us
  cc.wy.us
  ed.ao
  ed.cr
  ed.jp
  edu
  edu.af
  edu.al
  edu.ar
  edu.au
  edu.az
  edu.ba
  edu.bb
  edu.bd
  edu.bh
  edu.bi
  edu.bn
  edu.bo
  edu.br
  edu.bs
  edu.bt
  edu.bz
  edu.ck
  edu.cn
  edu.co
  edu.cu
  edu.do
  edu.dz
  edu.ec
  edu.ee
  edu.eg
  edu.er
  edu.es
  edu.et
  edu.ge
  edu.gh
  edu.gr
  edu.gt
  edu.hk
  edu.hn
  edu.ht
  edu.in
  edu.iq
  edu.jm
  edu.jo
  edu.kg
  edu.kh
  edu.kn
  edu.kw
  edu.ky
  edu.kz
  edu.la
  edu.lb
  edu.lr
  edu.lv
  edu.ly
  edu.me
  edu.mg
  edu.mk
  edu.ml
  edu.mm
  edu.mn
  edu.mo
  edu.mt
  edu.mv
  edu.mw
  edu.mx
  edu.my
  edu.ni
  edu.np
  edu.om
  edu.pa
  edu.pe
  edu.ph
  edu.pk
  edu.pl
  edu.pr
  edu.ps
  edu.pt
  edu.pw
  edu.py
  edu.qa
  edu.rs
  edu.ru
  edu.sa
  edu.sc
  edu.sd
  edu.sg
  edu.sh
  edu.sl
  edu.sv
  edu.sy
  edu.tr
  edu.tt
  edu.tw
  edu.ua
  edu.uy
  edu.ve
  edu.vn
  edu.ws
  edu.ye
  edu.zm
  es.kr
  g12.br
  hs.kr
  ms.kr
  sc.kr
  sc.ug
  sch.ae
  sch.gg
  sch.id
  sch.ir
  sch.je
  sch.jo
  sch.lk
  sch.ly
  sch.my
  sch.om
  sch.ps
  sch.sa
  sch.uk
  school.nz
  school.za
  tec.ar.us
  tec.az.us
  tec.co.us
  tec.fl.us
  tec.ga.us
  tec.ia.us
  tec.id.us
  tec.il.us
  tec.in.us
  tec.ks.us
  tec.ky.us
  tec.la.us
  tec.ma.us
  tec.md.us
  tec.me.us
  tec.mi.us
  tec.mn.us
  tec.mo.us
  tec.ms.us
  tec.mt.us
  tec.nc.us
  tec.nd.us
  tec.nh.us
  tec.nm.us
  tec.nv.us
  tec.ny.us
  tec.oh.us
  tec.ok.us
  tec.pa.us
  tec.sc.us
  tec.sd.us
  tec.tx.us
  tec.ut.us
  tec.vi.us
  tec.wa.us
  tec.wi.us
  tec.wv.us
  vic.edu.au
).to_set.freeze

Class Method Summary collapse

Instance Method Summary collapse

Methods included from SwotCollectionMethods

all_domains, each_domain

Class Method Details

.academic?Object



19
# File 'lib/swot.rb', line 19

alias_method :academic?, :valid?

.domains_pathObject



26
27
28
# File 'lib/swot.rb', line 26

def domains_path
  @domains_path ||= File.expand_path "../academic_data", File.dirname(__FILE__)
end

.from_path(path_string_or_path) ⇒ Object

Returns a new Swot instance for the domain file at the given path.

Note that the path must be absolute.

Returns a Swot instance or false is no domain is found at the given path.



34
35
36
37
38
39
40
41
# File 'lib/swot.rb', line 34

def from_path(path_string_or_path)
  path = Pathname.new(path_string_or_path)
  return false unless path.exist?
  path_dir, file = path.relative_path_from(Pathname.new(domains_path)).split
  backwards_path = path_dir.to_s.split('/').push(file.basename('.txt').to_s)
  domain = backwards_path.reverse.join('.')
  Swot.new(domain)
end

.get_institution_name(text) ⇒ Object Also known as: school_name



21
22
23
# File 'lib/swot.rb', line 21

def get_institution_name(text)
  Swot.new(text).institution_name
end

.is_academic?Object



18
# File 'lib/swot.rb', line 18

alias_method :is_academic?, :valid?

Instance Method Details

#academic_domain?Boolean

Figure out if a domain name is a know academic institution.

Returns true if the domain name belongs to a known academic institution;

false otherwise.

Returns:

  • (Boolean)


77
78
79
# File 'lib/swot.rb', line 77

def academic_domain?
  @academic_domain ||= File.exist?(file_path) || File.exist?(file_extended_path)
end

#institution_nameObject Also known as: school_name, name

Figure out the institution name based on the email address/domain.

Returns a string with the institution name; nil if nothing is found.



65
66
67
68
69
# File 'lib/swot.rb', line 65

def institution_name
  @institution_name ||= File.read(file_path, :mode => "rb", :external_encoding => "UTF-8").strip
rescue
  nil
end

#valid?Boolean

Figure out if an email or domain belongs to academic institution.

Returns true if the domain name belongs to an academic institution;

false otherwise.

Returns:

  • (Boolean)


48
49
50
51
52
53
54
55
56
57
58
59
60
# File 'lib/swot.rb', line 48

def valid?
  if domain.nil?
    false
  elsif BLACKLIST.any? { |d| to_s =~ /(\A|\.)#{Regexp.escape(d)}\z/ }
    false
  elsif ACADEMIC_TLDS.include?(domain.tld)
    true
  elsif academic_domain?
    true
  else
    false
  end
end