Module: Uts58

Defined in:
lib/uts58.rb,
lib/uts58/version.rb,
lib/uts58/constants.rb,
lib/uts58/extractor.rb

Overview

Generated by tools/maketables.rb from the UTS58 17.0.0 data files at www.unicode.org/Public/17.0.0/linkification/ . Do not edit by hand; rerun the generator.

Defined Under Namespace

Modules: Constants Classes: Extractor

Constant Summary collapse

VERSION =
"0.2.3"

Class Method Summary collapse

Class Method Details

.extract_email_addresses(text, options = {}) ⇒ Object

Like Uts58::Extractor#extract_email_addresses, but with overlapping results merged.



48
49
50
# File 'lib/uts58.rb', line 48

def extract_email_addresses(text, options = {})
  extract_email_addresses_with_indices(text, options).map { |r| r[:email] }
end

.extract_email_addresses_with_indices(text, options = {}) ⇒ Object

Like Uts58::Extractor#extract_email_addresses_with_indices, but with overlapping results merged.



40
41
42
43
44
# File 'lib/uts58.rb', line 40

def extract_email_addresses_with_indices(text, options = {})
  extractor.remove_overlapping_entities(
    extractor.extract_email_addresses_with_indices(text, options)
  )
end

.extract_entities(text, options = {}) ⇒ Object

Like ::extract_entities_with_indices, but flattened to the bare URL strings, in the order they occur. Email addresses appear in their mailto: form, e.g. “contact info@example.com or look at example.com” returns ["mailto:info@example.com", "https://example.com"].



74
75
76
# File 'lib/uts58.rb', line 74

def extract_entities(text, options = {})
  extract_entities_with_indices(text, options).map { |e| e[:url] }
end

.extract_entities_with_indices(text, options = {}) ⇒ Object

Both the URLs and email addresses in text, as one list of mixed-shape hashes — { url:, indices: } for links and { email:, indices: } for addresses — sorted by start offset with overlaps removed. The name and mixed-shape return follow Twitter::TwitterText::Extractor#extract_entities_with_indices.

Overlap is the point of going through here rather than calling the two extractors yourself: “contact info@grå.org today” yields both an email and the bare domain grå.org, and only one of those should survive. The earlier-starting candidate (the email) wins.



62
63
64
65
66
67
# File 'lib/uts58.rb', line 62

def extract_entities_with_indices(text, options = {})
  extractor.remove_overlapping_entities(
    extractor.extract_urls_with_indices(text, options) +
    extractor.extract_email_addresses_with_indices(text, options)
  )
end

.extract_urls(text, options = {}) ⇒ Object

Like Uts58::Extractor#extract_urls, but with the URLs of overlapping results merged.



34
35
36
# File 'lib/uts58.rb', line 34

def extract_urls(text, options = {})
  extract_urls_with_indices(text, options).map { |r| r[:url] }
end

.extract_urls_with_indices(text, options = {}) ⇒ Object

Like Uts58::Extractor#extract_urls_with_indices, but with overlapping results merged via Uts58::Extractor#remove_overlapping_entities.



26
27
28
29
30
# File 'lib/uts58.rb', line 26

def extract_urls_with_indices(text, options = {})
  extractor.remove_overlapping_entities(
    extractor.extract_urls_with_indices(text, options)
  )
end