Module: Uts58
- Defined in:
- lib/uts58.rb,
lib/uts58/version.rb,
lib/uts58/constants.rb,
lib/uts58/extractor.rb
Overview
Generated by tools/maketables.rb from the UTS58 17.0.0 data files at www.unicode.org/Public/17.0.0/linkification/ . Do not edit by hand; rerun the generator.
Defined Under Namespace
Modules: Constants Classes: Extractor
Constant Summary collapse
- VERSION =
"0.2.3"
Class Method Summary collapse
-
.extract_email_addresses(text, options = {}) ⇒ Object
Like Uts58::Extractor#extract_email_addresses, but with overlapping results merged.
-
.extract_email_addresses_with_indices(text, options = {}) ⇒ Object
Like Uts58::Extractor#extract_email_addresses_with_indices, but with overlapping results merged.
-
.extract_entities(text, options = {}) ⇒ Object
Like ::extract_entities_with_indices, but flattened to the bare URL strings, in the order they occur.
-
.extract_entities_with_indices(text, options = {}) ⇒ Object
Both the URLs and email addresses in
text, as one list of mixed-shape hashes —{ url:, indices: }for links and{ email:, indices: }for addresses — sorted by start offset with overlaps removed. -
.extract_urls(text, options = {}) ⇒ Object
Like Uts58::Extractor#extract_urls, but with the URLs of overlapping results merged.
-
.extract_urls_with_indices(text, options = {}) ⇒ Object
Like Uts58::Extractor#extract_urls_with_indices, but with overlapping results merged via Uts58::Extractor#remove_overlapping_entities.
Class Method Details
.extract_email_addresses(text, options = {}) ⇒ Object
Like Uts58::Extractor#extract_email_addresses, but with overlapping results merged.
48 49 50 |
# File 'lib/uts58.rb', line 48 def extract_email_addresses(text, = {}) extract_email_addresses_with_indices(text, ).map { |r| r[:email] } end |
.extract_email_addresses_with_indices(text, options = {}) ⇒ Object
Like Uts58::Extractor#extract_email_addresses_with_indices, but with overlapping results merged.
40 41 42 43 44 |
# File 'lib/uts58.rb', line 40 def extract_email_addresses_with_indices(text, = {}) extractor.remove_overlapping_entities( extractor.extract_email_addresses_with_indices(text, ) ) end |
.extract_entities(text, options = {}) ⇒ Object
Like ::extract_entities_with_indices, but flattened to the bare URL strings, in the order they occur. Email addresses appear in their mailto: form, e.g. “contact info@example.com or look at example.com” returns ["mailto:info@example.com", "https://example.com"].
74 75 76 |
# File 'lib/uts58.rb', line 74 def extract_entities(text, = {}) extract_entities_with_indices(text, ).map { |e| e[:url] } end |
.extract_entities_with_indices(text, options = {}) ⇒ Object
Both the URLs and email addresses in text, as one list of mixed-shape hashes — { url:, indices: } for links and { email:, indices: } for addresses — sorted by start offset with overlaps removed. The name and mixed-shape return follow Twitter::TwitterText::Extractor#extract_entities_with_indices.
Overlap is the point of going through here rather than calling the two extractors yourself: “contact info@grå.org today” yields both an email and the bare domain grå.org, and only one of those should survive. The earlier-starting candidate (the email) wins.
62 63 64 65 66 67 |
# File 'lib/uts58.rb', line 62 def extract_entities_with_indices(text, = {}) extractor.remove_overlapping_entities( extractor.extract_urls_with_indices(text, ) + extractor.extract_email_addresses_with_indices(text, ) ) end |
.extract_urls(text, options = {}) ⇒ Object
Like Uts58::Extractor#extract_urls, but with the URLs of overlapping results merged.
34 35 36 |
# File 'lib/uts58.rb', line 34 def extract_urls(text, = {}) extract_urls_with_indices(text, ).map { |r| r[:url] } end |
.extract_urls_with_indices(text, options = {}) ⇒ Object
Like Uts58::Extractor#extract_urls_with_indices, but with overlapping results merged via Uts58::Extractor#remove_overlapping_entities.
26 27 28 29 30 |
# File 'lib/uts58.rb', line 26 def extract_urls_with_indices(text, = {}) extractor.remove_overlapping_entities( extractor.extract_urls_with_indices(text, ) ) end |