Class: Alexandria::BookProviders::WebsiteBasedProvider
- Inherits:
-
GenericProvider
- Object
- AbstractProvider
- GenericProvider
- Alexandria::BookProviders::WebsiteBasedProvider
- Defined in:
- lib/alexandria/book_providers/website_based_provider.rb
Direct Known Subclasses
Instance Attribute Summary
Attributes inherited from AbstractProvider
Instance Method Summary collapse
- #html_to_doc(html, source_data_charset = "ISO-8859-1") ⇒ Object
-
#initialize(name, fullname = nil) ⇒ WebsiteBasedProvider
constructor
A new instance of WebsiteBasedProvider.
-
#text_of(node) ⇒ Object
from Palatina.
Methods inherited from AbstractProvider
#<=>, #abstract?, abstract?, #action_name, #enabled, #reinitialize, #remove, #toggle_enabled, #transport, unabstract, #variable_name
Constructor Details
#initialize(name, fullname = nil) ⇒ WebsiteBasedProvider
Returns a new instance of WebsiteBasedProvider.
32 33 34 35 |
# File 'lib/alexandria/book_providers/website_based_provider.rb', line 32 def initialize(name, fullname = nil) super @htmlentities = HTMLEntities.new end |
Instance Method Details
#html_to_doc(html, source_data_charset = "ISO-8859-1") ⇒ Object
37 38 39 40 41 42 |
# File 'lib/alexandria/book_providers/website_based_provider.rb', line 37 def html_to_doc(html, source_data_charset = "ISO-8859-1") html.force_encoding source_data_charset utf8_html = html.encode("utf-8") normalized_html = @htmlentities.decode(utf8_html) Nokogiri.parse(normalized_html) end |
#text_of(node) ⇒ Object
from Palatina
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
# File 'lib/alexandria/book_providers/website_based_provider.rb', line 45 def text_of(node) if node.nil? nil elsif node.text? node.to_html elsif node.elem? if node.children.nil? nil else node_text = node.children.map { |n| text_of(n) }.join node_text.strip.squeeze(" ") end end # node.inner_html.strip end |