Class: Scrapetor::TextNode

Inherits:
String
  • Object
show all
Defined in:
lib/scrapetor/text_node.rb

Overview

Result type for ‘::text` and `::attr(name)` pseudo-element queries.

Scrapy / Parsel-style code expects strings directly from these selectors (‘doc.css(“h3::text”).get`), but Nokogiri-style scrapers routinely chain a `.text` / `.content` accessor onto each result (`doc.css(“h3::text”).first.text` or `node.at(“a::attr(href)”).text`). Returning a bare String breaks the Nokogiri-style call path with NoMethodError, even though the String already is the text we would have returned.

TextNode is a thin String subclass that closes the gap: it equals, compares, splits, and concatenates exactly like a String, and adds the Node-shaped accessors (‘text`, `content`, `inner_text`, `name`, `element?`, `text?`) plus the Parsel-shaped `get` / `getall`. The underlying byte string is the actual text content; the extra methods all return self (or trivial derivatives), so chaining stays cheap.

Instance Attribute Summary collapse

Instance Method Summary collapse

Instance Attribute Details

#parent_nodeObject

Containing element (the node whose text/attribute this TextNode represents). Set by the css() boundary when we know the parent; left nil otherwise. Production code chains ‘result.at(::text).parent.css(…)` to navigate to siblings of the text node, mirroring the Nokogiri shape where text nodes carry a `.parent` back-reference.



63
64
65
# File 'lib/scrapetor/text_node.rb', line 63

def parent_node
  @parent_node
end

Instance Method Details

#[](*args) ⇒ Object



79
80
81
82
83
84
85
86
87
88
89
# File 'lib/scrapetor/text_node.rb', line 79

def [](*args)
  # String byte/range subscript when called with a single non-string
  # argument; nil for attribute-style String access.
  if args.size == 1 && args.first.is_a?(String)
    nil
  elsif args.size == 1 && args.first.is_a?(Symbol)
    nil
  else
    super
  end
end

#[]=(*_args) ⇒ Object



51
# File 'lib/scrapetor/text_node.rb', line 51

def []=(*_args);     nil; end

#add_class(_k) ⇒ Object



52
# File 'lib/scrapetor/text_node.rb', line 52

def add_class(_k);    self; end

#at(_selector) ⇒ Object



92
# File 'lib/scrapetor/text_node.rb', line 92

def at(_selector);          nil; end

#at_css(_selector) ⇒ Object



91
# File 'lib/scrapetor/text_node.rb', line 91

def at_css(_selector);      nil; end

#at_xpath(*_args) ⇒ Object



95
# File 'lib/scrapetor/text_node.rb', line 95

def at_xpath(*_args);       nil; end

#attribute(_name) ⇒ Object



74
# File 'lib/scrapetor/text_node.rb', line 74

def attribute(_name);       nil; end

#attribute_nodesObject



73
# File 'lib/scrapetor/text_node.rb', line 73

def attribute_nodes;        []; end

#attributesObject



72
# File 'lib/scrapetor/text_node.rb', line 72

def attributes;             {}; end

#cdata?Boolean

Returns:

  • (Boolean)


36
# File 'lib/scrapetor/text_node.rb', line 36

def cdata?;     false; end

#childrenObject



70
# File 'lib/scrapetor/text_node.rb', line 70

def children;               []; end

#classesObject



77
# File 'lib/scrapetor/text_node.rb', line 77

def classes;                []; end

#comment?Boolean

Returns:

  • (Boolean)


34
# File 'lib/scrapetor/text_node.rb', line 34

def comment?;   false; end

#content=(_v) ⇒ Object



50
# File 'lib/scrapetor/text_node.rb', line 50

def content=(_v);    _v; end

#css(_selector) ⇒ Object



90
# File 'lib/scrapetor/text_node.rb', line 90

def css(_selector);         []; end

#document?Boolean

Returns:

  • (Boolean)


35
# File 'lib/scrapetor/text_node.rb', line 35

def document?;  false; end

#element?Boolean

Returns:

  • (Boolean)


32
# File 'lib/scrapetor/text_node.rb', line 32

def element?;   false; end

#element_childrenObject



71
# File 'lib/scrapetor/text_node.rb', line 71

def element_children;       []; end

#getObject

Parsel-style accessors.



26
# File 'lib/scrapetor/text_node.rb', line 26

def get;        String.new(self); end

#getallObject



27
# File 'lib/scrapetor/text_node.rb', line 27

def getall;     [String.new(self)]; end

#has_class?(_klass) ⇒ Boolean

Returns:

  • (Boolean)


78
# File 'lib/scrapetor/text_node.rb', line 78

def has_class?(_klass);     false; end

#inner_html=(_v) ⇒ Object

No-op mutation API. Heterogeneous selectors like ‘.foo > ::text, .bar` can hand a TextNode to a caller that assumes an Element interface (e.g. `node.inner_html = node.inner_html.gsub(…)`). The reassignment would crash on bare String; we accept the write silently so the subsequent `.text` read still works. The mutation is intentionally dropped — TextNode wraps frozen content of the original element.



49
# File 'lib/scrapetor/text_node.rb', line 49

def inner_html=(_v); _v; end

#inspectObject



97
98
99
# File 'lib/scrapetor/text_node.rb', line 97

def inspect
  "#<Scrapetor::TextNode #{super}>"
end

#keysObject



75
# File 'lib/scrapetor/text_node.rb', line 75

def keys;                   []; end

#nameObject

Node-shape predicates so duck-typing checks (‘n.element?`, `n.text?`, `n.name == “#text”`) don’t blow up.



31
# File 'lib/scrapetor/text_node.rb', line 31

def name;       "#text"; end

#next_element_siblingObject



68
# File 'lib/scrapetor/text_node.rb', line 68

def next_element_sibling;   nil; end

#next_siblingObject



66
# File 'lib/scrapetor/text_node.rb', line 66

def next_sibling;           nil; end

#parentObject



65
# File 'lib/scrapetor/text_node.rb', line 65

def parent;                 @parent_node; end

#previous_element_siblingObject



69
# File 'lib/scrapetor/text_node.rb', line 69

def previous_element_sibling; nil; end

#previous_siblingObject



67
# File 'lib/scrapetor/text_node.rb', line 67

def previous_sibling;       nil; end

#removeObject



54
# File 'lib/scrapetor/text_node.rb', line 54

def remove;           self; end

#remove_class(*_) ⇒ Object



53
# File 'lib/scrapetor/text_node.rb', line 53

def remove_class(*_); self; end

#search(_selector) ⇒ Object



93
# File 'lib/scrapetor/text_node.rb', line 93

def search(_selector);      []; end

#textObject Also known as: inner_text, content



21
# File 'lib/scrapetor/text_node.rb', line 21

def text;       String.new(self); end

#text?Boolean

Returns:

  • (Boolean)


33
# File 'lib/scrapetor/text_node.rb', line 33

def text?;      true; end

#to_htmlObject Also known as: outer_html, inner_html



38
# File 'lib/scrapetor/text_node.rb', line 38

def to_html;    self.to_s; end


55
# File 'lib/scrapetor/text_node.rb', line 55

def unlink;           self; end

#valuesObject



76
# File 'lib/scrapetor/text_node.rb', line 76

def values;                 []; end

#xpath(*_args) ⇒ Object



94
# File 'lib/scrapetor/text_node.rb', line 94

def xpath(*_args);          []; end