Class: Moxml::EntityRegistry
- Inherits:
-
Object
- Object
- Moxml::EntityRegistry
- Defined in:
- lib/moxml/entity_registry.rb
Overview
EntityRegistry maintains a knowledge base of XML entity definitions.
Data source: W3C XML Core WG Character Entities (bundled) www.w3.org/2003/entities/2007/htmlmathml
The W3C entity data is bundled in data/w3c_entities.json and loaded from the gem’s data directory. For development, MOXML_ENTITY_DEFINITIONS_PATH can be set to an external copy.
Per W3C XML Core WG guidance:
-
Character entities are XML internal general entities providing a name for a single Unicode character
-
Standard XML entities (amp, lt, gt, quot, apos) are implicitly declared per XML specification
-
External entity sets (like HTML, MathML) can be referenced via DTD parameter entities
Defined Under Namespace
Classes: EntityDataError
Constant Summary collapse
- ENTITY_DATA_FILE =
W3C entity data file name
"w3c_entities.json"
Instance Attribute Summary collapse
-
#by_codepoint ⇒ Hash{Integer => Array<String>}
readonly
Codepoint to entity names mapping.
-
#by_name ⇒ Hash{String => Integer}
readonly
Entity name to codepoint mapping.
Class Method Summary collapse
-
.default ⇒ EntityRegistry
Get the default registry instance (lazy loaded).
-
.entity_data ⇒ Hash{String => String}
Get the raw entity data from the bundled JSON source.
-
.reset ⇒ void
Reset the default registry (mainly for testing).
Instance Method Summary collapse
-
#clear! ⇒ self
Clear all entities (reset to empty).
-
#codepoint_for_name(name) ⇒ Integer?
Get the Unicode codepoint for an entity name.
-
#declared?(name) ⇒ Boolean
Check if an entity name is declared.
-
#initialize(mode: :required, entity_provider: nil) ⇒ EntityRegistry
constructor
A new instance of EntityRegistry.
-
#load_all ⇒ self
Load all standard entity sets.
-
#load_html5 ⇒ self
Load all entities from the W3C HTMLMathML entity set This is called automatically by initialize.
-
#load_iso(_set_name = :iso8879) ⇒ self
Load ISO entity sets (included in HTMLMathML).
-
#load_mathml ⇒ self
Load MathML entity set (included in HTMLMathML).
-
#names_for_codepoint(codepoint) ⇒ Array<String>
Get all entity names for a codepoint.
-
#primary_name_for_codepoint(codepoint) ⇒ String?
Get the primary (preferred) entity name for a codepoint.
-
#register(entities) ⇒ self
Register additional entities.
Constructor Details
#initialize(mode: :required, entity_provider: nil) ⇒ EntityRegistry
Returns a new instance of EntityRegistry.
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
# File 'lib/moxml/entity_registry.rb', line 110 def initialize(mode: :required, entity_provider: nil) @by_name = {} @by_codepoint = Hash.new { |h, k| h[k] = [] } @mode = mode @entity_provider = entity_provider case mode when :required load_from_entity_data when :optional load_from_entity_data_optional when :custom load_custom_entities when :disabled # Don't load anything - empty registry end end |
Instance Attribute Details
#by_codepoint ⇒ Hash{Integer => Array<String>} (readonly)
Returns codepoint to entity names mapping.
106 107 108 |
# File 'lib/moxml/entity_registry.rb', line 106 def by_codepoint @by_codepoint end |
#by_name ⇒ Hash{String => Integer} (readonly)
Returns entity name to codepoint mapping.
103 104 105 |
# File 'lib/moxml/entity_registry.rb', line 103 def by_name @by_name end |
Class Method Details
.default ⇒ EntityRegistry
Get the default registry instance (lazy loaded)
38 39 40 |
# File 'lib/moxml/entity_registry.rb', line 38 def default @default ||= new end |
.entity_data ⇒ Hash{String => String}
Get the raw entity data from the bundled JSON source
32 33 34 |
# File 'lib/moxml/entity_registry.rb', line 32 def entity_data @entity_data ||= load_entity_data end |
.reset ⇒ void
This method returns an undefined value.
Reset the default registry (mainly for testing)
44 45 46 47 |
# File 'lib/moxml/entity_registry.rb', line 44 def reset @default = nil @entity_data = nil end |
Instance Method Details
#clear! ⇒ self
Clear all entities (reset to empty)
200 201 202 203 204 |
# File 'lib/moxml/entity_registry.rb', line 200 def clear! @by_name = {} @by_codepoint = Hash.new { |h, k| h[k] = [] } self end |
#codepoint_for_name(name) ⇒ Integer?
Get the Unicode codepoint for an entity name
138 139 140 |
# File 'lib/moxml/entity_registry.rb', line 138 def codepoint_for_name(name) @by_name[name] end |
#declared?(name) ⇒ Boolean
Check if an entity name is declared
131 132 133 |
# File 'lib/moxml/entity_registry.rb', line 131 def declared?(name) @by_name.key?(name) end |
#load_all ⇒ self
Load all standard entity sets
193 194 195 196 |
# File 'lib/moxml/entity_registry.rb', line 193 def load_all # All entities are loaded by default from initialize self end |
#load_html5 ⇒ self
Load all entities from the W3C HTMLMathML entity set This is called automatically by initialize
171 172 173 174 |
# File 'lib/moxml/entity_registry.rb', line 171 def load_html5 # All entities are loaded by default from initialize self end |
#load_iso(_set_name = :iso8879) ⇒ self
Load ISO entity sets (included in HTMLMathML)
186 187 188 189 |
# File 'lib/moxml/entity_registry.rb', line 186 def load_iso(_set_name = :iso8879) # All entities are loaded by default from initialize self end |
#load_mathml ⇒ self
Load MathML entity set (included in HTMLMathML)
178 179 180 181 |
# File 'lib/moxml/entity_registry.rb', line 178 def load_mathml # All entities are loaded by default from initialize self end |
#names_for_codepoint(codepoint) ⇒ Array<String>
Get all entity names for a codepoint
145 146 147 |
# File 'lib/moxml/entity_registry.rb', line 145 def names_for_codepoint(codepoint) @by_codepoint[codepoint] end |
#primary_name_for_codepoint(codepoint) ⇒ String?
Get the primary (preferred) entity name for a codepoint
152 153 154 |
# File 'lib/moxml/entity_registry.rb', line 152 def primary_name_for_codepoint(codepoint) @by_codepoint[codepoint]&.first end |
#register(entities) ⇒ self
Register additional entities
159 160 161 162 163 164 165 166 |
# File 'lib/moxml/entity_registry.rb', line 159 def register(entities) entities.each do |name, codepoint| @by_name[name] = codepoint @by_codepoint[codepoint] ||= [] @by_codepoint[codepoint] << name unless @by_codepoint[codepoint].include?(name) end self end |