Class: Moxml::EntityRegistry
- Inherits:
-
Object
- Object
- Moxml::EntityRegistry
- Defined in:
- lib/moxml/entity_registry.rb
Overview
EntityRegistry maintains a knowledge base of XML entity definitions.
Data source: W3C XML Core WG Character Entities (bundled) www.w3.org/2003/entities/2007/htmlmathml
The W3C entity data is bundled in data/w3c_entities.json and loaded from the gem’s data directory. For development, MOXML_ENTITY_DEFINITIONS_PATH can be set to an external copy.
Per W3C XML Core WG guidance:
-
Character entities are XML internal general entities providing a name for a single Unicode character
-
Standard XML entities (amp, lt, gt, quot, apos) are implicitly declared per XML specification
-
External entity sets (like HTML, MathML) can be referenced via DTD parameter entities
Defined Under Namespace
Classes: EntityDataError
Constant Summary collapse
- ENTITY_DATA_FILE =
W3C entity data file name
"w3c_entities.json"- STANDARD_CODEPOINTS =
Standard XML predefined entities (XML spec §4.6)
Set[0x26, 0x3C, 0x3E, 0x22, 0x27].freeze
Instance Attribute Summary collapse
-
#by_codepoint ⇒ Hash{Integer => Array<String>}
readonly
Codepoint to entity names mapping.
-
#by_name ⇒ Hash{String => Integer}
readonly
Entity name to codepoint mapping.
Class Method Summary collapse
-
.default ⇒ EntityRegistry
Get the default registry instance (lazy loaded).
-
.entity_data ⇒ Hash{String => String}
Get the raw entity data from the bundled JSON source.
-
.reset ⇒ void
Reset the default registry (mainly for testing).
Instance Method Summary collapse
-
#clear! ⇒ self
Clear all entities (reset to empty).
-
#codepoint_for_name(name) ⇒ Integer?
Get the Unicode codepoint for an entity name.
-
#declared?(name) ⇒ Boolean
Check if an entity name is declared.
-
#initialize(mode: :required, entity_provider: nil) ⇒ EntityRegistry
constructor
A new instance of EntityRegistry.
-
#load_all ⇒ self
Load all standard entity sets.
-
#load_html5 ⇒ self
Load all entities from the W3C HTMLMathML entity set This is called automatically by initialize.
-
#load_iso(_set_name = :iso8879) ⇒ self
Load ISO entity sets (included in HTMLMathML).
-
#load_mathml ⇒ self
Load MathML entity set (included in HTMLMathML).
-
#names_for_codepoint(codepoint) ⇒ Array<String>
Get all entity names for a codepoint.
-
#primary_name_for_codepoint(codepoint) ⇒ String?
Get the primary (preferred) entity name for a codepoint.
-
#register(entities) ⇒ self
Register additional entities.
-
#restorable_codepoints ⇒ Set<Integer>
Returns the set of codepoints that could potentially be restored as entities.
-
#should_restore?(codepoint, config:) ⇒ Boolean
Determine if an entity reference should be restored for a codepoint.
-
#standard_entity?(codepoint) ⇒ Boolean
Check if a codepoint is one of the 5 standard XML predefined entities.
Constructor Details
#initialize(mode: :required, entity_provider: nil) ⇒ EntityRegistry
Returns a new instance of EntityRegistry.
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
# File 'lib/moxml/entity_registry.rb', line 114 def initialize(mode: :required, entity_provider: nil) @by_name = {} @by_codepoint = Hash.new { |h, k| h[k] = [] } @mode = mode @entity_provider = entity_provider case mode when :required load_from_entity_data when :optional load_from_entity_data_optional when :custom load_custom_entities when :disabled # Don't load anything - empty registry end end |
Instance Attribute Details
#by_codepoint ⇒ Hash{Integer => Array<String>} (readonly)
Returns codepoint to entity names mapping.
110 111 112 |
# File 'lib/moxml/entity_registry.rb', line 110 def by_codepoint @by_codepoint end |
#by_name ⇒ Hash{String => Integer} (readonly)
Returns entity name to codepoint mapping.
107 108 109 |
# File 'lib/moxml/entity_registry.rb', line 107 def by_name @by_name end |
Class Method Details
.default ⇒ EntityRegistry
Get the default registry instance (lazy loaded)
42 43 44 |
# File 'lib/moxml/entity_registry.rb', line 42 def default @default ||= new end |
.entity_data ⇒ Hash{String => String}
Get the raw entity data from the bundled JSON source
36 37 38 |
# File 'lib/moxml/entity_registry.rb', line 36 def entity_data @entity_data ||= load_entity_data end |
.reset ⇒ void
This method returns an undefined value.
Reset the default registry (mainly for testing)
48 49 50 51 |
# File 'lib/moxml/entity_registry.rb', line 48 def reset @default = nil @entity_data = nil end |
Instance Method Details
#clear! ⇒ self
Clear all entities (reset to empty)
236 237 238 239 240 |
# File 'lib/moxml/entity_registry.rb', line 236 def clear! @by_name = {} @by_codepoint = Hash.new { |h, k| h[k] = [] } self end |
#codepoint_for_name(name) ⇒ Integer?
Get the Unicode codepoint for an entity name
142 143 144 |
# File 'lib/moxml/entity_registry.rb', line 142 def codepoint_for_name(name) @by_name[name] end |
#declared?(name) ⇒ Boolean
Check if an entity name is declared
135 136 137 |
# File 'lib/moxml/entity_registry.rb', line 135 def declared?(name) @by_name.key?(name) end |
#load_all ⇒ self
Load all standard entity sets
229 230 231 232 |
# File 'lib/moxml/entity_registry.rb', line 229 def load_all # All entities are loaded by default from initialize self end |
#load_html5 ⇒ self
Load all entities from the W3C HTMLMathML entity set This is called automatically by initialize
207 208 209 210 |
# File 'lib/moxml/entity_registry.rb', line 207 def load_html5 # All entities are loaded by default from initialize self end |
#load_iso(_set_name = :iso8879) ⇒ self
Load ISO entity sets (included in HTMLMathML)
222 223 224 225 |
# File 'lib/moxml/entity_registry.rb', line 222 def load_iso(_set_name = :iso8879) # All entities are loaded by default from initialize self end |
#load_mathml ⇒ self
Load MathML entity set (included in HTMLMathML)
214 215 216 217 |
# File 'lib/moxml/entity_registry.rb', line 214 def load_mathml # All entities are loaded by default from initialize self end |
#names_for_codepoint(codepoint) ⇒ Array<String>
Get all entity names for a codepoint
149 150 151 |
# File 'lib/moxml/entity_registry.rb', line 149 def names_for_codepoint(codepoint) @by_codepoint[codepoint] end |
#primary_name_for_codepoint(codepoint) ⇒ String?
Get the primary (preferred) entity name for a codepoint
156 157 158 |
# File 'lib/moxml/entity_registry.rb', line 156 def primary_name_for_codepoint(codepoint) @by_codepoint[codepoint]&.first end |
#register(entities) ⇒ self
Register additional entities
195 196 197 198 199 200 201 202 |
# File 'lib/moxml/entity_registry.rb', line 195 def register(entities) entities.each do |name, codepoint| @by_name[name] = codepoint @by_codepoint[codepoint] ||= [] @by_codepoint[codepoint] << name unless @by_codepoint[codepoint].include?(name) end self end |
#restorable_codepoints ⇒ Set<Integer>
Returns the set of codepoints that could potentially be restored as entities. Used by DocumentBuilder for O(1) fast-path checks.
184 185 186 187 188 189 190 |
# File 'lib/moxml/entity_registry.rb', line 184 def restorable_codepoints @restorable_codepoints ||= if @by_name.empty? STANDARD_CODEPOINTS else Set.new(@by_name.values).freeze end end |
#should_restore?(codepoint, config:) ⇒ Boolean
Determine if an entity reference should be restored for a codepoint. Standard XML entities are always restored (required by XML spec). Non-standard entities are only restored when restore_entities is enabled.
173 174 175 176 177 178 179 |
# File 'lib/moxml/entity_registry.rb', line 173 def should_restore?(codepoint, config:) name = primary_name_for_codepoint(codepoint) return false unless name return true if standard_entity?(codepoint) config.restore_entities end |
#standard_entity?(codepoint) ⇒ Boolean
Check if a codepoint is one of the 5 standard XML predefined entities
163 164 165 |
# File 'lib/moxml/entity_registry.rb', line 163 def standard_entity?(codepoint) STANDARD_CODEPOINTS.include?(codepoint) end |