Class: Moxml::EntityRegistry

Inherits:
Object
  • Object
show all
Defined in:
lib/moxml/entity_registry.rb

Overview

EntityRegistry maintains a knowledge base of XML entity definitions.

Data source: W3C XML Core WG Character Entities (bundled) www.w3.org/2003/entities/2007/htmlmathml

The W3C entity data is bundled in data/w3c_entities.json and loaded from the gem’s data directory. For development, MOXML_ENTITY_DEFINITIONS_PATH can be set to an external copy.

Per W3C XML Core WG guidance:

  • Character entities are XML internal general entities providing a name for a single Unicode character

  • Standard XML entities (amp, lt, gt, quot, apos) are implicitly declared per XML specification

  • External entity sets (like HTML, MathML) can be referenced via DTD parameter entities

Examples:

Basic usage

registry = EntityRegistry.new
registry.declared?("amp")  # => true
registry.codepoint_for_name("amp")  # => 38

Defined Under Namespace

Classes: EntityDataError

Constant Summary collapse

ENTITY_DATA_FILE =

W3C entity data file name

"w3c_entities.json"

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(mode: :required, entity_provider: nil) ⇒ EntityRegistry

Returns a new instance of EntityRegistry.

Parameters:

  • mode (Symbol) (defaults to: :required)

    Loading mode: :required, :optional, :disabled, :custom

  • entity_provider (Proc, nil) (defaults to: nil)

    Custom entity provider proc/lambda



110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/moxml/entity_registry.rb', line 110

def initialize(mode: :required, entity_provider: nil)
  @by_name = {}
  @by_codepoint = Hash.new { |h, k| h[k] = [] }
  @mode = mode
  @entity_provider = entity_provider

  case mode
  when :required
    load_from_entity_data
  when :optional
    load_from_entity_data_optional
  when :custom
    load_custom_entities
  when :disabled
    # Don't load anything - empty registry
  end
end

Instance Attribute Details

#by_codepointHash{Integer => Array<String>} (readonly)

Returns codepoint to entity names mapping.

Returns:

  • (Hash{Integer => Array<String>})

    codepoint to entity names mapping



106
107
108
# File 'lib/moxml/entity_registry.rb', line 106

def by_codepoint
  @by_codepoint
end

#by_nameHash{String => Integer} (readonly)

Returns entity name to codepoint mapping.

Returns:

  • (Hash{String => Integer})

    entity name to codepoint mapping



103
104
105
# File 'lib/moxml/entity_registry.rb', line 103

def by_name
  @by_name
end

Class Method Details

.defaultEntityRegistry

Get the default registry instance (lazy loaded)

Returns:



38
39
40
# File 'lib/moxml/entity_registry.rb', line 38

def default
  @default ||= new
end

.entity_dataHash{String => String}

Get the raw entity data from the bundled JSON source

Returns:

  • (Hash{String => String})

    entity name to character mapping



32
33
34
# File 'lib/moxml/entity_registry.rb', line 32

def entity_data
  @entity_data ||= load_entity_data
end

.resetvoid

This method returns an undefined value.

Reset the default registry (mainly for testing)



44
45
46
47
# File 'lib/moxml/entity_registry.rb', line 44

def reset
  @default = nil
  @entity_data = nil
end

Instance Method Details

#clear!self

Clear all entities (reset to empty)

Returns:

  • (self)


200
201
202
203
204
# File 'lib/moxml/entity_registry.rb', line 200

def clear!
  @by_name = {}
  @by_codepoint = Hash.new { |h, k| h[k] = [] }
  self
end

#codepoint_for_name(name) ⇒ Integer?

Get the Unicode codepoint for an entity name

Parameters:

  • name (String)

    entity name

Returns:

  • (Integer, nil)

    codepoint or nil if not found



138
139
140
# File 'lib/moxml/entity_registry.rb', line 138

def codepoint_for_name(name)
  @by_name[name]
end

#declared?(name) ⇒ Boolean

Check if an entity name is declared

Parameters:

  • name (String)

    entity name (e.g., “amp”, “nbsp”)

Returns:

  • (Boolean)


131
132
133
# File 'lib/moxml/entity_registry.rb', line 131

def declared?(name)
  @by_name.key?(name)
end

#load_allself

Load all standard entity sets

Returns:

  • (self)


193
194
195
196
# File 'lib/moxml/entity_registry.rb', line 193

def load_all
  # All entities are loaded by default from initialize
  self
end

#load_html5self

Load all entities from the W3C HTMLMathML entity set This is called automatically by initialize

Returns:

  • (self)


171
172
173
174
# File 'lib/moxml/entity_registry.rb', line 171

def load_html5
  # All entities are loaded by default from initialize
  self
end

#load_iso(_set_name = :iso8879) ⇒ self

Load ISO entity sets (included in HTMLMathML)

Parameters:

  • _set_name (Symbol) (defaults to: :iso8879)

    (ignored, all loaded together)

Returns:

  • (self)


186
187
188
189
# File 'lib/moxml/entity_registry.rb', line 186

def load_iso(_set_name = :iso8879)
  # All entities are loaded by default from initialize
  self
end

#load_mathmlself

Load MathML entity set (included in HTMLMathML)

Returns:

  • (self)


178
179
180
181
# File 'lib/moxml/entity_registry.rb', line 178

def load_mathml
  # All entities are loaded by default from initialize
  self
end

#names_for_codepoint(codepoint) ⇒ Array<String>

Get all entity names for a codepoint

Parameters:

  • codepoint (Integer)

    Unicode codepoint

Returns:

  • (Array<String>)

    entity names mapping to this codepoint



145
146
147
# File 'lib/moxml/entity_registry.rb', line 145

def names_for_codepoint(codepoint)
  @by_codepoint[codepoint]
end

#primary_name_for_codepoint(codepoint) ⇒ String?

Get the primary (preferred) entity name for a codepoint

Parameters:

  • codepoint (Integer)

    Unicode codepoint

Returns:

  • (String, nil)

    primary entity name or nil



152
153
154
# File 'lib/moxml/entity_registry.rb', line 152

def primary_name_for_codepoint(codepoint)
  @by_codepoint[codepoint]&.first
end

#register(entities) ⇒ self

Register additional entities

Parameters:

  • entities (Hash{String => Integer})

    name => codepoint mapping

Returns:

  • (self)


159
160
161
162
163
164
165
166
# File 'lib/moxml/entity_registry.rb', line 159

def register(entities)
  entities.each do |name, codepoint|
    @by_name[name] = codepoint
    @by_codepoint[codepoint] ||= []
    @by_codepoint[codepoint] << name unless @by_codepoint[codepoint].include?(name)
  end
  self
end