Module: Coradoc
- Extended by:
- Configurable
- Defined in:
- lib/coradoc/coradoc.rb,
lib/coradoc.rb,
lib/coradoc/cli.rb,
lib/coradoc/hooks.rb,
lib/coradoc/input.rb,
lib/coradoc/query.rb,
lib/coradoc/errors.rb,
lib/coradoc/logger.rb,
lib/coradoc/output.rb,
lib/coradoc/version.rb,
lib/coradoc/visitor.rb,
lib/coradoc/registry.rb,
lib/coradoc/transform.rb,
lib/coradoc/core_model.rb,
lib/coradoc/validation.rb,
lib/coradoc/configurable.rb,
lib/coradoc/format_module.rb,
lib/coradoc/core_model/toc.rb,
lib/coradoc/core_model/base.rb,
lib/coradoc/core_model/term.rb,
lib/coradoc/core_model/block.rb,
lib/coradoc/core_model/image.rb,
lib/coradoc/core_model/table.rb,
lib/coradoc/document_builder.rb,
lib/coradoc/core_model/builder.rb,
lib/coradoc/processor_registry.rb,
lib/coradoc/core_model/footnote.rb,
lib/coradoc/core_model/metadata.rb,
lib/coradoc/serializer/registry.rb,
lib/coradoc/core_model/list_item.rb,
lib/coradoc/document_manipulator.rb,
lib/coradoc/core_model/list_block.rb,
lib/coradoc/core_model/bibliography.rb,
lib/coradoc/core_model/toc_generator.rb,
lib/coradoc/core_model/inline_element.rb,
lib/coradoc/core_model/definition_item.rb,
lib/coradoc/core_model/definition_list.rb,
lib/coradoc/core_model/annotation_block.rb,
lib/coradoc/core_model/children_content.rb,
lib/coradoc/core_model/builder/detection.rb,
lib/coradoc/core_model/element_attribute.rb,
lib/coradoc/core_model/bibliography_entry.rb,
lib/coradoc/core_model/structural_element.rb,
lib/coradoc/core_model/builder/list_builder.rb,
lib/coradoc/core_model/builder/text_builder.rb,
lib/coradoc/core_model/builder/block_builder.rb,
lib/coradoc/core_model/builder/element_builder.rb
Overview
Coradoc - A hub-and-spoke document transformation library
Coradoc provides a unified document model (CoreModel) and transformation infrastructure for converting between document formats such as AsciiDoc, HTML, and Markdown.
## Architecture
Coradoc uses a hub-and-spoke architecture where CoreModel acts as the canonical document representation. Each format (AsciiDoc, HTML, Markdown) has its own model and transformers to/from CoreModel.
“‘ Source Format → Source Model → CoreModel → Target Model → Target Format “`
## Quick Start
Defined Under Namespace
Modules: Configurable, CoreModel, FormatModule, Hooks, Input, Output, ProcessorRegistry, Query, Serializer, Transform, Validation, Visitor Classes: CLI, DocumentBuilder, DocumentManipulator, Error, FileNotFoundError, Logger, ParseError, Registry, TransformationError, UnsupportedFormatError, ValidationError
Constant Summary collapse
- ERROR_SUGGESTIONS =
Suggestion patterns for common parsing errors
These patterns are matched against error messages and source content to provide helpful suggestions for fixing common issues.
[ { pattern: /unterminated.*string|unexpected.*end.*of.*input|expected.*["']/i, suggestion: 'Check for unclosed quotes or strings', examples: ["'text'", '"text"'] }, { pattern: /unexpected.*indentation|indentation.*error|inconsistent.*indent/i, suggestion: 'Check indentation - use consistent spaces or tabs', examples: [' indented line', ' nested item'] }, { pattern: /missing.*separator|expected.*delimiter|missing.*comma/i, suggestion: 'Add missing separator between elements', examples: ['item1, item2', 'key: value'] }, { pattern: /invalid.*attribute|unknown.*attribute|attribute.*not.*allowed/i, suggestion: 'Check attribute spelling and allowed values', examples: ['[role=example]', '[source,ruby]'] }, { pattern: /invalid.*heading|heading.*level|expected.*heading/i, suggestion: 'Use valid heading syntax with = or # markers', examples: ['= Level 1', '== Level 2', '### Level 3'] }, { pattern: /invalid.*list|list.*marker|expected.*list.*item/i, suggestion: 'Use correct list markers (*, -, ., or numbered)', examples: ['* bullet', '. ordered', 'term:: definition'] }, { pattern: /invalid.*link|malformed.*url|link.*syntax/i, suggestion: 'Use correct link syntax: text[url] or link:url[]', examples: ['Google[https://google.com]', 'link:file.adoc[]'] }, { pattern: /invalid.*table|table.*delimiter|expected.*separator/i, suggestion: 'Check table syntax with | delimiters', examples: ["|===\n| Cell 1 | Cell 2\n|==="] }, { pattern: /invalid.*block|block.*delimiter|unterminated.*block/i, suggestion: 'Ensure block delimiters match (----, ****, ====, etc.)', examples: ["----\ncode\n----", "====\nexample\n===="] }, { pattern: /invalid.*macro|unknown.*macro|macro.*syntax/i, suggestion: 'Check macro syntax: name:target[attributes]', examples: ['include::file.adoc[]', 'image::image.png[]'] } ].freeze
- VERSION =
'2.0.1'
Class Method Summary collapse
-
.binary_format?(format) ⇒ Boolean
Check if a format requires binary (file path) input.
- .build(&block) ⇒ Object
-
.config ⇒ Configuration
Shortcut to configuration.
-
.configure {|Configuration| ... } ⇒ void
Shortcut to configure.
-
.convert(text, from:, to:, **options) ⇒ String
Convert document text from one format to another.
-
.convert_file(path, from: nil, to:, **options) ⇒ String
Convert a file from one format to another.
-
.describe_element(elem) ⇒ String
Describe an element for display.
-
.detect_format(filename) ⇒ Symbol?
Detect format from a file extension.
-
.document_stats(doc) ⇒ Hash
Gather statistics about a parsed document.
-
.file_info(path) ⇒ Hash
Get file metadata for display.
-
.format_capabilities ⇒ Hash<Symbol, Hash<Symbol, Boolean>>
Get capability summary for all registered formats.
-
.get_format(format_name) ⇒ Module?
Get a registered format.
-
.manipulate(document) ⇒ DocumentManipulator
Create a DocumentManipulator for chainable operations.
-
.normalize_format(name) ⇒ Symbol?
Normalize a format name string to a symbol.
-
.parse(text, format:) ⇒ Coradoc::CoreModel::Base, Object
Parse text to a document model.
-
.parse_file(path, format: nil) ⇒ Coradoc::CoreModel::Base
Parse a document from a file path.
-
.parse_format?(format) ⇒ Boolean
Check if a format supports parsing (reading input).
-
.register_format(format_name, format_module, **options) ⇒ void
Register a format gem.
-
.registered_formats ⇒ Array<Symbol>
List all registered formats.
-
.registry ⇒ Registry
Get the format registry.
-
.resolve_output_format(output_file, default: :html) ⇒ Symbol
Resolve the output format from a filename, with a default.
-
.serialize(model, to:, **options) ⇒ String
Serialize a CoreModel to a specific format.
-
.serialize_format?(format) ⇒ Boolean
Check if a format supports serialization (writing output).
-
.strip_unicode(string, only: nil) ⇒ String
Strip unicode whitespace from a string.
-
.to_core(model) ⇒ Coradoc::CoreModel::Base
Transform a model to CoreModel.
-
.validate_file(path, format: nil) ⇒ Coradoc::Validation::Result
Validate a document file.
Methods included from Configurable
load_configuration, reset_configuration!
Class Method Details
.binary_format?(format) ⇒ Boolean
Check if a format requires binary (file path) input
258 259 260 261 |
# File 'lib/coradoc/coradoc.rb', line 258 def binary_format?(format) opts = registry.(format) opts&.fetch(:binary, false) == true end |
.build(&block) ⇒ Object
191 192 193 |
# File 'lib/coradoc/document_builder.rb', line 191 def self.build(&block) DocumentBuilder.build(&block) end |
.config ⇒ Configuration
Shortcut to configuration
512 513 514 |
# File 'lib/coradoc/configurable.rb', line 512 def self.config Configurable.configuration end |
.configure {|Configuration| ... } ⇒ void
This method returns an undefined value.
Shortcut to configure
520 521 522 |
# File 'lib/coradoc/configurable.rb', line 520 def self.configure(&block) Configurable.configure(&block) if block_given? end |
.convert(text, from:, to:, **options) ⇒ String
Convert document text from one format to another
This is the main entry point for format conversion. It handles the complete pipeline: parse -> transform to CoreModel -> transform to target -> serialize
135 136 137 138 139 140 141 |
# File 'lib/coradoc/coradoc.rb', line 135 def convert(text, from:, to:, **) # Parse to CoreModel core = parse(text, format: from) # Convert to target format serialize(core, to: to, **) end |
.convert_file(path, from: nil, to:, **options) ⇒ String
Convert a file from one format to another
246 247 248 249 250 251 252 |
# File 'lib/coradoc/coradoc.rb', line 246 def convert_file(path, from: nil, to:, **) source_format = from || detect_format(path) raise UnsupportedFormatError, "Could not detect format for: #{path}" unless source_format core = parse_file(path, format: source_format) serialize(core, to: to, **) end |
.describe_element(elem) ⇒ String
Describe an element for display
378 379 380 381 382 383 384 385 386 387 388 389 390 391 |
# File 'lib/coradoc/coradoc.rb', line 378 def describe_element(elem) return elem.to_s unless elem.is_a?(CoreModel::Base) type = elem.class.name.split('::').last if elem.respond_to?(:title) && elem.title "#{type}: #{elem.title}" elsif elem.respond_to?(:content) && elem.content preview = elem.content.to_s[0..50] preview += '...' if elem.content.to_s.length > 50 "#{type}: #{preview}" else type end end |
.detect_format(filename) ⇒ Symbol?
Detect format from a file extension
196 197 198 199 200 201 202 203 |
# File 'lib/coradoc/coradoc.rb', line 196 def detect_format(filename) ext = File.extname(filename).downcase registry.each do |name, _mod| opts = registry.(name) return name if opts[:extensions]&.include?(ext) end nil end |
.document_stats(doc) ⇒ Hash
Gather statistics about a parsed document
361 362 363 364 365 366 367 368 369 370 371 372 |
# File 'lib/coradoc/coradoc.rb', line 361 def document_stats(doc) stats = {} stats[:title] = doc.title if doc.respond_to?(:title) && doc.title if doc.respond_to?(:children) stats[:child_count] = count_elements(doc) stats[:element_counts] = count_element_types(doc) end stats end |
.file_info(path) ⇒ Hash
Get file metadata for display
332 333 334 335 336 337 |
# File 'lib/coradoc/coradoc.rb', line 332 def file_info(path) fmt = detect_format(path) info = { size: File.size(path), format: fmt } info[:lines] = File.read(path).lines.count unless binary_format?(fmt) info end |
.format_capabilities ⇒ Hash<Symbol, Hash<Symbol, Boolean>>
Get capability summary for all registered formats
Returns a hash mapping each format name to its capabilities (parse: bool, serialize: bool). Useful for CLI display and introspection.
308 309 310 311 312 313 314 315 |
# File 'lib/coradoc/coradoc.rb', line 308 def format_capabilities registered_formats.each_with_object({}) do |name, caps| caps[name] = { parse: parse_format?(name), serialize: serialize_format?(name) } end end |
.get_format(format_name) ⇒ Module?
Get a registered format
78 79 80 |
# File 'lib/coradoc/coradoc.rb', line 78 def get_format(format_name) registry.get(format_name) end |
.manipulate(document) ⇒ DocumentManipulator
Create a DocumentManipulator for chainable operations
184 185 186 |
# File 'lib/coradoc/coradoc.rb', line 184 def manipulate(document) DocumentManipulator.new(document) end |
.normalize_format(name) ⇒ Symbol?
Normalize a format name string to a symbol
Handles common aliases like “adoc” → :asciidoc, “md” → :markdown.
269 270 271 272 273 274 275 276 277 278 |
# File 'lib/coradoc/coradoc.rb', line 269 def normalize_format(name) return nil unless name key = name.to_s.downcase registry.each do |fmt_name, _mod| opts = registry.(fmt_name) return fmt_name if opts[:aliases]&.include?(key) end key.to_sym end |
.parse(text, format:) ⇒ Coradoc::CoreModel::Base, Object
Parse text to a document model
This is the main entry point for parsing documents. It automatically selects the appropriate parser based on the format.
105 106 107 108 109 110 111 112 113 114 115 116 |
# File 'lib/coradoc/coradoc.rb', line 105 def parse(text, format:) format_module = get_format(format) unless format_module raise UnsupportedFormatError, "Format '#{format}' is not registered. " \ "Available formats: #{registered_formats.join(', ')}" end text = Hooks.invoke(:before_parse, text, format: format) result = format_module.parse_to_core(text) Hooks.invoke(:after_parse, result, format: format) end |
.parse_file(path, format: nil) ⇒ Coradoc::CoreModel::Base
Parse a document from a file path
Handles both text formats (reads file content) and binary formats (passes file path directly to the format module).
218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
# File 'lib/coradoc/coradoc.rb', line 218 def parse_file(path, format: nil) raise FileNotFoundError, path unless File.exist?(path) source_format = format || detect_format(path) raise UnsupportedFormatError, "Could not detect format for: #{path}" unless source_format format_module = get_format(source_format) raise UnsupportedFormatError, "Format '#{source_format}' is not registered" unless format_module if binary_format?(source_format) format_module.parse_to_core(path) else content = File.read(path) parse(content, format: source_format) end end |
.parse_format?(format) ⇒ Boolean
Check if a format supports parsing (reading input)
297 298 299 300 |
# File 'lib/coradoc/coradoc.rb', line 297 def parse_format?(format) mod = get_format(format) mod&.respond_to?(:parse_to_core) || mod&.respond_to?(:parse) || false end |
.register_format(format_name, format_module, **options) ⇒ void
This method returns an undefined value.
Register a format gem
69 70 71 72 |
# File 'lib/coradoc/coradoc.rb', line 69 def register_format(format_name, format_module, **) registry.register(format_name, format_module, ) FormatModule.validate!(format_module, format_name) end |
.registered_formats ⇒ Array<Symbol>
List all registered formats
85 86 87 |
# File 'lib/coradoc/coradoc.rb', line 85 def registered_formats registry.list end |
.registry ⇒ Registry
Get the format registry
59 60 61 |
# File 'lib/coradoc/coradoc.rb', line 59 def registry @registry ||= Registry.new end |
.resolve_output_format(output_file, default: :html) ⇒ Symbol
Resolve the output format from a filename, with a default
322 323 324 325 326 |
# File 'lib/coradoc/coradoc.rb', line 322 def resolve_output_format(output_file, default: :html) return default unless output_file detect_format(output_file) || default end |
.serialize(model, to:, **options) ⇒ String
Serialize a CoreModel to a specific format
165 166 167 168 169 170 171 172 |
# File 'lib/coradoc/coradoc.rb', line 165 def serialize(model, to:, **) format_module = get_format(to) raise UnsupportedFormatError, "Format '#{to}' is not registered" unless format_module model = Hooks.invoke(:before_serialize, model, format: to) result = format_module.serialize(model, **) Hooks.invoke(:after_serialize, result, format: to) end |
.serialize_format?(format) ⇒ Boolean
Check if a format supports serialization (writing output)
284 285 286 287 288 289 290 291 |
# File 'lib/coradoc/coradoc.rb', line 284 def serialize_format?(format) mod = get_format(format) return false unless mod return mod.serialize? if mod.respond_to?(:serialize?) true end |
.strip_unicode(string, only: nil) ⇒ String
Strip unicode whitespace from a string
398 399 400 401 402 403 404 405 406 407 408 409 |
# File 'lib/coradoc/coradoc.rb', line 398 def strip_unicode(string, only: nil) return string if string.nil? case only when :begin string.sub(/^\p{Zs}+/, '') when :end string.sub(/\p{Zs}+$/, '') else string.sub(/^\p{Zs}+/, '').sub(/\p{Zs}+$/, '') end end |
.to_core(model) ⇒ Coradoc::CoreModel::Base
Transform a model to CoreModel
147 148 149 150 151 152 153 154 155 156 157 |
# File 'lib/coradoc/coradoc.rb', line 147 def to_core(model) return model if model.is_a?(CoreModel::Base) registry.each_value do |format_module| next unless format_module.respond_to?(:handles_model?) && format_module.handles_model?(model) return format_module.to_core(model) end raise TransformationError, "No transformer found for #{model.class}" end |
.validate_file(path, format: nil) ⇒ Coradoc::Validation::Result
Validate a document file
Parses the file and validates against auto-generated schema. Returns a Coradoc::Validation::Result.
348 349 350 351 352 353 354 355 |
# File 'lib/coradoc/coradoc.rb', line 348 def validate_file(path, format: nil) doc = parse_file(path, format: format) schema = Validation::SchemaGenerator.generate(doc.class) return schema.validate(doc) if schema Validation::Result.new end |