Class: Coradoc::Docx::Transform::ToCoreModel
- Inherits:
-
Object
- Object
- Coradoc::Docx::Transform::ToCoreModel
- Defined in:
- lib/coradoc/docx/transform/to_core_model.rb
Overview
Orchestrator for OOXML → CoreModel transformation.
Walks a Uniword::Wordprocessingml::DocumentRoot tree and dispatches to registered transform rules. Handles:
-
Style-based heading detection (via StyleResolver)
-
List grouping (consecutive numPr paragraphs → single ListBlock)
-
Footnote content collection
-
Image reference tracking
-
Bookmark ID propagation
Dispatch strategy:
-
HeadingRule and ListItemRule are dispatched directly by the orchestrator (they need context for style resolution).
-
All other element types are dispatched via RuleRegistry.
Class Method Summary collapse
Instance Method Summary collapse
Class Method Details
.transform(document) ⇒ Object
28 29 30 |
# File 'lib/coradoc/docx/transform/to_core_model.rb', line 28 def transform(document) new.transform(document) end |
Instance Method Details
#transform(document) ⇒ Object
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/coradoc/docx/transform/to_core_model.rb', line 33 def transform(document) registry = build_registry context = Context.new( styles_configuration: document.styles_configuration, numbering_configuration: document.numbering_configuration, footnotes: collect_footnotes(document), registry: registry ) @heading_rule = Rules::HeadingRule.new @list_item_rule = Rules::ListItemRule.new body = document.body doc_title = extract_document_title(document, context) children = transform_elements(body, context) # If the first child is an H1 matching the doc title, skip the # duplicate — the document title already captures it if doc_title && children.first.is_a?(Coradoc::CoreModel::StructuralElement) && children.first.section? && children.first.title == doc_title && children.first.level == 1 children.shift end doc = Coradoc::CoreModel::StructuralElement.new( element_type: 'document', title: doc_title, children: children ) # Extract semantic content from headers/footers (document, doc) doc end |