Module: Sourcerer::Sync::BlockParser
- Defined in:
- lib/sourcerer/sync/block_parser.rb
Overview
Parses tagged regions from any text file, regardless of comment style
Recognizes AsciiDoc ‘tag::`/`end::` markers in HTML comments, AsciiDoc line comments,
and shell/Ruby/YAML comments.
The trailing ‘[]` is optional. See the project README for the full tag-syntax reference.
Defined Under Namespace
Classes: Block, ParseError, TextSegment
Constant Summary collapse
- DEFAULT_CANONICAL_PREFIX =
Default prefix that marks a block as canonical (managed by Sync/Cast).
'universal-'- DEFAULT_TAG_SYNTAX_START =
Default opening tag marker template. ‘<tagged_block_name>` is the placeholder for the block name character class. A trailing `[]` is treated as optional in the compiled pattern.
'tag::<tagged_block_name>[]'- DEFAULT_TAG_SYNTAX_END =
Default closing tag marker template.
'end::<tagged_block_name>[]'- DEFAULT_COMMENT_SYNTAX_PATTERNS =
Default comment-wrapper templates. ‘<tag_syntax>` is the placeholder for the compiled tag marker pattern. A space between the comment delimiter and `<tag_syntax>` compiles as `s*`.
[ '<!-- <tag_syntax> -->', '// <tag_syntax>', '# <tag_syntax>' ].freeze
- DEFAULT_TAG_PATTERNS =
Default compiled pattern set, built from the three DEFAULT_* template constants. Retained for backward compatibility; prefer the template constants for customisation.
build_tag_patterns( DEFAULT_TAG_SYNTAX_START, DEFAULT_TAG_SYNTAX_END, DEFAULT_COMMENT_SYNTAX_PATTERNS).freeze
- TAG_PATTERNS =
Backward-compatible alias for DEFAULT_TAG_PATTERNS.
DEFAULT_TAG_PATTERNS
Class Method Summary collapse
-
.build_tag_patterns(tag_start, tag_end, comment_patterns) ⇒ Array<Hash>
Compile template strings into a patterns array compatible with BlockParser.parse.
-
.comment_template_to_full_regex(comment_template, inner_regex) ⇒ String
Wrap a compiled inner-tag regex fragment with a comment-wrapper template.
-
.extract_canonical(segments, canonical_prefix: DEFAULT_CANONICAL_PREFIX) ⇒ Hash{String => Block}
Extract all canonical blocks (those whose tag name starts with ‘canonical_prefix`) as a Hash keyed by tag name.
-
.parse(text, canonical_prefix: DEFAULT_CANONICAL_PREFIX, tag_syntax_start: DEFAULT_TAG_SYNTAX_START, tag_syntax_end: DEFAULT_TAG_SYNTAX_END, comment_syntax_patterns: DEFAULT_COMMENT_SYNTAX_PATTERNS, tag_patterns: nil) ⇒ Array<TextSegment, Block>
Parse a text string into an array of TextSegment and Block objects.
-
.tag_template_to_inner_regex(template) ⇒ String
Compile a tag marker template string into a plain regex fragment (no ‘A` anchor).
Class Method Details
.build_tag_patterns(tag_start, tag_end, comment_patterns) ⇒ Array<Hash>
Compile template strings into a patterns array compatible with parse.
Each entry in the returned array is a ‘Regexp, close: Regexp` hash. This is the same shape as DEFAULT_TAG_PATTERNS and may be passed directly to parse via the `tag_patterns:` keyword to avoid recompilation per call.
95 96 97 98 99 100 101 102 103 104 |
# File 'lib/sourcerer/sync/block_parser.rb', line 95 def self.build_tag_patterns tag_start, tag_end, comment_patterns open_inner = tag_template_to_inner_regex(tag_start) close_inner = tag_template_to_inner_regex(tag_end) comment_patterns.map do |cp| { open: Regexp.new(comment_template_to_full_regex(cp, open_inner)), close: Regexp.new(comment_template_to_full_regex(cp, close_inner)) } end end |
.comment_template_to_full_regex(comment_template, inner_regex) ⇒ String
Wrap a compiled inner-tag regex fragment with a comment-wrapper template.
‘<tag_syntax>` in `comment_template` is replaced by `inner_regex`. Adjacent literal spaces around `<tag_syntax>` are compiled as `s*`. The result is anchored to `A`.
73 74 75 76 77 78 79 80 81 82 |
# File 'lib/sourcerer/sync/block_parser.rb', line 73 def self.comment_template_to_full_regex comment_template, inner_regex halves = comment_template.split('<tag_syntax>', 2) left_raw = halves[0] right_raw = halves[1].to_s left_trim = left_raw.rstrip right_trim = right_raw.lstrip left_re = Regexp.escape(left_trim) + (left_trim == left_raw ? '' : '\s*') right_re = (right_trim == right_raw ? '' : '\s*') + Regexp.escape(right_trim) "\\A#{left_re}#{inner_regex}#{right_re}" end |
.extract_canonical(segments, canonical_prefix: DEFAULT_CANONICAL_PREFIX) ⇒ Hash{String => Block}
Extract all canonical blocks (those whose tag name starts with
`canonical_prefix`) as a Hash keyed by tag name.
Because parse already filters for canonical blocks when given the
same `canonical_prefix`, this method is largely a deduplication check.
It raises ParseError if more than one canonical block carries the same
tag name, which would make synchronization ambiguous.
230 231 232 233 234 235 236 237 238 239 240 |
# File 'lib/sourcerer/sync/block_parser.rb', line 230 def self.extract_canonical segments, canonical_prefix: DEFAULT_CANONICAL_PREFIX result = {} segments.each do |s| next unless s.is_a?(Block) && s.tag.start_with?(canonical_prefix) raise ParseError, "Duplicate canonical block '#{s.tag}'" if result.key?(s.tag) result[s.tag] = s end result end |
.parse(text, canonical_prefix: DEFAULT_CANONICAL_PREFIX, tag_syntax_start: DEFAULT_TAG_SYNTAX_START, tag_syntax_end: DEFAULT_TAG_SYNTAX_END, comment_syntax_patterns: DEFAULT_COMMENT_SYNTAX_PATTERNS, tag_patterns: nil) ⇒ Array<TextSegment, Block>
Parse a text string into an array of TextSegment and Block objects.
The result is ordered and reconstructable: joining every element’s
serialized form reproduces the original text character-perfectly.
Only blocks whose tag name starts with ‘canonical_prefix` are parsed as
proper {Block} objects; all other tag markers (open and close) are
treated as ordinary text.
This makes the parser robust against files that use tag markers for unrelated
purposes (e.g. AsciiDoc `include::` target regions or non-canonical project sections)
regardless of whether those regions are properly closed or even nested.
When a canonical block is open, every line is treated as content until
the matching close marker appears (including any inner tag markers).
Canonical blocks therefore cannot be nested.
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
# File 'lib/sourcerer/sync/block_parser.rb', line 144 def self.parse text, canonical_prefix: DEFAULT_CANONICAL_PREFIX, tag_syntax_start: DEFAULT_TAG_SYNTAX_START, tag_syntax_end: DEFAULT_TAG_SYNTAX_END, comment_syntax_patterns: DEFAULT_COMMENT_SYNTAX_PATTERNS, tag_patterns: nil patterns = tag_patterns || build_tag_patterns(tag_syntax_start, tag_syntax_end, comment_syntax_patterns) lines = text.lines segments = [] text_acc = [] block_state = nil # nil or { tag:, open_line:, content_lines: [] } lines.each do |line| stripped = line.chomp if block_state.nil? tag = detect_open_tag(stripped, patterns) if tag&.start_with?(canonical_prefix) segments << TextSegment.new(content: text_acc.join) unless text_acc.empty? text_acc = [] block_state = { tag: tag, open_line: line, content_lines: [] } else # Non-canonical open tags and all close tags at the top level are # treated as ordinary text. text_acc << line end else close_tag = detect_close_tag(stripped, patterns) if close_tag == block_state[:tag] segments << Block.new( tag: block_state[:tag], open_line: block_state[:open_line], content: block_state[:content_lines].join, close_line: line) block_state = nil else # Nested open tags or mismatched close tags: treat as block content block_state[:content_lines] << line end end end raise ParseError, "Unclosed canonical tag '#{block_state[:tag]}'" if block_state segments << TextSegment.new(content: text_acc.join) unless text_acc.empty? segments end |
.tag_template_to_inner_regex(template) ⇒ String
Compile a tag marker template string into a plain regex fragment (no ‘A` anchor).
‘<tagged_block_name>` is replaced with the `(?<tag>+)` named capture group. A trailing `[]` in the template becomes `(?:[])?` (optional literal brackets).
56 57 58 59 60 61 62 |
# File 'lib/sourcerer/sync/block_parser.rb', line 56 def self.tag_template_to_inner_regex template parts = template.split('<tagged_block_name>', 2) left = Regexp.escape(parts[0]) right = parts[1].to_s suffix = right == '[]' ? '(?:\[\])?' : Regexp.escape(right) "#{left}(?<tag>[\\w-]+)#{suffix}" end |