Module: RSyntaxTree::FormatConverter
- Defined in:
- lib/rsyntaxtree/format_converter.rb
Class Method Summary collapse
-
.detect_format(text) ⇒ Symbol
Detect the format of the input string.
-
.penn_to_bracket(text) ⇒ String
Convert Penn TreeBank format to bracket notation Penn: (S (NP hello) (VP world)) Bracket: [S [NP hello] [VP world]] Use ( and ) to include literal parentheses in text.
-
.to_bracket(text) ⇒ String
Convert any supported format to bracket notation.
Class Method Details
.detect_format(text) ⇒ Symbol
Detect the format of the input string
17 18 19 20 21 22 23 24 |
# File 'lib/rsyntaxtree/format_converter.rb', line 17 def detect_format(text) stripped = text.strip if stripped.start_with?('(') && !stripped.start_with?('([') :penn else :bracket end end |
.penn_to_bracket(text) ⇒ String
Convert Penn TreeBank format to bracket notation Penn: (S (NP hello) (VP world)) Bracket: [S [NP hello] [VP world]] Use ( and ) to include literal parentheses in text
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
# File 'lib/rsyntaxtree/format_converter.rb', line 44 def penn_to_bracket(text) # Normalize whitespace (collapse multiple spaces/newlines to single space) normalized = text.gsub(/\s+/, ' ').strip # Protect escaped parentheses with placeholders result = normalized.gsub('\(', "\x00LPAREN\x00").gsub('\)', "\x00RPAREN\x00") # Replace structural parentheses with brackets result = result.gsub('(', '[').gsub(')', ']') # Restore escaped parentheses as literal parentheses result = result.gsub("\x00LPAREN\x00", '(').gsub("\x00RPAREN\x00", ')') # Clean up extra spaces after opening brackets result = result.gsub(/\[\s+/, '[') # Clean up extra spaces before closing brackets result = result.gsub(/\s+\]/, ']') result end |
.to_bracket(text) ⇒ String
Convert any supported format to bracket notation
29 30 31 32 33 34 35 36 |
# File 'lib/rsyntaxtree/format_converter.rb', line 29 def to_bracket(text) case detect_format(text) when :penn penn_to_bracket(text) else text end end |