Module: RSyntaxTree::FormatConverter

Defined in:
lib/rsyntaxtree/format_converter.rb

Class Method Summary collapse

Class Method Details

.detect_format(text) ⇒ Symbol

Detect the format of the input string

Parameters:

  • text (String)

    the input tree notation

Returns:

  • (Symbol)

    :penn or :bracket



17
18
19
20
21
22
23
24
# File 'lib/rsyntaxtree/format_converter.rb', line 17

def detect_format(text)
  stripped = text.strip
  if stripped.start_with?('(') && !stripped.start_with?('([')
    :penn
  else
    :bracket
  end
end

.penn_to_bracket(text) ⇒ String

Convert Penn TreeBank format to bracket notation Penn: (S (NP hello) (VP world)) Bracket: [S [NP hello] [VP world]] Use ( and ) to include literal parentheses in text

Parameters:

  • text (String)

    Penn TreeBank notation

Returns:

  • (String)

    bracket notation



44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
# File 'lib/rsyntaxtree/format_converter.rb', line 44

def penn_to_bracket(text)
  # Normalize whitespace (collapse multiple spaces/newlines to single space)
  normalized = text.gsub(/\s+/, ' ').strip

  # Protect escaped parentheses with placeholders
  result = normalized.gsub('\(', "\x00LPAREN\x00").gsub('\)', "\x00RPAREN\x00")

  # Replace structural parentheses with brackets
  result = result.gsub('(', '[').gsub(')', ']')

  # Restore escaped parentheses as literal parentheses
  result = result.gsub("\x00LPAREN\x00", '(').gsub("\x00RPAREN\x00", ')')

  # Clean up extra spaces after opening brackets
  result = result.gsub(/\[\s+/, '[')
  # Clean up extra spaces before closing brackets
  result = result.gsub(/\s+\]/, ']')

  result
end

.to_bracket(text) ⇒ String

Convert any supported format to bracket notation

Parameters:

  • text (String)

    the input tree notation

Returns:

  • (String)

    bracket notation



29
30
31
32
33
34
35
36
# File 'lib/rsyntaxtree/format_converter.rb', line 29

def to_bracket(text)
  case detect_format(text)
  when :penn
    penn_to_bracket(text)
  else
    text
  end
end