Module: MppReader::RtfText
- Defined in:
- lib/mpp_reader/rtf_text.rb
Overview
Minimal RTF-to-plain-text conversion for notes fields. MS Project stores notes as simple RTF (font/color tables, par breaks, hex and unicode escapes); anything not starting with {rtf is returned as-is.
Constant Summary collapse
- SKIP_GROUPS =
Destination groups whose text is not document content.
%w[fonttbl colortbl stylesheet info pict object header footer generator].freeze
Class Method Summary collapse
Class Method Details
.strip(text) ⇒ Object
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 |
# File 'lib/mpp_reader/rtf_text.rb', line 12 def strip(text) return text if text.nil? || text.empty? || !text.start_with?("{\\rtf") out = +"" pos = 0 bytes = text.b # Each entry mirrors the group nesting: true while inside a skipped # destination group. skip_stack = [false] pending_unicode_skip = 0 while pos < bytes.bytesize ch = bytes[pos] case ch when "{" skip_stack.push(skip_stack.last) pos += 1 when "}" skip_stack.pop if skip_stack.size > 1 pos += 1 when "\\" pos = control(bytes, pos, out, skip_stack) { pending_unicode_skip = _1 } else if pending_unicode_skip.positive? pending_unicode_skip -= 1 elsif !skip_stack.last && ch != "\r" && ch != "\n" out << ch end pos += 1 end end out.sub!(/\n\z/, "") # formal RTF always ends with one extra \par out.force_encoding(Encoding::UTF_8).valid_encoding? ? out.force_encoding(Encoding::UTF_8) : out end |