Module: Clacky::Utils::ParserManager
- Defined in:
- lib/clacky/utils/parser_manager.rb
Overview
Manages user-space parsers in ~/.clacky/parsers/.
On first use, default parser scripts are copied from the gem’s default_parsers/ directory into ~/.clacky/parsers/. After that, the user-space version is always used — allowing the LLM to modify or extend parsers without touching the gem itself.
CLI interface contract (all parsers must follow):
ruby <parser>.rb <file_path>
stdout → extracted text (UTF-8)
stderr → error messages
exit 0 → success
exit 1 → failure
Constant Summary collapse
- PARSERS_DIR =
File.("~/.clacky/parsers").freeze
- DEFAULT_PARSERS_DIR =
File.("../default_parsers", __dir__).freeze
- PARSER_FOR =
{ ".pdf" => "pdf_parser.rb", ".doc" => "doc_parser.rb", ".docx" => "docx_parser.rb", ".xlsx" => "xlsx_parser.rb", ".xls" => "xlsx_parser.rb", ".pptx" => "pptx_parser.rb", ".ppt" => "pptx_parser.rb", }.freeze
Class Method Summary collapse
-
.parse(file_path) ⇒ Hash
Run the appropriate parser for the given file path.
-
.parser_path_for(ext) ⇒ Object
Returns the path to a parser script for a given extension.
-
.setup! ⇒ Object
Ensure ~/.clacky/parsers/ exists and all default parsers are present.
Class Method Details
.parse(file_path) ⇒ Hash
Run the appropriate parser for the given file path.
55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 |
# File 'lib/clacky/utils/parser_manager.rb', line 55 def self.parse(file_path) ext = File.extname(file_path.to_s).downcase script = PARSER_FOR[ext] unless script return { success: false, text: nil, error: "No parser available for #{ext} files", parser_path: nil } end parser_path = File.join(PARSERS_DIR, script) unless File.exist?(parser_path) return { success: false, text: nil, error: "Parser not found: #{parser_path}", parser_path: parser_path } end raw_stdout, raw_stderr, status = Open3.capture3(RbConfig.ruby, parser_path, file_path) # capture3 returns ASCII-8BIT across the subprocess boundary on Ruby 2.6+. # Normalise both streams to UTF-8 immediately so all downstream code is clean. stdout = Clacky::Utils::Encoding.to_utf8(raw_stdout) stderr = Clacky::Utils::Encoding.to_utf8(raw_stderr) # Filter out Ruby/Bundler version warnings that pollute stderr clean_stderr = stderr.lines.reject { |l| l.match?(/warning:|already initialized constant/) }.join.strip if status.success? && stdout.strip.length > 0 { success: true, text: stdout.strip, error: nil, parser_path: parser_path } else { success: false, text: nil, error: clean_stderr.empty? ? "Parser exited with code #{status.exitstatus}" : clean_stderr, parser_path: parser_path } end end |
.parser_path_for(ext) ⇒ Object
Returns the path to a parser script for a given extension. Used by agent to tell LLM where to find/modify the parser.
94 95 96 97 98 |
# File 'lib/clacky/utils/parser_manager.rb', line 94 def self.parser_path_for(ext) script = PARSER_FOR[ext.downcase] return nil unless script File.join(PARSERS_DIR, script) end |
.setup! ⇒ Object
Ensure ~/.clacky/parsers/ exists and all default parsers are present. Called once at startup.
37 38 39 40 41 42 43 44 45 46 47 48 49 |
# File 'lib/clacky/utils/parser_manager.rb', line 37 def self.setup! FileUtils.mkdir_p(PARSERS_DIR) PARSER_FOR.values.uniq.each do |script| dest = File.join(PARSERS_DIR, script) next if File.exist?(dest) src = File.join(DEFAULT_PARSERS_DIR, script) if File.exist?(src) FileUtils.cp(src, dest) end end end |