Class: Mathpix::MCP::Tools::ConvertDocumentTool
- Defined in:
- lib/mathpix/mcp/tools/convert_document_tool.rb
Overview
Convert Document Tool
Converts documents (PDF, DOCX, PPTX) to Markdown, LaTeX, or other formats Thin delegate to Mathpix::Document
Constant Summary collapse
- DEFAULT_MAX_INLINE_CHARS =
Above this many characters, returning the converted content inline would risk overflowing the LLM context window, so the result is written to a file and only a path + preview is returned.
50_000- PREVIEW_CHARS =
Characters of the converted markdown to include as a preview when the full content is written to a file instead of returned inline.
2_000
Class Method Summary collapse
- .call(document_path:, server_context:, formats: nil, include_tables: false, output_path: nil, max_inline_chars: DEFAULT_MAX_INLINE_CHARS, max_wait: 600, poll_interval: 3.0) ⇒ Object
-
.preview_of(content) ⇒ Object
First PREVIEW_CHARS characters of the content, if any.
-
.sanitize(value) ⇒ Object
Make a conversion id safe to use in a filename.
-
.save_contents(contents, markdown_path) ⇒ Hash{Symbol=>Hash}
Write each available format to disk, deriving sibling paths for the non-markdown formats from the markdown target's name.
-
.sibling_path(base, ext) ⇒ Object
Derive a sibling path with a different extension.
Class Method Details
.call(document_path:, server_context:, formats: nil, include_tables: false, output_path: nil, max_inline_chars: DEFAULT_MAX_INLINE_CHARS, max_wait: 600, poll_interval: 3.0) ⇒ Object
64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 |
# File 'lib/mathpix/mcp/tools/convert_document_tool.rb', line 64 def self.call(document_path:, server_context:, formats: nil, include_tables: false, output_path: nil, max_inline_chars: DEFAULT_MAX_INLINE_CHARS, max_wait: 600, poll_interval: 3.0) safe_execute do client = mathpix_client(server_context) # Normalize path document_path = normalize_path(document_path) unless url?(document_path) # Extract formats or use defaults output_formats = extract_formats(formats, client) # Use Document class (new unified interface) doc = Mathpix::Document.new(client, document_path) doc.with_formats(*output_formats) doc.with_tables if include_tables # Start conversion and wait for completion conversion = doc.convert conversion.wait_until_complete(max_wait: max_wait, poll_interval: poll_interval) result = conversion.result contents = { markdown: result.markdown, latex: result.latex, html: result.html }.compact response_data = { success: true, document_path: document_path, formats: output_formats, conversion_id: conversion.conversion_id, metadata: { document_type: conversion.document_type, pages: result.page_count, processing_time: result.processing_time } } total_chars = contents.values.sum(&:length) if output_path # Explicit save requested. response_data[:saved_files] = save_contents(contents, File.(output_path)) response_data[:preview] = preview_of(result.markdown) elsif total_chars <= max_inline_chars # Small enough to return inline. response_data[:results] = contents else # Too large to inline safely — auto-save to a temp file so the # model's context isn't blown out. default_path = File.join(Dir.tmpdir, "mathpix_#{sanitize(conversion.conversion_id)}.md") response_data[:saved_files] = save_contents(contents, default_path) response_data[:preview] = preview_of(result.markdown) response_data[:note] = "Converted output is #{total_chars} characters, which exceeds max_inline_chars " \ "(#{max_inline_chars}); it was written to a file to avoid exceeding the model " \ 'context. Read the file at saved_files for the full content, pass output_path to ' \ 'choose the destination, or raise max_inline_chars to force inline output.' end json_response(response_data) end end |
.preview_of(content) ⇒ Object
First PREVIEW_CHARS characters of the content, if any.
163 164 165 166 167 |
# File 'lib/mathpix/mcp/tools/convert_document_tool.rb', line 163 def self.preview_of(content) return nil unless content content.length > PREVIEW_CHARS ? "#{content[0, PREVIEW_CHARS]}…" : content end |
.sanitize(value) ⇒ Object
Make a conversion id safe to use in a filename.
170 171 172 |
# File 'lib/mathpix/mcp/tools/convert_document_tool.rb', line 170 def self.sanitize(value) value.to_s.gsub(/[^a-zA-Z0-9_-]/, '_') end |
.save_contents(contents, markdown_path) ⇒ Hash{Symbol=>Hash}
Write each available format to disk, deriving sibling paths for the non-markdown formats from the markdown target's name.
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 |
# File 'lib/mathpix/mcp/tools/convert_document_tool.rb', line 136 def self.save_contents(contents, markdown_path) ext_for = { markdown: nil, latex: 'tex', html: 'html' } saved = {} contents.each do |format, content| path = if format == :markdown markdown_path else sibling_path(markdown_path, ext_for[format] || format.to_s) end File.write(path, content) saved[format] = { path: path, bytes: content.bytesize } end saved end |
.sibling_path(base, ext) ⇒ Object
Derive a sibling path with a different extension.
156 157 158 159 160 |
# File 'lib/mathpix/mcp/tools/convert_document_tool.rb', line 156 def self.sibling_path(base, ext) dir = File.dirname(base) stem = File.basename(base, File.extname(base)) File.join(dir, "#{stem}.#{ext}") end |