Class: Mathpix::MCP::Tools::ConvertDocumentTool

Inherits:
BaseTool
  • Object
show all
Defined in:
lib/mathpix/mcp/tools/convert_document_tool.rb

Overview

Convert Document Tool

Converts documents (PDF, DOCX, PPTX) to Markdown, LaTeX, or other formats Thin delegate to Mathpix::Document

Constant Summary collapse

DEFAULT_MAX_INLINE_CHARS =

Above this many characters, returning the converted content inline would risk overflowing the LLM context window, so the result is written to a file and only a path + preview is returned.

50_000
PREVIEW_CHARS =

Characters of the converted markdown to include as a preview when the full content is written to a file instead of returned inline.

2_000

Class Method Summary collapse

Class Method Details

.call(document_path:, server_context:, formats: nil, include_tables: false, output_path: nil, max_inline_chars: DEFAULT_MAX_INLINE_CHARS, max_wait: 600, poll_interval: 3.0) ⇒ Object



64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
# File 'lib/mathpix/mcp/tools/convert_document_tool.rb', line 64

def self.call(document_path:, server_context:, formats: nil, include_tables: false,
              output_path: nil, max_inline_chars: DEFAULT_MAX_INLINE_CHARS,
              max_wait: 600, poll_interval: 3.0)
  safe_execute do
    client = mathpix_client(server_context)

    # Normalize path
    document_path = normalize_path(document_path) unless url?(document_path)

    # Extract formats or use defaults
    output_formats = extract_formats(formats, client)

    # Use Document class (new unified interface)
    doc = Mathpix::Document.new(client, document_path)
    doc.with_formats(*output_formats)
    doc.with_tables if include_tables

    # Start conversion and wait for completion
    conversion = doc.convert
    conversion.wait_until_complete(max_wait: max_wait, poll_interval: poll_interval)
    result = conversion.result

    contents = {
      markdown: result.markdown,
      latex: result.latex,
      html: result.html
    }.compact

    response_data = {
      success: true,
      document_path: document_path,
      formats: output_formats,
      conversion_id: conversion.conversion_id,
      metadata: {
        document_type: conversion.document_type,
        pages: result.page_count,
        processing_time: result.processing_time
      }
    }

    total_chars = contents.values.sum(&:length)

    if output_path
      # Explicit save requested.
      response_data[:saved_files] = save_contents(contents, File.expand_path(output_path))
      response_data[:preview] = preview_of(result.markdown)
    elsif total_chars <= max_inline_chars
      # Small enough to return inline.
      response_data[:results] = contents
    else
      # Too large to inline safely — auto-save to a temp file so the
      # model's context isn't blown out.
      default_path = File.join(Dir.tmpdir, "mathpix_#{sanitize(conversion.conversion_id)}.md")
      response_data[:saved_files] = save_contents(contents, default_path)
      response_data[:preview] = preview_of(result.markdown)
      response_data[:note] =
        "Converted output is #{total_chars} characters, which exceeds max_inline_chars " \
        "(#{max_inline_chars}); it was written to a file to avoid exceeding the model " \
        'context. Read the file at saved_files for the full content, pass output_path to ' \
        'choose the destination, or raise max_inline_chars to force inline output.'
    end

    json_response(response_data)
  end
end

.preview_of(content) ⇒ Object

First PREVIEW_CHARS characters of the content, if any.



163
164
165
166
167
# File 'lib/mathpix/mcp/tools/convert_document_tool.rb', line 163

def self.preview_of(content)
  return nil unless content

  content.length > PREVIEW_CHARS ? "#{content[0, PREVIEW_CHARS]}" : content
end

.sanitize(value) ⇒ Object

Make a conversion id safe to use in a filename.



170
171
172
# File 'lib/mathpix/mcp/tools/convert_document_tool.rb', line 170

def self.sanitize(value)
  value.to_s.gsub(/[^a-zA-Z0-9_-]/, '_')
end

.save_contents(contents, markdown_path) ⇒ Hash{Symbol=>Hash}

Write each available format to disk, deriving sibling paths for the non-markdown formats from the markdown target's name.

Parameters:

  • contents (Hash{Symbol=>String})

    format => content

  • markdown_path (String)

    target path for the markdown output

Returns:

  • (Hash{Symbol=>Hash})

    format => { path:, bytes: }



136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# File 'lib/mathpix/mcp/tools/convert_document_tool.rb', line 136

def self.save_contents(contents, markdown_path)
  ext_for = { markdown: nil, latex: 'tex', html: 'html' }
  saved = {}

  contents.each do |format, content|
    path =
      if format == :markdown
        markdown_path
      else
        sibling_path(markdown_path, ext_for[format] || format.to_s)
      end

    File.write(path, content)
    saved[format] = { path: path, bytes: content.bytesize }
  end

  saved
end

.sibling_path(base, ext) ⇒ Object

Derive a sibling path with a different extension.



156
157
158
159
160
# File 'lib/mathpix/mcp/tools/convert_document_tool.rb', line 156

def self.sibling_path(base, ext)
  dir = File.dirname(base)
  stem = File.basename(base, File.extname(base))
  File.join(dir, "#{stem}.#{ext}")
end