Class: SwarmSDK::V3::Tools::DocumentConverters::PdfConverter

Inherits:
Base
  • Object
show all
Defined in:
lib/swarm_sdk/v3/tools/document_converters/pdf_converter.rb

Overview

PDF document converter

Converts PDF files to text and extracts JPEG images. Requires the pdf-reader gem.

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Base

available?

Class Method Details

.extensionsArray<String>

Returns supported extensions.

Returns:

  • (Array<String>)

    supported extensions



24
25
26
# File 'lib/swarm_sdk/v3/tools/document_converters/pdf_converter.rb', line 24

def extensions
  [".pdf"]
end

.format_nameString

Returns format name.

Returns:

  • (String)

    format name



19
20
21
# File 'lib/swarm_sdk/v3/tools/document_converters/pdf_converter.rb', line 19

def format_name
  "PDF"
end

.gem_nameString

Returns gem name.

Returns:

  • (String)

    gem name



14
15
16
# File 'lib/swarm_sdk/v3/tools/document_converters/pdf_converter.rb', line 14

def gem_name
  "pdf-reader"
end

Instance Method Details

#convert(file_path) ⇒ String, RubyLLM::Content

Convert PDF to text with optional image attachments

Parameters:

  • file_path (String)

    path to PDF file

Returns:

  • (String, RubyLLM::Content)

    text or content with images



33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/swarm_sdk/v3/tools/document_converters/pdf_converter.rb', line 33

def convert(file_path)
  return unsupported_format_message unless self.class.available?

  require "pdf-reader"
  reader = PDF::Reader.new(file_path)

  # Extract text from all pages
  output = build_text_output(reader, file_path)

  # Extract JPEG images (inline - no separate class)
  image_paths = extract_jpeg_images(reader)

  # Return with images if any extracted
  if image_paths.any?
    content = RubyLLM::Content.new(output)
    image_paths.each { |path| content.add_attachment(path) }
    content
  else
    output
  end
rescue PDF::Reader::MalformedPDFError => e
  error("Malformed PDF: #{e.message}")
rescue StandardError => e
  error("PDF conversion failed: #{e.message}")
end