Class: SwarmSDK::V3::Tools::DocumentConverters::PdfConverter
- Defined in:
- lib/swarm_sdk/v3/tools/document_converters/pdf_converter.rb
Overview
PDF document converter
Converts PDF files to text and extracts JPEG images. Requires the pdf-reader gem.
Class Method Summary collapse
-
.extensions ⇒ Array<String>
Supported extensions.
-
.format_name ⇒ String
Format name.
-
.gem_name ⇒ String
Gem name.
Instance Method Summary collapse
-
#convert(file_path) ⇒ String, RubyLLM::Content
Convert PDF to text with optional image attachments.
Methods inherited from Base
Class Method Details
.extensions ⇒ Array<String>
Returns supported extensions.
24 25 26 |
# File 'lib/swarm_sdk/v3/tools/document_converters/pdf_converter.rb', line 24 def extensions [".pdf"] end |
.format_name ⇒ String
Returns format name.
19 20 21 |
# File 'lib/swarm_sdk/v3/tools/document_converters/pdf_converter.rb', line 19 def format_name "PDF" end |
.gem_name ⇒ String
Returns gem name.
14 15 16 |
# File 'lib/swarm_sdk/v3/tools/document_converters/pdf_converter.rb', line 14 def gem_name "pdf-reader" end |
Instance Method Details
#convert(file_path) ⇒ String, RubyLLM::Content
Convert PDF to text with optional image attachments
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# File 'lib/swarm_sdk/v3/tools/document_converters/pdf_converter.rb', line 33 def convert(file_path) return unless self.class.available? require "pdf-reader" reader = PDF::Reader.new(file_path) # Extract text from all pages output = build_text_output(reader, file_path) # Extract JPEG images (inline - no separate class) image_paths = extract_jpeg_images(reader) # Return with images if any extracted if image_paths.any? content = RubyLLM::Content.new(output) image_paths.each { |path| content.(path) } content else output end rescue PDF::Reader::MalformedPDFError => e error("Malformed PDF: #{e.}") rescue StandardError => e error("PDF conversion failed: #{e.}") end |