Module: OllamaChat::SourceFetching

Included in:
Chat
Defined in:
lib/ollama_chat/source_fetching.rb

Overview

A module that provides functionality for fetching and processing various types of content sources.

The SourceFetching module encapsulates methods for retrieving content from different source types including URLs, file paths, and shell commands. It handles the logic for determining the appropriate fetching method based on the source identifier and processes the retrieved content through specialized parsers depending on the content type. The module also manages image handling, document importing, summarizing, and embedding operations while providing error handling and debugging capabilities.

Examples:

Fetching content from a URL

chat.fetch_source('https://example.com/document.html') do |source_io|
  # Process the fetched content
end

Importing a local file

chat.fetch_source('/path/to/local/file.txt') do |source_io|
  # Process the imported file content
end

Executing a shell command

chat.fetch_source('!ls -la') do |source_io|
  # Process the command output
end

Instance Method Summary collapse

Instance Method Details

#add_image(images, source_io, source) ⇒ Object

Adds an image to the images collection from the given source IO and source identifier.

This method takes an IO object containing image data and associates it with a source, creating an Ollama::Image instance and adding it to the images array.

Parameters:

  • images (Array)

    The collection of images to which the new image will be added

  • source_io (IO)

    The input stream containing the image data

  • source (String, #to_s)

    The identifier or path for the source of the image



87
88
89
90
91
# File 'lib/ollama_chat/source_fetching.rb', line 87

def add_image(images, source_io, source)
  STDERR.puts "Adding #{source_io&.content_type} image #{source.to_s.inspect}."
  image = Ollama::Image.for_io(source_io, path: source.to_s)
  (images << image).uniq!
end

#embed(source, tags: []) ⇒ String?

Embeds content from the specified source.

This method fetches content from a given source (command, URL, or file) and processes it for embedding using the embed_source method. If embedding is disabled, it falls back to generating a summary instead.

Parameters:

  • source (String)

    The source identifier which can be a command, URL, or file path

Returns:

  • (String, nil)

    The formatted embedding result or summary message, or nil if the operation fails



242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
# File 'lib/ollama_chat/source_fetching.rb', line 242

def embed(source, tags: [])
  if @embedding.on?
    STDOUT.puts "Now embedding #{source.to_s.inspect} in collection #{collection.to_s.inspect}."
    fetch_source(source) do |source_io|
      content = parse_source(source_io)
      content.present? or return
      source_io.rewind
      embed_source(source_io, source, tags:)
    end
    prompt(:embed).to_s % { source:, collection: collection }
  else
    STDOUT.puts "Embedding is off, so I will just give a small summary of this source."
    summarize(source)
  end
end

#embed_source(source_io, source, tags: [], count: nil) ⇒ Array, ...

Embeds content from the given source IO and source identifier.

This method processes document content by splitting it into chunks using various splitting strategies (Character, RecursiveCharacter, Semantic) and adds the chunks to a document store for embedding.

Parameters:

  • source_io (IO)

    The input stream containing the document content to embed

  • source (String, #to_s)

    The identifier or path for the source of the content

  • count (Integer, nil) (defaults to: nil)

    An optional counter for tracking processing order

Returns:

  • (Array, String, nil)

    The embedded chunks or processed content, or nil if embedding is disabled or fails



175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
# File 'lib/ollama_chat/source_fetching.rb', line 175

def embed_source(source_io, source, tags: [], count: nil)
  @embedding.on? or return parse_source(source_io)
  m = "Embedding #{italic { source_io&.content_type }} document "\
    "#{source.to_s.inspect} in collection #{collection.to_s.inspect}."
  if count
    STDOUT.puts '%u. %s' % [ count, m ]
  else
    STDOUT.puts m
  end
  unless @documents.source_modified?(source)
    STDOUT.puts "Source #{source.to_s.inspect} already up-to-date. => Skipping."
    return
  end
  text = parse_source(source_io) or return
  splitter_config = config.embedding.splitter
  inputs = nil
  case splitter_config.name
  when 'Character'
    splitter = Documentrix::Documents::Splitters::Character.new(
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(text)
  when 'RecursiveCharacter'
    splitter = Documentrix::Documents::Splitters::RecursiveCharacter.new(
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(text)
  when 'Semantic'
    splitter = Documentrix::Documents::Splitters::Semantic.new(
      ollama:, model: config.embedding.model.name,
      chunk_size: splitter_config.chunk_size,
    )
    inputs = splitter.split(
      text,
      breakpoint: splitter_config.breakpoint.to_sym,
      percentage: splitter_config.percentage?,
      percentile: splitter_config.percentile?,
    )
  end
  inputs or return
  source = source.to_s
  command = false
  if source.start_with?(?!)
    source = Kramdown::ANSI::Width.truncate(
      source[1..-1].gsub(/\W+/, ?_),
      length: 10
    )
    command = true
  end
  if !command
    @documents.source_update(inputs, source:, tags:, batch_size: config.embedding.batch_size?)
  else
    @documents.add(inputs, source:, tags:, batch_size: config.embedding.batch_size?)
  end
end

#fetch_source(source, check_exist: false) {|tmp| ... } ⇒ Object

The fetch_source method retrieves content from various source types including commands, URLs, and file paths. It processes the source based on its type and yields a temporary file handle for further processing.

Parameters:

  • source (String, #to_path)

    the source identifier which can be a command, URL, or file path

Yields:

  • (tmp)


35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
# File 'lib/ollama_chat/source_fetching.rb', line 35

def fetch_source(source, check_exist: false, &block)
  source = source.ask_and_send_or_self(:to_path).to_s
  case source
  when %r{\A!(.*)}
    command = $1
    OllamaChat::Utils::Fetcher.execute(command) do |tmp|
      block.(tmp)
    end
  when %r{\Ahttps?://\S+}
    get_url(source, cache:) do |tmp|
      block.(tmp)
    end
  when %r{\Afile://([^\s#]+)}
    filename = $1
    filename = URI.decode_www_form_component(filename)
    filename = File.expand_path(filename)
    check_exist && !File.exist?(filename) and return
    fetch_source_as_filename(filename, &block)
  when  %r{\A((?:\.\.|[~.]?)/(?:\\ |\\|[^\\]+)+)}
    filename = $1
    filename = filename.gsub('\ ', ' ')
    filename = File.expand_path(filename)
    check_exist && !File.exist?(filename) and return
    fetch_source_as_filename(filename, &block)
  when %r{\A"((?:\.\.|[~.]?)/(?:\\"|\\|[^"\\]+)+)"}
    filename = $1
    filename = filename.gsub('\"', ?")
    filename = File.expand_path(filename)
    check_exist && !File.exist?(filename) and return
    fetch_source_as_filename(filename, &block)
  else
    raise "invalid source #{source.inspect}"
  end
rescue => e
  msg = "Fetching source #{source.to_s.inspect}: #{e.class} #{e}"
  STDERR.puts "#{msg}\n#{e.backtrace * ?\n}"
  confirm?(prompt: '⏎  Press any key to continue (%s). ', output: STDERR, timeout: 3)
  msg = OllamaChat::Utils::Fetcher::ResponseMetadata.failed(msg)
  block.(msg)
  msg
end

#import(source) ⇒ String?

Imports content from the specified source and processes it.

This method fetches content from a given source (command, URL, or file) and passes the resulting IO object to the import_source method for processing.

Parameters:

  • source (String)

    The source identifier which can be a command, URL, or file path

Returns:

  • (String, nil)

    A formatted message indicating the import result and parsed content, # or nil if the operation fails



120
121
122
123
124
125
126
# File 'lib/ollama_chat/source_fetching.rb', line 120

def import(source)
  fetch_source(source) do |source_io|
    content = import_source(source_io, source) or return
    source_io.rewind
    content
  end
end

#import_source(source_io, source) ⇒ String

The import_source method processes and imports content from a given source, displaying information about the document type and returning a formatted string that indicates the import result along with the parsed content.

Parameters:

  • source_io (IO)

    the input stream containing the document content

  • source (String)

    the source identifier or path

Returns:

  • (String)

    a formatted message indicating the import result and the parsed content



102
103
104
105
106
107
108
# File 'lib/ollama_chat/source_fetching.rb', line 102

def import_source(source_io, source)
  source        = source.to_s
  document_type = source_io&.content_type.full? { |ct| italic { ct } + ' ' }
  STDOUT.puts "Importing #{document_type}document #{source.to_s.inspect} now."
  source_content = parse_source(source_io)
  "Imported #{source.inspect}:\n\n#{source_content}\n\n"
end

#summarize(source, words: nil) ⇒ String?

Summarizes content from the specified source.

This method fetches content from a given source (command, URL, or file) and generates a summary using the summarize_source method.

Parameters:

  • source (String)

    The source identifier which can be a command, URL, or file path

  • words (Integer, nil) (defaults to: nil)

    The target number of words for the summary (defaults to 100)

Returns:

  • (String, nil)

    The formatted summary message or nil if the operation fails



155
156
157
158
159
160
161
# File 'lib/ollama_chat/source_fetching.rb', line 155

def summarize(source, words: nil)
  fetch_source(source) do |source_io|
    content = summarize_source(source_io, source, words:) or return
    source_io.rewind
    content
  end
end

#summarize_source(source_io, source, words: nil) ⇒ String?

Summarizes content from the given source IO and source identifier.

This method takes an IO object containing document content and generates a summary based on the configured prompt template and word count.

Parameters:

  • source_io (IO)

    The input stream containing the document content to summarize

  • source (String, #to_s)

    The identifier or path for the source of the content

  • words (Integer, nil) (defaults to: nil)

    The target number of words for the summary (defaults to 100)

Returns:

  • (String, nil)

    The formatted summary message or nil if content is empty or cannot be processed



137
138
139
140
141
142
143
144
# File 'lib/ollama_chat/source_fetching.rb', line 137

def summarize_source(source_io, source, words: nil)
  STDOUT.puts "Summarizing #{italic { source_io&.content_type }} document #{source.to_s.inspect} now."
  words = words.to_i
  words < 1 and words = 100
  source_content = parse_source(source_io)
  source_content.present? or return
  prompt(:summarize).to_s % { source_content:, words: }
end