Class: Rubino::Tools::SummarizeFileTool
- Defined in:
- lib/rubino/tools/summarize_file_tool.rb
Overview
Summarizes a large text file WITHOUT pulling its bytes into the main agent context. The file is chunked and map-reduced through the ‘summarize` auxiliary LLM; only the final summary string returns to the caller. This is the in-house realization of the “summarization subagent” pattern: the raw 30k-line document lives only in the aux calls, so it never bloats the primary prompt (which is what pushes time-to-first-token past the provider’s stream idle-timeout and gets a run cut mid-stream).
Algorithm (LangChain/OpenAI-cookbook map-reduce):
1. MAP — split the file into ~CHUNK_BYTES chunks, summarize each.
2. REDUCE— combine the chunk summaries; if the combined text still
overflows a chunk, group + re-summarize recursively (capped).
Constant Summary collapse
- CHUNK_BYTES =
~6k tokens/chunk at 4 bytes/token — leaves room for the prompt and the chunk’s own summary inside a modest context window.
24_000- MAX_FILE_BYTES =
Refuse absurdly large inputs rather than fan out hundreds of LLM calls.
8_000_000- REDUCE_DEPTH_CAP =
Bound the reduce recursion so a pathological fan-in can’t loop forever.
4- GROUP_SIZE =
5- AUX_TASK =
"summarize"
Instance Attribute Summary collapse
-
#aux_client ⇒ Object
writeonly
Test seam: inject a stub LLM client.
Attributes inherited from Base
#cancel_token, #read_tracker, #stream_chunk
Instance Method Summary collapse
- #call(arguments) ⇒ Object
- #description ⇒ Object
- #input_schema ⇒ Object
- #name ⇒ Object
- #risk_level ⇒ Object
Methods inherited from Base
#cancellation_requested?, #config_key, #emit_chunk, #risky?, #to_tool_definition, workspace_root, workspace_roots
Instance Attribute Details
#aux_client=(value) ⇒ Object
Test seam: inject a stub LLM client. Production lazily builds the real AuxiliaryClient, which routes to the ‘auxiliary.summarize` config.
30 31 32 |
# File 'lib/rubino/tools/summarize_file_tool.rb', line 30 def aux_client=(value) @aux_client = value end |
Instance Method Details
#call(arguments) ⇒ Object
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
# File 'lib/rubino/tools/summarize_file_tool.rb', line 65 def call(arguments) file_path = arguments["file_path"] || arguments[:file_path] focus = (arguments["focus"] || arguments[:focus]).to_s.strip focus = "the key facts, structure, decisions, and any errors" if focus.empty? max_words = (arguments["max_words"] || arguments[:max_words] || 500).to_i.clamp(50, 4000) return "Error: file_path is required" if file_path.nil? || file_path.to_s.empty? = File.(file_path) return "Error: File not found: #{file_path}" unless File.exist?() return "Error: Not a regular file: #{file_path}" unless File.file?() size = File.size() return "#{file_path} is empty — nothing to summarize." if size.zero? if binary?() return "Error: #{file_path} looks binary. Read it with the `read_attachment` tool " \ "(it converts documents to text in-process and summarizes oversized output), " \ "rather than summarizing raw bytes." end if size > MAX_FILE_BYTES return "Error: #{file_path} is #{size / 1_000_000}MB, over the " \ "#{MAX_FILE_BYTES / 1_000_000}MB summarize cap. Split it (e.g. with split/sed) " \ "or grep to the relevant section, then summarize that." end chunks = chunk_file() return "#{file_path} is empty — nothing to summarize." if chunks.empty? summaries = chunks.each_with_index.map do |chunk, i| raise Rubino::Interrupted if cancellation_requested? emit_chunk("summarizing chunk #{i + 1}/#{chunks.size}…\n") map_summarize(chunk, focus) end summary = reduce(summaries, focus, max_words) { output: summary, metrics: "#{chunks.size} chunk#{"s" if chunks.size != 1} → summary" } rescue Rubino::Interrupted raise rescue StandardError => e "Error summarizing #{file_path}: #{e.}" end |
#description ⇒ Object
36 37 38 39 40 41 42 43 44 45 |
# File 'lib/rubino/tools/summarize_file_tool.rb', line 36 def description "Summarize a large text file WITHOUT loading it into this conversation. " \ "The file is read and map-reduced by a separate summarization model; only the " \ "final summary returns here, so the raw bytes never enter context. " \ "PREFER this over `read` whenever you need the gist of a big document — converted " \ "PDFs, logs, transcripts, anything more than a few hundred lines. For binary docs " \ "(PDF/DOCX/XLSX/PPTX) use the `read_attachment` tool, which converts them to text " \ "in-process and summarizes oversized output automatically. " \ "Use `focus` to steer what the summary must preserve." end |
#input_schema ⇒ Object
47 48 49 50 51 52 53 54 55 56 57 58 59 |
# File 'lib/rubino/tools/summarize_file_tool.rb', line 47 def input_schema { type: "object", properties: { file_path: { type: "string", description: "Absolute or relative path to a text file" }, focus: { type: "string", description: "What the summary must preserve, e.g. 'chapter titles and page numbers' or 'API errors with timestamps'. Optional." }, max_words: { type: "integer", description: "Approximate length of the final summary in words (default 500)." } }, required: %w[file_path] } end |
#name ⇒ Object
32 33 34 |
# File 'lib/rubino/tools/summarize_file_tool.rb', line 32 def name "summarize_file" end |
#risk_level ⇒ Object
61 62 63 |
# File 'lib/rubino/tools/summarize_file_tool.rb', line 61 def risk_level :low end |