Class: Rubino::LLM::InlineThinkFilter

Inherits:
Object
  • Object
show all
Defined in:
lib/rubino/llm/inline_think_filter.rb

Overview

Streaming filter that splits text into :content and :thinking events by recognising inline <think>…</think> sentinels emitted by MiniMax, DeepSeek-R1, Qwen, and similar reasoning models that don’t expose a dedicated reasoning channel.

Holds back up to TAG_MAX_LEN-1 chars across chunks so a tag split between chunks (e.g. “<thi” + “nk>”) still gets matched. Call #flush at end of stream to drain any tail.

A reasoning model emits its <think> block as the FIRST thing in the turn —the reasoning precedes the answer. A LITERAL <think> a coding agent types mid-answer (echoing user input, writing docs/HTML, discussing the syntax) is content, not a control marker, and MUST survive verbatim. So we only honor an OPENING <think> as a reasoning sentinel while the turn still LEADS with it — i.e. before any visible content has been emitted and while not inside a fenced code block. Once real content (or a “‘ fence) has appeared, every <think>/</think> is treated as ordinary text and is never dropped from the answer or the persisted transcript (STRM-1).

Constant Summary collapse

OPEN_RE =
/<think>/i
CLOSE_RE =
%r{</think>}i
FENCE_RE =

A “‘ fence toggles “literal code” mode: backticks can appear mid-line (inline `code`) or open a block, so we only need to know a fence run STARTED to stop treating <think> as control inside it.

/```/
TAG_MAX_LEN =
"</think>".length

Instance Method Summary collapse

Constructor Details

#initializeInlineThinkFilter

Returns a new instance of InlineThinkFilter.



32
33
34
35
36
37
# File 'lib/rubino/llm/inline_think_filter.rb', line 32

def initialize
  @inside       = false  # currently inside a <think>...</think> reasoning span
  @content_seen = false  # any visible (:content) text already emitted this turn
  @in_fence     = false  # inside a ``` code fence (where <think> is literal)
  @pending      = +""
end

Instance Method Details

#feed(chunk, &block) ⇒ Object



39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
# File 'lib/rubino/llm/inline_think_filter.rb', line 39

def feed(chunk, &block)
  @pending << chunk
  loop do
    # Outside a reasoning span, <think> is only a CONTROL marker while the
    # turn still LEADS with it: no visible content emitted yet and not
    # inside a ``` fence. Once content (or a fence) has appeared, every
    # <think> is literal — emit the safe prefix as content and never split.
    if !@inside && (@content_seen || @in_fence)
      emit_safe_prefix(:content, &block)
      break
    end

    re, sentinel = @inside ? [CLOSE_RE, :thinking] : [OPEN_RE, :content]
    match = @pending.match(re)

    if match
      idx = match.begin(0)
      # An OPEN <think> preceded by NON-BLANK content on this turn is not a
      # reasoning sentinel — it's literal text the user must keep. Emit the
      # whole pending span (prefix INCLUDING the tag) as content and treat
      # all that follows as literal too. (Whitespace-only prefix still
      # leads, so a genuine reasoning block can start after a newline.)
      if sentinel == :content && @pending[0, idx].match?(/\S/)
        emit_safe_prefix(:content, &block)
        break
      end

      tag_len = match[0].length
      emit    = @pending.slice!(0, idx)
      @pending.slice!(0, tag_len)
      unless emit.empty?
        note_content(emit) if sentinel == :content
        block.call(sentinel, emit)
      end
      @inside = !@inside
    else
      emit_safe_prefix(sentinel, &block)
      break
    end
  end
end

#flush {|sentinel, @pending| ... } ⇒ Object

Yields:

  • (sentinel, @pending)


81
82
83
84
85
86
87
# File 'lib/rubino/llm/inline_think_filter.rb', line 81

def flush
  return if @pending.empty?

  sentinel = @inside ? :thinking : :content
  yield sentinel, @pending
  @pending = +""
end