Class: BufferedTokenizer

Inherits:
Object
  • Object
show all
Defined in:
lib/em/buftok.rb

Overview

BufferedTokenizer takes a delimiter upon instantiation, or acts line-based by default. It allows input to be spoon-fed from some outside source which receives arbitrary length datagrams which may-or-may-not contain the token by which entities are delimited. In this respect it's ideally paired with something like EventMachine (rubyeventmachine.com/).

Instance Method Summary collapse

Constructor Details

#initialize(delimiter = $/) ⇒ BufferedTokenizer

New BufferedTokenizers will operate on lines delimited by a delimiter, which is by default the global input delimiter $/ (“n”).

The input buffer is stored as an array. This is by far the most efficient approach given language constraints (in C a linked list would be a more appropriate data structure). Segments of input data are stored in a list which is only joined when a token is reached, substantially reducing the number of objects required for the operation.



15
16
17
18
19
20
# File 'lib/em/buftok.rb', line 15

def initialize(delimiter = $/)
  @delimiter = delimiter
  @input = []
  @tail = ''
  @trim = @delimiter.length - 1
end

Instance Method Details

#extract(data) ⇒ Object

Extract takes an arbitrary string of input data and returns an array of tokenized entities, provided there were any available to extract. This makes for easy processing of datagrams using a pattern like:

tokenizer.extract(data).map { |entity| Decode(entity) }.each do ...

Using -1 makes split to return “” if the token is at the end of the string, meaning the last element is the start of the next chunk.



30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
# File 'lib/em/buftok.rb', line 30

def extract(data)
  if @trim > 0
    tail_end = @tail.slice!(-@trim, @trim) # returns nil if string is too short
    data = tail_end + data if tail_end
  end

  @input << @tail
  entities = data.split(@delimiter, -1)
  @tail = entities.shift

  unless entities.empty?
    @input << @tail
    entities.unshift @input.join
    @input.clear
    @tail = entities.pop
  end

  entities
end

#flushObject

Flush the contents of the input buffer, i.e. return the input buffer even though a token has not yet been encountered



52
53
54
55
56
57
58
# File 'lib/em/buftok.rb', line 52

def flush
  @input << @tail
  buffer = @input.join
  @input.clear
  @tail = "" # @tail.clear is slightly faster, but not supported on 1.8.7
  buffer
end