Class: Rouge::Lexer Abstract

Inherits:
Object
  • Object
show all
Includes:
Token::Tokens
Defined in:
lib/rouge/lexer.rb

Overview

This class is abstract.

A lexer transforms text into a stream of ‘[token, chunk]` pairs.

Constant Summary

Constants included from Token::Tokens

Token::Tokens::Num, Token::Tokens::Str

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from Token::Tokens

token

Constructor Details

#initialize(opts = {}) ⇒ Lexer

Create a new lexer with the given options. Individual lexers may specify extra options. The only current globally accepted option is ‘:debug`.

Parameters:

  • opts (Hash) (defaults to: {})

    a customizable set of options

Options Hash (opts):

  • :debug (Object)

    Prints debug information to stdout. The particular info depends on the lexer in question. In regex lexers, this will log the state stack at the beginning of each step, along with each regex tried and each stream consumed. Try it, it’s pretty useful.



346
347
348
349
350
351
352
# File 'lib/rouge/lexer.rb', line 346

def initialize(opts={})
  @options = {}
  opts.each { |k, v| @options[k.to_s] = v }
  eager_load! unless self.class.skip_auto_load?

  @debug = Lexer.debug_enabled? && bool_option('debug')
end

Instance Attribute Details

#optionsObject (readonly)

-*- instance methods -*- #



336
337
338
# File 'lib/rouge/lexer.rb', line 336

def options
  @options
end

Class Method Details

.aliases(*args) ⇒ Object

Used to specify alternate names this lexer class may be found by.

Examples:

class Erb < Lexer
  tag 'erb'
  aliases 'eruby', 'rhtml'
end

Lexer.find('eruby') # => Erb


284
285
286
287
288
# File 'lib/rouge/lexer.rb', line 284

def aliases(*args)
  args.map!(&:to_s)
  args.each { |arg| Lexer.register(arg, self) }
  (@aliases ||= []).concat(args)
end

.allObject

Returns a list of all lexers.

Returns:

  • a list of all lexers.



138
139
140
# File 'lib/rouge/lexer.rb', line 138

def all
  @all ||= registry.values.uniq
end

.assert_utf8!(str) ⇒ Object

Raises:

  • (EncodingError)


318
319
320
321
322
323
324
325
326
# File 'lib/rouge/lexer.rb', line 318

def assert_utf8!(str)
  encoding = str.encoding
  return if encoding == Encoding::US_ASCII || encoding == Encoding::UTF_8 || encoding == Encoding::BINARY

  raise EncodingError.new(
    "Bad encoding: #{str.encoding.names.join(',')}. " +
    "Please convert your string to UTF-8."
  )
end

.continue_lex(*a, &b) ⇒ Object

In case #continue_lex is called statically, we simply begin a new lex from the beginning, since there is no state.

See Also:



25
26
27
# File 'lib/rouge/lexer.rb', line 25

def continue_lex(*a, &b)
  lex(*a, &b)
end

.debug_enabled?Boolean

Returns:

  • (Boolean)


212
213
214
# File 'lib/rouge/lexer.rb', line 212

def debug_enabled?
  (defined? @debug_enabled) ? true : false
end

.demo(arg = :absent) ⇒ Object

Specify or get a small demo string for this lexer



131
132
133
134
135
# File 'lib/rouge/lexer.rb', line 131

def demo(arg=:absent)
  return @demo = arg unless arg == :absent

  @demo ||= File.read(demo_file, mode: 'rt:bom|utf-8')
end

.demo_file(arg = :absent) ⇒ Object

Specify or get the path name containing a small demo for this lexer (can be overriden by demo).



124
125
126
127
128
# File 'lib/rouge/lexer.rb', line 124

def demo_file(arg=:absent)
  return @demo_file = Pathname.new(arg) unless arg == :absent

  @demo_file ||= Pathname.new(File.join(__dir__, 'demos', tag))
end

.desc(arg = :absent) ⇒ Object

Specify or get this lexer’s description.



106
107
108
109
110
111
112
# File 'lib/rouge/lexer.rb', line 106

def desc(arg=:absent)
  if arg == :absent
    @desc
  else
    @desc = arg
  end
end

.detect?(text) ⇒ Boolean

This method is abstract.

Return true if there is an in-text indication (such as a shebang or DOCTYPE declaration) that this lexer should be used.

Parameters:

Returns:

  • (Boolean)


548
549
550
# File 'lib/rouge/lexer.rb', line 548

def self.detect?(text)
  false
end

.detectable?Boolean

Determine if a lexer has a method named :detect? defined in its singleton class.

Returns:

  • (Boolean)


218
219
220
221
# File 'lib/rouge/lexer.rb', line 218

def detectable?
  return @detectable if defined?(@detectable)
  @detectable = singleton_methods(false).include?(:detect?)
end

.disable_debug!Object



208
209
210
# File 'lib/rouge/lexer.rb', line 208

def disable_debug!
  remove_instance_variable :@debug_enabled if defined? @debug_enabled
end

.eager_load!Object



236
237
238
239
240
241
242
243
244
245
# File 'lib/rouge/lexer.rb', line 236

def eager_load!
  return if @_loaded
  @_loaded = true

  lazy_procs.each { |b| instance_eval(&b) }

  superclass.eager_load! unless superclass == Lexer

  self
end

.enable_debug!Object



204
205
206
# File 'lib/rouge/lexer.rb', line 204

def enable_debug!
  @debug_enabled = true
end

.filenames(*fnames) ⇒ Object

Specify a list of filename globs associated with this lexer.

If a filename glob is associated with more than one lexer, this can cause a Guesser::Ambiguous error to be raised in various guessing methods. These errors can be avoided by disambiguation. Filename globs are disambiguated in one of two ways. Either the lexer will define a ‘self.detect?` method (intended for use with shebangs and doctypes) or a manual rule will be specified in Guessers::Disambiguation.

Examples:

class Ruby < Lexer
  filenames '*.rb', '*.ruby', 'Gemfile', 'Rakefile'
end


303
304
305
# File 'lib/rouge/lexer.rb', line 303

def filenames(*fnames)
  (@filenames ||= []).concat(fnames)
end

.find(name) ⇒ Class<Rouge::Lexer>?

Given a name in string, return the correct lexer class.

Parameters:

  • name (String)

Returns:



32
33
34
# File 'lib/rouge/lexer.rb', line 32

def find(name)
  registry[name.to_s]
end

.find_fancy(str, code = nil, default_options = {}) ⇒ Object

Find a lexer, with fancy shiny features.

  • The string you pass can include CGI-style options

    Lexer.find_fancy('erb?parent=tex')
    
  • You can pass the special name ‘guess’ so we guess for you, and you can pass a second argument of the code to guess by

    Lexer.find_fancy('guess', "#!/bin/bash\necho Hello, world")
    

    If the code matches more than one lexer then Guesser::Ambiguous is raised.

This is used in the Redcarpet plugin as well as Rouge’s own markdown lexer for highlighting internal code blocks.



91
92
93
94
95
# File 'lib/rouge/lexer.rb', line 91

def find_fancy(str, code=nil, default_options={})
  lexer_class, opts = lookup_fancy(str, code, default_options)

  lexer_class && lexer_class.new(opts)
end

.guess(info = {}, &fallback) ⇒ Class<Rouge::Lexer>

Guess which lexer to use based on a hash of info.

Parameters:

  • fallback (Proc)

    called if multiple lexers are detected. If omitted, Guesser::Ambiguous is raised.

  • info (Hash) (defaults to: {})

    a customizable set of options

Options Hash (info):

  • :mimetype (Object)

    A mimetype to guess by

  • :filename (Object)

    A filename to guess by

  • :source (Object)

    The source itself, which, if guessing by mimetype or filename fails, will be searched for shebangs, <!DOCTYPE …> tags, and other hints.

Returns:

See Also:



179
180
181
182
183
184
185
186
187
188
189
190
# File 'lib/rouge/lexer.rb', line 179

def guess(info={}, &fallback)
  lexers = guesses(info)

  return Lexers::PlainText if lexers.empty?
  return lexers[0] if lexers.size == 1

  if fallback
    yield(lexers)
  else
    raise Guesser::Ambiguous.new(lexers)
  end
end

.guess_by_filename(fname) ⇒ Object



196
197
198
# File 'lib/rouge/lexer.rb', line 196

def guess_by_filename(fname)
  guess :filename => fname
end

.guess_by_mimetype(mt) ⇒ Object



192
193
194
# File 'lib/rouge/lexer.rb', line 192

def guess_by_mimetype(mt)
  guess :mimetype => mt
end

.guess_by_source(source) ⇒ Object



200
201
202
# File 'lib/rouge/lexer.rb', line 200

def guess_by_source(source)
  guess :source => source
end

.guesses(info = {}) ⇒ Object

Guess which lexer to use based on a hash of info.

This accepts the same arguments as Lexer.guess, but will never throw an error. It will return a (possibly empty) list of potential lexers to use.



147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
# File 'lib/rouge/lexer.rb', line 147

def guesses(info={})
  mimetype, filename, source = info.values_at(:mimetype, :filename, :source)
  custom_globs = info[:custom_globs]

  guessers = (info[:guessers] || []).dup

  guessers << Guessers::Mimetype.new(mimetype) if mimetype
  guessers << Guessers::GlobMapping.by_pairs(custom_globs, filename) if custom_globs && filename
  guessers << Guessers::Filename.new(filename) if filename
  guessers << Guessers::Modeline.new(source) if source
  guessers << Guessers::Source.new(source) if source
  guessers << Guessers::Disambiguation.new(filename, source) if source && filename

  Guesser.guess(guessers, Lexer.all)
end

.lazy(auto: true, &block) ⇒ Object



247
248
249
250
# File 'lib/rouge/lexer.rb', line 247

def lazy(auto: true, &block)
  @skip_auto_load = true unless auto
  lazy_procs << block
end

.lex(stream, opts = {}, &b) ⇒ Object

Lexes ‘stream` with the given options. The lex is delegated to a new instance.

See Also:



17
18
19
# File 'lib/rouge/lexer.rb', line 17

def lex(stream, opts={}, &b)
  new(opts).lex(stream, &b)
end

.lookup_fancy(str, code = nil, default_options = {}) ⇒ Object

Same as ::find_fancy, except instead of returning an instantiated lexer, returns a pair of [lexer_class, options], so that you can modify or provide additional options to the lexer.

Please note: the lexer class might be nil!



41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
# File 'lib/rouge/lexer.rb', line 41

def lookup_fancy(str, code=nil, default_options={})
  if str && !str.include?('?') && str != 'guess'
    lexer_class = find(str)
    return [lexer_class, default_options]
  end

  name, opts = str ? str.split('?', 2) : [nil, '']

  # parse the options hash from a cgi-style string
  cgi_opts = Hash.new { |hash, key| hash[key] = [] }
  URI.decode_www_form(opts || '').each do |k, val|
    cgi_opts[k] << val
  end
  cgi_opts.transform_values! do |vals|
    case vals.size
    when 0 then true
    when 1 then vals[0]
    else vals
    end
  end

  opts = default_options.merge(cgi_opts)

  lexer_class = case name
  when 'guess', nil
    self.guess(:source => code, :mimetype => opts['mimetype'])
  when String
    self.find(name)
  end

  [lexer_class, opts]
end

.mimetypes(*mts) ⇒ Object

Specify a list of mimetypes associated with this lexer.

Examples:

class Html < Lexer
  mimetypes 'text/html', 'application/xhtml+xml'
end


313
314
315
# File 'lib/rouge/lexer.rb', line 313

def mimetypes(*mts)
  (@mimetypes ||= []).concat(mts)
end

.option(name, desc) ⇒ Object



118
119
120
# File 'lib/rouge/lexer.rb', line 118

def option(name, desc)
  option_docs[name.to_s] = desc
end

.option_docsObject



114
115
116
# File 'lib/rouge/lexer.rb', line 114

def option_docs
  @option_docs ||= InheritableHash.new(superclass.option_docs)
end

.skip_auto_load?Boolean

Returns:

  • (Boolean)


252
253
254
255
256
# File 'lib/rouge/lexer.rb', line 252

def skip_auto_load?
  return true if @skip_auto_load
  return superclass.skip_auto_load? unless superclass == Lexer
  false
end

.tag(t = nil) ⇒ Object

Used to specify or get the canonical name of this lexer class.

Examples:

class MyLexer < Lexer
  tag 'foo'
end

MyLexer.tag # => 'foo'

Lexer.find('foo') # => MyLexer


268
269
270
271
272
273
# File 'lib/rouge/lexer.rb', line 268

def tag(t=nil)
  return @tag if t.nil?

  @tag = t.to_s
  Lexer.register(@tag, self)
end

.title(t = nil) ⇒ Object

Specify or get this lexer’s title. Meant to be human-readable.



98
99
100
101
102
103
# File 'lib/rouge/lexer.rb', line 98

def title(t=nil)
  if t.nil?
    t = tag.capitalize
  end
  @title ||= t
end

Instance Method Details

#as_bool(val) ⇒ Object



366
367
368
369
370
371
372
373
374
375
# File 'lib/rouge/lexer.rb', line 366

def as_bool(val)
  case val
  when nil, false, 0, '0', 'false', 'off'
    false
  when Array
    val.empty? ? true : as_bool(val.last)
  else
    true
  end
end

#as_lexer(val) ⇒ Object



394
395
396
397
398
399
400
401
402
403
404
405
# File 'lib/rouge/lexer.rb', line 394

def as_lexer(val)
  return as_lexer(val.last) if val.is_a?(Array)
  return val.new(@options) if val.is_a?(Class) && val < Lexer

  case val
  when Lexer
    val
  when String
    lexer_class = Lexer.find(val)
    lexer_class && lexer_class.new(@options)
  end
end

#as_list(val) ⇒ Object



383
384
385
386
387
388
389
390
391
392
# File 'lib/rouge/lexer.rb', line 383

def as_list(val)
  case val
  when Array
    val.flat_map { |v| as_list(v) }
  when String
    val.split(',')
  else
    []
  end
end

#as_string(val) ⇒ Object



377
378
379
380
381
# File 'lib/rouge/lexer.rb', line 377

def as_string(val)
  return as_string(val.last) if val.is_a?(Array)

  val ? val.to_s : nil
end

#as_token(val) ⇒ Object



407
408
409
410
411
412
413
414
415
# File 'lib/rouge/lexer.rb', line 407

def as_token(val)
  return as_token(val.last) if val.is_a?(Array)
  case val
  when Token
    val
  else
    Token[val]
  end
end

#bool_option(name, &default) ⇒ Object



417
418
419
420
421
422
423
424
425
# File 'lib/rouge/lexer.rb', line 417

def bool_option(name, &default)
  name_str = name.to_s

  if @options.key?(name_str)
    as_bool(@options[name_str])
  else
    default ? yield : false
  end
end

#continue_lex(string) {|last_token, last_val| ... } ⇒ Object

Continue the lex from the the current state without resetting

Yields:

  • (last_token, last_val)


502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
# File 'lib/rouge/lexer.rb', line 502

def continue_lex(string, &b)
  return enum_for(:continue_lex, string, &b) unless block_given?

  # consolidate consecutive tokens of the same type
  last_token = nil
  last_val = nil
  stream_tokens(string) do |tok, val|
    next if val.empty?

    if tok == last_token
      last_val << val
      next
    end

    yield(last_token, last_val) if last_token
    last_token = tok
    last_val = val
  end

  yield(last_token, last_val) if last_token
end

#eager_load!Object



354
355
356
# File 'lib/rouge/lexer.rb', line 354

def eager_load!
  self.class.eager_load!
end

#hash_option(name, defaults, &val_cast) ⇒ Object



443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
# File 'lib/rouge/lexer.rb', line 443

def hash_option(name, defaults, &val_cast)
  name = name.to_s
  out = defaults.dup

  base = @options.delete(name.to_s)
  base = {} unless base.is_a?(Hash)
  base.each { |k, v| out[k.to_s] = val_cast ? yield(v) : v }

  @options.keys.each do |key|
    next unless key =~ /(\w+)\[(\w+)\]/ and $1 == name
    value = @options.delete(key)

    out[$2] = val_cast ? yield(value) : value
  end

  out
end

#lex(string, opts = nil, &b) ⇒ Object

Note:

The use of :continue => true has been deprecated. A warning is issued if run with ‘$VERBOSE` set to true.

Note:

The use of arbitrary ‘opts` has never been supported, but we previously ignored them with no error. We now warn unconditionally.

Given a string, yield [token, chunk] pairs. If no block is given, an enumerator is returned.

Parameters:

  • opts (Hash) (defaults to: nil)

    a customizable set of options

Options Hash (opts):

  • :continue (Object)

    Continue the lex from the previous state (i.e. don’t call #reset!)



479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
# File 'lib/rouge/lexer.rb', line 479

def lex(string, opts=nil, &b)
  if opts
    if (opts.keys - [:continue]).size > 0
      # improper use of options hash
      warn('Improper use of Lexer#lex - this method does not receive options.' +
           ' This will become an error in a future version.')
    end

    if opts[:continue]
      warn '`lex :continue => true` is deprecated, please use #continue_lex instead'
      return continue_lex(string, &b)
    end
  end

  return enum_for(:lex, string) unless block_given?

  Lexer.assert_utf8!(string)
  reset!

  continue_lex(string, &b)
end

#lexer_option(name, &default) ⇒ Object



431
432
433
# File 'lib/rouge/lexer.rb', line 431

def lexer_option(name, &default)
  as_lexer(@options.delete(name.to_s, &default))
end

#list_option(name, &default) ⇒ Object



435
436
437
# File 'lib/rouge/lexer.rb', line 435

def list_option(name, &default)
  as_list(@options.delete(name.to_s, &default))
end

#reset!Object

This method is abstract.

Called after each lex is finished. The default implementation is a noop.



465
466
# File 'lib/rouge/lexer.rb', line 465

def reset!
end

#stream_tokens(stream, &b) ⇒ Object

This method is abstract.

Yield ‘[token, chunk]` pairs, given a prepared input stream. This must be implemented.

Parameters:

  • stream (StringScanner)

    the stream



536
537
538
# File 'lib/rouge/lexer.rb', line 536

def stream_tokens(stream, &b)
  raise 'abstract'
end

#string_option(name, &default) ⇒ Object



427
428
429
# File 'lib/rouge/lexer.rb', line 427

def string_option(name, &default)
  as_string(@options.delete(name.to_s, &default))
end

#tagObject

delegated to tag



525
526
527
# File 'lib/rouge/lexer.rb', line 525

def tag
  self.class.tag
end

#token_option(name, &default) ⇒ Object



439
440
441
# File 'lib/rouge/lexer.rb', line 439

def token_option(name, &default)
  as_token(@options.delete(name.to_s, &default))
end

#with(opts = {}) ⇒ Object

Returns a new lexer with the given options set. Useful for e.g. setting debug flags post hoc, or providing global overrides for certain options



360
361
362
363
364
# File 'lib/rouge/lexer.rb', line 360

def with(opts={})
  new_options = @options.dup
  opts.each { |k, v| new_options[k.to_s] = v }
  self.class.new(new_options)
end