Module: BioSyntax

Defined in:
lib/biosyntax.rb,
lib/biosyntax/version.rb,
ext/biosyntax/biosyntax_ext.c

Overview

Ruby bindings for the vendored ‘libbiosyntax` tokenizer/highlighter.

BioSyntax highlights biological text formats one line at a time. The parser and ANSI renderer are implemented by a native C extension; this module exposes Ruby value objects and convenience factories around that extension.

Examples:

Highlight a VCF line and inspect spans

highlighter = BioSyntax.vcf
line = "chr1\t42\trs1\tA\tT\t99\tPASS\tDP=10\n"
highlighter.highlight(line).each do |span|
  puts [span.start, span.end, span.kind_name, span.scope].join("\t")
end

Render ANSI-colored output

highlighter = BioSyntax.fastq
File.foreach("reads.fastq", chomp: false) do |line|
  print highlighter.colorize(line)
end

Defined Under Namespace

Modules: Native Classes: Error, Format, Highlighter, Kind, Span, UnknownKindError, UnsupportedFormatError

Constant Summary collapse

LIBBIOSYNTAX_VERSION =

Version string reported by the vendored native ‘libbiosyntax` core.

Returns:

  • (String)
Native.libbiosyntax_version.freeze
LIBBIOSYNTAX_ABI_VERSION =

ABI version reported by the vendored native ‘libbiosyntax` core.

Returns:

  • (Integer)
Native.abi_version
FORMATS =

Supported formats keyed by canonical format name.

Returns:

RAW_FORMATS.each_with_object({}) do |row, hash|
  next if row.fetch(:id).zero?

  format = Format.new(
    id: row.fetch(:id),
    name: row.fetch(:name),
    description: row.fetch(:description),
    stateful: row.fetch(:stateful)
  )
  hash[format.name] = format
end.freeze
KINDS =

Known token kinds keyed by canonical kind name.

Returns:

  • (Hash{Symbol => Kind})
RAW_KINDS.each_with_object({}) do |row, hash|
  kind = Kind.new(
    id: row.fetch(:id),
    name: row.fetch(:name),
    scope: row.fetch(:scope),
    foreground: row.fetch(:foreground),
    background: row.fetch(:background),
    font_style: row.fetch(:font_style),
    ansi_sgr: row.fetch(:ansi_sgr)
  )
  hash[kind.name] = kind
end.freeze
FORMAT_NAMES =

Returns supported canonical format names.

Returns:

  • (Array<Symbol>)

    supported canonical format names

FORMATS.keys.freeze
KIND_NAMES =

Returns known canonical kind names.

Returns:

  • (Array<Symbol>)

    known canonical kind names

KINDS.keys.freeze
SCOPES =

Token kinds grouped by semantic scope.

Returns:

  • (Hash{String => Array<Kind>})
KINDS.values.each_with_object(Hash.new { |hash, key| hash[key] = [] }) do |kind, hash|
  hash[kind.scope] << kind
end.each_with_object({}) do |(scope, kinds), hash|
  hash[scope.freeze] = kinds.freeze
end.freeze
VERSION =

Ruby gem version.

Returns:

  • (String)
'0.1.0'

Class Method Summary collapse

Class Method Details

.[](format) ⇒ Highlighter Also known as: highlighter

Create a highlighter for a format.

Parameters:

  • format (Format, Symbol, String, Integer)

    format object, name, alias, or native id

Returns:

Raises:



410
411
412
# File 'lib/biosyntax.rb', line 410

def [](format)
  Highlighter.new(format)
end

.format(value) ⇒ Format

Resolve a format object from a name, alias, id, or existing object.

Parameters:

  • value (Format, Symbol, String, Integer)

Returns:

Raises:



430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
# File 'lib/biosyntax.rb', line 430

def format(value)
  return value if value.is_a?(Format)

  found = case value
          when Integer
            FORMATS_BY_ID[value]
          else
            name = value.to_s.downcase
            FORMATS[name.to_sym] ||
              FORMATS[name.tr('_', '-').to_sym] ||
              FORMATS_BY_ID[Native.format_id_from_name(name)]
          end

  return found if found

  raise UnsupportedFormatError, "unsupported format: #{value.inspect}"
end

.format_name(value) ⇒ Symbol?

Resolve the canonical name for a format.

Parameters:

  • value (Format, Symbol, String, Integer)

Returns:

  • (Symbol, nil)


452
453
454
455
456
# File 'lib/biosyntax.rb', line 452

def format_name(value)
  format(value).name
rescue UnsupportedFormatError
  nil
end

.format_supported?(value) ⇒ Boolean

Parameters:

  • value (Format, Symbol, String, Integer)

Returns:

  • (Boolean)


460
461
462
# File 'lib/biosyntax.rb', line 460

def format_supported?(value)
  !format_name(value).nil?
end

.formatsArray<Symbol>

Returns supported canonical format names.

Returns:

  • (Array<Symbol>)

    supported canonical format names



416
417
418
# File 'lib/biosyntax.rb', line 416

def formats
  FORMAT_NAMES
end

.guess(path_or_extension) ⇒ Highlighter?

Guess a format from a path or extension and create a highlighter.

Parameters:

  • path_or_extension (String, #to_s)

Returns:



514
515
516
517
# File 'lib/biosyntax.rb', line 514

def guess(path_or_extension)
  name = guess_format(path_or_extension)
  name && Highlighter.new(name)
end

.guess_format(path_or_extension) ⇒ Symbol?

Guess a format from a path or extension.

Parameters:

  • path_or_extension (String, #to_s)

Returns:

  • (Symbol, nil)

    canonical format name if recognized



504
505
506
507
508
# File 'lib/biosyntax.rb', line 504

def guess_format(path_or_extension)
  id = Native.guess_format_id(path_or_extension.to_s)
  format = FORMATS_BY_ID[id]
  format&.name
end

.kind(value) ⇒ Kind

Resolve token kind metadata from a name, id, or existing object.

Parameters:

  • value (Kind, Symbol, String, Integer)

Returns:

Raises:



469
470
471
472
473
474
475
476
477
478
479
480
481
482
# File 'lib/biosyntax.rb', line 469

def kind(value)
  return value if value.is_a?(Kind)

  found = case value
          when Integer
            KINDS_BY_ID[value]
          else
            KINDS[normalize_name(value)] || KINDS[value.to_s.downcase.tr('-', '_').to_sym]
          end

  return found if found

  raise UnknownKindError, "unknown kind: #{value.inspect}"
end

.kind_known?(value) ⇒ Boolean

Parameters:

  • value (Kind, Symbol, String, Integer)

Returns:

  • (Boolean)


496
497
498
# File 'lib/biosyntax.rb', line 496

def kind_known?(value)
  !kind_name(value).nil?
end

.kind_name(value) ⇒ Symbol?

Resolve the canonical name for a token kind.

Parameters:

  • value (Kind, Symbol, String, Integer)

Returns:

  • (Symbol, nil)


488
489
490
491
492
# File 'lib/biosyntax.rb', line 488

def kind_name(value)
  kind(value).name
rescue UnknownKindError
  nil
end

.kindsArray<Symbol>

Returns known canonical kind names.

Returns:

  • (Array<Symbol>)

    known canonical kind names



421
422
423
# File 'lib/biosyntax.rb', line 421

def kinds
  KIND_NAMES
end