Purpose
The tbx Ruby gem allows you to parse, manipulate, and serialize TBX
(TermBase eXchange) documents as defined by ISO 30042:2019.
TBX is an international standard for representing structured terminological data in XML. This library provides complete coverage of the TBX core structure, including:
-
DCA (Data Category Archive) style — standard TBX element names with
typeattributes (e.g.,<descrip type="definition">) -
DCT (Data Category Tagging) style — module-namespaced elements (e.g.,
<basic:definition>) (planned)
The library is built on lutaml-model
for declarative XML serialization.
|
Note
|
This is a work-in-progress. |
Installation
Install the gem and add to the application’s Gemfile:
bundle add tbx
Or install directly:
gem install tbx
Usage
require 'tbx'
# Parse a TBX file
doc = IO.read('spec/fixtures/TBX_test_files/min_good.tbx')
tbx = Tbx::Document.from_xml(doc)
# Access document metadata
tbx.type # => "TBX-Min"
tbx.style # => "dca"
tbx.lang # => "en"
# Access header
tbx.tbx_header.file_desc.source_desc.p.first.content.join
# => "TBX file, created via MultiTerm Export"
# Navigate concept entries
entry = tbx.text.body.concept_entry.first
entry.id # => "c1"
entry.lang_sec.first.lang # => "en"
entry.lang_sec.first.term_sec.first.term.content.join
# => "open cluster"
# Serialize back to XML
puts tbx.to_xml(pretty: true)
# => round-tripped TBX document
API
# Parse
tbx = Tbx::Document.from_xml(xml_string)
# Access elements
tbx.tbx_header.file_desc.source_desc
tbx.text.body.concept_entry.each do |entry|
entry.lang_sec.each do |lang|
lang.term_sec.each do |ts|
puts ts.term.content.join
end
end
end
# Serialize back
tbx.to_xml
tbx.to_xml(pretty: true, declaration: true, encoding: "utf-8")
Using TYPES and VALUES constants
Each data-category element class exposes TYPES (permitted @type values)
and VALUES (permitted picklist values) constants composed from the
TBX module definitions:
# Permitted termNote types across all modules
Tbx::TermNote::TYPES
# => { administrative_status: "administrativeStatus", part_of_speech: "partOfSpeech",
# geographical_usage: "geographicalUsage", grammatical_gender: "grammaticalGender",
# term_location: "termLocation", term_type: "termType",
# grammatical_number: "grammaticalNumber", register: "register",
# transfer_comment: "transferComment" }
# Picklist values for a specific type
Tbx::TermNote::VALUES[:grammatical_gender]
# => { masculine: "masculine", feminine: "feminine", neuter: "neuter", other: "other" }
# Admin types from all modules
Tbx::Admin::TYPES
# => { customer_subset: "customerSubset", project_subset: "projectSubset",
# source: "source", reading: "reading" }
# Descrip types
Tbx::Descrip::TYPES
# => { subject_field: "subjectField", context: "context", definition: "definition" }
# Hi types from core RNG
Tbx::Hi::TYPES
# => { entailed_term: "entailedTerm", hotkey: "hotkey", italics: "italics",
# bold: "bold", superscript: "superscript", subscript: "subscript", math: "math" }
# Create validated elements using constants
note = Tbx::TermNote.new(
type: Tbx::TermNote::TYPES[:part_of_speech],
content: [Tbx::TermNote::VALUES[:part_of_speech][:noun]],
)
Supported TBX elements
Structural elements (TBXcoreStructV03.rng)
| Ruby class | TBX element | Schema source |
|---|---|---|
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
Header elements (TBXcoreStructV03.rng)
| Ruby class | TBX element | Schema source |
|---|---|---|
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
Data category elements with TYPES constants
These elements carry a type attribute whose permitted values are defined
by the active TBX module composition. The TYPES constant on each class
contains the union of all applicable module types.
Admin element
| Module | Type value | Content | Source |
|---|---|---|---|
Min |
|
string |
Min.tbxmd (DC-165) |
Basic |
|
string |
Basic.tbxmd (DC-406) |
Basic |
|
string |
Basic.tbxmd (DC-471) |
Linguist |
|
string |
Linguist.tbxmd |
AdminNote element
| Module | Type value | Content | Source |
|---|---|---|---|
Linguist |
|
noteText |
Linguist.tbxmd |
Descrip element
| Module | Type value | Content | Source |
|---|---|---|---|
Min |
|
string |
Min.tbxmd (DC-489) |
Basic |
|
noteText |
Basic.tbxmd (DC-149) |
Basic |
|
noteText |
Basic.tbxmd (DC-168) |
TermNote element
| Module | Type value | Content | Source |
|---|---|---|---|
Min |
|
picklist [1] |
Min.tbxmd (DC-168) |
Min |
|
picklist [2] |
Min.tbxmd (DC-396) |
Basic |
|
string |
Basic.tbxmd (DC-243) |
Basic |
|
picklist [3] |
Basic.tbxmd (DC-245) |
Basic |
|
picklist [4] |
Basic.tbxmd (DC-1823) |
Basic |
|
picklist [5] |
Basic.tbxmd (DC-2677) |
Linguist |
|
picklist [6] |
Linguist.tbxmd (DC-251) |
Linguist |
|
picklist [7] |
Linguist.tbxmd (DC-423) |
Linguist |
|
string |
Linguist.tbxmd (DC-520) |
Transac element
| Module | Type value | Values | Source |
|---|---|---|---|
Basic |
|
picklist [8] |
Basic.tbxmd (DC-1689) |
TransacNote element
| Module | Type value | Content | Source |
|---|---|---|---|
Basic |
|
string |
Basic.tbxmd (DC-451) |
Ref element
| Module | Type value | Content | Source |
|---|---|---|---|
Basic |
|
string |
Basic.tbxmd (DC-164) |
Xref element
| Module | Type value | Content | Source |
|---|---|---|---|
Min |
|
string |
TBX-Min DCA Schematron |
Basic |
|
string |
Basic.tbxmd (DC-226) |
Basic |
|
string |
Basic.tbxmd (DC-2920) |
Hi element (core RNG, not module-specific)
| Type value | Description |
|---|---|
|
Cross-reference to another concept within term text |
|
Keyboard shortcut designation |
|
Italic text emphasis |
|
Bold text emphasis |
|
Superscript text |
|
Subscript text |
|
Mathematical notation |
Grouping elements (TBXcoreStructV03.rng)
| Ruby class | TBX element | Schema source |
|---|---|---|
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
TermComp elements (TermComp-namespace.rng)
| Ruby class | TBX element | Schema source |
|---|---|---|
|
|
TermComp-namespace.rng |
|
|
TermComp-namespace.rng |
|
|
TermComp-namespace.rng |
Tbx::TermCompSec::TYPES defines 5 decomposition types:
hyphenation, lemma, morphologicalElement, syllabification, termElement.
Reference object elements (TBXcoreStructV03.rng)
| Ruby class | TBX element | Schema source |
|---|---|---|
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
Inline and text elements (TBXcoreStructV03.rng)
| Ruby class | TBX element | Schema source |
|---|---|---|
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
|
|
Core RNG |
Module architecture
TBX modules define sets of data categories. Dialects compose modules to
determine which @type values are permitted on each element.
Modules and dialects
| Module | Dialects that include it | Source files |
|---|---|---|
Min |
TBX-Min, TBX-Basic, TBX-Core, TBX-Linguist |
|
Basic |
TBX-Basic, TBX-Linguist |
|
Linguist |
TBX-Linguist |
|
TermComp |
(add-on for any dialect) |
|
Core |
TBX-Core |
|
Ruby module constants
| Ruby module | Constants | Source schema |
|---|---|---|
|
|
Min.tbxmd |
|
|
Basic.tbxmd |
|
|
Linguist.tbxmd |
|
|
TBXcoreStructV03.rng |
Element classes compose these via merge:
| Element class | TYPES composition |
|---|---|
|
Min::ADMIN_TYPES + Basic::ADMIN_TYPES + Linguist::ADMIN_TYPES |
|
Linguist::ADMIN_NOTE_TYPES |
|
Min::DESCRIP_TYPES + Basic::DESCRIP_TYPES |
|
Min::TERM_NOTE_TYPES + Basic::TERM_NOTE_TYPES + Linguist::TERM_NOTE_TYPES |
|
Basic::REF_TYPES |
|
Min::XREF_TYPES + Basic::XREF_TYPES |
|
Basic::TRANSAC_TYPES |
|
Basic::TRANSAC_NOTE_TYPES |
|
CoreTypes::HI_TYPES |
|
(self-contained: hyphenation, lemma, morphologicalElement, syllabification, termElement) |
Shared concern
Tbx::DataElement is a module that injects common attributes (id, lang,
target, datatype, type, content) via self.included(base).
Tbx::DataElement::InlineContent adds inline child attributes (hi, ec,
foreign, ph, sc) for elements with entity.noteText content.
| Element classes | Includes InlineContent? |
|---|---|
|
Yes (entity.noteText content) |
|
No (plain text only per RNG) |
Schema source audit
Every element class in lib/tbx/ has YARD documentation tracing it to its
schema source. The authoritative sources are:
| Source file | Defines |
|---|---|
|
All structural elements, attributes, content models, and |
|
Min module data categories: customerSubset, subjectField, administrativeStatus, partOfSpeech, externalCrossReference |
|
Basic module data categories: projectSubset, source, context, definition, geographicalUsage, grammaticalGender, termLocation, termType, crossReference, externalCrossReference, xGraphic, transactionType, responsibility |
|
Linguist module data categories: reading, readingNote, grammaticalNumber, register, transferComment |
|
TermComp elements (termComp, termCompGrp, termCompSec) and type values |
|
TBX-Basic DCA Schematron: permitted |
|
TBX-Min DCA Schematron |
|
TBX-Linguist DCA Schematron |
|
TBX-Core Schematron |
|
Module-level structural extensions |
|
Module-level Schematron rules |
Reference documentation
The reference-docs/ directory contains official TBX schema, dialect, and
module files from LTAC Global. See
reference-docs/README.adoc for a complete
inventory with source repository links.
Official LTAC-Global repositories
| Repository | Description |
|---|---|
Core structural schema ( |
|
Min module data categories (RNG, Schematron, |
|
Basic module data categories (RNG, Schematron, |
|
Linguist module data categories (RNG, Schematron, |
|
Term composition module (namespace RNG) |
|
|
|
TBX-Min dialect (DCA + DCT Schematron, NVDL) |
|
TBX-Basic dialect (DCA + DCT Schematron, bundled modules) |
|
TBX-Core dialect (integrated RNG, Schematron) |
|
TBX-Linguist dialect (DCA + DCT Schematron, integrated RNG) |
|
TBX-Basic implementation guide |
|
Official TBX test fixture files |
Test data
Test fixtures are sourced from the
TBX_test_files repository
maintained by LTAC Global, and from the dialect
schemas and example files in reference-docs/.
Development
After checking out the repo, run bin/setup to install dependencies. Then,
run bundle exec rake to run the tests and linter.
# Run tests
bundle exec rspec
# Run linter
bundle exec rubocop
# Run both (default task)
bundle exec rake
Credits
This gem is developed, maintained and funded by Ribose Inc.
TBX schema, dialect, and module reference files are maintained by LTAC Global and sourced from the LTAC-Global GitHub organization.
License
The gem is available as open source under the terms of the 2-Clause BSD License.