SchemaGraphy is an aspirational framework for governing data and technical content in flat-file formats. It is based in a modestly extended form of YAML syntax and preprocessing, and is designed to be used in a variety of contexts.
For now, the released and supported version of this project is a Ruby gem that provides a simple API for processing and validating specially formatted YAML files, mainly for generating single-sourced documentation from them. This gem is being released to support divergent downstream projects (so far: ReleaseHx and DocOps Lab’s Jekyll extensions suite).
SchemaGraphy is basically an extension of YAML, enabling Ruby developers and end users more broadly to powerfully interpret and schematize YAML-based data.
Most relevant to our case, as enabled by the SchemaGraphy module in this gem, is its handling of YAML custom tags, attribute resolution, and what I am calling “templated fields”, where the value of a YAML node is a String that is intended to be further processed by a templating engine like Liquid or ERB, either immediately upon ingest or later in the runtime stack, when it can be mixed with additional data.
SchemaGraphy facilitates handling these and other quirky power-ups we use with our fully valid YAML files, so low-code users can pass some dynamism along in their YAML configs and so forth.
API reference documentation is published at rubydoc.info/gems/schemagraphy.
Attribute Resolution
SchemaGraphy provides attribute resolution capabilities through the AttributeResolver component.
This enables YAML files to reference external attributes using placeholder syntax like {attribute_name}.
When loading YAML with .load_yaml_with_attributes(file_path, attributes), SchemaGraphy:
-
Loads the YAML file normally
-
Searches for
{attribute_name}patterns in String values -
Replaces them with corresponding values from the provided attributes Hash
-
Returns the resolved YAML data structure
This is used extensively in ReleaseHx’s configuration system to enable single-sourcing of defaults from README attributes.
Custom YAML Tag Handling
To enable end users to pass meta-instructions along with their data, wherever it will make sense to do so, SchemaGraphy offers a straightforward handling system.
Wherever you parse YAML-formatted data using .load_yaml_with_tags, custom-tagged Scalar nodes are converted into Maps like so:
some_property: !liquid "{{ some_value }}"
some_property:
__tag__: liquid
value: "{{ some_value }}"
Developers may therefore conditionally interpret ingested data based on user-defined classifications, wherever the developer supports such things.
Whether a Scalar has been transformed into a TagMap, you can resolve it using:
SchemaGraphy::TagUtils.detag(some_property)
# Or, with a local alias
detag = ->(val) { SchemaGraphy::TagUtils.detag(val) }
detag(some_property)
When tags are used this way, to convey a syntax/engine for processing a template or other dynamic content, SchemaGraphy can even help us handle the content in the manner designated by the tag. This will come up again in the next section.
|
Note
|
This capability is only available on Scalar node values. For now, tags applied to other compound node types (Arrays/sequences, Maps/mappings) will be ignored by SchemaGraphy interpreters. |
|
Warning
|
When you use load_yaml_with_tags, you will encounter errors downstream if a user places a tag on a node where you do not expect it.
|
Templated Property Values in YAML
We are calling these “templated fields” to specify that we are talking about enabling end users to use Liquid, ERB, or eventually other templating syntaxes in YAML node values.
In so doing, developer are able to designate that the value of certain YAML nodes should be handled by a templating engine, as well as when and how.
We’ll look at how this is done in Dynamic Templated-field Handling.
For now, the point is that sometimes files like specs/data/config-def.yml or an API-mapping file call for a little more runtime dynamism than a low-code solution like pure YAML can support.
Therefore, when the value of a user-configurable or environment-deterimined “setting” is a string that must be generated from data defined outside that field, we parse and render the template at runtime, using data from the environment or elsewhere. For now, it is up to our calling code to provide the appropriate variables to the template depending on the context.
Configuration Definition (CFGYML)
All user-configurable settings have a definition, if not also a default value. For single-sourcing purposes, these are defined in a YAML format called CFGYML — a configuration-file modeling language.
The file is at ./specs/data/config-def.yml.
It is used to establish the literal default settings carried by the application, and also to document those settings for the user.
This practice lets developers give end users extremely detailed configurations, always well documented.
Attribute Resolution in CFGYML
CFGYML supports attribute resolution from AsciiDoc files (like this README) using placeholder syntax.
Default values can reference README attributes using {attribute_name} syntax:
properties:
$meta:
properties:
markup:
dflt: "{default_markup}" # Resolved from README.adoc :default_markup: attribute
This enables single-sourcing of configuration defaults from README attributes, ensuring that documentation and defaults stay synchronized.
CFGYML Schema Structure
The basic schema is somewhat straightforward.
Essentially, you’re nesting Map objects within a YAML key properties, and each property (setting) of the defined config file can be described and constrained.
Each setting can have a type, description (desc), default (dflt), and templated-field instructions (templating).
If the setting is itself a of type Map (YAML “mapping”, JSON “object”), its own nested parameters can be established with a properties: block.
For now, you can designate the type, which you will have to enforce in your code, as well as a default value.
SGYML Schemas
Similar to but more complicated than CFGYML definition files are SchemaGraphy schema files. This is a partially specified, partially developed, and as-yet-incomplete syntax for designating and constraining YAML documents.
ReleaseHx and CFGYML at this time make active use of only minimal aspects of these schemas.
Each of the YAML formats used by ReleaseHx has its own schema in the repo.
The cfgyml-schema.yaml file will eventually be spun off, but the specs/data/rhyml-schema.yaml and specs/data/rhyml-mapping-schema.yaml files will stay here, defining valid formats for the types of files they apply to.
Since SchemaGraphy itself is still unreleased, CFGYML as introduced in this gem offers only a subset of what it will enable down the road.
Once SchemaGraphy is generally available, this gem will call it as a dependency.
At that point, a file like specs/data/config-def.yml (CFGYML) will be able to impose a more detailed $schema for any property.
Dynamic Templated-field Handling
The most powerful function of SchemaGraphy schemas that is now available in ReleaseHx is the ability to instruct how templated fields should be processed at different stages, and also to parse and render them as needed.
Templated-field handling can be established between a combination of (1) CFGYML definition files or SGYML schema files and (2) configuration files to be applied at runtime.
Developers can designate a given property to be type: Template in a schema or definition.
This “typing” can be a trigger for downstream parsing/rendering of the template.
|
Note
|
Liquid uses these two stages.
The parse operation compiles a template into a Liquid::Template object.
The render operation applies a dataset to the loaded template, generating a String with Liquid tags resolved.
|
Not Yet Implemented
Most aspects of SchemaGraphy/SGYML are planned but not yet available, but some are worth pointing out.
- data types
-
As of now, the
typenode of any property inspecs/data/config-def.ymlis not particularly functional. I do have a whole table of “data types” in SGYML, most of which are extremely self-explanatory and drawn from fairly platonic, cross-language terms.However, these are entirely unenforced in ReleaseHx — for now, data still has to be type checked explicitly in the Ruby code, and user configs are not validated against any kind of typing system.
- schema docs
-
The schema files do not yet generate complete reference docs for the subject files that they govern. So for instance, you’ll have to read files like
specs/data/rhyml-schema.yamlandspecs/data/rhyml-mapping-schema.yamldirectly to understand the format of RHYML files and how they are mapped to REST response payloads. - SGYML validation and resolvers
-
The SGYML schemas are not yet being used to validate YAML files or to dereference
$refnodes and other enhancements. - URIx specification
-
A lightweight standard for designating and parsing extended URIs to enable identification of structured data/text in flat files at any address.
The Future of SchemaGraphy
-
Add transclusion (
$ref) capability to YAML documents -
Govern serialized, nested data objects, including default properties and inter-object relationships.
-
Constrain AsciiDoc, Markdown, restructuredText, and any other lightweight markup text formatting to meet your custom standards.
-
Generate linter definitions based on your schemas.
-
Automatically generate documentation for APIs, config files, and marked-up datasets.
-
Even validate HTML and rich-test output of your source markup!
-
Port to and from third-party schema models like JSON Schema* and GraphQL.
-
Render HTML forms for data collection with one click.
-
Ingest default values from a config file definition.
-
Generate stubs/drafts data and documents.
-
All with human-friendly YAML files and objects, with AsciiDoc supported internally…