sas-lexer
A Ruby gem that wraps the sas-lexer Rust crate through the FFI interface. Tokenizes SAS source code into a stream of typed tokens with full position metadata.
This gem is a thin Ruby binding only — all lexing logic lives in the upstream Rust crate. The gem ships prebuilt native shims for supported platforms; a runtime loader picks the matching one for the host.
Installation
Add to your Gemfile:
gem "sas-lexer"
Or install directly:
gem install sas-lexer
Usage
require "sas_lexer"
lexer = SasLexer::Lexer.new
tokens = lexer.tokenize("data test; set input; run;")
lexer.free # release the underlying Rust buffers
tokens.each do |t|
puts "#{t[:start_line]}:#{t[:start_column]} type=#{t[:type]} text=#{t[:text].inspect}"
end
Each token is a hash with the following keys:
| key | description |
|---|---|
:index |
0-based position in the token stream |
:text |
raw source text the token spans |
:type |
token type integer (see SasLexer::Lexer::TokenType) |
:channel |
0 default, 1 hidden (whitespace), 2 comment |
:start |
byte offset of token start in the source |
:end |
byte offset of token end |
:start_line |
1-based line number of token start |
:end_line |
1-based line number of token end |
:start_column |
0-based column number of token start |
:end_column |
0-based column number of token end |
Constants for token types and channels are exposed under SasLexer::Lexer::TokenType and SasLexer::Lexer::TokenChannel. They mirror the enums in crates/sas-lexer/src/lexer/token_type.rs of the upstream Rust crate.
Native library loading
SasLexer::Lexer probes for a shared library in this order:
lib/native/<platform>/libsas_lexer_ffi.{so,dylib,dll}— the prebuilt shipped inside the published universal gem for the host's platform.lib/native/libsas_lexer_ffi.{so,dylib,dll}— the flat path produced bybundle exec rake sas_lexer:installfor local development.
If neither is found, require "sas_lexer" raises SasLexer::Error immediately. SasLexer::Lexer::LIBRARY_PATH exposes the resolved path.
Building from source
You only need to build from source when contributing a prebuilt for a new platform — the published gem already ships every committed platform's artifact.
Prerequisites:
- Rust toolchain (https://rustup.rs/)
- Ruby 3.4+
bundle install
bundle exec rake sas_lexer:install
This:
- Clones the upstream
sas-lexerrepo intovendor/sas-lexer/. - Compiles the FFI shim under
ffi-wrapper/against it. - Installs the resulting shared library at
lib/native/libsas_lexer_ffi.<ext>.
To ship the artifact in the universal gem, move it under lib/native/<platform>/ and commit it:
mkdir -p lib/native/$(ruby -e 'puts RUBY_PLATFORM')
mv lib/native/libsas_lexer_ffi.* lib/native/$(ruby -e 'puts RUBY_PLATFORM')/
git add lib/native/$(ruby -e 'puts RUBY_PLATFORM')/
To cross-build an x86_64-linux .so from any host (typically macOS), use the Docker helper:
bin/build_linux_extension
Testing
bundle exec rake spec
Why a universal gem (not platform-tagged)?
The published gem is platform: ruby and ships every committed lib/native/<platform>/ together. The loader globs at runtime to pick the matching one. Empirically, platform-tagged gems served from some private registries can be served under the generic <gem>-<version>.gem URL too, which causes Bundler to lock the version without a platform suffix and then fail to materialize at install time. A single universal gem sidesteps that failure mode at the cost of ~1–2 MB extra per platform shipped.
There is no source-build fallback at install time — a host without a committed prebuilt for its platform fails fast at FFI load.
License
sas-lexer (the gem) is licensed under the GNU Affero General Public License v3.0 or later, matching the upstream Rust crate. See LICENSE for the full text.
The upstream Rust crate is © Misha Perlov. The FFI shim and Ruby binding in this repository are © Mon Ami, Inc.