uea-stemmer
Ruby implementation of the UEA-Lite stemmer for conservative stemming in search and indexing workloads. The gem has no runtime dependencies.
UEA-Lite uses a rule set to normalize suffixes while avoiding aggressive stemming.
Behavior Notes
The stemmer operates on a single token at a time and returns a stemmed token.
Notable behavior of this implementation:
-
possessive apostrophes are removed
-
contractions are expanded by default (for example,
don'tbecomesdo not) -
tokens beginning with uppercase letters are preserved, and pluralized acronyms ending in a lowercase
sare singularized -
pure numbers, and tokens containing hyphens/underscores, are passed through unchanged
This is a port to Ruby from the Java port of the original Perl script by Marie-Claire Jenkins and Dr. Dan J. Smith at the University of East Anglia.
Installation
Requires Ruby 3.1 or newer.
Install the gem:
gem install uea-stemmer
Install from source:
git clone https://github.com/ealdent/uea-stemmer.git
cd uea-stemmer
gem build uea-stemmer.gemspec
gem install ./uea-stemmer-*.gem
Example Usage
Basic usage:
require "uea-stemmer"
stemmer = UEAStemmer.new
stemmer.stem("helpers") # => "helper"
stemmer.stem("dying") # => "die"
stemmer.stem("scarred") # => "scar"
You can extract the matching rule with stem_with_rule:
result = stemmer.stem_with_rule("invited")
result.word # => "invite"
result.rule_num # => "22.3"
result.rule # => #<UEAStemmer::Rule ...>
Disable contraction expansion:
UEAStemmer.new(nil, nil, skip_contractions: true).stem("don't")
# => "don't"
Use the singleton instance:
DefaultUEAStemmer.instance.stem("running") # => "run"
Development
This project does not require Bundler or Rake for normal development. Run the tests directly:
ruby -Itest test/uea_stemmer_test.rb
Build the gem package:
gem build uea-stemmer.gemspec
GitHub Actions runs the test suite and gem build on supported Ruby versions.
Contributing
-
Fork the project.
-
Make your feature addition or bug fix.
-
Add or update tests.
-
Run ruby -Itest test/uea_stemmer_test.rb.
-
Run gem build uea-stemmer.gemspec.
-
Send a pull request.
Relevant Web Pages
Copyright
Copyright © 2005 by the University of East Anglia and authored by Marie-Claire Jenkins and Dr. Dan J Smith. This port to Ruby was done by Jason Adams using the port to Java by Richard Churchill.
This project is distributed under the Apache 2.0 License. See LICENSE for details.