uea-stemmer

Ruby implementation of the UEA-Lite stemmer for conservative stemming in search and indexing workloads. The gem has no runtime dependencies.

UEA-Lite uses a rule set to normalize suffixes while avoiding aggressive stemming.

Behavior Notes

The stemmer operates on a single token at a time and returns a stemmed token.

Notable behavior of this implementation:

  • possessive apostrophes are removed

  • contractions are expanded by default (for example, don't becomes do not)

  • tokens beginning with uppercase letters are preserved, and pluralized acronyms ending in a lowercase s are singularized

  • pure numbers, and tokens containing hyphens/underscores, are passed through unchanged

This is a port to Ruby from the Java port of the original Perl script by Marie-Claire Jenkins and Dr. Dan J. Smith at the University of East Anglia.

Installation

Requires Ruby 3.1 or newer.

Install the gem:

gem install uea-stemmer

Install from source:

git clone https://github.com/ealdent/uea-stemmer.git
cd uea-stemmer
gem build uea-stemmer.gemspec
gem install ./uea-stemmer-*.gem

Example Usage

Basic usage:

require "uea-stemmer"
stemmer = UEAStemmer.new

stemmer.stem("helpers")   # => "helper"
stemmer.stem("dying")     # => "die"
stemmer.stem("scarred")   # => "scar"

You can extract the matching rule with stem_with_rule:

result = stemmer.stem_with_rule("invited")
result.word      # => "invite"
result.rule_num  # => "22.3"
result.rule      # => #<UEAStemmer::Rule ...>

Disable contraction expansion:

UEAStemmer.new(nil, nil, skip_contractions: true).stem("don't")
# => "don't"

Use the singleton instance:

DefaultUEAStemmer.instance.stem("running")  # => "run"

Development

This project does not require Bundler or Rake for normal development. Run the tests directly:

ruby -Itest test/uea_stemmer_test.rb

Build the gem package:

gem build uea-stemmer.gemspec

GitHub Actions runs the test suite and gem build on supported Ruby versions.

Contributing

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add or update tests.

  • Run ruby -Itest test/uea_stemmer_test.rb.

  • Run gem build uea-stemmer.gemspec.

  • Send a pull request.

Relevant Web Pages

Copyright © 2005 by the University of East Anglia and authored by Marie-Claire Jenkins and Dr. Dan J Smith. This port to Ruby was done by Jason Adams using the port to Java by Richard Churchill.

This project is distributed under the Apache 2.0 License. See LICENSE for details.