NilsimsaLite
Ruby wrapper for nilsimsa-lite.
In its most simplistic explanation, Nilsimsa produces the same hash if the data it represents is "almost" identical. Compare this to cryptographic hash functions (such as MD5) where hashes are identical if the data it represents are identical.
This makes it useful for things like spam detection, etc.
Installation
Install the gem and add to the application's Gemfile by executing:
bundle add nilsimsa_lite
If bundler is not being used to manage dependencies, install the gem by executing:
gem install nilsimsa_lite
Usage
require 'nilsimsa_lite'
d1 = NilsimsaLite.digest("Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.")
#=> "5D9E4CB0061ECD485062E26FA8273991A7A9C9C36A57E33C2E18AA13F7E4F0EE"
d2 = NilsimsaLite.digest("Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.".gsub('dolor', 'delectatio'))
#=> "55DC5CB10716C8587022E275BC203991A781C9C36A56672C2E18AA03D3F4A467"
NilsimsaLite.compare(d1, d2)
=> 92
According to the original implementation, "any nilsimsa over 24 (which is 3 sigma) indicates that the two messages are probably not independently generated."
Contributing
Bug reports and pull requests are welcome on Codeberg at https://codeberg.org/sgilperez/nilsimsa_lite. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the code of conduct.
Code of Conduct
Everyone interacting in the NilsimsaLite project's codebases, issue trackers, chat rooms and mailing lists is expected to follow the code of conduct.