yara-normalize

Normalizes YARA signatures into a repeatable, stable hash even when non-semantic changes are made (whitespace, comments, tag ordering, variable renaming, etc.).

To enable consistent comparisons between YARA rules, a uniform fingerprinting standard is applied:

  1. *Strings section* — each string value (the part after the ‘=’) is extracted, sorted alphabetically, and the sorted list is hashed with SHA-256. Variable names ($a, $mshtmlExec_1, …) are excluded from the hash so that renaming does not change the fingerprint.

  2. *Condition section* — variable references ($name, #name) are replaced with positional tokens ($0, $1, …) in order of first appearance, so cosmetic renames do not affect the hash. The resulting text is hashed with SHA-256.

The rule fingerprint is:

yn<VERSION>:<last-16-hex-chars-of-strings-SHA256>:<last-10-hex-chars-of-condition-SHA256>

Prior to version 0.4.0 the fingerprint used MD5 and carried the prefix yn01. Since 0.4.0 the fingerprint uses SHA-256 and carries the prefix yn02. The two identifier series are not interchangeable.

Usage

require 'yara-normalize'

sig = <<~EOS
  rule DataConversion__wide : IntegerParsing DataConversion {
    meta:
      weight = 1
    strings:
      $ = "wtoi" nocase
      $ = "wtol" nocase
      $ = "wtof" nocase
      $ = "wtodb" nocase
    condition:
      any of them
  }
EOS

yn = YaraTools::YaraRule.new(sig)

puts yn.hash
# => yn02:6783b7082bed88dc:6821e3f6a3

puts yn.name    # => DataConversion__wide
pp   yn.tags    # => ["IntegerParsing", "DataConversion"]
pp   yn.meta    # => {"weight"=>"1"}
pp   yn.strings # => ["$ = \"wtoi\" nocase", ...]

puts yn.normalize
# => rule DataConversion__wide : IntegerParsing DataConversion {
#      meta:
#        weight = 1
#      strings:
#        $ = "wtoi" nocase
#        $ = "wtol" nocase
#        $ = "wtof" nocase
#        $ = "wtodb" nocase
#      condition:
#        any of them
#    }

Splitting a multi-rule file:

rules = YaraTools::Splitter.split(File.read("ruleset.yar"))
rules.each { |r| puts "#{r.name}: #{r.hash}" }

Security notes

  • Fingerprints use SHA-256 (as of yn02). MD5-based yn01 hashes should be considered legacy and re-computed.

  • YaraRule#hash overrides Ruby’s Object#hash. Do not use YaraRule objects as Hash keys; the method returns a String fingerprint, not the Integer that Ruby’s Hash tables require.

Contributing to yara-normalize

  • Check out the latest master to make sure the feature hasn’t been implemented or the bug hasn’t been fixed yet.

  • Check out the issue tracker to make sure someone already hasn’t requested it and/or contributed it.

  • Fork the project.

  • Start a feature/bugfix branch.

  • Commit and push until you are happy with your contribution.

  • Make sure to add tests for it. This is important so I don’t break it in a future version unintentionally.

  • Please try not to mess with the Rakefile, version, or history.

Copyright © 2012 chrislee35. See LICENSE.txt for further details.