GTCRN

Speech enhancement (denoising) using GTCRN model.

SYNOPSIS

require "gtcrn"

output = GTCRN.new.enhance_speech("path/to/audio.wav", "path/to/output.wav")

# You may omit output path
output = GTCRN.new.enhance_speech("path/to/audio.wav")
# => <Pathname:path/to/audio.enhanced.wav>

Audio file must be with 16kHz sampling rate and 16-bit per sample. Currently, file formats supported by TorchAudio Ruby (TorchCodec Ruby) are available.

INSTALLATION

This gem depends on Torch.rb, TorchAudio Ruby and TorchCodec Ruby which require precompiled libtorch and being built with it.

% wget  https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.10.0.zip # See https://pytorch.org/get-started/locally/ for download URI for your environment
% unzip -d path/to/libtorch libtorch-macos-arm64-2.10.0.zip
% gem install torch-rb -- --with-torch-dir=path/to/libtorch
% gem install torchaudio -- --with-torch-dir=path/to/libtorch
% gem install torchcodec -- --with-torch-dir=path/to/libtorch
% gem install gtcrn

Or,

% bundle config set --local build.torch-rb --with-torch-dir=path/to/libtorch
% bundle config set --local build.torchaudio --with-torch-dir=path/to/libtorch
% bundle config set --local build.torchcodec --with-torch-dir=path/to/libtorch
% bundle install

These instructions might be outdated. Refer to each library's instruction if you have trouble.

CLI

This gem ships with gtcrn command.

% gtcrn path/to/audio.wav --output=path/to/output.wav
Enhanced file written to
path/to/output.wav

You may omit output path

% gtcrn path/to/audio.wav
Enhanced file written to
path/to/audio.enhanced.wav

ENHANCE AUDIO DATA

You can also enhance audio data in memory:

waveform, sample_rate = TorchAudio.load("path/to/audio.wav")
enhanced = GTCRN.new.enhance_speech_waveform(waveform)
TorchAudio.save("path/to/output.wav", enhanced.squeeze, sample_rate)

GTCRN#enhance_speech_waveform enhances each channel separately if you pass multi-channel audio.

LICENSE

MIT license. See LICENSE file.

GTCRN ONNX model under vendor/gtcrn directory is distributed under MIT license by Rong Xiaobin. See vendor/gtcrn/LICENSE file.