GTCRN
Speech enhancement (denoising) using GTCRN model.
SYNOPSIS
require "gtcrn"
output = GTCRN.new.enhance_speech("path/to/audio.wav", "path/to/output.wav")
# You may omit output path
output = GTCRN.new.enhance_speech("path/to/audio.wav")
# => <Pathname:path/to/audio.enhanced.wav>
Audio file must be with 16kHz sampling rate and 16-bit per sample. Currently, file formats supported by TorchAudio Ruby (TorchCodec Ruby) are available.
INSTALLATION
This gem depends on Torch.rb, TorchAudio Ruby and TorchCodec Ruby which require precompiled libtorch and being built with it.
% wget https://download.pytorch.org/libtorch/cpu/libtorch-macos-arm64-2.10.0.zip # See https://pytorch.org/get-started/locally/ for download URI for your environment
% unzip -d path/to/libtorch libtorch-macos-arm64-2.10.0.zip
% gem install torch-rb -- --with-torch-dir=path/to/libtorch
% gem install torchaudio -- --with-torch-dir=path/to/libtorch
% gem install torchcodec -- --with-torch-dir=path/to/libtorch
% gem install gtcrn
Or,
% bundle config set --local build.torch-rb --with-torch-dir=path/to/libtorch
% bundle config set --local build.torchaudio --with-torch-dir=path/to/libtorch
% bundle config set --local build.torchcodec --with-torch-dir=path/to/libtorch
% bundle install
These instructions might be outdated. Refer to each library's instruction if you have trouble.
CLI
This gem ships with gtcrn command.
% gtcrn path/to/audio.wav --output=path/to/output.wav
Enhanced file written to
path/to/output.wav
You may omit output path
% gtcrn path/to/audio.wav
Enhanced file written to
path/to/audio.enhanced.wav
ENHANCE AUDIO DATA
You can also enhance audio data in memory:
waveform, sample_rate = TorchAudio.load("path/to/audio.wav")
enhanced = GTCRN.new.enhance_speech_waveform(waveform)
TorchAudio.save("path/to/output.wav", enhanced.squeeze, sample_rate)
GTCRN#enhance_speech_waveform enhances each channel separately if you pass multi-channel audio.
LICENSE
MIT license. See LICENSE file.
GTCRN ONNX model under vendor/gtcrn directory is distributed under MIT license by Rong Xiaobin. See vendor/gtcrn/LICENSE file.