GTCRN

Speech enhancement (denoising) using GTCRN model.

SYNOPSIS

require "gtcrn"

output = GTCRN.new.enhance_speech("path/to/audio.wav", "path/to/output.wav")

# You may omit output path
output = GTCRN.new.enhance_speech("path/to/audio.wav")
# => <Pathname:path/to/audio.enhanced.wav>

Audio file must be monoral WAV with 16kHz sampling rate and 16-bit per sample..

CLI

This gem ships with gtcrn command.

% gtcrn path/to/audio.wav path/to/output.wav
Enhanced file written to
path/to/output.wav

You may omit output path

% gtcrn path/to/audio.wav
Enhanced file written to
path/to/audio.enhanced.wav

ENHANCE AUDIO DATA

You can also enhance audio data in memory:

waveform, sample_rate = TorchAudio.load("path/to/audio.wav")
enhanced = GTCRN.new.enhance_speech_waveform(waveform)
TorchAudio.save("path/to/output.wav", enhanced.squeeze, sample_rate)

LICENSE

MIT license. See LICENSE file.

GTCRN ONNX model under vendor/gtcrn directory is distributed under MIT license by Rong Xiaobin. See vendor/gtcrn/LICENSE file.