GTCRN
Speech enhancement (denoising) using GTCRN model.
SYNOPSIS
require "gtcrn"
output = GTCRN.new.enhance_speech("path/to/audio.wav", "path/to/output.wav")
# You may omit output path
output = GTCRN.new.enhance_speech("path/to/audio.wav")
# => <Pathname:path/to/audio.enhanced.wav>
Audio file must be monoral WAV with 16kHz sampling rate and 16-bit per sample..
CLI
This gem ships with gtcrn command.
% gtcrn path/to/audio.wav path/to/output.wav
Enhanced file written to
path/to/output.wav
You may omit output path
% gtcrn path/to/audio.wav
Enhanced file written to
path/to/audio.enhanced.wav
ENHANCE AUDIO DATA
You can also enhance audio data in memory:
waveform, sample_rate = TorchAudio.load("path/to/audio.wav")
enhanced = GTCRN.new.enhance_speech_waveform(waveform)
TorchAudio.save("path/to/output.wav", enhanced.squeeze, sample_rate)
LICENSE
MIT license. See LICENSE file.
GTCRN ONNX model under vendor/gtcrn directory is distributed under MIT license by Rong Xiaobin. See vendor/gtcrn/LICENSE file.