WaveSplit: A Multi-Stage Framework for Audio Enhancement and Audio Denoising – Combining Deep Learning with Psychoacoustic Principles and Adaptive Noise Processing
DOI:
https://doi.org/10.65138/ijresm.v9i2.3413Abstract
In the field of audio processing, noise interference poses a significant challenge, affecting speech intelligibility and communication quality across multiple domains. Current audio denoising methods often struggle with the delicate balance be- tween noise removal and speech preservation. This paper presents WaveSplit, a novel multi-stage framework for audio enhancement and denoising that addresses these limitations by combining deep learning techniques with psychoacoustic principles and adaptive noise processing. Building upon the CleanUNet architecture, our approach introduces several innovative components: adaptive SNR-based processing, harmonic enhancement that preserves critical speech components, vocal clarity enhancement, and perceptual processing leveraging human hearing characteristics. Evaluations demonstrate that our framework achieves superior performance compared to baseline models, with significant improvements in SNR (76.36 dB compared to 7.20-8.10 dB in baseline models), PESQ scores (1.05 improvement versus 0.77- 0.91), and STOI metrics (0.15 versus 0.09-0.13) while reducing the “robotic” artifacts common in traditional methods. This research has significant implications for applications including telecommunications, hearing assistive technologies, content production, and speech recognition systems. By addressing both objective quality metrics and perceptual factors, WaveSplit represents an advancement toward more effective, natural-sounding audio enhancement solutions for real-world environments.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Parth Gandhi, Kevin Doshi, Anand Godbole

This work is licensed under a Creative Commons Attribution 4.0 International License.
