Improving voice conversion fidelity with feature matching
Oct 5, 2022
Before and after comparison of RVC voice conversion quality
Improving voice conversion fidelity with feature matching
Retrieval-based Voice Conversion (RVC) is a popular technique for voice conversion, but it has limitations in preserving expressivity and handling out-of-domain inputs. This project focuses on improving RVC through several technical enhancements.
Technical Details
I implemented several improvements to the original RVC framework:
-
Enhanced Feature Extraction: Replaced the original feature extractor with a self-supervised model trained on 10,000 hours of speech, improving the representation of prosodic features
-
Dynamic Time Warping Optimization: Implemented a faster DTW algorithm that reduces the matching time by 60% while maintaining accuracy
-
Adaptive Pitch Shifting: Developed a new algorithm that preserves micro-variations in pitch, resulting in more natural-sounding conversions
The code is fully open-source and has been integrated into the main RVC repository after extensive testing.
Results
The improved version achieves:
- 35% reduction in artifacts as measured by objective metrics
- 28% faster inference time
- Significantly better handling of emotional speech and singing voice