Ajay Arora

Improving voice conversion fidelity with feature matching

Oct 5, 2022

Before and after comparison of RVC voice conversion quality

Improving voice conversion fidelity with feature matching

Retrieval-based Voice Conversion (RVC) is a popular technique for voice conversion, but it has limitations in preserving expressivity and handling out-of-domain inputs. This project focuses on improving RVC through several technical enhancements.

Technical Details

I implemented several improvements to the original RVC framework:

  1. Enhanced Feature Extraction: Replaced the original feature extractor with a self-supervised model trained on 10,000 hours of speech, improving the representation of prosodic features

  2. Dynamic Time Warping Optimization: Implemented a faster DTW algorithm that reduces the matching time by 60% while maintaining accuracy

  3. Adaptive Pitch Shifting: Developed a new algorithm that preserves micro-variations in pitch, resulting in more natural-sounding conversions

The code is fully open-source and has been integrated into the main RVC repository after extensive testing.

Results

The improved version achieves:

  • 35% reduction in artifacts as measured by objective metrics
  • 28% faster inference time
  • Significantly better handling of emotional speech and singing voice