SongGen: Controllable Song Generation (MEng Thesis)
Dec 10, 2022
Demonstration of SongGen creating a complete song with controllable parameters
SongGen: Controllable Song Generation
SongGen is a system for generating complete songs with controllable parameters, developed as part of my MEng thesis at MIT.
Technical Details
The system uses a hierarchical approach to music generation:
- Lyric Generation: Fine-tuned GPT-3 model conditioned on topic, mood, and style
- Melody Composition: Transformer-based model that generates melodies aligned with lyrical content
- Arrangement: Neural network that expands the melody into a full arrangement with multiple instruments
- Production: DSP pipeline for mixing and mastering the generated tracks
The key innovation is the control mechanism that allows users to specify high-level attributes (genre, mood, tempo, instrumentation) and have them consistently applied throughout the generation process.
Evaluation
We conducted a user study with 50 participants, including professional musicians and producers. The results showed that:
- 72% of participants could not reliably distinguish SongGen outputs from human compositions in a blind test
- The controllability features were rated 4.2/5 for effectiveness
- Professional producers rated the production quality at 3.8/5