Ajay Arora

SongGen: Controllable Song Generation (MEng Thesis)

Dec 10, 2022

Demonstration of SongGen creating a complete song with controllable parameters

SongGen: Controllable Song Generation

SongGen is a system for generating complete songs with controllable parameters, developed as part of my MEng thesis at MIT.

Technical Details

The system uses a hierarchical approach to music generation:

  1. Lyric Generation: Fine-tuned GPT-3 model conditioned on topic, mood, and style
  2. Melody Composition: Transformer-based model that generates melodies aligned with lyrical content
  3. Arrangement: Neural network that expands the melody into a full arrangement with multiple instruments
  4. Production: DSP pipeline for mixing and mastering the generated tracks

The key innovation is the control mechanism that allows users to specify high-level attributes (genre, mood, tempo, instrumentation) and have them consistently applied throughout the generation process.

Evaluation

We conducted a user study with 50 participants, including professional musicians and producers. The results showed that:

  • 72% of participants could not reliably distinguish SongGen outputs from human compositions in a blind test
  • The controllability features were rated 4.2/5 for effectiveness
  • Professional producers rated the production quality at 3.8/5