Music and AI - Part 4
Last Updated: 01 Dec 2022Just some closing thoughts on this topic, as well as some random notes and questions I’ve left for myself in the future.
In the second post on AI and Music, we discussed why it’s difficult to generate music with AI. Along the way, we presented different learning archictures, and how they were applied to the problem through different strategies. Here is a quick summary of everything we went through:
List of Architectures
- Feedforward
- Autoencoder
- Variational Autoencoder
- Restricted Boltzman Machine
- Recurrent Networks
- Convolutional Networks
- Conditioning Convolutional Networks
- Generative Adversarial Networks
- Reinforcement Learning
List of Strategies
- Single step feedforward
- Decoder feedforward
- Sampling based methods
- Iterative feedforward
- Input manipulation
- Reinforcement
- Unit Selection
List of Challenges
- Creatio Ex nihilo (creation from nothing)
- Length variability (music of different lengths)
- Content variability (is it deterministic?)
- Expressiveness (dynamics and feeling)
- Melody-harmony consistency
- Control (user ability to affect outcome)
- Style transfer (applicable to images, but can it apply to music?)
- Structure (songs formats, AABA, etc.)
- Originality (not just repeating information from training data)
- Incrementality
- Adaptability
- Explainability
Recurrent vs Convolutional Networks
Convolutional networks have not been explored thoroughly as a way to generate or recognize music. Maybe this is because music representation is so complex that it’s difficult to visualize how a CNN would apply.
That being said, convolutional networks do have two advantages:
- Faster to train and easier to parallelize
- By nature of their implementation, convolutional networks increase the volume of data.
Transfer Learning
Transfer learning for music is another area that really hasn’t been explored in depth, but which could be enormously useful, if we use the (more developed) subject of image generation as a sign of things to come.