‘AI music’ directory

See Also
Gwern
- “GPT-2 Preference Learning for Music Generation ”, Gwern 2019
- “GPT-2 Folk Music ”, Gwern & Presser 2019
Links
Miscellaneous
Bibliography

See Also

Gwern

“GPT-2 Preference Learning for Music Generation ”, Gwern 2019

GPT-2 Preference Learning for Music Generation

“GPT-2 Folk Music ”, Gwern & Presser 2019

GPT-2 Folk Music

Links

“Crossing the Uncanny Valley of Conversational Voice ”, Iribe et al 2025

Crossing the uncanny valley of conversational voice

“NotaGen: Advancing Musicality in Symbolic Music Generation With Large Language Model Training Paradigms ”, Wang et al 2025

NotaGen: Advancing Musicality in Symbolic Music Generation with Large Language Model Training Paradigms

“Continuous Autoregressive Models With Noise Augmentation Avoid Error Accumulation ”, Pasini et al 2024

Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation

“Generating Distinct AI Voice Performances By Prompt Engineering GPT-4o ”

Generating Distinct AI Voice Performances By Prompt Engineering GPT-4o :

View External Link:

https://minimaxir.com/2024/10/speech-prompt-engineering/

“GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music ”

GSoC 2024: Differentiable Logic for Interactive Systems and Generative Music

“A.L.S. Stole His Voice. AI Retrieved It. ”

A.L.S. Stole His Voice. AI Retrieved It.

“SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound ”, Liu et al 2024

SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound

“Long-Form Music Generation With Latent Diffusion ”, Evans et al 2024

Long-form music generation with latent diffusion

“An Accurate and Rapidly Calibrating Speech Neuroprosthesis ”, Card et al 2024

An accurate and rapidly calibrating speech neuroprosthesis

“OpenVoice: Versatile Instant Voice Cloning ”, Qin et al 2023

OpenVoice: Versatile Instant Voice Cloning

“A Disney Director Tried—And Failed—To Use an AI Hans Zimmer to Create a Soundtrack ”, Heikkilä 2023

A Disney director tried—and failed—to use an AI Hans Zimmer to create a soundtrack

“Whisper-AT: Noise-Robust Automatic Speech Recognizers Are Also Strong General Audio Event Taggers ”, Gong et al 2023

Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers

“High-Fidelity Audio Compression With Improved RVQGAN ”, Kumar et al 2023

High-Fidelity Audio Compression with Improved RVQGAN

“Vocos: Closing the Gap between Time-Domain and Fourier-Based Neural Vocoders for High-Quality Audio Synthesis ”, Siuzdak 2023

Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis

“Voice Conversion With Just Nearest Neighbors ”, Baas et al 2023

Voice Conversion With Just Nearest Neighbors

“FST: Improving Speech Translation by Fusing Speech and Text ”, Yin et al 2023

FST: Improving speech translation by fusing speech and text

“SoundStorm: Efficient Parallel Audio Generation ”, Borsos et al 2023

SoundStorm: Efficient Parallel Audio Generation

“ImageBind: One Embedding Space To Bind Them All ”, Girdhar et al 2023

ImageBind: One Embedding Space To Bind Them All

“TANGO: Text-To-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model ”, Ghosal et al 2023

TANGO: Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

“CLaMP: Contrastive Language-Music Pre-Training for Cross-Modal Symbolic Music Information Retrieval ”, Wu et al 2023

CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval

“Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-To-Speech With Minimal Supervision ”, Kharitonov et al 2023

Speak, Read and Prompt (SPEAR-TTS): High-Fidelity Text-to-Speech with Minimal Supervision

“Msanii: High Fidelity Music Synthesis on a Shoestring Budget ”, Maina 2023

Msanii: High Fidelity Music Synthesis on a Shoestring Budget

“Archisound: Audio Generation With Diffusion ”, Schneider 2023

Archisound: Audio Generation With Diffusion

“Rock Guitar Tablature Generation via Natural Language Processing ”, Casco-Rodriguez 2023

Rock Guitar Tablature Generation via Natural Language Processing

“VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers ”, Wang et al 2023

VALL-E: Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers

“High Fidelity Neural Audio Compression ”, Défossez et al 2022

High Fidelity Neural Audio Compression

“Hierarchical Diffusion Models for Singing Voice Neural Vocoder ”, Takahashi et al 2022

Hierarchical Diffusion Models for Singing Voice Neural Vocoder

“RealSinger: Ultra-Realistic Singing Voice Generation via Stochastic Differential Equations ”, Anonymous 2022

RealSinger: Ultra-Realistic Singing Voice Generation via Stochastic Differential Equations

“AudioLM: a Language Modeling Approach to Audio Generation ”, Borsos et al 2022

AudioLM: a Language Modeling Approach to Audio Generation

“MeloForm: Generating Melody With Musical Form Based on Expert Systems and Neural Networks ”, Lu et al 2022

MeloForm: Generating Melody with Musical Form based on Expert Systems and Neural Networks

“AI Composer Bias: Listeners like Music Less When They Think It Was Composed by an AI ”, Shank et al 2022

AI composer bias: Listeners like music less when they think it was composed by an AI

“Musika! Fast Infinite Waveform Music Generation ”, Pasini & Schlüter 2022

Musika! Fast Infinite Waveform Music Generation

“Multitrack Music Transformer: Learning Long-Term Dependencies in Music With Diverse Instruments ”, Dong et al 2022

Multitrack Music Transformer: Learning Long-Term Dependencies in Music with Diverse Instruments

“BigVGAN: A Universal Neural Vocoder With Large-Scale Training ”, Lee et al 2022

BigVGAN: A Universal Neural Vocoder with Large-Scale Training

“CLAP: Learning Audio Concepts From Natural Language Supervision ”, Elizalde et al 2022

CLAP: Learning Audio Concepts From Natural Language Supervision

“Tradformer: A Transformer Model of Traditional Music Transcriptions ”, Casini & Sturm 2022

Tradformer: A Transformer Model of Traditional Music Transcriptions

“SymphonyNet: Symphony Generation With Permutation Invariant Language Model ”, Liu et al 2022

SymphonyNet: Symphony Generation with Permutation Invariant Language Model

“It’s Raw! Audio Generation With State-Space Models ”, Goel et al 2022

It’s Raw! Audio Generation with State-Space Models

“General-Purpose, Long-Context Autoregressive Modeling With Perceiver AR ”, Hawthorne et al 2022

General-purpose, long-context autoregressive modeling with Perceiver AR

“FIGARO: Generating Symbolic Music With Fine-Grained Artistic Control ”, Rütte et al 2022

FIGARO: Generating Symbolic Music with Fine-Grained Artistic Control

“Steerable Discovery of Neural Audio Effects ”, Steinmetz & Reiss 2021

Steerable discovery of neural audio effects

“Semi-Supervised Music Tagging Transformer ”, Won et al 2021

Semi-Supervised Music Tagging Transformer

“AudioCLIP: Extending CLIP to Image, Text and Audio ”, Guzhov et al 2021

AudioCLIP: Extending CLIP to Image, Text and Audio

“MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis ”, Tae et al 2021

MLP Singer: Towards Rapid Parallel Korean Singing Voice Synthesis

“PriorGrad: Improving Conditional Denoising Diffusion Models With Data-Dependent Adaptive Prior ”, Lee et al 2021

PriorGrad: Improving Conditional Denoising Diffusion Models with Data-Dependent Adaptive Prior

“DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism ”, Liu et al 2021

DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

“Symbolic Music Generation With Diffusion Models ”, Mittal et al 2021

Symbolic Music Generation with Diffusion Models

“Interacting With GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation ”, Geerlings & Meroño-Peñuela 2020

Interacting with GPT-2 to Generate Controlled and Believable Musical Sequences in ABC Notation

“AI Song Contest: Human-AI Co-Creation in Songwriting ”, Huang et al 2020

AI Song Contest: Human-AI Co-Creation in Songwriting

“HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis ”, Kong et al 2020

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

“GiantMIDI-Piano: A Large-Scale MIDI Dataset for Classical Piano Music ”, Kong et al 2020

GiantMIDI-Piano: A large-scale MIDI dataset for classical piano music

“DeepSinger: Singing Voice Synthesis With Data Mined From the Web ”, Ren et al 2020

DeepSinger: Singing Voice Synthesis with Data Mined From the Web

“Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models ”, Papadimitriou & Jurafsky 2020

Learning Music Helps You Read: Using Transfer to Study Linguistic Structure in Language Models

“2020-04-12-Gwern-Gpt-2-117m-Midi-30588051.tar.xz ”

2020-04-12-gwern-gpt-2-117m-midi-30588051.tar.xz

“Pony Voice Event—What People Forced Ponies to Say! ”, Daily 2020

Pony Voice Event—What People Forced Ponies to Say!

“15.ai ”, Fifteen-kun & Project 2020

“Pop Music Transformer: Beat-Based Modeling and Generation of Expressive Pop Piano Compositions ”, Huang & Yang 2020

Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions

“Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric ”, Barrio 2020

Writing the Next American Hit: Using GPT-2 to Explore the Possibility of Creating Successful AI-Generated Song Lyrics Possibility of Creating Successful AI-Generated Song Lyric

“Encoding Musical Style With Transformer Autoencoders ”, Choi et al 2019

Encoding Musical Style with Transformer Autoencoders

“Parallel WaveGAN: A Fast Waveform Generation Model Based on Generative Adversarial Networks With Multi-Resolution Spectrogram ”, Yamamoto et al 2019

Parallel WaveGAN: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram

“Low-Dimensional Embodied Semantics for Music and Language ”, Raposo et al 2019

Low-dimensional Embodied Semantics for Music and Language

“MuseNet: a Deep Neural Network That Can Generate 4-Minute Musical Compositions With 10 Different Instruments, and Can Combine Styles from Country to Mozart to the Beatles ”, Payne 2019

MuseNet: a deep neural network that can generate 4-minute musical compositions with 10 different instruments, and can combine styles from country to Mozart to the Beatles

“Generative Modeling With Sparse Transformers: We’ve Developed the Sparse Transformer, a Deep Neural Network Which Sets New Records at Predicting What Comes next in a Sequence—Whether Text, Images, or Sound. It Uses an Algorithmic Improvement of the attention Mechanism to Extract Patterns from Sequences 30× Longer Than Possible Previously ”, Child & Gray 2019

Generative Modeling with Sparse Transformers: We’ve developed the Sparse Transformer, a deep neural network which sets new records at predicting what comes next in a sequence—whether text, images, or sound. It uses an algorithmic improvement of the attention mechanism to extract patterns from sequences 30× longer than possible previously

“Music Transformer: Generating Music With Long-Term Structure ”, Huang et al 2018

Music Transformer: Generating Music with Long-Term Structure

“FloWaveNet: A Generative Flow for Raw Audio ”, Kim et al 2018

FloWaveNet: A Generative Flow for Raw Audio

“Piano Genie ”, Donahue et al 2018

Piano Genie

“Music Transformer ”, Huang et al 2018

Music Transformer

“This Time With Feeling: Learning Expressive Musical Performance ”, Oore et al 2018

This Time with Feeling: Learning Expressive Musical Performance

“The Challenge of Realistic Music Generation: Modeling Raw Audio at Scale ”, Dieleman et al 2018

The challenge of realistic music generation: modeling raw audio at scale

“Samples ”

“The Sound of Pixels ”, Zhao et al 2018

The Sound of Pixels

“Efficient Neural Audio Synthesis ”, Kalchbrenner et al 2018

Efficient Neural Audio Synthesis

“Generating Structured Music through Self-Attention ”, Huang et al 2018

Generating Structured Music through Self-Attention

“Towards Deep Modeling of Music Semantics Using EEG Regularizers ”, Raposo et al 2017

Towards Deep Modeling of Music Semantics using EEG Regularizers

“Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models ”, Guimaraes et al 2017

Objective-Reinforced Generative Adversarial Networks (ORGAN) for Sequence Generation Models

“Neural Audio Synthesis of Musical Notes With WaveNet Autoencoders ”, Engel et al 2017

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders

“Tuning Recurrent Neural Networks With Reinforcement Learning ”, Jaques et al 2017

Tuning Recurrent Neural Networks with Reinforcement Learning

“SampleRNN: An Unconditional End-To-End Neural Audio Generation Model ”, Mehri et al 2016

SampleRNN: An Unconditional End-to-End Neural Audio Generation Model

“WaveNet: A Generative Model for Raw Audio ”, Oord et al 2016

WaveNet: A Generative Model for Raw Audio

“The Abc Music Standard 2.1: §3.1.1: `X:`—Reference Number ”, Walshaw 2011

The abc music standard 2.1: §3.1.1: X:—reference number

“Staring Emmy Straight in the Eye—And Doing My Best Not to Flinch ”, Hofstadter & Cope 2001

Staring Emmy Straight in the Eye—And Doing My Best Not to Flinch

“Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical Report CU-CS-495-90] ”, Mozer 1990

Connectionist Music Composition Based on Melodic, Stylistic, and Psychophysical Constraints [Technical report CU-CS-495-90]

“DarwinTunes ”

DarwinTunes

“Ai-Forever/music-Composer ”

ai-forever/music-composer

“Autoregressive Long-Context Music Generation With Perceiver AR ”

Autoregressive long-context music generation with Perceiver AR :

View External Link:

https://magenta.tensorflow.org/perceiver-ar

“Will AI Take the Pleasure Out of Music? ”

Will AI Take the Pleasure Out of Music? :

View HTML:

/doc/www/pitchfork.com/a16a34cce623b5f7d3484472a21f74b98077239d.html

“Qualia Research Institute: The Musical Album of 2024 (V1) ”

Qualia Research Institute: The Musical Album of 2024 (v1) :

View HTML:

/doc/www/qualiacomputing.com/db09ee889af2ac8f2ee381e58913359052c46a3d.html

“Stream Uncoolbob Aka DarwinTunes ”

Stream uncoolbob aka DarwinTunes

“Introducing V4 ”, AI 2025

Introducing v4

“Curious about You ”, translucentaudiosynthesis319 2025

Curious about you

“Sydney Misbehaving ”, Whiton 2025

Sydney Misbehaving

“Waifu Synthesis: Real Time Generative Anime ”

Waifu Synthesis: real time generative anime

“Composing Music With Recurrent Neural Networks ”

Composing Music With Recurrent Neural Networks :

View HTML:

/doc/www/www.danieldjohnson.com/0108f05c124cfb8b547e784dba32a6a5be813f44.html

“Old Musicians Never Die. They Just Become Holograms. ”

Old Musicians Never Die. They Just Become Holograms.

“`midi2abc`: Program to Convert MIDI Format Files to Abc Notation ”

midi2abc: program to convert MIDI format files to abc notation :

View HTML:

/doc/www/www.systutorials.com/5f146d901fb193d0015c0cc380488e464a9ef2d6.html

“‘It’s the Screams of the Damned!’ The Eerie AI World of Deepfake Music Music ”

‘It’s the screams of the damned!’ The eerie AI world of deepfake music Music

“Inside the Discord Where Thousands of Rogue Producers Are Making AI Music ”

Inside the Discord Where Thousands of Rogue Producers Are Making AI Music

Sort By Magic

Annotations sorted by machine learning into inferred 'tags'. This provides an alternative way to browse: instead of by date order, one can browse in topic order. The 'sorted' list has been automatically clustered into multiple sections & auto-labeled for easier browsing.

Beginning with the newest annotation, it uses the embedding of each annotation to attempt to create a list of nearest-neighbor annotations, creating a progression of topics. For more details, see the link.

`speech-synthesis prosthesis generative-modeling sequence-prediction neural-networks 15ai`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`denoising-models`

[see previous entry]

[see previous entry]

[see previous entry]

`ai-composition`

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

[see previous entry]

`music-generation`

[see previous entry]

[see previous entry]

[see previous entry]

Wikipedia (5)

Miscellaneous

Bibliography

https://arxiv.org/abs/2305.09636#google: “SoundStorm: Efficient Parallel Audio Generation ”, Zalán Borsos, Matt Sharifi, Damien Vincent, Eugene Kharitonov, Neil Zeghidour, Marco Tagliasacchi

link-bibliography
https://arxiv.org/abs/2305.05665#facebook: “ImageBind: One Embedding Space To Bind Them All ”, Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, Ishan Misra

link-bibliography
https://arxiv.org/abs/2304.13731: “TANGO: Text-To-Audio Generation Using Instruction-Tuned LLM and Latent Diffusion Model ”, Deepanway Ghosal, Navonil Majumder, Ambuj Mehrish, Soujanya Poria

link-bibliography
https://raw.githubusercontent.com/flavioschneider/master-thesis/main/audio_diffusion_thesis.pdf: “Archisound: Audio Generation With Diffusion ”, Flavio Schneider

link-bibliography
https://arxiv.org/abs/2301.02111#microsoft: “VALL-E: Neural Codec Language Models Are Zero-Shot Text to Speech Synthesizers ”, Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei

link-bibliography
https://arxiv.org/abs/2210.13438#facebook: “High Fidelity Neural Audio Compression ”, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

link-bibliography
https://arxiv.org/abs/2210.07508#sony: “Hierarchical Diffusion Models for Singing Voice Neural Vocoder ”, Naoya Takahashi, Mayank Kumar, Singh, Yuki Mitsufuji

link-bibliography
2022-shank.pdf: “AI Composer Bias: Listeners like Music Less When They Think It Was Composed by an AI ”, Daniel B. Shank, Courtney Stefanik, Cassidy Stuhlsatz, Kaelyn Kacirek, Amy M. Belfi

link-bibliography
https://arxiv.org/abs/2206.04658#nvidia: “BigVGAN: A Universal Neural Vocoder With Large-Scale Training ”, Sang-gil Lee, Wei Ping, Boris Ginsburg, Bryan Catanzaro, Sungroh Yoon

link-bibliography
https://arxiv.org/abs/2202.09729: “It’s Raw! Audio Generation With State-Space Models ”, Karan Goel, Albert Gu, Chris Donahue, Christopher Ré

link-bibliography
https://arxiv.org/abs/2202.07765#deepmind: “General-Purpose, Long-Context Autoregressive Modeling With Perceiver AR ”, Curtis Hawthorne, Andrew Jaegle, Cătălina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski, Sander Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, João Carreira, Jesse Engel

link-bibliography
https://arxiv.org/abs/2106.13043: “AudioCLIP: Extending CLIP to Image, Text and Audio ”, Andrey Guzhov, Federico Raue, Jörn Hees, Andreas Dengel

link-bibliography
https://fifteen.ai/: “15.ai ”, Fifteen-kun, Pony Preservation Project

link-bibliography
https://arxiv.org/abs/1910.11480#naver: “Parallel WaveGAN: A Fast Waveform Generation Model Based on Generative Adversarial Networks With Multi-Resolution Spectrogram ”, Ryuichi Yamamoto, Eunwoo Song, Jae-Min Kim

link-bibliography
https://openai.com/research/musenet: “MuseNet: a Deep Neural Network That Can Generate 4-Minute Musical Compositions With 10 Different Instruments, and Can Combine Styles from Country to Mozart to the Beatles ”, Christine Payne

link-bibliography
https://magenta.tensorflow.org/music-transformer: “Music Transformer: Generating Music With Long-Term Structure ”, Cheng-Zhi Anna Huang, Ian Simon, Monica Dinculescu

link-bibliography
https://arxiv.org/abs/1811.02155: “FloWaveNet: A Generative Flow for Raw Audio ”, Sungwon Kim, Sang-gil Lee, Jongyoon Song, Jaehyeon Kim, Sungroh Yoon

link-bibliography
2018-huang.pdf: “Generating Structured Music through Self-Attention ”, Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, Andrew Dai, Matt Hoffman, Curtis Hawthorne, Douglas Eck

link-bibliography

[Quote Of The Day]

[Site Of The Day]

[Annotation Of The Day]

[adblock public service announcement]