Enabling Factorized Piano Music Modeling and Generation with the MAESTRO\n Dataset

Abstract

Generating musical audio directly with neural networks is notoriously\ndifficult because it requires coherently modeling structure at many different\ntimescales. Fortunately, most music is also highly structured and can be\nrepresented as discrete note events played on musical instruments. Herein, we\nshow that by using notes as an intermediate representation, we can train a\nsuite of models capable of transcribing, composing, and synthesizing audio\nwaveforms with coherent musical structure on timescales spanning six orders of\nmagnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave. This large\nadvance in the state of the art is enabled by our release of the new MAESTRO\n(MIDI and Audio Edited for Synchronous TRacks and Organization) dataset,\ncomposed of over 172 hours of virtuosic piano performances captured with fine\nalignment (~3 ms) between note labels and audio waveforms. The networks and the\ndataset together present a promising approach toward creating new expressive\nand interpretable neural models of music.\n

References

Page 1

	Year	Citations

Page 1