AutoOsu: Audio-Aware Action Generation for Rhythm Games

AutoOsu is a CRNN-based model for generating beatmaps for osu! mania. It has an architecture similar to that of automatic music translation models, and was trained on 1,126 public beatmaps collected from the game's official website.
During inference, an audio file, tempo and offset values, and an intended star rating(difficulty) must be provided. The inference code outputs an .osz file, which can be put directly into the game. As of now, AutoOsu can only be used with songs in a time signiture of 4/4, and with consistent tempo.
We find some inconsistencies in the note density of generated charts, which seems to be most affected by the first few seconds of the input audio track. We plan to fix this issue using a different training method. We also plan to experiment with machine learning-based beat tracking methods to fully automate inference. Some gameplay examples of generated charts are presented below.
Inference Examples
Ke$ha - Die Young
Red Velvet - On a Ride
Yukopi - Kyoufuu All Back (Covered by Nayuta)
NCT 127 - Superhuman
YOASOBI - Shukufuku
Knife Party - Bonfire
Krewella - Come & Get It