header image

Audio loading

We recently had to tackle a particularly knotty problem with how Unity was loading in Mini Metro’s audio. For what seemed like a relatively small problem it turned out to take a good chunk of time to (partially!) solve, so I thought it would make a good illustration of why Mini Metro’s taken so long to finish!

The problem

The audio system in Mini Metro is mixing up to 30 samples at once. For the mixing to occur efficiently, all of the samples need to be available uncompressed in memory. Disasterpeace is using roughly 2,200 short samples for the audio; at 16-bit 44.1kHz, it weighs in at 200MiB. So not too bad!

Normally Unity resamples your audio samples into whatever sample rate the desktop system is playing at. So if your system is set to 48kHz, Unity will play a 44.1kHz sample fine. However we’re bypassing most of Unity’s audio system, so we need to provide the audio samples in all of the common sample rates. So in addition to 200MiB of 44.1kHz samples, we also need to distribute 218MiB of  48kHz samples, and 110MiB of 24kHz samples. That’s getting to be a massive download size for a small game like Mini Metro.

Unity’s solution

No problem! By default Unity compresses raw audio data using Ogg Vorbis, and gives us the option of decompressing all samples on load. This was getting our builds down to well under 300MiB, including all three sets of audio data, font textures, and of course the actual game itself. However it was taking forever to load—decompressing 200MiB of raw audio data is slow.

Enter asset bundles! Unity has a nice way of packaging assets together into asset bundles. Bundles can be loaded all at once, either synchronously or asynchronously (i.e., you can load them in the background while the game is still running). We packaged up the audio data, kicked off an synchronous load and … 10 seconds to load. From an SSD hard drive. Async? 15 seconds. Hmm. Not ideal.

We got thinking. Is Ogg Vorbis decompression really *that* slow? We did more investigation on where exactly all that time was going. Loading the asset bundle from disk was almost instant. Decompression took about 60% of the load time, the other 40% was actually creating the Unity assets post-decompression. That rankled. There was no need for that delay, as our audio mixer uses raw sample data. Unity is no doubt doing no end of resampling and data conversion that we didn’t need.

Our solution

So the obvious thing to do was implement our own audio bundling. I wrote a script to take all of the raw audio samples, compressed them all to Ogg Vorbis, and write the compressed data into one huge stream. I started out processing the WAV data myself, and using libogg and libvorbis to compress the data, but that turned out to be a rabbit hole. We had a couple of 24-bit WAVs, one stereo WAV, and even a bext extension thrown in for good measure, so eventually I gave up and just used sox. The odd thing? Even at maximum quality, our audio banks are roughly half the size of Unity’s asset bundles.

Then we had to write a custom importer. This had to be a native plugin written in C, as it involved calls to libogg and libvorbis. We already have a bunch of functions in a native plugin so that wasn’t a huge deal; the main problem was getting both libraries compiling on Windows, OS X, Linux, as well as iOS.

Finally, after ironing out all the various bugs, we got the audio loading in roughly 5 seconds. Noticeably faster than with Unity’s asset bundles, but still not ideal. I wanted to improve it further by only synchronously loading the samples the game needs immediately, and loading the others asynchronously, but that’s error-prone work and I’d already spent an entire week on this. We deemed the reduction in load time good enough for release. We’ll finish off the load time improvements for the mobile release (and port it back to desktop, don’t worry!).

An unexpected win was also getting the decompressed audio data loaded in the original 16-bit format. Unity was unnecessarily giving us 32-bit floating-point samples, and therefore taking twice the memory for no gain. This isn’t a huge problem for the desktop platforms of course (they can spare another 200MiB of RAM), but will prove critical in getting Mini Metro running on older mobile devices.