Procedural Sound Design in Video Games

 

This blog update will be about Procedural Sound Design, defining what it is, how it functions, what the use cases are, limitations and why procedural sound design is used in video games.
What is Procedural Sound Design in terms of video games? “Procedural Sound Design is about sound design as a system, an algorithm, or a procedure that re-arranges, combines or manipulates sound assets”(Stevens and Raybould, 2016) It is the concept of audio becoming a dynamic and interactive experience for the player, this is typically achieved by parameterisation within a system such as a game engine or MAX MSP. The main benefits are to achieve efficiency, detailed control, variation and flexibility in the Sound Design.

Whilst doing the initial research for this blog post, it became apparent that the term procedural sound design is not comprehensively definable. Misconceptions about

Procedural audio, Procedural synthesis and Procedural music are all prevalent. It could be said there is a matter of opinion to either define the exact differences or not to concern yourself with them as long as the resulting audio works as intended, however you wish to define the process. “Andy Farnell, who coined or at least popularised the term ‘Procedural Audio’ sees it as any kind of system where the sound produced is the result of a process…” “... So under that definition, as soon as you set up any kind of system of playback you could see it as being procedural audio.” (Asbjoern Andersen. 2016.)

Why use a Procedural approach to sound?
These sound systems are essentially a list of instructions that recall audio files/create synthesis in specific ways and this means it is an adaptable frame work that can be repurposed for mechanically similar systems.

See fig 1 for an example of a car system created by Andy Farnell

Fig 1 at 9.30

 https://www.youtube.com/watch?v=-Ucv7EXwnCM&list=PLLHtPBwbWUW5EG-4ajfz5BQBOIm31ClhC&index=5

Andy goes on to say that this system can be reused for another type of car by using this existing framework

…What we’re doing now is moving from a state based interaction model to a continuous interaction model…” “…In the tests that I’ve done with players, its the interactivity of the object, the way it responds to your input which defines realism, realism is not a sonic quality, realism is a behavioral quality.” (Matthew Cheshire. 2013, part 4 at 11.05)
Flexibility in terms of future proofing your sounds as they relate to other in game assets that maybe changed by another member of a development team such as a character animation is a possible consideration. My point is that it is easier to tweak a system to get a slight timing adjust on a sound than it is to go back to a DAW and rework the asset from this point. “Procedural Audio is a philosophy about sound being a process and not data. In its broadest sense, if I were to say it is the philosophy of sound design in a dynamic space, and the method is as irrelevant to Procedural Audio as whether you use oils or water colours is to painting.” (Andy Farnell, 2012) Andy Farnell states that whichever method the sound designer gets the desired result using is the correct one, and that procedural audio is just another tool available along side traditional methods of sound design

Limitations have always been a factor when creating sound for video games. When computer games were first being created the specific sound chip for the platform being developed on was the limitation. Then at the start of the CD era of gaming with the Playstation™ it was having sounds loading from the disk responsively enough. Today the main limitations will be platform specific hardware restrictions in terms of how much RAM and processing power is needed for the audio system to run.

footsteps-cue-cgAbove is an example of a footstep Sound Cue for Unreal Engine 4.

Inherently having lots of different variations of sounds will add to the memory used in the audio departments memory budget, and this is where a lot of the benefits can be found for a procedural system. In Unreal Engine 4 the ability to control the sound cues adds the ability to control single shot assets as you can alleviate the need for larger looping sections of audio for example. For this footstep example the Heel and the Toe sections of the foot steps have been separated and recombined randomly by the system, this technique along with adding modulators for pitch and volume create a larger variation of sounds than at all possible with conventions means.
“In general, the fact that with procedural audio not all voices are created equal – and that some patches will use more CPU cycles than others – is usually not a very good selling point against the predictability of a fixed voice architecture.” (Nicolas Fournel,2012)

 

Bibliography

Stevens, R. and Raybould, D (2016) Game Audio Implementation. Edited by `Dave Raybould. First edn. 6000 Broken Sound Parkway NW,Suite 300 Boca Raton, FL: CRC Press/Taylor and Francis.

designingsound.org Varun Nair. 2012. Procedural Audio: Interview with Andy Farnell. [ONLINE] Available at: http://designingsound.org/2012/01/procedural-audio-interview-with-andy-farnell/. [Accessed 13 February 2017].

designingsound.org Varun Nair. 2012. Procedural Audio: An Interview with Nicolas Fournel

[ONLINE] Available at: http://designingsound.org/2012/06/procedural-audio-an-interview-with-nicolas-fournel/ . [Accessed 13 February 2017].

ASoundEffect.com Asbjoern Andersen. 2016. WHY PROCEDURAL GAME SOUND DESIGN IS SO USEFUL – DEMONSTRATED IN THE UNREAL ENGINE. [ONLINE] Available at: https://www.asoundeffect.com/procedural-game-sound-design/ . [Accessed 13 February 2017].

Matthew Cheshire. (2013). Andy Farnell designing sound procedural / computational audio lecture part 1-5. [Online Video]. 7 March 2013. Available from: part 1 https://www.youtube.com/playlist?list=PLLHtPBwbWUW5EG-4ajfz5BQBOIm31ClhC , part 4 https://www.youtube.com/watch?v=kwK7OSkg4Gs&list=PLLHtPBwbWUW5EG-4ajfz5BQBOIm31ClhC&index=4 .part 5 9.30 https://www.youtube.com/watch?v=-Ucv7EXwnCM&list=PLLHtPBwbWUW5EG-4ajfz5BQBOIm31ClhC&index=5 [Accessed: 13 February 2017].

 

Overwatch: Play by Sound

Overwatch is my personal favourite game right now, so I just had to cover it in some aspect or another. A fantastic talk at the Game Developers Conference 2016 called “Overwatch – The Elusive Goal: Play by Sound ” by Scott Lawlor and Thomas Neumann,Is what I will use mostly use to explain the technical side of implementation in this article.

(along with my 400+ play time of the game)

Scott Lawlor and Thomas Neumann talk about how at the early stages of design they where given the task of “being able to play with sound alone” by the Game Director Jeff Kaplan, and how much the sound design should give as much relevant information to the player as possible.

They outline the main goals for the audio team into 5 categories or “pillars” of importance;

Pillars:

      • A Clear Mix

      • Pinpoint Accuracy

      • Gameplay Information

      • Informative Hero Voice Over

      • Pavlovian response

Each of these topics deserves its own in depth discussion, but i would like to focus mainly on the mixing system and the Gameplay information as these 2 link to each other quite well

Overwatch is a First Person Shooter based around two teams of six players.

All the game modes are fairly similar from a sound point of view, so this shouldn’t change anything in the overall discussion. The most intense moments of action would mean there are 12+ sounds happening all at once in a very small area of combat. This without the appropriate mixing would sound overwhelming and cause the pillar for “A Clear Mix” to fail at these moments, which are very common whilst playing the game.

The mixing process for most games is all done either in engine or through a middleware piece of software such as Audio Kinetics Wwise™. Custom software can also be developed to fit a specific need for any area of a game, and the audio team developed a few good examples to explore further.

overwatch-clear-mix-slide

The first is a simple program that relates a specific piece of audio to an assigned “importance” value ranging from 0 – 120, 120 being most important. Pictured above is an example of what maybe considered when assigning these values in real time. This data is further organised by the software into 4 different “buckets” of importance depending on their number value:
High/Medium/Low/Cull. The example below is which of these 4 buckets is the sound currently,
The character and the importance number. Then how many sounds are put into these buckets.

This slideshow requires JavaScript.

The reason the developers decided to group the audio into these buckets is because without a clear winner to what is most important, the mix would be muddied by too many important things happening at one time, so they limit this by having only 1 high priority sound at any one time, and the rest as follows in the other buckets.

This bucket value is what data is actually sent to Wwise, to then adjust volume and filtering parameters accordingly. 

A Real Time Parameter Control is what is used in Wwise to change the values of makeup gain on the sound depending on which of the 4 buckets the sound has been placed in.

each category of sound in the game will have its own specific values for how much or little gain will be applied to the sound in game.

An Ultimate ability for example will be something that when activated will have the highest importance. this is true for both friendly and enemy Ultimate ability usage.

A clever way of differentiating whether the current ability being used is friendly or not, is to have two different voice lines that players hear, one team will hear one distinct line and the other team will hear the other distinct voice line. An example

When the character Lucio uses his Ult (Ultimate ability);
allies hear “OH, Lets break it down!”

enemy will hear “drop the beat!” 

allied lines will still be quieter than enemy lines for the reasons above but they are both very high on importance.
There are 23 different heroes/playable characters in the game and there can be the same character used on both teams at the same time so this simple difference will help the player understand what is going on in the fight with audio cues alone.

Of these heroes, 6 of them  speak a different language to English from all across the globe, and Bastion the Omnic (robots in the Overwatch Universe) who just talk in a series of beeps and boops (think Wall-E for an example)
they use this idea like above and say their voice line in e.g. French from the enemy perspective and English for the ally players.

” Pavlovian conditioning: A method to cause a reflex response or behaviour by training with repetitive action.”

This example used supports the pillar for a Pavlovian response in the player as these voice lines are consistent every time you hear them so they are quickly learned by the player, which is very important as Ult usage is key to winning games in Overwatch, hence the need to clearly communicate every time.

The idea of having repetitive sounds in gaming these days is usually thought of as a bad idea, as these sounds may become annoying or seen as lazy development. The developers felt it was more important that the players could instantly know what something is just by listening for a fraction of a second than confuse players.
OW heroes.jpg

But repetition isn’t the only way to teach players the sounds in game.
Each hero is very different in all aspects be it, nationality, age, race, size, weapon type and materials worn on their feet. This diversity in characters attributes helps the player discern who it is walking up behind them, ready to strike from an audio perspective. This is very important information for winning a fight as you maybe  dying or getting the upper hand depending on audio alone. This example holds true for a lot of online First Person Shooter games, but here, you know which specific character will be walking round that corner and whether it is even a winnable fight.

This gameplay mechanic can be counter played by crouch walking to mute the movement sounds of your player at the expense of going a lot slower than walking speed. Again a common gameplay mechanic in other titles but worth noting.
The ability to know who is crucial, but what about where they are in the map?

Instead of using the standard occlusion  settings in Wwise, the team customised a ray tracing technique from another part of the AI path finding for the hero Pharah (who flies around the maps). this means the rays cast for the audio can go around corners from the source to the listener and measures the distance to determine volume instead of going through walls in a straight line. Filtering the audio with a high pass filter when behind walls and large obstacles as needed.

The team also developed a custom quad delay for use in giving realistic tails to the sounds heard in game. these 4 delay channels are assigned to the 4 surround sound speakers if the user has the setup. 4 rays are cast every frame in these 4 45 degree angles, the distance measured in in game meters, directly correlates to 1 millisecond of delay time for that specific channel of audio. This technique is usefully for understanding the space you are in whilst playing and helping with the realistic and immersive feel of the game world too. 

 overwatch-quad-delay

Overwatch also supports Dolby Atmos speaker setups which is 9.1.2
this support will enable the player to hear directly above and below them in addition to a 9.1 speaker array. This technology at the time of writing is relatively new and unexplored to its fullest in terms of gaming. I’m sure it would be a very immersive experience, they do support Dolby Atmos for stereo headphones also which is just fine. It should be said that this is the best option for listening to the game with standard equipment for that better stereo field differentiation.

To summarise

points I haven’t focused on are the Music, other more standard User Interface sounds and hero dialogue.These help communicate a lot of additional information about objectives, Ultimate ability readieness and other helpful features. The dialogue system also automatically and random chooses some context sensitive hero to hero story/lore dialogue. This brings a sense of fun and world building/immersion in the universe that is great to hear at the start of rounds (when you are basically just waiting for 1 minute for the action to start). It helps fill these gaps in the action nicely. thats all for now thank you for reading. Calum Grant