Spatial Audio for VR with a Ricoh Theta S Camera and Zoom H2n Audio Recorder

I had a try adding spatial audio to a VR video. In theory this should add realism to a 360 VR video by adding audio that can be processed to play back differently depending on the direction of the viewer.

I updated the Zoom H2n to firmware version 2.00 as described here, and set it to record to uncompressed WAV at 48KHz, 16-bit.

I attached the audio recorder to my Ricoh Theta S camera. I orientated the camera so that the record button was facing toward me, and the Zoom H2n’s LCD display was facing away from me. I pressed record on the sound recorder and then the video camera. I then needed a sound and visual indicator to be able to synchronize the two together in post production, and clicking my fingers worked perfectly.

I installed the I created a new project in Adobe Premiere, and a new sequence with Audio Master set to Multichannel, and 4 adaptive channels. Next I imported the audio and video tracks, and cut them to synchronize to when I clicked my fingers together.

Exporting was slightly more involved. I exported two files, one for video and one for audio.

For the video export, I used the following settings:

  • Format: H264
  • Width: 2048 Height: 1024
  • Frame Rate: 30
  • Field Order: Progressive
  • Aspect; Square Pixels (1.0)
  • Profile: Main
  • Bitrate: CBR 40Mbps
  • Audio track disabled

For the audio export, I used the following settings:

  • Format: Waveform Audio
  • Audio codec: Uncompressed
  • Sample rate: 48000 Hz
  • Channels: 4 channel
  • Sample Size: 16 bit

I then used FFmpeg to combine the two files with the following command:

ffmpeg -i ambisonic_video.mp4 -i ambisonic_audio.wav -channel_layout 4.0 -c:v copy -c:a copy

And finally injected 360 metadata using the 360 Video Metadata app, making sure to tick both ‘My video is spherical (360)’ and ‘My video has spatial audio (ambiX ACN/SN3D format).

And finally uploaded it to YouTube. It took an extra five hours of waiting for the spatial audio track to be processed by YouTube. Both the web player and native Android and iOS apps appear to support spatial audio.

If you have your sound recorder orientated incorrectly, you can correct it using the plugins. In my case, I used the Z-axis rotation to effectively turn the recorder around.

There are a lot of fascinating optimizations and explanations of ambisonic and spatial audio processing available to read at Wikipedia:

The original in-camera audio (Ricoh Theta S records in mono) to compare can be viewed here: