Recoding Audio Data From Nest Camera
Technical details on the implementation of audio recording in FoggyCam - a Nest camera recorder.
By Den in Projects
February 15, 2021
Right before the end of the year, I wrote about my updated approach to recording Nest video streams without having to worry about the Nest Aware subscription by reading from the video stream directly with the help of a .NET-based application I wrote, called FoggyCam.
In this blog post, I have an exciting update for those that rely on the tool as an experimental way to keep their Nest recordings local - it can now record audio as well. I’ll walk through the technical implementation and share some of the code snippets on how the content is processed on the machine where the recording is done.
First things first, I already mentioned in my earlier blog post that instead of a hacky way to get the video stream by stitching together the output of the “snapshot” API, I switched to using the Nest stream directly, which, in turn, is channeled to both the apps and the website using protocol buffers.
There are several packet types that Google sends to the client application, captured in PacketType.cs
:
|
|
So far, I’ve been dealing with LONG_PLAYBACK_PACKET
and PLAYBACK_PACKET
, which is great, but only captures the video part of the stream. If you have a Nest camera in your household, you probably already know that it also captures the audio from its surroundings, so wouldn’t it be neat if I had a way to grab that as well?
The trick was in properly identifying existing packets. Audio data is still caught in standard playback packets, but on different channels, that are determined by the channel ID:
|
|
What’s the process of getting the channel IDs? Parsing the starting playback packet:
|
|
Awesome - depending on the codec (video in H264, audio in AAC), I now am able to identify the exact channels and parse playback packets accordingly.
I just needed to actually get the data, store it in a local buffer, and then merge it with the video stream. To start, I need to make sure that the camera has the audio stream enabled, and for that I can check for camera properties when playback is started, inside the StartPlayback
function:
|
|
When packets are received, they are written to a generic list of byte arrays, that will be the “dumping ground” until the content is written to a file.
To actually write the content to a file, I am using a ProcessBuffers
call first, that copies the content of the existing global buffers into a local instance before writing them to a file:
|
|
DumpToFile
is then called to process the binary content - it uses the ffmpeg
process to first create the video file, and then “mux” the audio stream into the content:
|
|
The work of this method is assumes that all packets were received in order both for video and audio, which might not always be reliable but as a quick and easy recording approach works pretty well. That’s all it took to add audio support for stream recording in FoggyCam - the data was there, it just needed to be captured.
I also haven’t found a reliable way just yet to write two streams at once - you can check out my question on Stack Overflow on this topic. I am very much open to suggestions on optimizing my current implementation and remove the need to have intermediary files.
Feedback
Have any thoughts? Let me know over email by sending a note to hi followed by the domain of this website.