Detect silence in a wave?

Dec 30, 2009 at 6:51 PM


I'm trying to use NAudio to capture streaming voice dialog via Skype, stuff it in a WaveStream, and process it. So far, it seems to be working really nicely, but I'd like to do some processing before the conversation ends, like after the caller stops speaking for approximately 1-2 seconds, but I'm not sure how to detect silence with a WaveStream. Anyone have any advice to offer?

Dec 30, 2009 at 10:57 PM

One idea: Take a look at the MeteringStream in the demo app. It periodically raises an event with the maximum volume over a period of time. If the volume exceeds a threshold, you could assume speech, if not then you can assume silence. It only works on floating point samples though, so you will need to convert the stream beforehand.

Dec 31, 2009 at 6:29 AM

I tried implementing a solution like that, but I never got it to work properly. The volumeMeter control I placed on my form remained blank. I replaced it with a text field that printed the float values returned via the volume notifications, but it just printed out "NaN". Turns out the values being determined via the method used in the MeteringStream are garbage (like 9.18383E-41). I think this solution might not work so well with my streaming implementation.

Dec 31, 2009 at 12:00 PM

Sounds like you may not be converting the samples to floating point correctly. Can you post your code?


Dec 31, 2009 at 7:12 PM

My code's on my home computer and I'm at work at the moment, but I had a duh moment this morning after saving the recorded wave and then feeding it back into the MeteringStream, I realized that Skype outputs 16 bit audio, and the MeteringStream is expecting 32 bit. Any advice on how to extract volume information from a 16 bit audio stream? I don't suppose it would be as simple as pulling out two bytes from the buffer into a 4 byte array and padding it with leading or trailing zeros before converting it to a float?

Jan 1, 2010 at 7:29 PM

I tinkered around with this a bit, and padding a 4 byte array with zeros didn't help at all. The best way I've found so far to get fairly decent values out of a 16 bit audio stream was to use the BitConverter.ToInt16 method, cast it as a float and divide by 1000 to get the values to come out similar to the 32 bit stream. This is how my ProcessData method looks. I'd be happy to hear if anyone has a cleaner solution.



        private void ProcessData(byte[] buffer, int offset, int count)
            int index = 0;
            while (index < count)
                for (int channel = 0; channel < maxSamples.Length; channel++)
                    float sampleValue = (float)(Math.Abs(BitConverter.ToInt16(buffer, offset + index)) / 1000.0);
                    maxSamples[channel] = (float.IsNaN(sampleValue)) ? maxSamples[channel] : Math.Max(maxSamples[channel], sampleValue);
                    index += 2;
                if(sampleCount >= SamplesPerNotification)
                    sampleCount = 0;
                    Array.Clear(maxSamples, 0, maxSamples.Length);



Jan 4, 2010 at 11:21 AM

use the WaveChannel32 stream to convert from 16 bit to 32 bit floating point (1.0 represents full volume in an IEEE floating point stream)