This project has moved and is read-only. For the latest updates, please go here.

Talking recognition

Jun 28, 2014 at 3:19 PM
I'm using MediaFoundationReader to read from a video file.
My question is how can I find the approximate times when someone is talking in a movie? It doesn't have to be 100% accurate but a close enough analysis.

Jun 29, 2014 at 11:14 PM
hi, I'm afraid that is going to be a really difficult problem to solve. You can detect silence versus sound, but it would be very hard to tell the difference between speech and other types of noise. Maybe some kind of FFT analysis would help you profile the frequency content of speech versus other sounds.