How to use FFT?

Jan 24, 2011 at 3:57 AM


Im new to using NAudio and currently its looking great! Im wondering if any of you can help me with how to use the FFT functionality of NAudio? Im having some difficulties in trying to understand it based on checking the SampleAggregator and AudioGraph cs files. One of the reasons for this is because the Project file NAudioWpfDemo doesnt work for me for some reason when I try to run it (says theres an error with the MainWindow.xaml file). So I was wondering if you can help me on how to use it?

Basically what I mean to do is to playback a .wav file and perform FFT on it while it is playing.


Jan 24, 2011 at 10:55 AM

have a look at my recent article on Coding4Fun, which goes into a bit more depth on the use of FFT:



Jan 24, 2011 at 12:09 PM


Great! This is a very nicely written article! Im sure this will help me very much.

Thanks very much!

Jan 25, 2011 at 5:39 AM


I tried to make my own Pitch Detector app using information from the article and I am having a problem. Whenever I try to detect pitch it always returns 0 as the frequency. Can you take a look if I am doing something wrong?

This is the method when you decide to start the pitch detector:

private void btnPitch_Click(object sender, EventArgs e)
    String filename = "";
    OpenFileDialog openFileDialog = new OpenFileDialog();
    openFileDialog.Filter = "All Supported Files (*.wav, *.mp3)|*.wav;*.mp3|All Files (*.*)|*.*";
    openFileDialog.FilterIndex = 1;
    if (openFileDialog.ShowDialog() == DialogResult.OK)
       filename = openFileDialog.FileName;

     using (WaveFileReader reader = new WaveFileReader(filename))
       IWaveProvider stream32 = new Wave16ToFloatProvider(reader);
       IWaveProvider streamEffect = new FFTDetector(stream32);

       byte[] buffer = new byte[8192];
       int bytesRead;
          bytesRead = streamEffect.Read(buffer, 0, buffer.Length);
       } while (bytesRead != 0);

This is the FFTDetector class:

public FFTDetector(IWaveProvider source)
   this.source = source;
   this.sampleRate = source.WaveFormat.SampleRate;

public int Read(byte[] buffer, int offset, int count)
   if (waveBuffer == null || waveBuffer.MaxSize < count)
     waveBuffer = new WaveBuffer(count);

   int bytesRead = source.Read(waveBuffer, 0, count);

   if (bytesRead > 0) bytesRead = count;

   int frames = bytesRead / sizeof(float);
   float pitch = DetectPitch(waveBuffer.FloatBuffer, frames);

   Console.WriteLine("Freq: " + pitch);
   return frames * 4;

Finally, here is the code for DetectPitch method (which I got from the article):

public float DetectPitch(float[] buffer, int inFrames)
   Func<int, int, float> window = HammingWindow;
   if (prevBuffer == null)
      prevBuffer = new float[inFrames];

   // double frames since we are combining present and previous buffers
   int frames = inFrames * 2;
   if (fftBuffer == null)
     fftBuffer = new float[frames * 2]; // times 2 because it is complex input

   for (int n = 0; n < frames; n++)
     if (n < inFrames)
        fftBuffer[n * 2] = prevBuffer[n] * window(n, frames);
        fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer
        fftBuffer[n * 2] = buffer[n - inFrames] * window(n, frames);
        fftBuffer[n * 2 + 1] = 0; // need to clear out as fft modifies buffer

   float binSize = sampleRate / frames;
   int minBin = (int)(85 / binSize);
   int maxBin = (int)(300 / binSize);

   float maxIntensity = 0f;
   int maxBinIndex = 0;

    for (int bin = minBin; bin <= maxBin; bin++)
      float real = fftBuffer[bin * 2];
      float imaginary = fftBuffer[bin * 2 + 1];
      float intensity = real * real + imaginary * imaginary;
      if (intensity > maxIntensity)
         maxIntensity = intensity;
         maxBinIndex = bin;

    return binSize * maxBinIndex;

private float HammingWindow(int n, int N) 
  return 0.54f - 0.46f * (float)Math.Cos((2 * Math.PI * n) / (N - 1));
Thank you!

Feb 7, 2011 at 8:46 AM


I managed to make it work. However, I am getting inaccurate pitch detection. For exampe, the note should be C4 but it outputs A#3 (generally, a few notes below the supposed note).

I tried to check whether it was not broken (i.e. just throwing random notes as a result ... since i know that it returns incorrect results) by making a progression starting from E2 until E6. The result was that on the lower frequency range (around 80Hz - 150Hz), it returns inaccurate and no pattern whatsoever. However, upon reaching a certain note (C3), it starts to progress properly with the next note C#3 then D3 then D#3 etc. But upon further inspection, its still not accurate based on what is supposed to be the corresponding note.

What might be the problem? I am using block size of 8192 and minBin of 80/binSize and maxBin of 1300/binSize. My audio is at 44.1khz, 16-bit, mono.

Hope you can help me! Thanks!

Feb 7, 2011 at 1:23 PM

hi bloodfire, I would recommend you asking this question on a DSP forum, as they will better be able to explain the maths to you


Mar 4, 2011 at 3:42 PM
Edited Mar 4, 2011 at 3:43 PM

I've finally got the FFT figured out, but I used the publicly available FFTPACK at  The library is written in fortran but you can create a DLL that can be used in whatever language you want, and will run extremely fast.  I used rfftf.f because I'm only dealing with real values. Instructions on creating a dll from a fortran library can be found here: Once you have your dll working, to use it you just do the following. At the top of your code, insert:

using System.Runtime.InteropServices;
Then in your code, insert:
        [DllImport("FFT.dll")] public static extern Int32 rffti(ref Int32 size, float[] workArray);
        [DllImport("FFT.dll")] public static extern Int32 rfftf(ref Int32 size, float[] mainArray, float[] workArray);
Where FFT.dll is your fortran compiled library, and rffti and rfftf are the functions I exported in the dll. Below is the code for the actual usage:
            int sampleRate = 2048;
            float frequency = 500;
            float[] data = new float[sampleRate];
            int j = 0;

            for (float i = 0; i < 1; i += 1f / sampleRate)
                data[j] = (float)(Math.Sin(2 * Math.PI * frequency * i));

            float[] workArray = new float[(sampleRate * 2) + 15];

            rffti(ref sampleRate, workArray);
            rfftf(ref sampleRate, data, workArray);
To get your actual frequency bin values back, you'll need to take the distance of every point as your final value.  For example, to write them to a file:
                StreamWriter striter = new StreamWriter(@"C:\fft.txt");
                for (int i = 1; i < data.Length - 1; i += 2)
                    striter.WriteLine(Math.Sqrt(data[i] * data[i] + data[i + 1] * data[i + 1]));
You'll notice that I started at 1 rather than 0. The zero spot holds the average power of your FFT. Hope this is useful to someone other than myself. Also, you'll have to map the output to your frequency bins to 1 over your sampleRate, if I remember correctly.
Mar 7, 2011 at 6:24 AM

Hi and thanks for this!

I finally managed to get FFT working using Mark's article. But still thanks for your input!

Feb 16, 2013 at 6:17 PM
Hi Mark,

In your article, this code:
float sample = ((oldIndex < 0) ? prevBuffer[frames + 
            corr += (sample * buffer[i]);
it shows errors,
"Cannot implicitly convert type 'float' to 'int'. An explicit conversion exists (Are you missing a cast?)"
"Syntax error, ']' expected"

Any idea?
Feb 18, 2013 at 7:35 AM
well all the code is in the .NET voice recorder application, and that definitely compiles. I would suspect that line has been truncated somehow. I'd have a look at the source code itself.
Sep 6, 2013 at 11:38 PM
Edited Sep 6, 2013 at 11:39 PM

I am performing speech recognition using Windows API and I 'd like to perform speaker detection/recognition to know who is speaking.

I dont need voice print or advanced feature. Just knowing there is 3 users (A, B, and C) and knowing who's speaking.

I retrieved the PitchTracker from the article's project and make it works.
MidiCents: 20 MidiNote: 46 Pitch: 115,1978 RecordIndex: 17
MidiCents: -41 MidiNote: 45 Pitch: 112,6709 RecordIndex: 18
MidiCents: -4 MidiNote: 45 Pitch: 110,2652 RecordIndex: 19
MidiCents: 0 MidiNote: 0 Pitch: 0 RecordIndex: 20
MidiCents: 0 MidiNote: 0 Pitch: 0 RecordIndex: 21
MidiCents: -38 MidiNote: 45 Pitch: 112,4994 RecordIndex: 22
MidiCents: 27 MidiNote: 46 Pitch: 114,678 RecordIndex: 23
MidiCents: -8 MidiNote: 46 Pitch: 117,14 RecordIndex: 24
MidiCents: -7 MidiNote: 47 Pitch: 123,9888 RecordIndex: 25
MidiCents: -36 MidiNote: 47 Pitch: 126,1081 RecordIndex: 26
MidiCents: -39 MidiNote: 47 Pitch: 126,3387 RecordIndex: 27
MidiCents: 11 MidiNote: 47 Pitch: 122,6507 RecordIndex: 28
MidiCents: 0 MidiNote: 46 Pitch: 116,5315 RecordIndex: 29
MidiCents: -27 MidiNote: 45 Pitch: 111,7825 RecordIndex: 30
MidiCents: 8 MidiNote: 57 Pitch: 218,8634 RecordIndex: 31
And know I don't know how to compare this results of user A with an other sample from user B ?
May be I'm completly on the wrong way ?

Sep 9, 2013 at 12:37 PM
Knowing who is speaking is a really hard algorithm to write. It cannot be done on the basis of pitch alone.