signal processing - Java - Questions to Estimating fundamental frequency -


im trying estimate fundamental frequency .wav file contains recording of speech of 1 word.

what i've tried read file audioinputstream. format pcm_signed 44100.0 hz, 16 bit, stereo, 4 bytes/frame, little-endian.

therefore have made new buffer contain 1 channel. code achieves that:

double [] audioright = new double[audiobytes.length/2];  for(int = 0, k = 0; <= audiobytes.length-1; i+=4, k+=2){     audioright[k]=audiobytes[i];     audioright[k+1]=audiobytes[i+1]; } 

then data moved fftbuffer, twice size, , dft applied. library used jtransform. function used called realforwardfull.

doublefft_1d fftdo= new doublefft_1d(audioleft.length); double[] fftbuffer = new double [audioleft.length*2];  (int = 0; < audioleft.length; i++){      fftbuffer[i] = audioleft[i]; } fftdo.realforwardfull(fftbuffer); 

this gives list of complex numbers use calculate magnitude/amplitude of each complex number in order make power spectrum.

the formula used amplitude amplitude=sqrt(imim+rere).

this provides array of amplitudes apply harmonic summation method to. harmonic summation index + 3 harmonics gives highest sum index represents fundamental frequency.

double top_sum = 0; double first_index = 0; double sum = 0; double f_0 = 0; double fr = audioinputstream.getformat().getsamplerate()/2/ampbuffer.length;  (int = 50; <= ampbuffer.length/4-1; i++){ sum = ampbuffer[i]+ampbuffer[i*2]+ampbuffer[i*3]+ampbuffer[i*4];      if (top_sum < sum){  top_sum=sum;  first_index = i; 

this index needs mapped correct frequency domnain. understanding should done saying (index / fttbuffer.length)*samplerate.

this provides estimate of fundamental frequency.

the result not "correct". have several different .wav files test on, , of them result way outside expected range. same female voices, 3 different words gives results 40, 13 , 360. 3 results expected in range 250 350, approximately.

some of issues think causing amplitude buffer values. when plotted graph doesnt show clear peaks represents harmoncis.

here's image of graph:

amplitudes http://i39.tinypic.com/29wkg7.png

i know lot of information, believe more information makes easier understand has been done.

recap: unsure of amplitude data. values make sense? plotted correctly? need data before search harmoncis , find fundamental frequency?

i have considered apply kind of windowing, because have suspicion leakage might why peaks plot have isnt harmonics each other.

any or suggestions appreciated. in advance, thank help!

edit: attempt suggested:

 bytebuffer buf = bytebuffer.wrap(audiobytes);          buf.order(byteorder.little_endian);          double[] audio = new double[audiobytes.length/2];              for(int = 0; < audiobytes.length/2; i++) {              short s = buf.getshort();              double mono = (double) s;              double mono_norm = mono / 32768.0;               audio[i]=mono_norm;            } 

now 1 channel of pcm data should saved in array audio[].

some general hints:

you try estimate fundamental frquency of 1 spoken word. "word" consists of several consonants , vowels (or better phonemes). each of "vowels" have different fundamental frequency , in cases frequency change within 1 vowel (which generates "melody" of our sentences). thius means should estimate fundamental frequency / pitch of short interval of speech , make sure looking @ vowel (consonants form of noise , have cyclic components).

so first sterp should generate spectogram of word.

then may calculate short-term-ffts of interesting parts , proceed harmonic summation.

you better results short term autocorrelation function however.

other things research: pitch-detection, cepstrum


Comments