im trying estimate fundamental frequency .wav file contains recording of speech of 1 word.
what i've tried read file audioinputstream. format pcm_signed 44100.0 hz, 16 bit, stereo, 4 bytes/frame, little-endian.
therefore have made new buffer contain 1 channel. code achieves that:
double [] audioright = new double[audiobytes.length/2]; for(int = 0, k = 0; <= audiobytes.length-1; i+=4, k+=2){ audioright[k]=audiobytes[i]; audioright[k+1]=audiobytes[i+1]; } then data moved fftbuffer, twice size, , dft applied. library used jtransform. function used called realforwardfull.
doublefft_1d fftdo= new doublefft_1d(audioleft.length); double[] fftbuffer = new double [audioleft.length*2]; (int = 0; < audioleft.length; i++){ fftbuffer[i] = audioleft[i]; } fftdo.realforwardfull(fftbuffer); this gives list of complex numbers use calculate magnitude/amplitude of each complex number in order make power spectrum.
the formula used amplitude amplitude=sqrt(imim+rere).
this provides array of amplitudes apply harmonic summation method to. harmonic summation index + 3 harmonics gives highest sum index represents fundamental frequency.
double top_sum = 0; double first_index = 0; double sum = 0; double f_0 = 0; double fr = audioinputstream.getformat().getsamplerate()/2/ampbuffer.length; (int = 50; <= ampbuffer.length/4-1; i++){ sum = ampbuffer[i]+ampbuffer[i*2]+ampbuffer[i*3]+ampbuffer[i*4]; if (top_sum < sum){ top_sum=sum; first_index = i; this index needs mapped correct frequency domnain. understanding should done saying (index / fttbuffer.length)*samplerate.
this provides estimate of fundamental frequency.
the result not "correct". have several different .wav files test on, , of them result way outside expected range. same female voices, 3 different words gives results 40, 13 , 360. 3 results expected in range 250 350, approximately.
some of issues think causing amplitude buffer values. when plotted graph doesnt show clear peaks represents harmoncis.
here's image of graph:
amplitudes http://i39.tinypic.com/29wkg7.png
i know lot of information, believe more information makes easier understand has been done.
recap: unsure of amplitude data. values make sense? plotted correctly? need data before search harmoncis , find fundamental frequency?
i have considered apply kind of windowing, because have suspicion leakage might why peaks plot have isnt harmonics each other.
any or suggestions appreciated. in advance, thank help!
edit: attempt suggested:
bytebuffer buf = bytebuffer.wrap(audiobytes); buf.order(byteorder.little_endian); double[] audio = new double[audiobytes.length/2]; for(int = 0; < audiobytes.length/2; i++) { short s = buf.getshort(); double mono = (double) s; double mono_norm = mono / 32768.0; audio[i]=mono_norm; } now 1 channel of pcm data should saved in array audio[].
some general hints:
you try estimate fundamental frquency of 1 spoken word. "word" consists of several consonants , vowels (or better phonemes). each of "vowels" have different fundamental frequency , in cases frequency change within 1 vowel (which generates "melody" of our sentences). thius means should estimate fundamental frequency / pitch of short interval of speech , make sure looking @ vowel (consonants form of noise , have cyclic components).
so first sterp should generate spectogram of word.
then may calculate short-term-ffts of interesting parts , proceed harmonic summation.
you better results short term autocorrelation function however.
other things research: pitch-detection, cepstrum
Comments
Post a Comment