UNIANAL, UNISYNT - universal speech analysis and synthesis created by Jan Cernocky 7/12/96 modifications: 4/8/97 - added frame energy. 7/3/98 - writes also number of VAD parts to stdin 26/6/2000 - exit code 0 for unianal. ***************************************************************************** ============================================================================= UNIANAL - LPC speech analysis ============================================================================= unianal is a program for universal speech analysis. It can compute various kinds of spectral coefficients as well as the pitch, residual energy and voice activity. The speech is split into (possibly overlapping) frames, preaccentuated and passed through analysis window (rectangular or Hamming). The reflection coefficients are computed using correlation analysis and Leroux algorithm. The other coefficients are obtained by a conversion from LPC or reflection coefficients. If LSF are required, they are computed from correl. coeffs. using Split Levinson algorithm and Saoudi's method (reference given in lpc.c file). The energy written is always computed on preaccentuated but not windowed frame. The user can choose either original frame energy or residual energy (computed the well known formula with reflection coefficients). Both energies are normalized (division by the frame length) and logarithmic (natural log). The pitch detector does not use preaccentuation. The cepstrum is computed over a larger frame and the max is detected in the region of allowed lags (20..160). The voicing decision is based on 3 criterions computed over the LPC analysis frame from non-preaccentuated signal: zero passages number, prediction gain and relative frame energy. The voice activity decision works with a simple energy comparison (absolute and relative). The decision are smoothed using a 11-tap "all-must-be-zero" filter. That means, that to let a frame be classified "inactive", 11 frames around must be also inactive. This prevents the energy gaps in words (befor plosives) to be classified as silences. Inputs of unianal are: ********************** file.l16 - signal Outputs of unianal are: *********************** different types of LPC parameters: ---------------------------------- file.lpc - LPC a-coefficients file.rfl - reflection coefficients file.lar - log area ratios file.lsf - line spectrum frequencies file.cps - LPC-cepstrum coefficients (their number can be different from the analysis order !) file.mfc - MEL-cepstral coefficients (their number can be TOTALLY different from the LPC anal order, as it is a different analysis) !!! MFCC coeffs not yet implemented. !!! Other parameters: ----------------- file.pit - pitch and voicing information (pitch=0 means UNVOICED) file.ene - residual or frame energy file.vad - VAD information FILE FORMATS ************ Signal (file.l16) - bin, shorts, Fs=8000 Hz. Attention to differences between PC and UNIX machines! Parameters (file.lpc, rfl, lar, lsf, mfc, cps) - bin, floats. Vectors of params are written one after another. LPC: does not write a0=1. LPCC: does not write c0=0. LSF: Normalization of Fs/2 to 1. Pitch/Voicing: - bin, shorts. Number of samples giving lag, or 0 when unvoiced. VAD: - bin, shorts. 1st short is giving the number of frames with the same activity. 2nd shirt is giving the type of activity. Example: 5 0 - frames 0-4 mute 11 1 - frames 5-15 active 1 0 - frame 16 mute ... etc ... UNIANAL PARAMETERS ****************** Name: ----- -i string input (and also ouput) file name. Without extension. No default. Lpc analysis: ------------- -x short length of analysis window. Must be even. Default 160. -y short overlapping of analysis windows. Must be even and < enelysis window. Default 0. -z short preaccentuation coeff. * 10000. Default 9500 (mu=0.95). -o short order of LPC analysis. Default 10. -w short type of window for LPC anal. 0=rectangular, 1=Hamming. Default 1 (Hamming). Pitch and VAD: -------------- -s short frame length for pitch detection. must be even and > than LPC frame length and > than the max. pitch period. Default 320. -q short Zero passages threshold for pitch detection. To let the frame be classified as "voiced", the number of zero passages over the LPC analysis frame (not the long one) must not exceed the threshold. Default 40. -f short Threshold for prediction gain * 10000. To let the frame be classified as "voiced", the prediction gain (on the LPC analysis frame, without preaccentuation) must not exceed the threshold. Default 3800 (0.38 thr). -d short Threshold for relative frame energy * 10000. To let the frame be classified "voiced", its energy must exceed the maximal frame energy multiplied by the threshold. Default 200 (0.02 thr). -k short Relative threshold for voice activity [dB]. For "active", the frame energy must exceed the maximal frame energy * 10^(-thr/10). Default 30 (-30 dB of the max frame energy). -u short Absolute threshold for voice activity [dB]. For "active", the frame energy must exceed the maximal possible frame energy * 10^(-thr/10). The maximal possible energy is that of a pure sinusoid with the maximal amplitude (A^2 / 2). Default 77 (-77 dB of the max. energy possible). Outputs: -------- -a short write LPC coefficients. 0=no, 1=yes. Default 0. -r short write reflection coefficients. 0=no, 1=yes. Default 0. -g short write LAR coefficients. 0=no, 1=yes. Default 0. -l short write LSF coefficients. 0=no, 1=yes. If LSF coeffs required, the LPC analysis order must be even. Default 0. -c short write LPCC coefficients. 0=no, >0 - number of coeffs. Default 0. -m short write MFCC coefficients. 0=no, >0 - number of coeffs. Default 0. !!! MFCC coeffs not yet implemented. !!! -p short write pitch/voicing. 0=no, 1=yes. Default 0. -e short write energy. 0=no, 1=frame, 2=residual. Default 0. -v short write voice activity information. 0=no, 1=yes. Default 0. Standard output: **************** If everything goes right, unianal produces only a basic information: file name (without extension) frame number sample number number of VAD parts (both types), 1 if VAD not computed Note, that the sample number is generally lower than the no. of samples in the original file. This is given by the frame by frame analysis. The number of samples processed is given by LSEG+(FN-1)*WS, where FN is the frame number, LSEG is the length of analysis frame, RSEG is the overlapping and WS=LSEG-RSEG is the window shift. Standard error: **************** In case some refl. coefficients are unstable, a warning and the bad frame number is printed. In case a file is shorter than a frame, message is printed to stderr and number of frames and samples 0,0 are printed to stdout. ============================================================================= UNISYNT - LPC speech analysis ============================================================================= unisynt is a program for a very primitive LPC speech synthesis. It reads spectral parameters, pitch and residual energy and constructs the synthetic speech. If the frame is voiced, the synthesis filter is excited by the Kronecker pulse train (possibly something to improve !), if it is not voiced, the filter is excited by a Gaussian white noise. The signal may be de-accentuated (filtering by 1/(1-mu*z^-1) ). Inputs of unianal are: ********************** file.pit - pitch and voicing information (pitch=0 means UNVOICED) file.ene - residual energy file.lpc - LPC a-coefficients or file.rfl - reflection coefficients or file.lar - log area ratios or file.lsf - line spectrum frequencies or file.cps - LPC-cepstrum coefficients (their number can be different from the analysis order !) or file.mfc - MEL-cepstral coefficients (their number can be TOTALLY different from the LPC anal order, as it is a different analysis). !!! NOT YET IMPLEMENTED !!! Outputs of unianal are: *********************** file.syn.l16 - synthetic speech. FILE FORMATS ************ are the same as for unianal. UNISYNT PARAMETERS ****************** Name: ----- -i string input (and also ouput) file name. Without extension. No default. Frame information and deaccentuation: ------------------------------------- -x short length of analysis window. Must be even. Default 160. -y short overlapping of analysis windows. Must be even and < enelysis window. Default 0. -z short preaccentuation coeff. * 10000. Default 9500 (mu=0.95). -o short order of LPC analysis. Default 10. Type of input coefficients: --------------------------- -e short type of energy. Must be 2 (residual energy). -a short read LPC coefficients. 0=no, 1=yes. Default 0. -r short read reflection coefficients. 0=no, 1=yes. Default 0. -g short read LAR coefficients. 0=no, 1=yes. Default 0. -l short read LSF coefficients. 0=no, 1=yes. If LSF coeffs read, the LPC analysis order must be even. Default 0. -c short read LPCC coefficients. 0=no, >0 - number of coeffs. Default 0. -m short read MFCC coefficients. 0=no, >0 - number of coeffs. Default 0. !!! MFCC coeffs not yet implemented. !!! Note, the the options -a1, -r1, -g1, -l1, -c1, -m1 are EXCLUSIVE ! Standard output *************** If everything goes right, unisynt produces only a basic information: file name (without extension) frame number sample number Note, that the sample number is generally lower than the no. of samples in the original file. This is given by the frame by frame analysis. The number of samples processed is given by LSEG+(FN-1)*WS, where FN is the frame number, LSEG is the length of analysis frame, RSEG is the overlapping and WS=LSEG-RSEG is the window shift. ============================================================================= FILES ============================================================================= README what you are reading right now Makefile UNIX makefile. Modify the compiler name, parameters according to your system Executables and Batch: ---------------------- unianal unisynt run.bat batch file for a test (analysis and synthesis) C-files: -------- inutile.c some older and not well functionning functions (NOT USED) io.c input/output of speech lpc.c LPC analysis and parameter conversion pitch.c pitch detection pitch_gsm.c pitch detection using GSM HR method (NOT USED) synth.c synthesis functions (noise, pulses) unianal.c main for unianal unisynt.c main for unisynt vad.c voice activity detection functions common.c common functions getargs.c functions for getting arguments from command line Headers: -------- constants.h constants (debugging, pi, etc) defaults.h default values for parameters globdefs.h global definitions for all c-files io.h header for io.c lpc.h header for lpc.c pitch.h header for pitch.c pitch_gsm.h header for pitch_gsm.c (NOT USED) synth.h header for synth.c vad.h header for vad.c common.h header for common.c getargs.h header for getargs.c