Our first stop in speech analysis is usually the short-time Fourier transform, or STFT1: Figure 1: Top: an STFT of speech (it's me, saying: "Es war einmal ein Mann"). Bottom: the signal's waveform. As we can see, this speech signal has a strong fundamental frequency track around 120 Hz, and harmonics at integer multiples of the fundamental frequency. We perceive the frequency of the fundamental as the speech's pitch. The magnitude of the harmonics varies over time, which we perceive as the so...