Recap and Today’s Theme
Hello! In the previous episode, we explored waveform visualization, learning how to display audio signals over time as changes in amplitude. By visualizing waveforms, we could understand the intensity and certain events in the sound.
Today, we’ll dive into spectrograms, a powerful tool for visualizing the frequency components of audio data over time. Spectrograms are widely used for detailed analysis of sound, providing insights into the frequencies that make up a signal and how they change. In this episode, we’ll explain what spectrograms are, how they are generated, and how to analyze them using Python’s LibROSA library.
What is a Spectrogram?
A spectrogram is a graph that displays the frequency components of an audio signal over time. Since sound is composed of various frequencies, visualizing these components helps in understanding the characteristics and changes of the audio over time.
Components of a Spectrogram
- Time Axis (X-axis): Represents time, showing how sound evolves as the audio plays.
- Frequency Axis (Y-axis): Displays the frequencies present in the sound, ranging from low to high frequencies.
- Color Intensity (Amplitude): Represents the strength (amplitude) of each frequency component, with brighter colors indicating higher amplitude.
Spectrograms are useful for various applications such as music analysis, speech recognition, detecting abnormal sounds, and even analyzing animal sounds.
How Spectrograms Are Generated
Spectrograms are generated using a technique called Short-Time Fourier Transform (STFT). STFT divides the audio signal into small segments (windows) and applies the Fourier Transform to each window to extract the frequency components.
What is Short-Time Fourier Transform (STFT)?
Fourier Transform converts a signal from the time domain into the frequency domain, but it loses time information. By using STFT, we can keep both time and frequency information by applying the Fourier Transform to small time intervals.
- Window Size: The length of the time interval. Shorter windows give better time resolution, while longer windows provide better frequency resolution.
- Hop Size: The step between windows. Overlapping windows allow for continuous time information.
By performing STFT, we can extract frequency spectra from each window and arrange them over time to form a spectrogram.
Creating a Spectrogram with LibROSA
Let’s use LibROSA to load an audio file, apply STFT, and visualize the spectrogram.
1. Loading the Audio File
import librosa
# Specify the path to the audio file
audio_path = 'example.wav'
# Load the audio file
y, sr = librosa.load(audio_path, sr=None)
librosa.load(audio_path, sr=None)
: Loads the audio file, returning the waveform (y
) and the sampling rate (sr
).
2. Calculating STFT and Generating the Spectrogram
import numpy as np
import matplotlib.pyplot as plt
import librosa.display
# Compute the Short-Time Fourier Transform (STFT)
D = librosa.stft(y)
# Convert the amplitude spectrogram to a decibel scale
S_db = librosa.amplitude_to_db(np.abs(D))
# Display the spectrogram
plt.figure(figsize=(10, 4))
librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Spectrogram')
plt.xlabel('Time (seconds)')
plt.ylabel('Frequency (Hz)')
plt.show()
librosa.stft()
: Performs STFT on the waveform data, generating complex spectral data.librosa.amplitude_to_db()
: Converts the amplitude spectrum to decibels for easier visualization.librosa.display.specshow()
: Displays the spectrogram, with time on the X-axis and frequency on the Y-axis.
Running this code will produce a spectrogram, allowing you to visually analyze the frequency components of the audio over time.
Frequency Axis Scales
You can adjust the frequency axis depending on your needs:
linear
: Shows frequencies on a linear scale, highlighting changes in higher frequencies.log
: Uses a logarithmic scale, better suited for analyzing audio across a wide range of frequencies.mel
: Uses the Mel scale, mimicking how humans perceive sound frequencies and often used in speech recognition.
Analyzing the Spectrogram
By examining a spectrogram, you can gain insights into various characteristics of the sound:
1. Emphasized Frequencies
Certain sounds or instruments produce strong frequency components. By looking at a spectrogram, you can identify which frequency bands are emphasized, providing clues about the sound source.
2. Duration of Sounds
Spectrograms show how long specific frequencies are sustained. This allows you to visualize the length of notes or speech sounds and their rhythmic patterns.
3. Noise Detection
Noise often appears as wideband, irregular frequency components. High-frequency noise is particularly easy to spot on a spectrogram, making it useful for analyzing and removing unwanted sounds.
Practical Example: Analyzing Audio Events
In real-world audio data, different events such as speech, music, or noise are often mixed. A spectrogram can help isolate these events for analysis.
# Display the spectrogram for analysis
plt.figure(figsize=(10, 6))
librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='log')
plt.colorbar(format='%+2.0f dB')
plt.title('Spectrogram')
plt.xlabel('Time (seconds)')
plt.ylabel('Frequency (Hz)')
plt.show()
By analyzing this spectrogram, you can identify:
- Instruments: Certain frequency bands will be emphasized, often appearing with rhythmic patterns.
- Human Voice: Voices tend to cover a wide range of frequencies and show complex patterns with clear peaks at specific moments.
- Noise: Noise components are often continuous across frequencies, appearing irregularly in the spectrogram.
Summary
In this episode, we explored spectrograms, learning how to visualize frequency components over time. Spectrograms are a valuable tool for analyzing audio, providing detailed information about the characteristics and events in a sound signal. In the next episode, we will take a closer look at Mel-Frequency Cepstral Coefficients (MFCC), an important feature in speech recognition.
Next Episode Preview
In the next episode, we’ll discuss MFCC (Mel-Frequency Cepstral Coefficients), explaining how to extract these key features from audio and their significance in speech processing.
Notes
- STFT (Short-Time Fourier Transform): A method for analyzing both time and frequency components of an audio signal.
- Decibel Scale: Converts amplitude to a logarithmic scale to better visualize sound intensity.
Comments