Article
-
Chapter 11
[AI from Scratch] Episode 326: Applications of Speech Processing — Smart Speakers and Automated Response Systems
Recap and Today's Theme Hello! In the last episode, we discussed speech recognition in noisy environments, covering techniques such as noise reduction and data augmentation to build robust models. Today, we will explore applications of s... -
Chapter 11
[AI from Scratch] Episode 324: Basics of Audio Codecs — Compression Techniques for Audio Data
Recap and Today's Theme Hello! In the previous episode, we explored real-time audio processing, learning how to perform speech recognition and synthesis with low latency. Today, we will shift our focus to audio codecs—technologies that c... -
Chapter 11
[AI from Scratch] Episode 322: Multimodal Learning — Combining Speech, Images, and Text for Enhanced Learning
Recap and Today's Theme Hello! In the previous episode, we explored audio data augmentation, where we enhanced the diversity of audio datasets using techniques like pitch shifting and time-stretching. Today, we will delve into an excitin... -
Chapter 11
[AI from Scratch] Episode 323: Real-Time Speech Processing — Techniques for Low-Latency Speech Recognition and Synthesis
Recap and Today's Theme Hello! In the previous episode, we explored multimodal learning, which combines data from different modalities (audio, images, text) to build more accurate and robust models. This integration of multiple data type... -
Chapter 11
[AI from Scratch] Episode 321: Audio Data Augmentation — Techniques Using Pitch Shift and Time Stretching
Recap and Today's Theme Hello! In the previous episode, we covered speaker recognition, a technique used to identify individual speakers based on voice patterns. Speaker recognition plays a key role in voice assistants and security syste... -
Chapter 11
[AI from Scratch] Episode 319: Speech Emotion Recognition — How to Estimate Emotions from Speech
Recap and Today's Theme Hello! In the previous episode, we covered keyword spotting (KWS), a technique used to detect specific keywords in real-time, which plays a critical role in voice assistants and smart devices. Today, we will focus... -
Chapter 11
[AI from Scratch] Episode 320: Speaker Recognition — Identifying Speakers from Audio
Recap and Today's Theme Hello! In the previous episode, we discussed emotion recognition from speech, focusing on how audio data can be analyzed to detect the emotions of a speaker. Today, we will explore another important topic: Speaker... -
Chapter 11
[AI from Scratch] Episode 318: Keyword Spotting — Detecting Specific Keywords in Speech
Recap and Today's Theme Hello! In the previous episode, we discussed evaluation metrics for speech recognition systems, such as Word Error Rate (WER), and how these metrics help assess model performance. Today, we will focus on a technol... -
Chapter 11
[AI from Scratch] Episode 316: Overview of WaveGlow — Real-Time Speech Synthesis Model
Recap and Today's Theme Hello! In the previous episode, we discussed Tacotron 2, a model that generates high-quality speech by converting text into Mel-spectrograms. While Tacotron 2 excels at generating Mel-spectrograms, it still requir... -
Chapter 11
[AI from Scratch] Episode 317: Evaluation Metrics for Speech Recognition Models — Understanding Word Error Rate (WER) and Other Metrics
Recap and Today's Theme Hello! In the previous episode, we explained WaveGlow, a vocoder model that enables high-quality and real-time speech synthesis. By combining WaveGlow with Tacotron 2, we can generate natural-sounding speech effic... -
Chapter 11
[AI from Scratch] Episode 315: Implementing Tacotron 2 — A High-Quality Speech Synthesis Model
Recap and Today's Theme Hello! In the previous episode, we introduced Text-to-Speech (TTS) technology, which converts text into natural-sounding speech. With advances in deep learning, modern TTS systems, such as Tacotron 2, can generate... -
Chapter 11
[AI from Scratch] Episode 314: Basics of Speech Synthesis (Text-to-Speech) — Generating Audio from Text
Recap and Today's Theme Hello! In the previous episode, we discussed Wav2Vec, a self-supervised learning model used to extract features from audio data, which significantly improves the accuracy of speech recognition systems. Today, we w... -
Chapter 11
[AI from Scratch] Episode 312: Introduction to DeepSpeech — A Deep Learning-Based Speech Recognition Model
Recap and Today's Theme Hello! In the previous episode, we discussed Connectionist Temporal Classification (CTC), a method that solves the problem of differing input and output sequence lengths in speech recognition. CTC allows deep lear... -
Chapter 11
[AI from Scratch] Episode 313: Understanding Wav2Vec — Self-Supervised Learning for Speech Representation Learning
Recap and Today's Theme Hello! In the previous episode, we discussed DeepSpeech, a speech recognition model that utilizes CTC (Connectionist Temporal Classification) and deep learning to convert audio data into text in an end-to-end fash... -
Chapter 11
[AI from Scratch] Episode 311: Connectionist Temporal Classification (CTC) — Maintaining Label Alignment in Speech Recognition
Recap and Today's Theme Hello! In the previous episode, we discussed Hidden Markov Models (HMMs), a classical approach to speech recognition. While HMMs decompose speech into phonemes, they face limitations when capturing the complex and...
