Learning AI from scratch– category –
-
Chapter 11
[AI from Scratch] Episode 319: Speech Emotion Recognition — How to Estimate Emotions from Speech
Recap and Today's Theme Hello! In the previous episode, we covered keyword spotting (KWS), a technique used to detect specific keywords in real-time, which plays a critical role in voice assistants and smart devices. Today, we will focus... -
Chapter 11
[AI from Scratch] Episode 320: Speaker Recognition — Identifying Speakers from Audio
Recap and Today's Theme Hello! In the previous episode, we discussed emotion recognition from speech, focusing on how audio data can be analyzed to detect the emotions of a speaker. Today, we will explore another important topic: Speaker... -
Chapter 11
[AI from Scratch] Episode 318: Keyword Spotting — Detecting Specific Keywords in Speech
Recap and Today's Theme Hello! In the previous episode, we discussed evaluation metrics for speech recognition systems, such as Word Error Rate (WER), and how these metrics help assess model performance. Today, we will focus on a technol... -
Chapter 11
[AI from Scratch] Episode 316: Overview of WaveGlow — Real-Time Speech Synthesis Model
Recap and Today's Theme Hello! In the previous episode, we discussed Tacotron 2, a model that generates high-quality speech by converting text into Mel-spectrograms. While Tacotron 2 excels at generating Mel-spectrograms, it still requir... -
Chapter 11
[AI from Scratch] Episode 317: Evaluation Metrics for Speech Recognition Models — Understanding Word Error Rate (WER) and Other Metrics
Recap and Today's Theme Hello! In the previous episode, we explained WaveGlow, a vocoder model that enables high-quality and real-time speech synthesis. By combining WaveGlow with Tacotron 2, we can generate natural-sounding speech effic... -
Chapter 11
[AI from Scratch] Episode 315: Implementing Tacotron 2 — A High-Quality Speech Synthesis Model
Recap and Today's Theme Hello! In the previous episode, we introduced Text-to-Speech (TTS) technology, which converts text into natural-sounding speech. With advances in deep learning, modern TTS systems, such as Tacotron 2, can generate... -
Chapter 11
[AI from Scratch] Episode 314: Basics of Speech Synthesis (Text-to-Speech) — Generating Audio from Text
Recap and Today's Theme Hello! In the previous episode, we discussed Wav2Vec, a self-supervised learning model used to extract features from audio data, which significantly improves the accuracy of speech recognition systems. Today, we w... -
Chapter 11
[AI from Scratch] Episode 312: Introduction to DeepSpeech — A Deep Learning-Based Speech Recognition Model
Recap and Today's Theme Hello! In the previous episode, we discussed Connectionist Temporal Classification (CTC), a method that solves the problem of differing input and output sequence lengths in speech recognition. CTC allows deep lear... -
Chapter 11
[AI from Scratch] Episode 313: Understanding Wav2Vec — Self-Supervised Learning for Speech Representation Learning
Recap and Today's Theme Hello! In the previous episode, we discussed DeepSpeech, a speech recognition model that utilizes CTC (Connectionist Temporal Classification) and deep learning to convert audio data into text in an end-to-end fash... -
Chapter 11
[AI from Scratch] Episode 311: Connectionist Temporal Classification (CTC) — Maintaining Label Alignment in Speech Recognition
Recap and Today's Theme Hello! In the previous episode, we discussed Hidden Markov Models (HMMs), a classical approach to speech recognition. While HMMs decompose speech into phonemes, they face limitations when capturing the complex and... -
Chapter 11
[AI from Scratch] Episode 310: Hidden Markov Model (HMM) — Classical Speech Recognition Model Explained
Recap and Today's Theme Hello! In the previous episode, we explored the basics of speech recognition, understanding how audio is converted into text and how different components like acoustic models and language models work together. Tod... -
Chapter 11
[AI from Scratch] Episode 308: Preprocessing Audio Data — Normalization and Filtering Techniques
Recap and Today's Theme Hello! In the previous episode, we discussed noise reduction, learning various techniques to remove noise from audio data to improve its quality. Noise reduction is an essential step in enhancing the overall clari... -
Chapter 11
[AI from Scratch] Episode 309: Introduction to Speech Recognition — Converting Speech into Text
Recap and Today's Theme Hello! In the previous episode, we covered audio preprocessing techniques such as normalization, filtering, and adjusting the sampling rate. These steps are essential for improving the quality of audio data and en... -
Chapter 11
[AI from Scratch] Episode 306: Mel-Frequency Cepstral Coefficients (MFCC) — Extracting Audio Features
Recap and Today's Theme Hello! In the previous episode, we explored spectrograms, learning how to break down audio signals into frequency components and display them over time. Spectrograms are a crucial tool for visually understanding t... -
Chapter 11
[AI from Scratch] Episode 307: Noise Reduction — Techniques for Removing Noise from Audio Data
Recap and Today's Theme Hello! In the previous episode, we discussed Mel-Frequency Cepstral Coefficients (MFCC), a crucial tool for extracting features from audio data used in speech recognition and acoustic analysis. Today, we’ll focus ...
