音声認識と音声処理(301~330)– 音声データの処理と音声認識の基本を学びます。 –
-
Chapter 11
[AI from Scratch] Episode 314: Basics of Speech Synthesis (Text-to-Speech) — Generating Audio from Text
Recap and Today's Theme Hello! In the previous episode, we discussed Wav2Vec, a self-supervised learning model used to extract features from audio data, which significantly improves the accuracy of speech recognition systems. Today, we w... -
Chapter 11
[AI from Scratch] Episode 313: Understanding Wav2Vec — Self-Supervised Learning for Speech Representation Learning
Recap and Today's Theme Hello! In the previous episode, we discussed DeepSpeech, a speech recognition model that utilizes CTC (Connectionist Temporal Classification) and deep learning to convert audio data into text in an end-to-end fash... -
Chapter 11
[AI from Scratch] Episode 312: Introduction to DeepSpeech — A Deep Learning-Based Speech Recognition Model
Recap and Today's Theme Hello! In the previous episode, we discussed Connectionist Temporal Classification (CTC), a method that solves the problem of differing input and output sequence lengths in speech recognition. CTC allows deep lear... -
Chapter 11
[AI from Scratch] Episode 311: Connectionist Temporal Classification (CTC) — Maintaining Label Alignment in Speech Recognition
Recap and Today's Theme Hello! In the previous episode, we discussed Hidden Markov Models (HMMs), a classical approach to speech recognition. While HMMs decompose speech into phonemes, they face limitations when capturing the complex and... -
Chapter 11
[AI from Scratch] Episode 310: Hidden Markov Model (HMM) — Classical Speech Recognition Model Explained
Recap and Today's Theme Hello! In the previous episode, we explored the basics of speech recognition, understanding how audio is converted into text and how different components like acoustic models and language models work together. Tod... -
Chapter 11
[AI from Scratch] Episode 309: Introduction to Speech Recognition — Converting Speech into Text
Recap and Today's Theme Hello! In the previous episode, we covered audio preprocessing techniques such as normalization, filtering, and adjusting the sampling rate. These steps are essential for improving the quality of audio data and en... -
Chapter 11
[AI from Scratch] Episode 308: Preprocessing Audio Data — Normalization and Filtering Techniques
Recap and Today's Theme Hello! In the previous episode, we discussed noise reduction, learning various techniques to remove noise from audio data to improve its quality. Noise reduction is an essential step in enhancing the overall clari... -
Chapter 11
[AI from Scratch] Episode 307: Noise Reduction — Techniques for Removing Noise from Audio Data
Recap and Today's Theme Hello! In the previous episode, we discussed Mel-Frequency Cepstral Coefficients (MFCC), a crucial tool for extracting features from audio data used in speech recognition and acoustic analysis. Today, we’ll focus ... -
Chapter 11
[AI from Scratch] Episode 306: Mel-Frequency Cepstral Coefficients (MFCC) — Extracting Audio Features
Recap and Today's Theme Hello! In the previous episode, we explored spectrograms, learning how to break down audio signals into frequency components and display them over time. Spectrograms are a crucial tool for visually understanding t... -
Chapter 11
[AI from Scratch] Episode 305: What is a Spectrogram? — Visualizing Frequency Components Over Time
Recap and Today's Theme Hello! In the previous episode, we explored waveform visualization, learning how to display audio signals over time as changes in amplitude. By visualizing waveforms, we could understand the intensity and certain ... -
Chapter 11
[AI from Scratch] Episode 304: Visualizing Waveform Data — How to Graph Audio Signals
Recap and Today's Theme Hello! In the previous episode, we explored LibROSA, a powerful Python library for audio processing. We learned how to load, play, and extract features from audio files easily. Now, it's time to move on to a more ... -
Chapter 11
[AI from Scratch] Episode 303: Introduction to LibROSA — Basics of the Audio Processing Library
Recap and Today's Theme Hello! In the previous episode, we covered the basics of audio data, discussing key concepts such as sampling rate and bit depth. Understanding these fundamentals allows you to handle audio data properly and manag... -
Chapter 11
[AI from Scratch] Episode 302: Basics of Audio Data — Understanding Sampling Rate and Bit Depth
Recap and Today's Theme Hello! In the previous episode, we discussed audio processing and covered the foundational technologies of digital audio, including speech recognition and speech synthesis. We learned about how audio is digitized ... -
Chapter 11
[AI from Scratch] Episode 301: What is Speech Processing? — A Guide to Working with Audio Data
Recap and Today's Theme Hello! In the previous episode, we summarized Chapter 10 and conducted a knowledge check to review and deepen our understanding. Now, we’re moving into Chapter 11, where we will learn about speech recognition and ...
12
