音声認識と音声処理（301～330）– 音声データの処理と音声認識の基本を学びます。 –

Chapter 11

[AI from Scratch] Episode 314: Basics of Speech Synthesis (Text-to-Speech) — Generating Audio from Text

Recap and Today's Theme Hello! In the previous episode, we discussed Wav2Vec, a self-supervised learning model used to extract features from audio data, which significantly improves the accuracy of speech recognition systems. Today, we w...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 313: Understanding Wav2Vec — Self-Supervised Learning for Speech Representation Learning

Recap and Today's Theme Hello! In the previous episode, we discussed DeepSpeech, a speech recognition model that utilizes CTC (Connectionist Temporal Classification) and deep learning to convert audio data into text in an end-to-end fash...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 312: Introduction to DeepSpeech — A Deep Learning-Based Speech Recognition Model

Recap and Today's Theme Hello! In the previous episode, we discussed Connectionist Temporal Classification (CTC), a method that solves the problem of differing input and output sequence lengths in speech recognition. CTC allows deep lear...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 311: Connectionist Temporal Classification (CTC) — Maintaining Label Alignment in Speech Recognition

Recap and Today's Theme Hello! In the previous episode, we discussed Hidden Markov Models (HMMs), a classical approach to speech recognition. While HMMs decompose speech into phonemes, they face limitations when capturing the complex and...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 310: Hidden Markov Model (HMM) — Classical Speech Recognition Model Explained

Recap and Today's Theme Hello! In the previous episode, we explored the basics of speech recognition, understanding how audio is converted into text and how different components like acoustic models and language models work together. Tod...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 309: Introduction to Speech Recognition — Converting Speech into Text

Recap and Today's Theme Hello! In the previous episode, we covered audio preprocessing techniques such as normalization, filtering, and adjusting the sampling rate. These steps are essential for improving the quality of audio data and en...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 308: Preprocessing Audio Data — Normalization and Filtering Techniques

Recap and Today's Theme Hello! In the previous episode, we discussed noise reduction, learning various techniques to remove noise from audio data to improve its quality. Noise reduction is an essential step in enhancing the overall clari...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 307: Noise Reduction — Techniques for Removing Noise from Audio Data

Recap and Today's Theme Hello! In the previous episode, we discussed Mel-Frequency Cepstral Coefficients (MFCC), a crucial tool for extracting features from audio data used in speech recognition and acoustic analysis. Today, we’ll focus ...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 306: Mel-Frequency Cepstral Coefficients (MFCC) — Extracting Audio Features

Recap and Today's Theme Hello! In the previous episode, we explored spectrograms, learning how to break down audio signals into frequency components and display them over time. Spectrograms are a crucial tool for visually understanding t...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 305: What is a Spectrogram? — Visualizing Frequency Components Over Time

Recap and Today's Theme Hello! In the previous episode, we explored waveform visualization, learning how to display audio signals over time as changes in amplitude. By visualizing waveforms, we could understand the intensity and certain ...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 304: Visualizing Waveform Data — How to Graph Audio Signals

Recap and Today's Theme Hello! In the previous episode, we explored LibROSA, a powerful Python library for audio processing. We learned how to load, play, and extract features from audio files easily. Now, it's time to move on to a more ...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 303: Introduction to LibROSA — Basics of the Audio Processing Library

Recap and Today's Theme Hello! In the previous episode, we covered the basics of audio data, discussing key concepts such as sampling rate and bit depth. Understanding these fundamentals allows you to handle audio data properly and manag...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 302: Basics of Audio Data — Understanding Sampling Rate and Bit Depth

Recap and Today's Theme Hello! In the previous episode, we discussed audio processing and covered the foundational technologies of digital audio, including speech recognition and speech synthesis. We learned about how audio is digitized ...

2024年10月13日
Chapter 11

[AI from Scratch] Episode 301: What is Speech Processing? — A Guide to Working with Audio Data

Recap and Today's Theme Hello! In the previous episode, we summarized Chapter 10 and conducted a knowledge check to review and deepen our understanding. Now, we’re moving into Chapter 11, where we will learn about speech recognition and ...

2024年10月13日

音声認識と音声処理（301～330）– 音声データの処理と音声認識の基本を学びます。 –

[AI from Scratch] Episode 314: Basics of Speech Synthesis (Text-to-Speech) — Generating Audio from Text

[AI from Scratch] Episode 313: Understanding Wav2Vec — Self-Supervised Learning for Speech Representation Learning

[AI from Scratch] Episode 312: Introduction to DeepSpeech — A Deep Learning-Based Speech Recognition Model

[AI from Scratch] Episode 311: Connectionist Temporal Classification (CTC) — Maintaining Label Alignment in Speech Recognition

[AI from Scratch] Episode 310: Hidden Markov Model (HMM) — Classical Speech Recognition Model Explained

[AI from Scratch] Episode 309: Introduction to Speech Recognition — Converting Speech into Text

[AI from Scratch] Episode 308: Preprocessing Audio Data — Normalization and Filtering Techniques

[AI from Scratch] Episode 307: Noise Reduction — Techniques for Removing Noise from Audio Data

[AI from Scratch] Episode 306: Mel-Frequency Cepstral Coefficients (MFCC) — Extracting Audio Features

[AI from Scratch] Episode 305: What is a Spectrogram? — Visualizing Frequency Components Over Time

[AI from Scratch] Episode 304: Visualizing Waveform Data — How to Graph Audio Signals

[AI from Scratch] Episode 303: Introduction to LibROSA — Basics of the Audio Processing Library

[AI from Scratch] Episode 302: Basics of Audio Data — Understanding Sampling Rate and Bit Depth

[AI from Scratch] Episode 301: What is Speech Processing? — A Guide to Working with Audio Data