MENU

[AI from Scratch] Episode 228: Implementing a CNN — Building a Convolutional Neural Network

TOC

Recap and Today’s Theme

Hello! In the previous episode, we explored how to build models using Keras. We discovered how Keras makes it simple to construct anything from basic fully connected networks to complex custom networks.

Today, we will dive into one of the most prominent models in deep learning: the Convolutional Neural Network (CNN). CNNs are particularly effective in the fields of image recognition and image classification, using filters to extract features from images for classification and detection tasks. In this episode, we’ll use Keras to build a simple CNN and gain a practical understanding of how it works.

What Is a CNN?

A Convolutional Neural Network (CNN) is a type of neural network designed to efficiently extract patterns and features from image data for classification and detection. CNNs are composed of the following essential layers:

  1. Convolutional Layer: Applies filters (kernels) to the image data to create feature maps.
  2. Pooling Layer: Downsamples the output from the convolutional layer, reducing the computation while still extracting features.
  3. Fully Connected Layer: Uses the extracted features for class classification.

By stacking these layers, CNNs learn hierarchical features of images, enabling them to recognize complex patterns.

Implementing a CNN with Keras

Let’s build a simple CNN using Keras to classify the MNIST dataset (handwritten digit images).

1. Importing the Necessary Libraries

First, import TensorFlow and Keras libraries.

import tensorflow as tf
from tensorflow.keras import layers, models

2. Preparing the Dataset

Load the MNIST dataset and preprocess the data to fit the CNN input format.

# Loading the dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

# Reshaping data (28x28 -> 28x28x1) and scaling pixel values to 0-1
x_train = x_train.reshape((x_train.shape[0], 28, 28, 1)).astype('float32') / 255
x_test = x_test.reshape((x_test.shape[0], 28, 28, 1)).astype('float32') / 255

# Converting labels to categorical (one-hot encoding)
y_train = tf.keras.utils.to_categorical(y_train, 10)
y_test = tf.keras.utils.to_categorical(y_test, 10)
  • reshape(): Adjusts the shape of the grayscale images (28×28) to (28x28x1) for CNN input.
  • astype(‘float32’) / 255: Normalizes pixel values to a range of 0 to 1.
  • to_categorical(): Converts labels into one-hot vectors for 10-class classification (0–9).

3. Building the CNN Model

Next, define a simple CNN using Keras.

# Defining the model
model = models.Sequential()

# Adding the first convolutional and pooling layers
model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))

# Adding the second convolutional and pooling layers
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))

# Adding the third convolutional layer
model.add(layers.Conv2D(64, (3, 3), activation='relu'))

# Flattening and adding fully connected layers
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))
  • Conv2D: Adds convolutional layers. The numbers (32, 64) indicate the number of filters, and (3, 3) is the filter size.
  • MaxPooling2D: Adds pooling layers to downsample feature maps.
  • Flatten: Flattens 3D data into 1D for the fully connected layers.
  • Dense: Adds fully connected layers, using softmax in the output layer to produce probabilities for each class.

4. Compiling the Model

Compile the model by specifying the loss function, optimizer, and evaluation metrics.

model.compile(optimizer='adam',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
  • optimizer=’adam’: Adam is an efficient optimization algorithm commonly used in deep learning.
  • loss=’categorical_crossentropy’: Specifies a loss function suitable for multi-class classification.
  • metrics=[‘accuracy’]: Evaluates model performance based on accuracy.

5. Training the Model

Train the model using the dataset.

history = model.fit(x_train, y_train, epochs=10, batch_size=64, validation_split=0.2)
  • epochs=10: Trains the model for 10 iterations over the dataset.
  • batch_size=64: Specifies the number of samples per update.
  • validation_split=0.2: Uses 20% of the training data for validation.

6. Evaluating the Model

Evaluate the model’s performance using the test dataset.

test_loss, test_accuracy = model.evaluate(x_test, y_test)
print(f"Test accuracy: {test_accuracy:.2f}")

7. Making Predictions with the Model

Use the trained model to make predictions on the test data.

predictions = model.predict(x_test)

# Displaying the first 5 predictions
for i in range(5):
    print(f"Actual: {tf.argmax(y_test[i])}, Predicted: {tf.argmax(predictions[i])}")
  • predict(): Uses the model to predict classes for the test data.
  • tf.argmax(): Converts the one-hot encoded output back to the original class label.

Extending and Customizing the CNN

Based on the simple CNN above, you can expand and customize the network further.

Adding a Dropout Layer

Dropout helps prevent overfitting by randomly disabling neurons during training, and it can be easily added in Keras.

model.add(layers.Dropout(0.5))

Adding More Convolutional and Pooling Layers

Increasing the number of convolutional and pooling layers enhances the model’s learning capacity.

model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))

Summary

In this episode, we explained how to implement a CNN (Convolutional Neural Network) using Keras. CNNs are highly effective for image recognition and classification because of their structure, which extracts hierarchical features from image data. By learning the basic construction, you’ve built a solid foundation for applying these skills to other datasets and models.

Next Episode Preview

Next time, we will discuss the implementation of RNNs (Recurrent Neural Networks). RNNs are useful for time series data and natural language processing. We’ll explore their features and basic implementation!


Annotations

  • Convolutional Layer: Applies filters to images to extract features.
  • Pooling Layer: Downsamples feature maps, reducing computation.
Let's share this post !

Author of this article

株式会社PROMPTは生成AIに関する様々な情報を発信しています。
記事にしてほしいテーマや調べてほしいテーマがあればお問合せフォームからご連絡ください。
---
PROMPT Inc. provides a variety of information related to generative AI.
If there is a topic you would like us to write an article about or research, please contact us using the inquiry form.

Comments

To comment

TOC