MENU

[AI from Scratch] Episode 237: Environment Setup with Docker — Ensuring Reproducibility Using Container Technology

TOC

Recap and Today’s Theme

Hello! In the previous episode, we discussed running models in cloud environments like AWS and GCP, which allowed us to build scalable and reliable systems that can handle large traffic volumes using cloud services.

Today, we will explore Docker, a powerful tool for ensuring the reproducibility of development and production environments. Docker minimizes errors caused by environmental differences, significantly enhancing development and operational efficiency. This episode explains the basics of environment setup using Docker and demonstrates how to run a machine learning model using Docker containers.

What Is Docker?

Docker is an open-source platform for creating, deploying, and managing containers that run applications. It allows packaging an application and its dependencies into a self-contained environment, ensuring that it runs consistently across different platforms.

Benefits of Docker

  1. Reproducibility: By creating an environment within a container, the same settings can be used across development, testing, and production environments.
  2. Dependency Management: Docker packages the application and its dependencies together, preventing version conflicts or installation errors.
  3. Portability: Docker images can run on any platform, ensuring high compatibility.
  4. Lightweight and Fast: Unlike virtual machines, containers share the host OS kernel, resulting in faster startup times and lower resource usage.

Basics of Environment Setup with Docker

Let’s go through the basic steps of environment setup using Docker.

1. Installing Docker

First, download and install Docker from the official site:

Follow the installation instructions according to your operating system.

2. Creating a Dockerfile

A Dockerfile is like a blueprint for building Docker images. It specifies the application dependencies and configurations needed to create a reproducible environment. Below is an example Dockerfile for a simple Python application.

Dockerfile

# Specify the base image
FROM python:3.9-slim

# Set the working directory
WORKDIR /app

# Copy the dependency file
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy the application code
COPY . .

# Define the command to run the application
CMD ["python", "app.py"]
  • FROM: Specifies the base image (a lightweight version of Python 3.9).
  • WORKDIR: Sets the working directory inside the container.
  • COPY: Copies local files into the container.
  • RUN: Executes commands like installing dependencies.
  • CMD: Specifies the command to run when the container starts (e.g., running the application).

3. Building the Docker Image

Once the Dockerfile is ready, build the Docker image using the following command:

docker build -t my-python-app .
  • docker build: Builds the Docker image.
  • -t my-python-app: Tags the image with a name.
  • .: Specifies the directory containing the Dockerfile.

4. Running the Docker Container

After building the image, start a container based on the image:

docker run -p 5000:5000 my-python-app
  • docker run: Runs the Docker container.
  • -p 5000:5000: Binds the local port 5000 to the container’s port 5000.
  • my-python-app: Specifies the image name to run.

This setup runs app.py, and the application becomes accessible at http://localhost:5000.

Running a Machine Learning Model with Docker

Let’s see how to run a machine learning model using Docker. We will create a Flask-based API and set up the environment using Docker.

1. Preparing the Required Files

Prepare the following files:

  1. app.py: The main code for the Flask application.
  2. requirements.txt: A file listing the Python library dependencies.
  3. Dockerfile: The Docker configuration file.

(1) app.py

This code uses Flask to serve a pre-trained model as an API:

from flask import Flask, request, jsonify
import tensorflow as tf
import numpy as np

app = Flask(__name__)

# Load the trained model
model = tf.keras.models.load_model('model.h5')

@app.route('/predict', methods=['POST'])
def predict():
    data = request.get_json()
    input_data = np.array(data['input']).reshape(1, -1)
    prediction = model.predict(input_data)
    predicted_class = int(np.argmax(prediction))
    return jsonify({'prediction': predicted_class})

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

(2) requirements.txt

List the necessary libraries:

flask
tensorflow
numpy

(3) Dockerfile

Use this Dockerfile to build an environment containing Flask and TensorFlow:

FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["python", "app.py"]

2. Building and Running the Docker Container

Build the Docker image and run the container:

docker build -t flask-ml-app .
docker run -p 5000:5000 flask-ml-app

This setup launches the Flask application inside a Docker container, exposing the trained model as an API.

3. Managing Docker Containers

Docker provides commands for container management. Here are some basic examples:

  • List running containers: docker ps
  • Stop a container: docker stop [container_id]
  • Restart a container: docker restart [container_id]

Best Practices for Using Docker

  1. Use Lightweight Base Images: To keep the image size small, use lightweight versions like python:3.9-slim.
  2. Avoid Caching: Use RUN pip install --no-cache-dir to prevent storing unnecessary files.
  3. Multi-stage Builds: Use multi-stage builds to remove development dependencies from the production image.
  4. Environment Variables: Store sensitive information (e.g., API keys) as environment variables rather than hardcoding them in the Dockerfile.

Summary

In this episode, we covered environment setup with Docker, explaining how container technology ensures reproducibility and minimizes environmental discrepancies. Docker allows seamless deployment across development and production environments, making it efficient for developing and operating machine learning models. Use this foundation to further develop your applications!

Next Episode Preview

Next time, we will introduce Version Control with Git, explaining the basics of code management and collaborative development. Learn how to manage projects efficiently and collaborate with your team!


Notes

  • Container: A virtualized environment that packages an application and its dependencies.
  • Multi-stage Build: A method of building Docker images in multiple stages to keep only the necessary parts for the final image.
Let's share this post !

Author of this article

株式会社PROMPTは生成AIに関する様々な情報を発信しています。
記事にしてほしいテーマや調べてほしいテーマがあればお問合せフォームからご連絡ください。
---
PROMPT Inc. provides a variety of information related to generative AI.
If there is a topic you would like us to write an article about or research, please contact us using the inquiry form.

Comments

To comment

TOC