MENU

Classification and Regression (Learning AI from scratch : Part 13)

TOC

Recap of Last Time and Today’s Topic

Hello! In the last session, we discussed labels (targets), the “correct answers” used in supervised learning that directly impact the accuracy of AI models. Today, we’ll explore the two main categories of AI prediction problems: classification and regression.

Classification and regression are two major types of prediction problems in machine learning. Classification is used to assign data to different categories, while regression is used to predict numerical values. These problems are applied in various fields, and understanding how AI makes useful predictions from data is crucial. Let’s dive into the differences and applications of classification and regression.

What Is Classification?

Identifying Categories

Classification is the task of assigning data to predefined categories. This can involve binary classification, where data is split into two categories, or multi-class classification, where data is divided into three or more categories.

For example, spam email filtering is a typical binary classification task, where emails are classified as either “spam” or “not spam.” Meanwhile, recognizing handwritten digits is an example of multi-class classification, where each digit from “0” to “9” is identified.

Classification Algorithms

Several algorithms are commonly used for classification tasks. Some of the most popular ones include:

  • Logistic Regression: Despite its name, logistic regression is used for classification problems. It is particularly useful for binary classification, predicting the probability that input data belongs to a certain class.
  • Support Vector Machine (SVM): SVM is an algorithm that finds the optimal boundary between classes by mapping data points into a high-dimensional space, even when the data is not linearly separable.
  • Decision Tree: This algorithm predicts classes by splitting data based on conditions, forming a tree structure. Its visual representation makes it easy to understand and interpret.
  • Random Forest: An ensemble learning method that uses multiple decision trees to make stable predictions. Each tree makes a prediction, and the final classification is based on majority voting.

Applications of Classification

Classification is widely used in various real-world scenarios, including:

Medical Diagnosis

In healthcare, classification models are commonly used to determine whether a patient has a certain disease. For instance, AI models can analyze X-ray images to classify whether a patient has pneumonia, distinguishing between “normal” and “abnormal” images.

Customer Segmentation

In marketing, customer data can be classified into groups such as “potential customers,” “existing customers,” or “inactive customers.” This helps companies tailor their marketing strategies for different customer segments.

What Is Regression?

Predicting Numerical Values

Regression deals with predicting continuous numerical values based on input data. Regression models are used when the task is to estimate a numerical outcome. Examples include predicting house prices or stock market trends.

Regression Algorithms

Regression tasks also utilize several algorithms. Some of the most well-known ones include:

  • Linear Regression: The most basic regression algorithm, which models the relationship between input data and the predicted value as a straight line.
  • Ridge Regression: An extension of linear regression that adds a regularization term to prevent overfitting, improving the model’s generalizability.
  • Lasso Regression: Similar to ridge regression, but it uses a different regularization term. Lasso regression automatically selects important features by setting the coefficients of irrelevant features to zero.
  • Support Vector Machine (SVM): SVM can also be applied to regression problems, where it maps data into a high-dimensional space to make predictions. It’s particularly powerful for non-linear regression tasks.

Applications of Regression

Regression is also widely applied in various fields. Here are a couple of examples:

House Price Prediction

Predicting house prices is a common regression problem. The model uses input data such as the size of the house, the number of rooms, and the location to estimate its price. These predictions can help guide real estate decisions.

Sales Forecasting

Sales forecasting is another example of a regression problem. By analyzing past sales data and market trends, companies can predict future sales. This helps in managing inventory and planning marketing strategies.

Differences Between Classification and Regression

Different Prediction Outcomes

The main difference between classification and regression lies in the type of prediction they generate. Classification assigns data to categories, while regression predicts continuous numerical values. For instance, predicting whether a customer will purchase a product is a classification task, while predicting the amount they will spend is a regression task.

Model Evaluation

The evaluation metrics for classification and regression models are different. Classification models are evaluated using metrics like accuracy, F1 score, and ROC curves, while regression models are evaluated using metrics such as mean squared error (MSE), R-squared, and mean absolute error (MAE). These metrics help measure how accurately the model is making predictions.

Different Application Areas

Classification and regression are applied in different fields. Classification is commonly used in areas like medical diagnosis, spam filtering, and image recognition, while regression is used for tasks like financial forecasting, economic analysis, and supply-demand prediction.

The Future of Classification and Regression

Classification and regression are fundamental tasks in AI and machine learning and will continue to evolve. With the advancements in deep learning, more complex classification and regression models are being developed to address real-world challenges.

New techniques, such as hybrid models and ensemble learning, are emerging to further improve accuracy and generalizability. This will allow AI to make even more precise predictions across various industries, broadening its range of applications.

Coming Up Next

Now that we have explored classification and regression, in the next session, we’ll dive into the issue of overfitting. Overfitting occurs when a model becomes too specialized to its training data, losing its ability to generalize. We will learn about its causes and how to address it.

Summary

In this session, we covered the two main categories of AI prediction problems: classification and regression. Classification involves categorizing data, while regression predicts continuous numerical values. Understanding the differences and applications of these methods provides valuable insights for designing AI models. Next time, we’ll explore the topic of overfitting in greater detail, so stay tuned!


Notes

  • Binary Classification: A classification task that divides data into two categories. For example, spam email filtering classifies emails as “spam” or “not spam.”
  • Multi-Class Classification: A classification task that divides data into three or more categories. For example, handwriting recognition classifies digits from “0” to “9.”
  • Linear Regression: A basic regression algorithm that models the relationship between input data and the predicted value using a straight line.
Let's share this post !

Author of this article

株式会社PROMPTは生成AIに関する様々な情報を発信しています。
記事にしてほしいテーマや調べてほしいテーマがあればお問合せフォームからご連絡ください。
---
PROMPT Inc. provides a variety of information related to generative AI.
If there is a topic you would like us to write an article about or research, please contact us using the inquiry form.

Comments

To comment

TOC