MENU

Lesson 161: Coefficient of Determination (R²)

TOC

Recap: Mean Absolute Error (MAE)

In the previous lesson, we discussed Mean Absolute Error (MAE), a metric that calculates the average absolute difference between predicted and actual values. MAE is useful when the impact of outliers needs to be minimized, providing a stable evaluation of overall error. It’s suitable for tasks where outliers have minimal influence but might not be the best choice when large errors need emphasis.

Today, we’ll cover another key evaluation metric for regression models: the Coefficient of Determination (R²), which measures how much of the variance in the data the model can explain.


What is the Coefficient of Determination (R²)?

The Coefficient of Determination (R²) quantifies how well a regression model explains the variance in the data, indicating the model’s “explanatory power.” The R² value ranges from 0 to 1, where values closer to 1 indicate higher explanatory power, and values closer to 0 indicate lower explanatory power.

R² is calculated using the following formula:

[
R^2 = 1 – \frac{\sum_{i=1}^{n} (y_i – \hat{y}i)^2}{\sum{i=1}^{n} (y_i – \bar{y})^2}
]

Where:

  • (y_i) is the actual value,
  • (\hat{y}_i) is the predicted value,
  • (\bar{y}) is the mean of all actual values.

This formula compares the prediction error (numerator) to the total variance in the data (denominator).

Example: Understanding R²

R² can be compared to “exam performance.” Imagine comparing a student’s score with the class average. If the student’s score is much higher than the average (closer to 1), they have performed well independently of the class. However, if the score is close to the class average (closer to 0), it indicates that the student’s performance is similar to the class mean, showing less independent achievement.


Example Calculation of R²

Let’s calculate R² using a practical example.

Example: House Price Prediction Model

Consider a model predicting house prices with the following data:

  • Actual house prices: $300,000, $400,000, $500,000
  • Predicted house prices: $320,000, $390,000, $510,000

First, calculate the sum of squared errors:

  1. ((300,000 – 320,000)^2 = 400,000,000)
  2. ((400,000 – 390,000)^2 = 100,000,000)
  3. ((500,000 – 510,000)^2 = 100,000,000)

The total sum of squared errors is 600,000,000.

Next, calculate the mean of the actual values:

[
\bar{y} = \frac{300,000 + 400,000 + 500,000}{3} = 400,000
]

Now, calculate the variance:

  1. ((300,000 – 400,000)^2 = 100,000,000)
  2. ((400,000 – 400,000)^2 = 0)
  3. ((500,000 – 400,000)^2 = 100,000,000)

The total variance is 200,000,000.

Finally, calculate R²:

[
R^2 = 1 – \frac{600,000,000}{200,000,000} = 1 – 3 = -2
]

An R² value of -2 indicates that the model’s predictions are worse than simply using the mean of the dataset, demonstrating poor performance.

When R² is Important

R² is crucial for evaluating the performance of regression models, especially when assessing how much of the variance in the data the model can explain. A model with an R² value close to 1 has high explanatory power, indicating accurate predictions. Conversely, an R² value close to 0 suggests the model struggles to explain the data variance.


Advantages and Disadvantages of R²

Advantages

  1. Clear Measure of Explanatory Power: R² provides a straightforward way to quantify the model’s explanatory power, making it easy to understand how much variance in the data is explained by the model.
  2. Model Comparison: R² is useful for comparing different regression models on the same dataset. Higher R² values indicate models that fit the data better.

Disadvantages

  1. Risk of Overfitting: A very high R² value may indicate that the model is overfitting the training data, which can lead to poor generalization on new data.
  2. Not Suitable for Non-Linear Models: R² works well for linear regression models but may not accurately reflect performance for models with non-linear relationships.

Example: Understanding R²’s Limitations

The limitations of R² can be likened to a “sports team’s performance.” A team might perform exceptionally well at home games but struggle during away games. Similarly, a high R² value might indicate that the model is overly tuned to the training data (home games) but fails to generalize to new data (away games), highlighting the risk of overfitting.


Applications of R²

R² is widely used in regression analysis, particularly in the following scenarios:

  1. House Price Prediction: Evaluates how much the model explains fluctuations in house prices.
  2. Stock Price Forecasting: Measures the model’s ability to explain past stock price variability.
  3. Sales Forecasting: Assesses how well the model explains sales trends and predicts future sales.

Summary

In this lesson, we covered the Coefficient of Determination (R²), an important metric that evaluates the explanatory power of regression models. R² indicates how much of the variance in the data the model can explain, making it a widely used tool for measuring the performance of linear regression models. However, if R² is too high, it may suggest overfitting, making it essential to use other metrics for a comprehensive evaluation.


Next Topic: Analyzing Learning Curves

In the next lesson, we will explore Learning Curves, which visualize the training process and provide insights into how the model is learning. Stay tuned!


Notes

  1. Coefficient of Determination (R²): A metric indicating how much variance in the data is explained by the model. It ranges from 0 to 1, with higher values indicating greater explanatory power.
  2. Overfitting: When a model fits the training data too closely, leading to poor performance on new, unseen data.
  3. Linear Regression Model: A regression model that predicts outcomes based on linear relationships between variables.
  4. Variance: A measure of how much data points deviate from the mean.
  5. Outliers: Data points significantly different from others in the dataset.
Let's share this post !

Author of this article

株式会社PROMPTは生成AIに関する様々な情報を発信しています。
記事にしてほしいテーマや調べてほしいテーマがあればお問合せフォームからご連絡ください。
---
PROMPT Inc. provides a variety of information related to generative AI.
If there is a topic you would like us to write an article about or research, please contact us using the inquiry form.

Comments

To comment

TOC