MENU

[AI from Scratch] Episode 218: Advanced Visualization with Seaborn — Creating Beautiful Graphs

TOC

Recap and Today’s Theme

Hello! In the previous episode, we explored the basics of data visualization using Python’s Matplotlib library. We learned how to create various graphs, such as line plots, bar charts, and scatter plots, to visualize data trends.

Today, we will use Seaborn, a library that builds on Matplotlib to create more advanced and visually appealing visualizations. Seaborn offers numerous features tailored for data analysis and statistical visualization, making it easy to create visually stunning graphs. Let’s dive into Seaborn’s basic operations!

What Is Seaborn?

Seaborn is a Python data visualization library particularly suited for visualizing statistical data. It has several key features:

  1. High-Quality Default Styles: Beautiful graph styles are set by default, allowing you to create visually appealing graphs easily.
  2. Compatibility with Pandas: You can directly use Pandas DataFrames to create graphs, making the visualization process efficient.
  3. Advanced Visualization Functions: Offers specialized graphs such as correlation matrices, heatmaps, and categorical plots, ideal for statistical data analysis.

Installing and Setting Up Seaborn

Seaborn is often pre-installed in Anaconda environments, but you can manually install it using the following command:

pip install seaborn

Importing Seaborn

It is common practice to import Seaborn with the alias sns.

import seaborn as sns
import matplotlib.pyplot as plt

Matplotlib’s pyplot is also imported to assist with graph displays.

Basic Graph Creation with Seaborn

1. Line Plot

Let’s create a line plot using Seaborn. The lineplot function allows you to easily draw line graphs.

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

# Preparing data
x = np.linspace(0, 10, 100)
y = np.sin(x)

# Storing data in a DataFrame
data = pd.DataFrame({'x': x, 'y': y})

# Drawing the line plot
sns.lineplot(x='x', y='y', data=data)
plt.title("Sine Wave Line Plot")
plt.show()
  • sns.lineplot(): Seaborn function that creates a line plot directly from a DataFrame.

2. Scatter Plot

You can create scatter plots to visualize the relationship between two variables using the scatterplot function.

# Loading a sample dataset
tips = sns.load_dataset('tips')

# Drawing the scatter plot
sns.scatterplot(x='total_bill', y='tip', data=tips)
plt.title("Total Bill vs Tip Scatter Plot")
plt.show()
  • sns.scatterplot(): Draws a scatter plot using a Pandas DataFrame, as shown with the tips dataset.

3. Bar Plot

Use the barplot function to compare categorical data.

# Drawing a bar plot
sns.barplot(x='day', y='total_bill', data=tips)
plt.title("Average Total Bill per Day")
plt.show()
  • sns.barplot(): Displays averages or totals for each category as a bar chart, showing the average total_bill per day in the tips dataset.

4. Histogram

The histplot function is effective for visualizing the distribution of data.

# Drawing a histogram
sns.histplot(tips['total_bill'], bins=20, kde=True)
plt.title("Total Bill Histogram with KDE")
plt.show()
  • sns.histplot(): Creates a histogram. Setting kde=True adds a Kernel Density Estimate (KDE) to smooth out the data distribution.

Advanced Visualization Features in Seaborn

1. Pair Plot

The pairplot function is useful for visualizing relationships between multiple variables at once.

# Drawing a pair plot
sns.pairplot(tips)
plt.show()
  • sns.pairplot(): Combines scatter plots and histograms for each pair of numeric variables in a DataFrame, making it easy to explore variable relationships.

2. Heatmap

The heatmap function is ideal for visualizing correlation matrices or 2D data.

# Creating a correlation matrix
corr = tips.corr()

# Drawing the heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap")
plt.show()
  • sns.heatmap(): Displays the correlation matrix as a heatmap. The annot=True parameter shows the values of each cell, and cmap specifies the colormap.

3. Boxplot

To check data distribution and outliers, use the boxplot function.

# Drawing a boxplot
sns.boxplot(x='day', y='total_bill', data=tips)
plt.title("Boxplot of Total Bill by Day")
plt.show()
  • sns.boxplot(): Visually displays the median, quartiles, and outliers of data, showing the distribution of total_bill by day.

4. Count Plot

To visualize the frequency of categorical variables, use the countplot function.

# Drawing a count plot
sns.countplot(x='day', data=tips)
plt.title("Count of Days")
plt.show()
  • sns.countplot(): Displays the number of occurrences for each category, making it easy to see the frequency distribution.

Customizing Seaborn Graphs

1. Changing Styles

Seaborn allows you to change graph styles using the set_style function.

# Setting the style
sns.set_style('whitegrid')

# Drawing a bar plot
sns.barplot(x='day', y='total_bill', data=tips)
plt.title("Styled Bar Plot")
plt.show()
  • sns.set_style(): Changes the overall graph style. Options include whitegrid, darkgrid, white, etc.

2. Setting a Color Palette

You can differentiate data using color palettes to make your graphs more visually appealing.

# Setting the color palette
sns.set_palette('pastel')

# Drawing a bar plot
sns.barplot(x='day', y='total_bill', hue='sex', data=tips)
plt.title("Bar Plot with Pastel Palette")
plt.show()
  • sns.set_palette(): Sets the color palette for graphs. Seaborn provides various palettes like pastel and deep.

Summary

In this episode, we explored advanced data visualization techniques using Seaborn. With Seaborn, you can easily create visually appealing graphs that enhance data understanding and communication. From basic plots to advanced visualizations like heatmaps and boxplots, Seaborn offers a wide array of tools for deep data exploration.

Next Episode Preview

Next time, we will discuss the basics of Scikit-learn, a library for machine learning. We’ll learn how to load data, preprocess it, and build basic models, taking our first steps in AI development!


Annotations

  • Pair Plot: A method for visualizing the relationships and distributions of multiple variables in a dataset simultaneously.
  • Heatmap: A graph that visualizes correlations or 2D data using color.
Let's share this post !

Author of this article

株式会社PROMPTは生成AIに関する様々な情報を発信しています。
記事にしてほしいテーマや調べてほしいテーマがあればお問合せフォームからご連絡ください。
---
PROMPT Inc. provides a variety of information related to generative AI.
If there is a topic you would like us to write an article about or research, please contact us using the inquiry form.

Comments

To comment

TOC