Ultimate Guide to Data Visualization in Python: Exploring the Top 3 Libraries

Quick and Easy Data Visualizations in Python for Unlocking the Power of Your Data

29 min read
Share on:

If you want to learn the visualization using python in one place, then you’ll LOVE this guide.

Have you ever struggled to make sense of a large dataset? Data visualization can be your secret weapon!

Data visualization can help make sense of large, high-dimensional datasets and facilitate clearer understanding, particularly during the Exploratory Data Analysis (EDA) phase of a project.

When it comes to presenting final results to non-technical audiences, it’s important to be able to communicate findings in a concise and compelling manner.

By turning data into pictures, you can quickly and easily understand trends, patterns, and relationships in your data.

However, the process of setting up data, parameters, figures, and plotting can become cumbersome.

And with Python’s powerful data visualization libraries you do just that! It makes it easy to create visualizations, but the setup process can be a bit tricky.

In this blog post, we’ll show you how to use Matplotlib, Seaborn, and Plotly, three of the most popular Python libraries for data visualization, to create quick and easy visualizations that will help you extract insights from your data.

Don’t let a confusing dataset stand in the way of your understanding – learn how to visualize your data and unlock its full potential!

Whether you’re a beginner looking to get started with data visualization in Python or an experienced data scientist looking to add new tools to your toolkit, this tutorial has something for you.


Let’s get started with your favorite library,



The database used for visualization

Here we used two Kaggle datasets for visualization using matplotlib library and used tips library for seaborn and Plotly libraries.

1. Chocolate bar ratings 2022 Database

The dataset used here is scraped from the flavor of cacao. This contains the chocolate reviews between 2006 and 2022.

There are ten columns in the dataset as follows:

  • REF (reference number). The highest REF numbers were the last entries made. They are not unique values
  • Company name or manufacturer
  • Company location (Country)
  • Date of review of the chocolate ratings
  • Origin of bean (Country)
  • Specific bean origin or bar name
  • Cocoa percent
  • Ingredients: Represents the number of ingredients in the chocolate; B = Beans, S = Sugar, S* = Sweetener other than white cane or beet sugar, C = Cocoa Butter, V = Vanilla, L = Lecithin, Sa = Salt)
  • Most memorable characteristics
  • Rating: The ratings are between 1 and 5, where 5 is considered the highest and 1 is the lowest.

You can download the chocolate bar rating database from here.

import pandas as pd

# reading the database
df = pd.read_csv("chocolate_bar_ratings.csv")

# printing the first 5 rows
df.head()
chocolate-bar-ratings-2022-database-kaggle

2. Alcohol Consumption around the World

This data is collected from World Health Organisation(WHO) and Global Information System on Alcohol and Health (GISAH).

There are ten columns in the dataset as follows:

  • country
  • beer_servings
  • spirit_servings
  • wine_servings
  • total_litres_of_pure_alcohol

You can download the alcohol consumption around the world database from here.

import pandas as pd

# reading the database
df = pd.read_csv("drinks.csv")

# printing the first 5 rows
df.head()
Alcohol Consumption around the World kaggle dataset

Visualization With Matplotlib

In this tutorial, we’ll show you how to use one of the most popular Python libraries Matplotlib to create quick and easy data visualizations.

Matplotlib is a popular Python library that can be used to create data visualizations with ease.

Matplotlib Python Install

To install the Matplotlib library for Python, you will need to have pip, the package manager for Python, installed on your system.

Matplotlib provides several color schemes that you can use to style your plots. You can check different color schemes here.

Once you have pip installed, you can use pip to install Matplotlib by running the following command in your terminal/command prompt:

pip install matplotlib

This will install the latest version of Matplotlib and all required dependencies.

matplotlib cmd install using pip

Alternatively, you can install a specific version of Matplotlib by specifying the version number in the command:

pip install matplotlib==3.5.1

This will install version 3.5.1 of Matplotlib.

If you are using Jupiter notebook, you can install matplotlib directly from your notebook cell.

!pip install matplotlib

Matplotlib Scatter Plots in python

Scatter plots are a type of data visualization that can be used to show the relationship between two variables.

In a scatter plot, each data point is represented by a dot, and the position of the dot on the x-axis and y-axis corresponds to the values of the two variables.

The Matplotlib can use scatter() method to draw a scatter plot and create scatter plots. Here’s an example of how you can use Matplotlib to create a scatter plot:

import pandas as pd
import matplotlib.pyplot as plt

# reading the database
df = pd.read_csv("chocolate_bar_ratings")

# remove % symbol
df['Cocoa Percent'] = list(map(lambda x: x[:-1], df['Cocoa Percent'].values))

# converting percent into float values
df['Cocoa Percent'] = [float(x) for x in df['Cocoa Percent'].values]

# Scatter plot with ratings againts cocoa percent
plt.scatter( df['Rating'], df['Cocoa Percent'])

#Adding Title to the plot
plt.title("Matplotlib Scatter Plot")

#Setting the X and Y labels
plt.xlabel("Rating")
plt.ylabel("Cocoa Percent")

plt.show()

Output –

Matplotlib Scatter Plots in python output

To make this graph more useful we can add color to a scatter plot in Matplotlib. You will need to use the colorbar() method of the Figure object.

This will create a scatter plot with a color bar on the right side of the plot. The color of the dots will be encoded by the values in the c array and the color bar will show the corresponding color scale.

Also, we can change the size of points by using the s parameter respectively of the scatter function.

Here’s an example of how you can create a scatter plot with a color bar using Matplotlib:

import pandas as pd
import matplotlib.pyplot as plt

# reading the database
df = pd.read_csv("chocolate_bar_ratings")

# remove % symbol
df['Cocoa Percent'] = list(map(lambda x: x[:-1], df['Cocoa Percent'].values))

# converting percent into float values
df['Cocoa Percent'] = [float(x) for x in df['Cocoa Percent'].values]

# Scatter plot with ratings againts cocoa percent
plt.scatter( df['Rating'], df['Cocoa Percent'], c = df['Review Date'],
           s = df['Cocoa Percent'])

#Adding Title to the plot
plt.title("Matplotlib Scatter Plot")

#Setting the X and Y labels
plt.xlabel("Rating")
plt.ylabel("Cocoa Percent")

# Showing colorbar
plt.colorbar()

plt.show()

Output –

Matplotlib Scatter Plot colourful in python output
AnupTechTips We would like to show you notifications for the latest news and updates.
Dismiss
Allow Notifications