January 20, 2021

Politics and Commentary News Aggregator

A beginner’s guide to data visualization with Python and Seaborn

7 min read

Data visualization is a technique that allows data scientists to convert raw data into charts and plots that generate valuable insights. Charts reduce the complexity of the data and make it easier to understand for any user.

There are many tools to perform data visualization, such as Tableau, Power BI, ChartBlocks, and more, which are no-code tools. They are very powerful tools, and they have their audience. However, when working with raw data that requires transformation and a good playground for data, Python is an excellent choice.

Though more complicated as it requires programming knowledge, Python allows you to perform any manipulation, transformation, and visualization of your data. It is ideal for data scientists.

There are many reasons why Python is the best choice for data science, but one of the most important ones is its ecosystem of libraries. Many great libraries are available for Python to work with data like numpy, pandas, matplotlib, tensorflow.

Matplotlib is probably the most recognized plotting library out there, available for Python and other programming languages like R. It is its level of customization and operability that set it in the first place. However, some actions or customizations can be hard to deal with when using it.

Developers created a new library based on matplotlib called seaborn. Seaborn is as powerful as matplotlib while also providing an abstraction to simplify plots and bring some unique features.

In this article, we will focus on how to work with Seaborn to create best-in-class plots. If you want to follow along you can create your own project or simply check out my seaborn guide project on GitHub.

What is Seaborn?

Seaborn is a library for making statistical graphics in Python. It builds on top of matplotlib and integrates closely with pandas data structures .

Seaborn design allows you to explore and understand your data quickly. Seaborn works by capturing entire data frames or arrays containing all your data and performing all the internal functions necessary for semantic mapping and statistical aggregation to convert data into informative plots.

It abstracts complexity while allowing you to design your plots to your requirements.

[Read: Meet the 4 scale-ups using data to save the planet]

Installing Seaborn

Installing seaborn is as easy as installing one library using your favorite Python package manager. When installing seaborn, the library will install its dependencies, including matplotlib, pandas, numpy, and scipy.

Let’s then install Seaborn, and of course, also the package notebook to get access to our data playground.

pipenv install seaborn notebook

Additionally, we are going to import a few modules before we get started.

import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib

Building your first plots

Before we can start plotting anything, we need data. The beauty of seaborn is that it works directly with pandas dataframes, making it super convenient. Even more so, the library comes with some built-in datasets that you can now load from code, no need to manually downloading files.

Let’s see how that works by loading a dataset that contains information about flights.

Scatter Plot

A scatter plot is a diagram that displays points based on two dimensions of the dataset. Creating a scatter plot in the Seaborn library is so simple and with just one line of code.

sns.scatterplot(data=flights_data, x="year", y="passengers")