Palmer’s Penguins Data Visualization

matplotlib
Author

Mario

Published

January 25, 2025

Introduction

In this post, we’ll be walking through how to visualize the classic Palmer’s Penguins dataset.

Reading in the Data

First, we read in the data from a url using pandas to create a dataframe.

import pandas as pd
url = "https://raw.githubusercontent.com/pic16b-ucla/24W/main/datasets/palmer_penguins.csv"
penguins = pd.read_csv(url) # read in data

islands = penguins["Island"].unique() # get a list of the different islands

penguins.head()
studyName Sample Number Species Region Island Stage Individual ID Clutch Completion Date Egg Culmen Length (mm) Culmen Depth (mm) Flipper Length (mm) Body Mass (g) Sex Delta 15 N (o/oo) Delta 13 C (o/oo) Comments
0 PAL0708 1 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A1 Yes 11/11/07 39.1 18.7 181.0 3750.0 MALE NaN NaN Not enough blood for isotopes.
1 PAL0708 2 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N1A2 Yes 11/11/07 39.5 17.4 186.0 3800.0 FEMALE 8.94956 -24.69454 NaN
2 PAL0708 3 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A1 Yes 11/16/07 40.3 18.0 195.0 3250.0 FEMALE 8.36821 -25.33302 NaN
3 PAL0708 4 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N2A2 Yes 11/16/07 NaN NaN NaN NaN NaN NaN NaN Adult not sampled.
4 PAL0708 5 Adelie Penguin (Pygoscelis adeliae) Anvers Torgersen Adult, 1 Egg Stage N3A1 Yes 11/16/07 36.7 19.3 193.0 3450.0 FEMALE 8.76651 -25.32426 NaN

Plotting

Now, we plot our data using matplotlib.

import matplotlib.pyplot as plt # import pyplot 

After importing pyplot, we should start building up to our final plot (it will be a bunch of histograms). Let’s create the make_hist function to plot our data.

def make_hist(row, island):
    '''
    This function puts our data into histograms, and it makes one row at a time. 
    Each row will have a different island that the data is from. 
    '''
    # make histogram for males on the left from a specific island
    ax[row, 0].hist(penguins[["Body Mass (g)"]][(penguins["Sex"]=="MALE") & (penguins["Island"]==island)])

    # set consistent limits of histograms
    ax[row, 0].set_xlim([2500,6500])
    ax[row, 0].set_ylim([0,25])

    # make histogram for females on the right from a specific island
    ax[row, 1].hist(penguins[["Body Mass (g)"]][(penguins["Sex"]=="FEMALE") & (penguins["Island"]==island)], color = "orange")

    # set consistent limits of histograms
    ax[row, 1].set_xlim([2500,6500])
    ax[row, 1].set_ylim([0,25])

Without the consistent limits that we set, it would be hard to really compare our data between each histogram. Otherwise, our sense of scale is thrown off.

Next, we will create a function to label everything and create our titles.

def set_text(row, island):
    '''
    This function labels and titles most everything in our final figure
    '''
    # set the labels for the x-axis
    ax[row, 0].set_xlabel("Body Mass (g)")
    ax[row, 1].set_xlabel("Body Mass (g)")

    # set the labels for the y-axis
    ax[row, 0].set_ylabel("Number of Penguins")
    ax[row, 1].set_ylabel("Number of Penguins")

    # set the titles for the histograms
    ax[row, 0].set_title("Male Penguins from " + islands[row]) 
    ax[row, 1].set_title("Female Penguins from " + islands[row]) 

Now we join these two functions to a bigger function that will make a row when it is run.

def make_row(row, island):
    '''
    This function completely creates a row by 
    plotting the data and labeling it. 
    '''
    make_hist(row, island)
    set_text(row, island)

The last thing to do is to create our subplots, make a row for each island, and top it off by titling the whole figure.

# create one plot for males, one for females, and make a row for each island
fig, ax = plt.subplots(len(islands), 2, figsize=(13, 17))

for index, island in enumerate(islands):
    make_row(index, island) # make each row

fig.suptitle("Body Mass (g) of Palmer's Penguins") # set title of the whole figure
Text(0.5, 0.98, "Body Mass (g) of Palmer's Penguins")