Visualizing Categorical Data

Base on DataCamp

Introduction to categorical plots using seaborn

Categorical plot

import seaborn as sns
import matplotlib.pyplot as plt

sns.catplot(...)
plt.show()

The catplot function

Parameters

  • x: name of the variable in data.

  • y: name of the variable in data.

  • data: a DataFrame

  • kind: type of plot to create - one of ("strip", "swarm", "box", "violin", "boxen", "point", "bar", "count")

Creating a box plot

# Set the font size to 1.25
sns.set(font_scale=1.25)

# Set the background to "darkgrid"
sns.set_style("darkgrid")

# Create a boxplot
sns.catplot(x="Traveler type", y="Helpful votes", data=reviews, kind="box")

plt.show()

Seaborn barplot

The hue parameter

  • hue:

    • name of a variable in data

    • used to split the data into a second category

    • used to color the graphic

    sns.catplot(
        x="Traveler type", 
        y="Score", 
        data="reviews", 
        kind="bar",
        hue="Tennis court" # <--- New parameter
    )

Creating a bar plot

# Print the frequency counts of "Period of stay"
print(reviews["Period of stay"].value_counts())

sns.set(font_scale=1.4)
sns.set_style("whitegrid")

# Create a bar plot of "Helpful votes" by "Period of stay"
sns.catplot(x="Period of stay", y="Helpful votes", data=reviews,kind="bar")
plt.show()

Ordering categories

# Set style
sns.set(font_scale=.9)
sns.set_style("whitegrid")

# Print the frequency counts for "User continent"
print(reviews["User continent"].value_counts())

# Convert "User continent" to a categorical variable
reviews["User continent"] = reviews["User continent"].astype("category")

# Reorder "User continent" using continent_categories and rerun the graphic
continent_categories = list(reviews["User continent"].value_counts().index)
reviews["User continent"] = reviews["User continent"].cat.reorder_categories(new_categories=continent_categories)
sns.catplot(x="User continent", y="Score", data=reviews, kind="bar")
plt.show()

Bar plot using hue

#1
# Add a second category to split the data on: "Free internet"
sns.set(font_scale=2)
sns.set_style("darkgrid")
sns.catplot(x="Casino", y="Score", data=reviews, kind="bar", hue="Free internet")
plt.show()

#2
# Switch the x and hue categories
sns.set(font_scale=2)
sns.set_style("darkgrid")
sns.catplot(x="Free internet", y="Score", data=reviews, kind="bar", hue="Casino")
plt.show()

#3
# Update x to be "User continent"
sns.set(font_scale=2)
sns.set_style("darkgrid")
sns.catplot(x="User continent", y="Score", data=reviews, kind="bar", hue="Casino")
plt.show()

#4
# Lower the font size so that all text fits on the screen.
sns.set(font_scale=1.0)
sns.set_style("darkgrid")
sns.catplot(x="User continent", y="Score", data=reviews, kind="bar", hue="Casino")
plt.show()

Point and count plots

Point plot

Point plot help users focus on the different values across the category by adding a connecting line across the points, while the y-axis is changed to better focus on the points.

Creating a point plot

# Create a point plot with catplot using "Hotel stars" and "Nr. reviews"
sns.catplot(
  # Split the data across Hotel stars and summarize Nr. reviews
  x='Hotel stars',
  y="Nr. reviews",
  data=reviews,
  # Specify a point plot
  kind="point",
  hue="Pool",
  # Make sure the lines and points don't overlap
  dodge=True
)
plt.show()

Creating a count plot

sns.set(font_scale=1.4)
sns.set_style("darkgrid")

# Create a catplot that will count the frequency of "Score" across "Traveler type"
sns.catplot(
  x="Score",
  data=reviews,
  kind="count",
  hue="Traveler type",
)
plt.show()

Additional catplot() options

Difficulties with categorical plots

Trying to visualize multiple categories can be difficult. Instead of creating six different plots, one for each continent, we can do better with following steps:

  • Using the catplot() facetgrid

      sns.catplot(
          x="Traveler type",
          kind="count",
          data=reviews,
          col="User continent",
          col_wrap=3,
          palette=sns.color_palette("Set1")
      )
    

    Common colors: "Set1", "Set2", "Tab10", "Paired"

  • Updating plots

    • Setup: save graphics as an object: ax

    • Plot title: ax.fig.suptitle("Super Title")

    • Axis labels: ax.set_axis_labels("x-axis-label", "y-axis-label")

    • Title height: plt.subplots_adjust(top=.9)

One visualization per group

# Create a catplot for each "Period of stay" broken down by "Review weekday"
ax = sns.catplot(
  # Make sure Review weekday is along the x-axis
  x="Review weekday",
  # Specify Period of stay as the column to create individual graphics for
  col="Period of stay",
  # Specify that a count plot should be created
  kind="count",
  # Wrap the plots after every 2nd graphic.
  col_wrap=2,
  data=reviews
)
plt.show()

Updating categorical plots

# Adjust the color
ax = sns.catplot(
  x="Free internet", y="Score",
  hue="Traveler type", kind="bar",
  data=reviews,
  palette=sns.color_palette("Set2")
)

# Add a title
ax.fig.suptitle("Hotel Score by Traveler Type and Free Internet Access")
# Update the axis labels
ax.set_axis_labels("Free Internet", "Average Review Rating")

# Adjust the starting height of the graphic
plt.subplots_adjust(top=0.93)
plt.show()