GRAPH SERIES: CIRCULAR BARPLOT

ME: Circular Barplot.

Hello, Letícia from Minha Estatística here :). Clique aqui para ir ao post em Português.

This week for the Graph Series we have the Circular Barplot!

Circular barplots are an excellent choice for visualizing data when you want to emphasize cyclical patterns or relationships in a visually appealing way. They are particularly useful for representing time-based data, such as months or seasons; datasets with categorical variables arranged in a circular format and are mostly used for ranking data. By arranging the bars in a circular format, these plots can make patterns and trends easier to spot, while also adding an aesthetic dimension that can make your visualization stand out. However, it’s important to ensure the design remains clear and easy to interpret for your audience.

They might not contain as much information as you'd wish a plot to have statistically. As an example, the following charts illustrate the birthwt dataset from R's MASS library; they show how the data is divided in "Race" groups, but they don't give much information as a boxplot or histogram. The following charts are created in Python and R, respectively.

While visually appealing, circular barplots make it more difficult to discern important data aspects, such as which bar is larger, and they lack the ability to display key statistical measures like medians or distributions. For these plots, it's important that the inner circle size is large enough; otherwise, the bars may appear asymmetric and skewed, inaccurately reflecting the observations or data.

The boxplot and histogram provide more useful statistics for analysis, such as the median and quartiles for the boxplot, and data distribution for the histogram. For deeper insights, combining circular bar plots with complementary plots like line graphs, histograms, or boxplots can offer a more comprehensive understanding and analysis of your data.

The circular barplot can be useful if you have cyclical data, i.e., if your data exhibits periodicity such as time of the day, days, months, seasons, and even compass directions. For example, it can be applied to analyze consumer spending patterns over the time to identify peaks or, in business, to highlight which days have the highest or lowest productivity. Circular barplots excel at providing a quick and visually engaging overview of periodic trends!

To show how to create this plot I chose the AirPassengers dataset, which is a time series that contains monthly data of number of passengers from 1949 to 1960. Now, let’s get started!

1) Plot in R

Start by loading the required libraries as well the dataset:


# Load required libraries and AirPassengers dataset
library(ggplot2)
data("AirPassengers") # Time series dataset
    
      

Since this dataset is a time series, we need to convert it into a DataFrame and also make some adjustments to the variables, such as converting the "month" variable into a category.


# Convert dataset into a dataframe
airpassenger_data <- data.frame( 
  month = rep(1:12, length.out = length(AirPassengers)),
  passengers = as.numeric(AirPassengers)
)
# Prepare the data
 airpassenger_data$month <- factor(airpassenger_data$month,
                                  levels = 1:12, 
                                  labels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", 
                                             "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))                              
  
  

Now to create the plot, the ggplot2 package will be used with its base funcion along with the geom_bar function. The argument stat = "identity" in the function means the plot bars are equivalent to the data values, while stat = "count" sets the bars to the number of occurrences for each x-value, adjusting the bar heights based on its frequency.


ggplot(airpassenger_data, aes(x = month, y = passengers, fill = month)) +
  geom_bar(stat = "identity", color = "black", alpha = 0.7, width = 1) +
  coord_polar(start = 0) +  # Makes the plot circular
  labs(title = "Monthly Airline Passengers", 
       x = "Month", 
       y = "Number of Passengers", 
       fill = "Month") +
    theme_minimal() +
    theme(
    axis.title = element_blank(),  # Remove the axis titles
    plot.title = element_text(hjust = 0.5) # Center title
  )
  
  

The coord_polar function transforms the plot into a polar coordinate system. By default, geom_bar generates plots in a Cartesian coordinate system (i.e., x and y axes forming a rectangular grid). Removing coord_polar() results in a regular histogram instead of a circular one, as this next code and plot shows. The geom_bar function stays the same, including its arguments.


ggplot(airpassenger_data, aes(x = month, y = passengers, fill = month)) +
  geom_bar(stat = "identity", color = "black", alpha = 0.7) +
  labs(title = "Monthly Airline Passengers", 
       x = "Month", 
       y = "Number of Passengers", 
       fill = "Month") +
  theme_minimal()
  
  

Now, to be more analytical, there's the possibility to create a boxplot as well. By changing geom_bar function to geom_boxplot you generate a boxplot. In this case there's no need to indentify the stat argument, because while histograms provide better insights into the data distribution, the boxplot will highlight the median for each category, as well as its quartiles, while also revealing outliers if present:


ggplot(airpassenger_data, aes(x = month, y = passengers, fill = month)) +
  geom_boxplot(color = "black", alpha = 0.7) +
  labs(title = "Monthly Airline Passengers", 
       x = "Month", 
       y = "Number of Passengers", 
       fill = "Month") +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5) # Center title
  )
  
  

Having explored how to create these visualizations in R, we can now shift to Python and learn how to generate the same plots using matplotlib for the circular barplot and seaborn for the histogram and boxplot! So let’s walk through the process of replicating these plots in Python.

2) Plot in Python

To create the circular barplot in Python, the main libraries needed are pandas for importing, cleaning, and organizing the data into a dataframe, numpy, which is essential for numerical operations like generating arrays and working with angles when creating the circular plot, and matplotlib, which is the primary library for plotting. Additionally, the seaborn library will be used for color palettes and to create the histogram and boxplot.


# Imports necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sea
    
      

With this setup, you'll be able to load and import the dataset regardless of the interpreter you're using. To load the dataset, use pd.read_csv, and then follow the steps to convert it into a DataFrame. It’s also important to properly prepare the data; in this case, we needed to convert the "month" column into a categorical variable and update the labels for each month.


# Load AirPassengers dataset
air_passengers = pd.read_csv('path_to_your_dataset/airline-passengers.csv', delimiter=',')
# Convert dataset into a dataframe
  airpassenger_data = pd.DataFrame({
    'month': np.tile(np.arange(1, 13), int(len(air_passengers) / 12)),
    'passengers': air_passengers['Passengers'].astype(int)
})
# Prepare the data  
airpassenger_data['month'] = airpassenger_data['month'].astype('category')
airpassenger_data['month'] = airpassenger_data['month'].cat.rename_categories({1: 'Jan', 2: 'Feb', 3: 'Mar',
                                                                             4: 'Apr', 5:'May',6:'Jun',
                                                                             7:'Jul', 8:'Aug', 9:'Sep',
                                                                             10:'Oct', 11:'Nov', 12:'Dec'})
 
      

This following code calculates the angles for each bar in a circular plot. The operation (np.arange(len(airpassenger_data)) % 12) ensures the angles repeat every month. These values are then scaled to fit into a circle by multiplying by 2 * np.pi (the total angle of a circle) and dividing by the number of months. The width of each bar is calculated by dividing the full circle (2 * np.pi) into 12 equal segments, ensuring the bars are evenly spaced around the circle. This results in evenly spaced angles and bar widths for each month in the circular bar plot.


# Calculate angles for the polar plot
angles = 2 * np.pi * (np.arange(len(airpassenger_data)) % 12) / 12
width = 2 * np.pi / 12
      
    

To create the plot, first, set up the figure (fig) and the set of axes (ax) for plotting. The argument subplot_kw={'projection': 'polar'} tells matplotlib to create a polar plot, arranging the bars in a circular layout. The values for the angles and width are used to position and space the bars along the circular axis. Next, you'll set the x-ticks to correspond to each month and adjust their labels. Finally, you can make additional adjustments, such as removing the y-axis lines and grid for a cleaner appearance.



# Set up the polar plot
fig, ax = plt.subplots(figsize=(8, 10), subplot_kw={'projection': 'polar'})
# Create the barplot
bars = ax.bar(
    angles,
    airpassenger_data['passengers'],
    width=width,
    color=sea.color_palette("Set2", n_colors=12),  # "Set2" color palette from seaborn
    edgecolor='black',
    alpha=0.7
)
# Adjust the x-ticks and radial ticks
ax.set_xticks(np.linspace(0, 2 * np.pi, 12, endpoint=False))
ax.set_xticklabels(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',
                    'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
ax.set_yticks([])  # Hide the radial y-ticks
ax.grid(False)

plt.title("Monthly Airline Passengers", ha='center', fontsize=14,y=1.1) # Add a title
plt.show() # Show the plot
      
    

The result is a circular bar plot where each bar represents the number of passengers for each month, arranged in a circular layout. The bars are evenly spaced around the circle, with each month's value placed at a specific angle. The plot allows for a clear comparison of passenger numbers throughout the year, and by adjusting the visual elements, the chart becomes more focused on the data, providing a visually appealing and effective way to display the data distribution over time.

The reason why matplotlib is used to create the circular barplot is that seaborn's functions do not have an option to change the coordinate system to polar, so is typically used to create plots like histograms and boxplots: the .barplot function creates a histrogram, and these functions usually have the arguments for setting x and y values, and the ci=None argument is available to disable the confidence interval above the bars.

  
plt.figure(figsize=(10, 8))
sea.barplot(x='month', y='passengers', data=airpassenger_data, palette="Set2", ci=None)

# Add title and labels
plt.title("Monthly Airline Passengers", fontsize=14)
plt.xlabel(" ", fontsize=10)
plt.ylabel("Number of Passengers", fontsize=10)
# Shows the plot
sea.despine() # Without border 
plt.show()
       
    

To create the boxplot, the .boxplot function is used with x and y values to plot, and the palette="Set2" argument customizes the colors, while other functions (that are also applied for the histogram) are used to adjust the labels and title of the plot, including, additionally, .despine() which removes the borders for a cleaner appearance.

 
plt.figure(figsize=(8, 6))
sea.boxplot(x='month', y='passengers', data=airpassenger_data, palette="Set2")

# Add title and labels
plt.title("Monthly Airline Passengers", fontsize=14)
plt.xlabel(" ", fontsize=10)
plt.ylabel("Number of Passengers", fontsize=10)
# Shows the plot
sea.despine()
plt.show()

           
    

Conclusion

Creating circular barplots in both R and Python offers an engaging way to visualize cyclical or periodic data. Whether you're using R's ggplot2 package or Python's matplotlib, both provide powerful tools to design and customize these unique plots. Circular barplots allow you to display data over time or any cyclical process in an aesthetically appealing manner, making them ideal for showing patterns in monthly sales, custumer traffic, weather trends, or any other repetitive data.

While R provides the flexibility of seamless integration with other statistical plots and the ability to easily modify visual elements, Python offers a more detailed and customizable approach, particularly when working with additional libraries like seaborn for improved styling.

I hope you enjoyed this week's post as much as I enjoyed making it! Thank you for being here and always feel free to share this content and comment!

We're also on Instagram @minhaestatistica, and I look forward to seeing you there!

I wish you all a happy new year, with lots of grace, happines and succes. See you next year!

Letícia - Minha Estatística.

Comentários

Postagens mais visitadas