Tuesday, March 15, 2016

One of the Best and Most Underutilized Graphs in ggplot2



Understanding how a distribution of a variable changes over time can make a great visualization. These highly intuitive graphics can display a lot of information and can be simply rendered in R using ggplot2. However, based on my experience, they are one of the most underutilized graphs in R.

A good example of this style of graph is from my research. My research studies how data analysis can be utilized to improve the product design and manufacturing process. The style of graph discussed in this post is extremely useful for showing how design specifications change over time. Below you can see an example of how the specifications of secondary cameras on cellphones has changed over time.  It is easily seen that before 2011, there was almost no secondary cameras and by 2015, almost all cameras released had some form of secondary camera.

To create these plots, first lets load ggplot2 and the diamond data set.

library(ggplot2)
data(diamonds)
head(diamonds)


When creating these plots, I like to make sure I under stand how the data is distributed over the x axis. This is helpful because if there is a section of x-axis with much fewer data points, the distribution of the y-axis can change rapidly over the x-axis due to low samples.

The plot below shows the distribution of diamonds grouped by cut as the price changes.

ggplot(data=diamonds, aes(x=price, group=cut, fill=cut, position="stack")) + 
geom_density(adjust=1.5)


In the next plots instead of the count in the y-axis, the y-axis is the percent of each group (cut for the first example and clarity for the second) for different prices.

ggplot(data=diamonds,aes(x=price, group=cut, fill=cut, position="stack")) + 
geom_density(adjust=1.5, position="fill")


ggplot(data=diamonds,aes(x=price, group=clarity, fill=clarity, position="stack")) +
 geom_density(adjust=1.5, position="fill")


Hallway Mathlete Data Scientist

I am a PhD student in Industrial Engineering at Penn State University. I did my undergrad at Iowa State in Industrial Engineering and Economics. My academic website can be found here.