This post discuss about the Box plots, how to read them and demonstrates an instance of drawing useful insights from them.
Box plot are a different kind of visualizations. Box plots are not meant for visualizing your data to novices or to the people who possibly couldn't understand the statistical concepts behind it. Hence they should not be the first choice to visualize if your audience is a common person in your company. But, if the potential stackholders are aware of them, these plots could discover insights that other visualizations can't.
How to read* a Box Plot:
There are two parts of a box plot:
Box: The bottom and the top of the box are the first and third quartiles.
Lines extending from the box: These lines are also called whiskers. These lines are up to the distance of 1.5 times of Inter-Quartile range (which is difference between the first and the third quartile). The end of whiskers are called hinges.
The points lying further the lines are considered as outliers and can be removed from the dataset during outlier treatment.
The line inside the box is the second Quartile (median) of the dataset. The space between the different parts of the blot giver the dispersion (spread) and skewness in the data.
Case: Visualizing the Growth Percentage of startups in different Regions using Box Plots
This simple Box Plot show the growth percentage of 30 different startups in 4 different regions called NSW, QLD, VIC and WA.
This plot could be used to do a competitive Analysis and figure out which place could be the best in order to start a Business in oder to maximize the chances of success.
In NSW, there is a very large variability in the growth percentage even though the median of growth is highest comparatively. At first this may seem to be the idea state to start a Business in, but large range of Inter-quartile distance means that it is also likely to be at the bottom of the graph if other parameters are kept constant.
The variability is little less in QLD, but median of growth rate is also quite low.
In VIC, the median is not as high as that of NSW but variability is very less. Also, the median is shifted towards the upper side meaning that the chances of making a high growth rate are better.
WA is the worst state to invest in, as median being lowest and variability in growth rate being largest comparatively.
The insights which can be drawn from the box plots sometimes are not visible in other types of visualizations.
Hope this conveys the importance of looking data using Box Plots.
How informative was this post?
Let me know.