Pareto Chart in Tableau
A Pareto Chart is a statistical chart, which is basically a vertical bar chart with a line graph over it. The values in this chart are plotted from left to right in decreasing order of their relative frequency. Pareto chart is basically used to identify the most frequent values. So, the longest bar on the left represent the most frequent value, whereas the shortest bar on the right represent the least frequent value. Here we will learn how to create Pareto chart in tableau.
The bars in the chart represent the individual values, whereas the line represents the ascending cumulative or running total. The horizontal axis in the chart represent the values to be measured, and the vertical axis represents the frequency of occurrence (or cost, time or money).
The chart is basically a pictorial representation of the Pareto principle, a theory stated by Vilfredo Pareto. According to the Pareto principle, i.e. the “80/20 rule”, 80% of the output is produced by 20% input. For instance, 80% of the wealth is owned by only 20% of the people. This principle can be extended or observed in many other instances. With the help of the longest and smallest bars, Pareto chart gives a visualization of the most significant or critical values that comprise 20% and are so called as “vital few”, and the remaining 80% values are known as “trivial many”.
Below is a simple illustration of a Pareto chart.
When is a Pareto Chart generally used?
- When we need to analyze data about the frequency of a process’s causes or problems, or the most significant problem is to be identified and resolved.
- It is generally used for quality control and thus to identify the biggest improvement opportunities.
Steps to Create a Pareto Chart in Tableau
- Let us say we have 793 customers and we wish to identify which products/customers account for most of the total sales, and see if 80% of the sales is from the top 20% customers. After loading the required customer data in tableau, the following steps can be used to create a Pareto chart:
- In order to create a bar chart, we pull out the customer and sales information from the data. To do this, we first drag the ‘Customer Name’ from ‘Dimensions’ in the ‘Data’ pane to ‘Columns’ section, and drag ‘Sales’ as values:
- So we have to expand the ‘Show Me’ drop down on the top right extreme and select the horizontal bar chart layout. Now since we require a vertical bar chart (since we know that the Pareto graph has vertical bars with a line graph on it), we swap the rows and columns of the horizontal chart layout by selecting the swap button as shown in the below screenshots:
- Now sort this in descending order by sales by selecting the ‘Sort’ button:
So we can see in the below screenshot that the maximum sales belong to ‘Sean Miller’ (25,043), and the second highest sales belong to ‘Tamara Chand’ and so on:
This way we have listed all our customers by sales in descending order.
Next we wish to see the cumulative effect or running total, (i.e. Sales of Sean Miller + Tamara Chand, Sales of Sean Miller + Tamara Chand + .Raymond Buch, and so on). Sales of top two, top three and so on).
- If we see the ‘Entire View’ of the chart, we see at the bottom that the total sum of sales is around 2.2 million
- Expand the ‘Sum(Sales)’ drop down in ‘Rows’ section and then select ‘Quick Table Calculation’ -> ‘Running Total’ :
This will give the below chart visual where the vertical axis represents running total:
Now let’s see 80% of the sales is from how many customers. We know that 80% * 2.2 million is close to 1700,000. So we again see the ‘Entire View’, and see which customer on x-axis cuts the y-axis at 1700,000.
This implies that share of all the customers falling on the left side from this point (close to Stephanie Upright) is 80% in the business, and share of those to the right side of this point is 20% in the business.
However, as clear from the visual, we can see that customers falling on the left side from this point, i.e. those whose share is 80% make up approximately 50% of the customers and not 20%. So to check this, we will convert the x and y axis in percentage. So let us first convert the y-axis (running total) in percentage and calculate running percentage of sum of sales. To do this, we follow the below steps:
- We apply a secondary calculation on the running total by expanding the ‘Sum(Sales)’ dropdown in ‘Rows’ section and then selecting ‘Edit Table Calculation’ :
- On doing this, a dialog box appears. Select ‘Running Total’ and check mark on ‘Add Secondary Calculation’. Select ‘Running Total’ under ‘Primary Calculation Type’ and check mark on ‘Customer name’ under ‘Specific Dimension’. Now select ‘Percent of Total’ under ‘Secondary Calculation Type’ and check mark on ‘Customer name’ under ‘Specific Dimension’:
This will give a graph visual as below:
In this scenario, we can clearly see a cutoff of 80%.
Now we convert x-axis also in percentage to see how much customers fall on left side of this point. So to do this, we first convert the graph in ‘Standard’ view and then assign indexing (across the table or customers) to each customer and divide each index by total number of customers.
- So now we have to create the calculated field, say ‘Customer%’. Use formula as index ()/size () for the same, where index () function will assign indexing to each customer across the table. size () function will calculate total number of customers.
- We will now have ‘Customer%’ in the ‘Measures’ pane. So we drag the ‘Customer%’ from ‘Measures’ in the ‘Data’ pane to ‘Columns’ section. Now drag ‘Customer Name’ over ‘Detail’ as we do not wish to see the ‘Customer Name’ in our visual but we need it in our calculation.
Since we have computed sum of sales using ‘Customer Name’, thus ‘Customer%’ also needs to be computed using the same. So we expand the dropdown of ‘Customer%’ in ‘Columns’ section and then click on ‘Compute Using’ -> ‘Customer Name’:
This will give a graph visual as below:
This is not sorted by ‘Customer%’. So we click on drop down of ‘Customer%’ in ‘Details’ section, and select ‘Sort’:
Select ‘Field’ in the ‘Sort’ dialog box, and then check mark on ‘Descending’, select : ‘Field name’: ‘Sales’ , ‘Aggregation’: ‘Sum’ :
So we will have a Pareto curve as below:
Now if we select the ‘Entire View’, the Pareto curve will look like:
We format the x-axis into percentage: Click on ‘Customer%’ dropdown and select ‘Format’:
Click on ‘Axis’ on the left pane, and select ‘Percentage’ in ‘Numbers’ dropdown:
So our x-axis is now in percentage, and we can see that 80% cutoff point falls at 50%. Now we can add reference line at 80% on y-axis as follows:
- Right click on the vertical axis and click on ‘Add Reference Line’:
Now make the following changes in the generated dialog box:
Similarly, add a reference line for 20% too. So we will have a Pareto curve with 80/20 reference lines as follows:
From the above screenshot, it is quite evident that customers falling on the left side from the 20% cutoff point is 44% of the customers and it not 80%.
Also, data (complete or partial) from the Pareto graph can even be exported to a csv file by right clicking on the graph and selecting ‘View data’ -> ‘Export All’. We can even select the area/portion of the left 50%( as the left 50% area is significant or important in this case), and then export only that portion to the csv file by selecting that portion and then selecting ‘View data’ -> ‘Export All’.
In the above example, we have successfully drawn a Pareto curve. which is not completely satisfied here. As 50% of the customers account for 80% of the total sales. 80% of the sales is from top 50% customers (and not top 20% customers), and sales from top 20% customers is approximately 48.33% (and not 80%):
So here we have covered, How to Create Pareto Chart in Tableau.