Box plot

Box plots allow you to visualize and compare the distribution and central tendency of numeric values through their quartiles. Quartiles are a method of splitting numeric values into four equal groups based on five key values: minimum, first quartile, median, third quartile, and maximum. Box plots use the percentile calculation to determine quartile values. For example, the first quartile is equal to the 25th percentile.

The box portion of the diagram below illustrates the middle 50 percent of the data values, also known as the interquartile range (IQR). The median of the values is depicted as a line splitting the box in half. The IQR illustrates the variability in a set of values. A large IQR indicates a large spread in values, while a smaller IQR indicates most values fall near the center. Box plots also illustrate the minimum and maximum data values through whiskers, or lines, extending from the box, and optionally, outliers as points extending beyond the whiskers.

Box plot diagram

Example

The box plot below shows the distribution of life expectancy by continent at 20-year increments from 1800 to 2040.

  • Numeric fieldsLife expectancy
  • CategoryYear
  • Split byContinent
  • Show outliers—Enabled
Box plot of votes for life expectancy by continent

Data

The Data tab Data configurations include the variables that are used to create the box plot.

Variables

Box plots are composed of an x-axis and a y-axis. The x-axis assigns one box for each category or numeric variable. The y-axis is used to measure the minimum, first quartile, median, third quartile, and maximum value in a set of numbers.

You can use box plots to visualize one or many distributions. To visualize a single distribution, add one Numeric fields variable. This results in a chart with one box plot visualizing the distribution of the chosen numeric attribute.

You can add other Numeric fields variables to compare multiple distributions from different attribute fields in a table. For example, in a county dataset, Population2010 and Population2015 are added as Numeric fields variables. The resulting chart displays two box plots, one visualizing the distribution of Population2010, and the other visualizing the distribution of Population2015, for all counties in the dataset.

When you create a box plot from multiple numeric fields, a z-score standardization is applied by default. Standardization allows numeric variables of different units to be comparable.

For example, a box plot comparing the distributions of income (with values in the tens of thousands) and unemployment rate (values ranging between 0 and 1.0) would be difficult to read without standardization because the unemployment rate values are so much smaller than the income values.

Standardization of the attribute values involves a z-transform, where the mean for all values is subtracted from each value and divided by the standard deviation for all values. The z-score standardization puts all the attributes on the same scale, allowing multiple distributions to be visualized in the same chart. To visualize the raw values instead, turn off Standardize values (z-score).

When only a single Numeric fields variable is added, you can add a Category variable as a method of comparing distributions across categories. For example, Population2010 is set as the Numeric fields variable and StateName as the Category variable for a county dataset. The resulting chart displays a box plot for each state, visualizing the distribution of Population2010 for all counties belonging to each state.

Multiple series

You can use multiple series box plots to compare distributions of different types, or by different categories.

Multiple series box plots can be created by specifying a category field and multiple numeric fields or by specifying a split by category field.

When using a Category variable with multiple Numeric fields variables, each numeric field added to the series table creates a series. For example, in a county dataset, StateName is set as the Category variable and Population2010, Population2015, and Population2020 are set as the Numeric fields variables. The resulting chart will have states as categories along the x-axis, with three series each (Population2010, Population2015, and Population2020).

Alternatively, a Split by variable can be added as a way to further divide the data and create multiple series. For example, Population2010 is set as the Numeric fields variable, StateName as the Category variable, and ElectionWinner as a Split by field for a county dataset. The resulting chart will display two side-by-side box plots for each state (100 box plots total), one visualizing the distribution of Population2010 of all counties in each state with the ElectionWinner value of Democrat, and one for all counties in each state with the ElectionWinner value of Republican.

You can also use Split by fields when multiple Numeric fields variables are used instead of a Category variable. For example, Population2010, Population2015, and Population2020 are set as the Numeric fields variables and ElectionWinner is set as the Split by field for a county dataset. The resulting chart will display the three Numeric fields variables along the x-axis (Population2010, Population2015, and Population2020), each with two side-by-side box plots: one displaying the distribution for all counties with the ElectionWinner value of Democrat, and the other for all counties with the ElectionWinner value of Republican.

Outliers

You can show outliers as points extending beyond the whiskers by enabling Show outliers. If not enabled, the whiskers will extend to encompass all data points.

Sort order

Box plots are automatically sorted alphabetically by category (X-axis ascending). The sorting can be changed using the Sort order parameter. The following sort options are available for box plots:

  • X-axis ascending—Categories are arranged alphabetically from left to right.
  • X-axis descending—Categories are arranged in reverse alphabetical order.
  • Mean ascending—Boxes are arranged by the mean statistic in ascending order.
  • Mean descending—Boxes are arranged by the mean statistic in descending order.
  • Median ascending—Boxes are arranged by the median statistic in ascending order.
  • Median descending—Boxes are arranged by the median statistic in descending order.

Series

The Series tab Series configurations are used to change the color and label of boxes on the box plot.

Axes

The Axes tab Axes configurations are used to change the specifications for the x-axis and y-axis.

X-axis

Category labels are truncated at 11 characters by default. When labels are truncated, you can see the full text by hovering over the label. To display the entire label text on the chart, increase the Label character limit value.

Y-axis

Default y-axis bounds are based on the range of data values represented on the y-axis. You can customize these values by typing a Minimum bounds or Maximum bounds value. Set a y-axis bound to keep the scale of the chart consistent for comparison. Click the Reset button to revert the axis bound to the default value.

You can format the way the y-axis displays numeric values by specifying the number of decimal places and whether to include a thousands separator.

Guides

The Guides tab Guides configurations are used to add guides or guide ranges to the chart.

Guide lines or ranges can be added to charts as a reference or way to highlight significant values. Guides are added to the y-axis by clicking the Add guide button.

To create a guide line, enter a Start value where you want the line to draw. To create a guide range, enter a Start value and an End value. You can also change the appearance of the guide line or range. For lines, the style, width, and color can be updated. For ranges, the fill color can be updated.

You can optionally change the name of the guide using the Guide name parameter and add text to your guide using the Guide label parameter (for example, Median).

You can choose whether the guide renders in front of the chart or behind the chart using the In front and In back buttons in the Display parameter.

Format

The Format tab Format configurations are used to change the look of the chart by formatting text and symbol elements.

Chart formatting options include the following:

  • Text elements—Size, color, and style of the font used for the chart title, x-axis title, y-axis title, legend title, description text, legend text, axis labels, and data labels. You can change the format for multiple elements at once by pressing Ctrl and clicking to select the elements.
  • Symbol elements—Color, width, and style (Solid, Dot, or Dash) for grid and axis lines and the background color of the chart.

General

The General tab General configurations are used to update the titles for the chart, axes, and legend.

The default titles for charts and axes are based on the variable names and chart type. You can edit or turn off the titles on the General tab. You can also provide a title in the Legend title parameter. The Legend alignment can be set as Right, Left, Top, or Bottom. You can also add a chart description in the Description parameter. A description is a block of text that appears at the bottom of the chart window.

Resources

Use the following resources to learn more about charts: