13.four - Box Plots

On the last page, we learned how to decide the first quartile, the median, and the third quartile for a sample of data. These three percentiles, along with a data set's minimum and maximum values, make up what is called the five-number summary. 1 overnice way of graphically depicting a information gear up's v-number summary is by way of a box plot (or box-and-whisker plot).

Here are some full general guidelines for drawing a box plot:

  1. Draw a horizontal centrality scaled to the data.
  2. In a higher place the axis, draw a rectangular box with the left side of the box at the first quartile \(q_1\) and the right side of the box at the third quartile \(q_3\).
  3. Describe a vertical line connecting the lower and upper horizontal lines of the box at the median \(m\).
  4. For the left whisker, depict a horizontal line from the minimum value to the midpoint of the left side of the box.
  5. For the correct whisker, depict a horizontal line from the maximum value to the midpoint of the correct side of the box.

Drawn as such, a box plot does a nice job of dividing the data graphically into fourths. Note, for instance, that the horizontal length of the box is the interquartile range IQR, the left whisker represents the kickoff quarter of the data, and the right whisker represents the fourth quarter of the data.

Example 13-three Revisited Section

Let's return to our random sample of 64 people selected to have the Stanford-Binet Intelligence Test. The resulting 64 IQs were sorted as follows:

68 75 78 83 85 85 85 86 86 87
87 88 90 91 91 91 91 93 93 93
94 94 94 96 96 97 98 98 99 99
99 99 100 101 101 102 102 104 104 105
105 105 106 106 106 107 107 107 107 107
108 109 110 110 111 114 116 116 117 122
123 128 136 141

Nosotros previously determined that the first quartile is 91, the median is 99.v, and the third quartile is 107. The interquartile range IQR is sixteen. Use these numbers, besides as the minimum value (68) and maximum value (141) to create a box plot of these data.

Solution

Past following the guidelines given to a higher place, a hand-drawn box plot of these data looks something like this:

65 75 85 95 105 i 15 125 135 145

In reality, you will probably almost always want to apply a statistical software parcel, such as Minitab, to create your box plots. If we ask Minitab to create a box plot for this data set, this is what we get:

65 75 85 95 105 IQ 1 15 125 135 145 * *

Hmm. How come up Minitab's box plot looks dissimilar than our box plot? Well, by default, Minitab creates what is chosen a modified box plot. In a modified box plot, the box is drawn simply as in a standard box plot, but the whiskers are defined differently. For a modified box plot, the whiskers are the lines that extend from the left and right of the box to the adjacent values. The adjacent values are divers as the lowest and highest observations that are still inside the region defined by the following limits:

  • Lower Limit: \(Q1-1.5\times IQR\)
  • Upper Limit: \(Q3+one.5\times IQR\)

In this example, the lower limit is calculated as \(Q1-i.5\times IQR=91-one.v(16)=67\). Therefore, in this case, the lower side by side value turns out to exist the aforementioned as the minimum value, 68, considering 68 is the lowest observation even so inside the region defined by the lower jump of 67. Now, the upper limit is calculated equally \(Q3+1.5\times IQR=107+1.v(16)=131\). Therefore, the upper next value is 128, because 128 is the highest ascertainment still inside the region defined by the upper bound of 131. In full general, values that autumn exterior of the adjacent value region are deemed outliers. In this case, the IQs of 136 and 141 are greater than the upper adjacent value and are thus deemed equally outliers. In Minitab's modified box plots, outliers are identified using asterisks.

Example 13-4 Revisited Section

Calcium supplements

Permit'southward return to the example in which we have a random sample of 20 concentrations of calcium carbonate (\(CaCO_3\)) in milligrams per liter:

130.8 129.9 131.5 131.ii 129.5 132.vii 131.v 127.eight 133.7
132.2 134.8 131.seven 133.9 129.8 131.4 128.8 132.seven 132.viii
131.4 131.3

With a piddling bit of work, information technology can exist shown that the five-number summary is as follows:

  • Minimum: 127.eight
  • Get-go quartile: 130.12
  • Median: 131.45
  • Third quartile: 132.seventy
  • Maximum: 134.8

Use the five-number summary to create a box plot of these data.

Solution

By following the guidelines given above, a hand-fatigued box plot of these data looks something like this:

128 130 132 134 136

In this case, the interquartile range IQR \(132.vii-130.12-ii.58\). Therefore, the lower limit is calculated as \(Q1-1.5\times IQR=130.12-ane.5(2.58)=126.25\). Therefore, the lower next value is the aforementioned every bit the minimum value, 127.eight, because 127.8 is lowest observation still inside the region defined past the lower bound of 126.25. The upper limit is calculated as \(Q3+1.5\times IQR=132.7+1.5(2.58)=136.57\). Therefore, the upper adjacent value is the same as the maximum value, 134.8, because 134.eight is the highest observation still inside the region defined by the upper bound of 136.57. Because the lower and upper adjacent values are the same as the minimum and maximum values, respectively, the box plot looks the same as the modified box plot

128 129 130 131 132 133 134 135 CaCO3
Box Plot of Calcium Carbonate Concentrations