Interquartile Range (IQR)

GCSE Statistics spread quartiles
\( IQR=Q_3-Q_1 \)

Statement

The interquartile range (IQR) is a measure of statistical spread. It shows how widely the middle 50% of the data values are distributed. The IQR is defined as the difference between the upper quartile (Q3) and the lower quartile (Q1):

\[ IQR = Q_3 - Q_1 \]

Why it’s true

  • The lower quartile \(Q_1\) is the value at the 25th percentile, meaning 25% of the data lies below it.
  • The upper quartile \(Q_3\) is the value at the 75th percentile, meaning 75% of the data lies below it.
  • Subtracting gives the range of the central 50% of the data, excluding outliers on both ends.

Recipe (how to use it)

  1. Order the dataset from smallest to largest.
  2. Find the median (this is \(Q_2\)).
  3. Find the median of the lower half → \(Q_1\).
  4. Find the median of the upper half → \(Q_3\).
  5. Subtract: \(IQR = Q_3 - Q_1\).

Spotting it

You use the IQR when asked about spread, variability, or when box plots are involved. It is a preferred measure over the full range, since it is not distorted by extreme values.

Common pairings

  • Box-and-whisker plots (the box spans from \(Q_1\) to \(Q_3\)).
  • Comparing variability of two datasets.
  • Identifying outliers (often using 1.5 × IQR rule).

Mini examples

  1. Data: 2, 4, 6, 8, 10. Q1=4, Q3=8 → IQR=4.
  2. Data: 1, 3, 5, 7, 9, 11. Q1=3, Q3=9 → IQR=6.

Pitfalls

  • Forgetting to order data: Quartiles are meaningless without sorting first.
  • Confusing median with quartiles: The IQR uses Q1 and Q3, not the overall median.
  • Small data sets: Be careful with whether to include/exclude the overall median when splitting halves.

Exam strategy

  • Always sort the data before calculating quartiles.
  • Check whether the question wants exact values or estimates from a graph (like cumulative frequency curves).
  • Remember: IQR = spread of the middle 50%, so it resists the effect of outliers.

Summary

The interquartile range is defined as \(Q_3 - Q_1\). It measures the spread of the middle 50% of the data and is a reliable statistic for comparing distributions. It is especially useful in box plots and for identifying variability in real-world datasets.