Statistics

Introduction to Statistics

Statistics is the science of collecting, organizing, analyzing, and interpreting data. In our data-driven world, statistical literacy is an essential skill—from understanding medical research to making informed decisions in everyday life.

Descriptive statistics summarizes data using numbers and visualizations, while inferential statistics uses sample data to make predictions about larger populations. This article focuses on descriptive statistics.

Types of Data

Categorical vs. Numerical Data

Categorical (Qualitative) data represents characteristics or qualities:

Numerical (Quantitative) data represents numbers:

Measures of Central Tendency

Central tendency describes the center or typical value of a dataset. The three main measures are mean, median, and mode.

Mean (Average)

The mean is the sum of all values divided by the count of values.

Mean (x̄) = (Σx) / n

where Σx is the sum of all values and n is the number of values.

Example 1: Calculating Mean

Find the mean of: 4, 8, 6, 5, 3, 8, 9

Sum = 4 + 8 + 6 + 5 + 3 + 8 + 9 = 43

n = 7 values

Mean = 43 / 7 = 6.14

Weighted Mean: When values have different importance:

x̄w = (Σwx) / (Σw)

where w is the weight for each value.

Median

The median is the middle value when data is arranged in order.

Example 2: Finding the Median

Data: 3, 7, 2, 9, 5, 8, 1

Step 1: Arrange in order: 1, 2, 3, 5, 7, 8, 9

Step 2: Find the middle: 7 values, so the 4th is middle

Median = 5

Example 3: Median with Even Number of Values

Data: 3, 7, 2, 9, 5, 8

Step 1: Arrange: 2, 3, 5, 7, 8, 9

Step 2: Average of two middle values: (5 + 7) / 2

Median = 6

Mode

The mode is the most frequently occurring value(s) in a dataset.

Example 4: Finding the Mode

Data: 3, 5, 7, 5, 2, 5, 9, 5

Count: 3(1), 5(4), 7(1), 2(1), 9(1)

Mode = 5 (appears 4 times)

A dataset can have:

Choosing the Right Measure

Measures of Spread (Variation)

Central tendency alone doesn't fully describe data. Spread tells us how varied the data is.

Range

The simplest measure of spread:

Range = Maximum - Minimum

Data: 3, 7, 2, 9, 5, 8

Range = 9 - 2 = 7

Variance and Standard Deviation

These measure how far values typically are from the mean.

Population Variance (σ²)

σ² = Σ(x - μ)² / N

where μ is the population mean and N is the population size.

Sample Variance (s²)

s² = Σ(x - x̄)² / (n - 1)

where x̄ is the sample mean and n is the sample size.

Note: We use (n-1) for sample variance to get an unbiased estimate.

Standard Deviation

σ = √σ² or s = √s²

Standard deviation is in the same units as the original data.

Example 5: Calculating Variance and Standard Deviation

Data: 4, 8, 6, 5, 3 (n = 5)

Step 1: Mean = (4 + 8 + 6 + 5 + 3) / 5 = 26/5 = 5.2

Step 2: Find deviations from mean:

(4-5.2) = -1.2, (8-5.2) = 2.8, (6-5.2) = 0.8, (5-5.2) = -0.2, (3-5.2) = -2.2

Step 3: Square deviations:

1.44, 7.84, 0.64, 0.04, 4.84

Step 4: Sum of squared deviations = 14.8

Step 5: Variance = 14.8 / 4 = 3.7 (using n-1)

Step 6: Standard deviation = √3.7 ≈ 1.92

The Empirical Rule (68-95-99.7 Rule)

For normally distributed data:

Quartiles and the Interquartile Range

Quartiles divide data into four equal parts:

Interquartile Range (IQR)

IQR = Q3 - Q1

IQR represents the middle 50% of data and is resistant to outliers.

Example 6: Finding Quartiles

Data: 2, 4, 5, 6, 7, 8, 9, 12

Q2 (median): (6 + 7) / 2 = 6.5

Q1: Median of lower half (2, 4, 5, 6) = (4 + 5) / 2 = 4.5

Q3: Median of upper half (7, 8, 9, 12) = (8 + 9) / 2 = 8.5

IQR: 8.5 - 4.5 = 4

Box Plots

A box plot visually displays the five-number summary:

Probability Basics

Probability measures how likely an event is to occur, on a scale from 0 (impossible) to 1 (certain).

Basic Probability Formula

P(event) = Number of favorable outcomes / Total number of outcomes

All outcomes must be equally likely and exhaustive.

Example 7: Basic Probability

What is the probability of rolling an even number on a fair die?

Favorable outcomes: 2, 4, 6 (3 outcomes)

Total outcomes: 6

P(even) = 3/6 = 1/2 or 0.5 or 50%

Probability Rules

Complement Rule:

P(not A) = 1 - P(A)

Addition Rule (for mutually exclusive events):

P(A or B) = P(A) + P(B)

Multiplication Rule (for independent events):

P(A and B) = P(A) × P(B)

Example 8: Probability Rules

A bag has 3 red and 7 blue balls. What is P(not red)?

P(red) = 3/10

P(not red) = 1 - 3/10 = 7/10

Expected Value

The expected value is the long-run average of a random variable:

E(X) = Σ[x × P(x)]

Sum of each outcome multiplied by its probability.

Example 9: Expected Value

A game costs $5 to play. You win $20 with probability 0.2 and nothing otherwise. What is your expected gain?

E(gain) = ($20 - $5)(0.2) + (-$5)(0.8) = $3 - $4 = -$1

On average, you lose $1 per game.

Key Takeaways

Practice Statistics

Test your statistics knowledge with our interactive practice tests.

Take Statistics Test →