Confidence Interval

Source: FAD1015 L21-L22 — Estimation of Population Mean, §3–6.

A confidence interval (CI) is an interval estimate for a population parameter with a specified level of confidence. It is one of the two main branches of statistical inference — the other being hypothesis testing.

Structure (Source: §3)

$$\mu = \text{point estimate} \pm \text{margin of error}$$

  • Lower Confidence Limit (LCL)
  • Upper Confidence Limit (UCL)
  • Width = UCL − LCL

Confidence Level

The confidence level is denoted $(1 - \alpha)100%$:

Confidence Level $1 - \alpha$ $\alpha$ $z_{\alpha/2}$
90% 0.90 0.10 1.645
95% 0.95 0.05 1.96
99% 0.99 0.01 2.576

Interpretation (Source: §3)

Correct: "We are 95% confident that the true population mean $\mu$ lies between [LCL] and [UCL]."

Incorrect: "There is a 95% probability that $\mu$ is in this interval."

The population mean $\mu$ is fixed but unknown. The 95% refers to the procedure: if you repeated the sampling many times, approximately 95% of the constructed intervals would capture $\mu$. From the lecture simulation (20 studies, $n = 100$ each), about 95% of the 95% CIs contain the true mean.

Procedure: Constructing a CI for $\mu$ (Source: §4–7)

Step 1 — Identify $\sigma$ Status

Follow the lecture's decision tree:

graph TD
    Q1["Is sigma known?"] -->|Yes| Z["Use Z-distribution<br/>CI = xbar +/- z_alpha/2 * sigma/sqrt(n)"]
    Q1 -->|No| Q2["Is n >= 30?"]
    Q2 -->|Yes| Z2["Use Z-distribution<br/>approximate sigma with s<br/>CI = xbar +/- z_alpha/2 * s/sqrt(n)"]
    Q2 -->|No| T["Use t-distribution<br/>df = n - 1<br/>CI = xbar +/- t_alpha/2,n-1 * s/sqrt(n)"]

Step 2 — Find Critical Value

  • $z$: use the table above or standard normal table
  • $t$: use the $t$-table with $df = n - 1$ at column $\alpha/2$ (Source: §6 — rows = df, columns = upper-tail probability)

Step 3 — Compute Margin of Error

$$E = \text{critical value} \times \frac{\text{standard deviation}}{\sqrt{n}}$$

  • $\sigma$ known: $E = z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$
  • $\sigma$ unknown: $E = t_{\alpha/2,,n-1} \cdot \frac{s}{\sqrt{n}}$

Step 4 — Construct the Interval

$$\bar{x} - E \le \mu \le \bar{x} + E$$

or equivalently:

$$(\bar{x} - E,; \bar{x} + E)$$

Step 5 — Interpret

State the interval in context using the correct interpretation wording from §3.

Case 1: $\sigma$ Known (Source: §4)

When $\sigma$ is known (rare in practice — usually from historical data):

$$\bar{x} \pm z_{\alpha/2} \cdot \frac{\sigma}{\sqrt{n}}$$

Assumptions (Source: §4):

  • Population is normally distributed, OR
  • $n \ge 30$ (Central Limit Theorem applies)

Example — Circuit Resistance (Source: §4, Example 1)

A sample of 11 circuits from a large normal population has $\bar{x} = 2.20\ \Omega$, $\sigma = 0.35\ \Omega$.

95% CI:

$$2.20 \pm 1.96 \times \frac{0.35}{\sqrt{11}} = 2.20 \pm 0.207$$

$$95%\ \text{CI} = (1.993,; 2.407)$$

99% CI:

$$2.20 \pm 2.576 \times \frac{0.35}{\sqrt{11}} = 2.20 \pm 0.272$$

$$99%\ \text{CI} = (1.928,; 2.472)$$

Observation from the lecture: Higher confidence level produces a wider interval.

Case 2: $\sigma$ Unknown, Small $n$ (Source: §6)

When $\sigma$ is unknown, substitute $s$ (sample SD). This adds uncertainty, so we use the Student's $t$-distribution with $df = n - 1$:

$$\bar{x} \pm t_{\alpha/2,,n-1} \cdot \frac{s}{\sqrt{n}}$$

Student's $t$-Distribution (Source: §6)

  • Bell-shaped and symmetric, but has fatter tails than $z$
  • As $n$ increases, $t \to z$ (standard normal = $t$ with $df = \infty$)

How to read the $t$-table (Source: §6):

  • Rows: degrees of freedom $\nu = n - 1$
  • Columns: upper-tail probability $\alpha$
  • For 95% CI with $n = 25$ ($df = 24$), use $\alpha = 0.025$ column → $t = 2.064$

Assumptions (Source: §6):

  • Population is normally distributed
  • If not normal, need $n \ge 30$ (fall back to Case 3)

Example — Random Sample (Source: §6, Example 3)

$n = 25$, $\bar{x} = 50$, $s = 8$. 95% CI:

  • $df = 24$, $t_{0.025,,24} = 2.064$
  • $50 \pm 2.064 \times \frac{8}{\sqrt{25}} = 50 \pm 3.30$
  • $95%\ \text{CI} = (46.70,; 53.30)$

Case 3: $\sigma$ Unknown, $n \ge 30$ (Source: §5)

The CLT ensures the sampling distribution of $\bar{x}$ is approximately normal. We approximate $\sigma$ with $s$ and use $z$:

$$\bar{x} \pm z_{\alpha/2} \cdot \frac{s}{\sqrt{n}}$$

Example — Tea Boxes (Source: §5, Example 2)

$n = 200$, $\bar{x} = 101.0$, $s = 2.78$. 99% CI:

$$101.0 \pm 2.576 \times \frac{2.78}{\sqrt{200}} = 101.0 \pm 0.506$$

$$99%\ \text{CI} = (100.494,; 101.506)$$

$z$ vs $t$ Summary (Source: §7)

Condition Distribution Formula
$\sigma$ known, normal pop (or $n \ge 30$) $z$ $\bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$
$\sigma$ unknown, normal pop $t$ ($df = n - 1$) $\bar{x} \pm t_{\alpha/2,,n-1} \frac{s}{\sqrt{n}}$
$\sigma$ unknown, $n \ge 30$ (CLT) $z$ (approx with $s$) $\bar{x} \pm z_{\alpha/2} \frac{s}{\sqrt{n}}$

Standardized forms from the lecture:

$$Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}} \sim N(0,1) \quad \text{and} \quad t = \frac{\bar{X} - \mu}{S/\sqrt{n}} \sim t_{n-1}$$

Factors Affecting Interval Width (Source: §4)

  • Higher confidence level → wider interval (larger $z_{\alpha/2}$ or $t$)
  • Larger sample size → narrower interval (smaller $\sigma/\sqrt{n}$ or $s/\sqrt{n}$)
  • More variable population → wider interval (larger $\sigma$ or $s$)

Connection to Hypothesis Testing (Source: FAD1015 L23-L24 — Hypothesis Testing About the Mean, §5.3)

A CI can be used for a two-tailed test at significance level $\alpha$:

  • $\mu_0$ falls outside the $(1-\alpha)$ CI → reject $H_0$
  • $\mu_0$ falls inside the $(1-\alpha)$ CI → do not reject $H_0$

Related