Deriving Simple Linear Regression
Throughout my working life, I've seen so many charts with regression lines. But how does regression work and when should we apply it?
Well let's see how it's derived.
Background
Two variables $Y_i$ and $x_i$ have the following linear relationship
$$Y_i = \alpha + \beta x_i + e_i \qquad \text{where} \quad e_i \sim N(0,\sigma^2)$$Let's call $x_i$ the input or independent variable
Let's call $Y_i$ the response or dependent variable
Examples:
- temperature $x_i$ of a steel making process and hardness of steel $Y_i$
- temperature $x_i$ of a chemical process and yield $Y_i$
We want to know what $\alpha$ and $\beta$ are since it could save a lot of money. Unfortunately we have to estimate their values based on observations $(x_i,Y_i)$
What is a good estimator?
So we have to come up with estimators $A$ and $B$ for $\alpha$ and $\beta$ and the question is, what are "good" estimators?
Note that you can do whatever you want. For example, let's say god tells us that $\alpha = \beta = 2$ and thus $Y_i = 2 + 2 x_i + e_i$
By sheer dumb luck I choose $A=B=2$ which is not based on any observations however I was incredibly lucky.
Other dumb ideas to use for an estimator are just to make a line with the first two observations $(x_1, Y_1)$ and $(x_2,Y_2)$.
Enter the method of least squares. Let
$$ SS_R = \sum_{i=1}^{n} (Y_i - A - B x_i)^2 \qquad \text{Sum of Squared Residuals} $$ $$ = (Y_1 - A - B x_1)^2 + (Y_2 - A - B x_2)^2 + ... + (Y_n - A - B x_n)^2 $$Choose $A$ and $B$ to minimize $SS_R$. To do this we can take derivatives and set equal to zero. Note that the $(x_i, Y_i)$ pairs are constants and our variables are $A$ and $B$.
$$ \frac{\partial SS_r}{\partial A} = -2 \sum_{i=1}^{n} (Y_i - A - B x_i) = 0$$ $$ \sum_{i=1}^{n} Y_i - \sum_{i=1}^{n} A - \sum_{i=1}^{n} B x_i = 0 $$ $$ \sum_{i=1}^{n} Y_i - n A - B \sum_{i=1}^{n} x_i = 0 $$ $$ \sum_{i=1}^{n} Y_i - B \sum_{i=1}^{n} x_i = nA $$ $$ A = \overline{Y} - B \overline{x}$$Now let's take the partial derivative with respect to $B$ and set equal to zero.
$$ \frac{\partial SS_R}{\partial B} = -2 \sum_{i=1}^{n} x_i (Y_i - A - B x_i) = 0$$ $$ \sum_{i=1}^{n} x_i (Y_i - A - B x_i) = \sum_{i=1}^{n} x_i Y_i - A x_i - B x_i^2 = 0$$Now plugging in $A = \overline{Y} - B \overline{x}$ from above,
$$ \sum_{i=1}^{n} x_i Y_i - (\overline{Y} - B \overline{x}) x_i - B x_i^2 = 0$$ $$ \sum_{i=1}^{n} x_i Y_i - \overline{Y} x_i + B \overline{x} x_i - B x_i^2 = 0$$ $$ \sum_{i=1}^{n} x_i Y_i - \overline{Y} x_i + B (\overline{x} x_i - x_i^2) = 0$$ $$ \sum_{i=1}^{n} (x_i Y_i - \overline{Y} x_i )+ \sum_{i=1}^{n} B (\overline{x} x_i - x_i^2) = 0$$ $$ B \sum_{i=1}^{n} \overline{x} x_i - x_i^2 = \sum_{i=1}^{n} \overline{Y} x_i - x_i Y_i$$ $$ B = \frac{ \sum_{i=1}^{n} x_i \overline{Y} - x_i Y_i } { \sum_{i=1}^{n} x_i \overline{x} - x_i^2 } = \frac{ \sum_{i=1}^{n} x_i (\overline{Y} - Y_i) } { \sum_{i=1}^{n} x_i (\overline{x} - x_i) } = \frac{ \sum_{i=1}^{n} x_i (Y_i - \overline{Y}) } { \sum_{i=1}^{n} x_i (x_i - \overline{x}) } $$I know that
$$ B = \frac{S_{xY}}{S_{xx}} = \frac{\sum_{i=1}^{n} (x_i - \overline{x})(Y_i - \overline{Y})}{\sum_{i=1}^{n} (x_i - \overline{x})^2}$$ $$ = \frac{\sum_{i=1}^{n} (x_i - \overline{x})(Y_i - \overline{Y})}{\sum_{i=1}^{n} x_i^2 - 2x_i \overline{x} + \overline{x}^2} $$So are we done now? Well let's say you want to do hypothesis testing, for example $H_0: \beta = 0$ or basically saying that there is no regression on the input variable. Hmm well if you note that $R^2$ which is the goodness of fit test is related to correlation. If $\beta = 0$ then there is no regression on the input variable, and the variation of the $Y_i$'s is totally dependent on the error term $e$. Note that a "standard measure" of the variation of the $Y_i$'s is $S_{YY} = \sum (Y_i - \overline{Y})^2$.
Okay well anyway to do hypothesis testing on $\beta$ or to build a confidence interval for $\alpha$ or whatever, we need to know the distributions of $A$ and $B$.
$$ A \sim N \left( \alpha, \frac{\sigma^2 \sum_{i=1}^n x_i^2}{n S_{xx}}\right)$$ $$ B \sim N \left( \beta, \frac{\sigma^2}{ S_{xx}}\right)$$And note that
$$ \sqrt{S_{xx}}\frac{(B-\beta)}{\sigma} \sim N(0,1) \qquad \text{which is independent of} \qquad \frac{SS_R}{\sigma^2} \sim \chi_{n-2}^2$$and so we have
$$ \frac{\sqrt{S_{xx}}\frac{(B-\beta)}{\sigma} }{\sqrt{\frac{\frac{SS_R}{\sigma^2}}{n-2}}} = \sqrt{\frac{(n-2)S_{xx}}{SS_R}} (B-\beta) \sim t_{n-2} $$Predicting future responses
For some specified value $x_0$. For example, now that you have your glorious model what will be the mean response or what will be the prediction interval when the input is $42$.
You do it for:
1. CI is for $E[Y \mid x_0]$ which is a CI for a parameter
2. $Y(x_0)$ which is a PI for a RV
$$ \text{CI} = A+B x_0 \pm \sqrt{\frac{SS_R}{n-2}} \sqrt{\frac{1}{n} + \frac{(x_0 - \overline{x})^2}{S_{xx}}}\, \, t_{\alpha/2,n-2}$$ $$ \text{PI} = A+B x_0 \pm \sqrt{\frac{SS_R}{n-2}} \sqrt{1+\frac{1}{n} + \frac{(x_0 - \overline{x})^2}{S_{xx}}}\, \, t_{\alpha/2,n-2}$$What happens as $n \rightarrow \infty$ ?
1) Well we would expect the CI to get down to a single point, namely $\alpha + \beta x_0$
2) We would expect the PI to turn into $\alpha + \beta x_0 \pm \sigma_e z_{\alpha/2}$
And this does happen because clearly $\frac{1}{n} \rightarrow 0$ and
$$\frac{(x_0 - \overline{x})^2}{S_{xx}} = \frac{(x_0 - \overline{x})^2}{{\sum_{i=1}^n} (x_i-\overline{x})^2} = \frac{(x_0 - \overline{x})^2}{ \left(\sum x_i^2 \right) - n\overline{x}^2} \rightarrow 0 \qquad \text{as} \quad n \rightarrow \infty$$Articles
Personal notes I've written over the years.
- When does the Binomial become approximately Normal
- Gambler's ruin problem
- The t-distribution becomes Normal as n increases
- Marcus Aurelius on death
- Proof of the Central Limit Theorem
- Proof of the Strong Law of Large Numbers
- Deriving Multiple Linear Regression
- Safety stock formula derivation
- Derivation of the Normal Distribution
- Comparing means of Normal populations
- Concentrate like a Roman
- How to read a Regression summary in R
- Notes on Expected Value
- How to read an ANOVA summary in R
- The time I lost faith in Expected Value
- Notes on Weighted Linear Regression
- How information can update Conditional Probability
- Coupon collecting singeltons with equal probability
- Coupon collecting with n pulls and different probabilities
- Coupon collecting with different probabilities
- Coupon collecting with equal probability
- Adding Independent Normals Is Normal
- The value of fame during and after life
- Notes on the Beta Distribution
- Notes on the Gamma distribution
- Notes on Conditioning
- Notes on Independence
- A part of society
- Conditional Expectation and Prediction
- Notes on Covariance
- Deriving Simple Linear Regression
- Nature of the body
- Set Theory Basics
- Polynomial Regression
- The Negative Hyper Geometric RV
- Notes on the MVN
- Deriving the Cauchy density function
- Exponential and Geometric relationship
- Joint Distribution of Functions of RVs
- Order Statistics
- The Sample Mean and Sample Variance
- Probability that one RV is greater than another
- St Petersburg Paradox
- Drunk guy by a cliff
- The things that happen to us