Polynomial Regression
Sometimes your relationship between $Y$ (response) and $x$ (input) is not linear and you don't have an exponent that you can easily take logs of to linearize, for example $Y_1 \approx A_1e^{-B_1x_1}$ . So $Y_i$ and $x_i$ have some sort of polynomial relationship. That is
$$Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 +... + \beta_r x_i^r +e$$We'd like to estimate the values of the $\beta_i$'s with $B_i$'s. What are good estimators?
Let's say those that minimize the sum of squared residuals.
$$ SS_R = \sum_{i=1}^n (Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r)^2$$We can get the vector $B = (B_0 \ B_1 \ ... \ B_r)$ by taking derivatives and setting equal to zero.
\begin{align} \frac{\partial{SS_R}} {\partial{B_0}} &= -2 \sum_{i=1}^n (Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) = 0 \\ \frac{\partial{SS_R}}{\partial{B_1}} &=-2 \sum_{i=1}^n x_i(Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) = 0 \\ \frac{\partial{SS_R}}{\partial{B_2}} &=-2 \sum_{i=1}^n x_i^2(Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) = 0 \\ &\vdots \\ \frac{\partial{SS_R}}{\partial{B_r}} &=-2 \sum_{i=1}^n x_i^r(Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) = 0 \end{align}After getting rid of the $-2$ and some rearrangement you can get $r+1$ normal equations!
$$\sum_{i=1}^n (Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) $$ $$\sum_{i=1}^n Y_i + \sum_{i=1}^n (- B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) $$ $$\sum_{i=1}^n Y_i = \sum_{i=1}^n (B_0 + B_1 x_i + B_r x_i^2 + ... + B_2 x_i^r) $$ $$\sum_{i=1}^n Y_i = nB_0 + B_1 \sum_{i=1}^n x_i + B_2 \sum_{i=1}^n x_i^2 + ... + B_r \sum_{i=1}^n x_i^r $$Note the LHS is a sum of independent normal RVs which means the RHS is also normal.
And one more just for illumination. Let's do the third equation (3).
$$ \sum_{i=1}^n x_i^2 (Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) $$ $$ \sum_{i=1}^n x_i^2 Y_i + \sum_{i=1}^n x_i^2(- B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) $$ $$ \sum_{i=1}^n x_i^2 Y_i = \sum_{i=1}^n x_i^2(B_0 + B_1 x_i + B_2 x_i^2 + ... + B_r x_i^r)$$ $$ \sum_{i=1}^n x_i^2 Y_i = B_0 \sum_{i=1}^n x_i^2 + B_1 \sum_{i=1}^n x_i^3 + B_2 \sum_{i=1}^n x_i^4 + ... + B_r \sum_{i=1}^n x_i^{r+2}$$Note that the LHS is a sum of a linear combination of independent normal RVs which means the RHS is also normal.
The last normal equation $(r+1)$ is
$$ \sum_{i=1}^n x_i^r Y_i = B_0 \sum_{i=1}^n x_i^r + B_1 \sum_{i=1}^n x_i^{r+1} + B_2 \sum_{i=1}^n x_i^{r+2} + ... + B_r \sum_{i=1}^n x_i^{2r}$$Note that we can write all this stuff in matrix notation.
Small Example
Let's say there are $3$ observations of data and based on a scatter diagram with more than $3$ data points that somehow we magically have we know that a degree $3$ polynomial is a good fit. So for $(x_1,Y_1),(x_2,Y_2),(x_3,Y_3)$, we have the relationship
$$Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 +e$$with $Y = (Y_1 \ Y_2 \ Y_3)^T$ and $x = (x_1 \ x_2 \ x_3)^T$
\[ \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ \end{bmatrix} = \begin{bmatrix} 1 & x_{1} & x_{1}^2 & x_{1}^3 \\ 1 & x_{2} & x_{2}^2 & x_{2}^3 \\ 1 & x_{3} & x_{3}^2 & x_{3}^3 \\ \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_2 \\ \end{bmatrix} + \begin{bmatrix} e_1 \\ e_2 \\ e_3 \\ \end{bmatrix} \]Let's call this $Y=X \beta + e$ which is $\mathbb{R}^{3 \times 1} = \mathbb{R}^{3 \times 4} \times \mathbb{R}^{4 \times 1} + \mathbb{R}^{3 \times 1}$
Take a look at the first row of the matrix $X$. There's only one independent variable, $x_i$ and functions of it. However we want to solve for $B$ and we can do it by making it like of a Multiple Linear Regression problem if we think of $x_i, x_i^2, x_i^3$ as $3$ different inputs.
Then $B = (X^TX)^{-1} X^TY$
You can convince yourself of this by looking at the $r+1$ normal equations for polynomial regression and the $r+1$ normal equations for multiple linear regression.
You can express the normal equations as $X^TY = X^TXB$ and from there get the solution for $B$.
General Example
More generally if we have
$$Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 +... + \beta_r x_i^r +e$$then
\[ \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \\ \end{bmatrix} = \begin{bmatrix} 1 & x_{1} & x_{1}^2 & x_{1}^3 & \dots & x_{1}^r \\ 1 & x_{2} & x_{2}^2 & x_{2}^3 & \dots & x_{2}^r \\ & & & \vdots \\ 1 & x_{n} & x_{n}^2 & x_{n}^3 & \dots & x_{n}^r \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_r \\ \end{bmatrix} + \begin{bmatrix} e_1 \\ e_2 \\ \vdots \\ e_n \\ \end{bmatrix} \] $$ Y = X \beta + e$$After taking partials of the $SS_R$ and setting equal to zero, you get the $r+1$ normal equations. Solving just like for Multiple Linear Regression get $B = (X^TX)^{-1} X^TY$.
Articles
Personal notes I've written over the years.
- When does the Binomial become approximately Normal
- Gambler's ruin problem
- The t-distribution becomes Normal as n increases
- Marcus Aurelius on death
- Proof of the Central Limit Theorem
- Proof of the Strong Law of Large Numbers
- Deriving Multiple Linear Regression
- Safety stock formula derivation
- Derivation of the Normal Distribution
- Comparing means of Normal populations
- Concentrate like a Roman
- How to read a Regression summary in R
- Notes on Expected Value
- How to read an ANOVA summary in R
- The time I lost faith in Expected Value
- Notes on Weighted Linear Regression
- How information can update Conditional Probability
- Coupon collecting singeltons with equal probability
- Coupon collecting with n pulls and different probabilities
- Coupon collecting with different probabilities
- Coupon collecting with equal probability
- Adding Independent Normals Is Normal
- The value of fame during and after life
- Notes on the Beta Distribution
- Notes on the Gamma distribution
- Notes on Conditioning
- Notes on Independence
- A part of society
- Conditional Expectation and Prediction
- Notes on Covariance
- Deriving Simple Linear Regression
- Nature of the body
- Set Theory Basics
- Polynomial Regression
- The Negative Hyper Geometric RV
- Notes on the MVN
- Deriving the Cauchy density function
- Exponential and Geometric relationship
- Joint Distribution of Functions of RVs
- Order Statistics
- The Sample Mean and Sample Variance
- Probability that one RV is greater than another
- St Petersburg Paradox
- Drunk guy by a cliff
- The things that happen to us