Sat 4 March 2017

Polynomial Regression

Written by Hongjinn Park in Articles

Sometimes your relationship between $Y$ (response) and $x$ (input) is not linear and you don't have an exponent that you can easily take logs of to linearize, for example $Y_1 \approx A_1e^{-B_1x_1}$ . So $Y_i$ and $x_i$ have some sort of polynomial relationship. That is

$$Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 +... + \beta_r x_i^r +e$$

We'd like to estimate the values of the $\beta_i$'s with $B_i$'s. What are good estimators?


Let's say those that minimize the sum of squared residuals.

$$ SS_R = \sum_{i=1}^n (Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r)^2$$

We can get the vector $B = (B_0 \ B_1 \ ... \ B_r)$ by taking derivatives and setting equal to zero.

\begin{align} \frac{\partial{SS_R}} {\partial{B_0}} &= -2 \sum_{i=1}^n (Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) = 0 \\ \frac{\partial{SS_R}}{\partial{B_1}} &=-2 \sum_{i=1}^n x_i(Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) = 0 \\ \frac{\partial{SS_R}}{\partial{B_2}} &=-2 \sum_{i=1}^n x_i^2(Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) = 0 \\ &\vdots \\ \frac{\partial{SS_R}}{\partial{B_r}} &=-2 \sum_{i=1}^n x_i^r(Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) = 0 \end{align}

After getting rid of the $-2$ and some rearrangement you can get $r+1$ normal equations!

$$\sum_{i=1}^n (Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) $$ $$\sum_{i=1}^n Y_i + \sum_{i=1}^n (- B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) $$ $$\sum_{i=1}^n Y_i = \sum_{i=1}^n (B_0 + B_1 x_i + B_r x_i^2 + ... + B_2 x_i^r) $$ $$\sum_{i=1}^n Y_i = nB_0 + B_1 \sum_{i=1}^n x_i + B_2 \sum_{i=1}^n x_i^2 + ... + B_r \sum_{i=1}^n x_i^r $$

Note the LHS is a sum of independent normal RVs which means the RHS is also normal.

And one more just for illumination. Let's do the third equation (3).

$$ \sum_{i=1}^n x_i^2 (Y_i - B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) $$ $$ \sum_{i=1}^n x_i^2 Y_i + \sum_{i=1}^n x_i^2(- B_0 - B_1 x_i - B_2 x_i^2 - ... - B_r x_i^r) $$ $$ \sum_{i=1}^n x_i^2 Y_i = \sum_{i=1}^n x_i^2(B_0 + B_1 x_i + B_2 x_i^2 + ... + B_r x_i^r)$$ $$ \sum_{i=1}^n x_i^2 Y_i = B_0 \sum_{i=1}^n x_i^2 + B_1 \sum_{i=1}^n x_i^3 + B_2 \sum_{i=1}^n x_i^4 + ... + B_r \sum_{i=1}^n x_i^{r+2}$$

Note that the LHS is a sum of a linear combination of independent normal RVs which means the RHS is also normal.

The last normal equation $(r+1)$ is

$$ \sum_{i=1}^n x_i^r Y_i = B_0 \sum_{i=1}^n x_i^r + B_1 \sum_{i=1}^n x_i^{r+1} + B_2 \sum_{i=1}^n x_i^{r+2} + ... + B_r \sum_{i=1}^n x_i^{2r}$$

Note that we can write all this stuff in matrix notation.

Small Example

Let's say there are $3$ observations of data and based on a scatter diagram with more than $3$ data points that somehow we magically have we know that a degree $3$ polynomial is a good fit. So for $(x_1,Y_1),(x_2,Y_2),(x_3,Y_3)$, we have the relationship

$$Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 +e$$

with $Y = (Y_1 \ Y_2 \ Y_3)^T$ and $x = (x_1 \ x_2 \ x_3)^T$

\[ \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ \end{bmatrix} = \begin{bmatrix} 1 & x_{1} & x_{1}^2 & x_{1}^3 \\ 1 & x_{2} & x_{2}^2 & x_{2}^3 \\ 1 & x_{3} & x_{3}^2 & x_{3}^3 \\ \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_2 \\ \end{bmatrix} + \begin{bmatrix} e_1 \\ e_2 \\ e_3 \\ \end{bmatrix} \]

Let's call this $Y=X \beta + e$ which is $\mathbb{R}^{3 \times 1} = \mathbb{R}^{3 \times 4} \times \mathbb{R}^{4 \times 1} + \mathbb{R}^{3 \times 1}$

Take a look at the first row of the matrix $X$. There's only one independent variable, $x_i$ and functions of it. However we want to solve for $B$ and we can do it by making it like of a Multiple Linear Regression problem if we think of $x_i, x_i^2, x_i^3$ as $3$ different inputs.

Then $B = (X^TX)^{-1} X^TY$

You can convince yourself of this by looking at the $r+1$ normal equations for polynomial regression and the $r+1$ normal equations for multiple linear regression.

You can express the normal equations as $X^TY = X^TXB$ and from there get the solution for $B$.

General Example

More generally if we have

$$Y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^2 + \beta_3 x_i^3 +... + \beta_r x_i^r +e$$

then

\[ \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \\ \end{bmatrix} = \begin{bmatrix} 1 & x_{1} & x_{1}^2 & x_{1}^3 & \dots & x_{1}^r \\ 1 & x_{2} & x_{2}^2 & x_{2}^3 & \dots & x_{2}^r \\ & & & \vdots \\ 1 & x_{n} & x_{n}^2 & x_{n}^3 & \dots & x_{n}^r \end{bmatrix} \begin{bmatrix} \beta_0 \\ \beta_1 \\ \vdots \\ \beta_r \\ \end{bmatrix} + \begin{bmatrix} e_1 \\ e_2 \\ \vdots \\ e_n \\ \end{bmatrix} \] $$ Y = X \beta + e$$

After taking partials of the $SS_R$ and setting equal to zero, you get the $r+1$ normal equations. Solving just like for Multiple Linear Regression get $B = (X^TX)^{-1} X^TY$.



Articles

Personal notes I've written over the years.