Linear Regression
Table of contents
1. Multiple Linear Regression
model
- response variable $y$, responding regressor variables $\lbrace x_i \rbrace _n$
- $y = w_0 + w_1 x_1+ w_2 x_2 + \cdots + w_n x_n + \epsilon$ for each response variable values
$\epsilon \overset{\text{iid}}{\sim} N(0, \sigma^2)$ - matrix notation :
$\mathbf{y} = X\mathbf{w} + \mathbf{\epsilon} $
$\mathbf{\epsilon} \overset{\text{iid}}{\sim} N(\mathbf{0}, \sigma^2 I_m)$
Least-Square Estimation
\[\underset{\mathbf{w}}{\text{min}} S(\mathbf{w}) = \sum_{i=1}^{m} \epsilon_i^2\]normal equation
\[\begin{align*} \sum_i \epsilon_i^2 &= \mathbf{\epsilon}^T \mathbf{\epsilon} \\ &= (\mathbf{y} - X\mathbf{w})^T (\mathbf{y} - X\mathbf{w}) \\ &= \mathbf{y}^T \mathbf{y} - 2\mathbf{y}^T X \mathbf{w} + \mathbf{w}^T X^T X \mathbf{w} \quad (\because \text{scalar}) \end{align*}\]Thus, from
\[\left. \frac{\partial S(\mathbf{w})}{\partial \mathbf{w}} \right\vert _{\hat{\mathbf{w}}} = - 2 X^T \mathbf{y} + 2 X^T X \hat{\mathbf{w}} = 0\]we obtain
\[X^T X \hat{\mathbf{w}} = X^T \mathbf{y}\]gradient descent
\[\mathcal{L}(\mathbf{w}) = \frac{1}{2m} S(\mathbf{w})\]parameter update : $\mathbf{w} \leftarrow \mathbf{w} - \alpha \mathcal{L}^\prime(\mathbf{w}) $ for a learning rate $\alpha$
From
\[\begin{align*} \frac{\partial \mathcal{L}(\mathbf{w})}{\partial \mathbf{w}} &= \frac{1}{m} X^T (X \mathbf{w} - \mathbf{y}) \\ &= \frac{1}{m} X^T (\hat{\mathbf{y}} - \mathbf{y}) \end{align*}\]we obtain
\[\mathbf{w} \leftarrow \mathbf{w} - \alpha \cdot \frac{1}{m} X^T (\hat{\mathbf{y}} - \mathbf{y})\]2. Polynomial Regression
kth-order polynomial regression model in one variable
\[y = w_0 + w_1 x + w_2 x^2 + \cdots + w_n x^n + \epsilon\]second-order polynomial regression model in two variables
\[y = w_0 + w_1 x_1 + w_2 x_2 + w_{11} x_1^2 + w_{12} x_1 x_2 + w_{22} x_2^2 + \epsilon\]piecewise linear regression
\[y = w_0 + w_1 x + w_2 (x-t)_+ + \epsilon\]orthogonal polynomial regression
\[y = \alpha_0 P_0(x) + \alpha_1 P_1(x) + \cdots + {\alpha}_n P_n(x) + \epsilon\]where $P(x)$ is the jth order orthogonal polynomial defined as
\[\sum_{i=1}^m P_r(x_i) P_s(x_i) = 0, \quad r \ne s \quad \text{and} \quad P_0(x_i) = 1\]