Linear regression

Revision as of 21:47, 4 May 2022 by Orange quail 9 (talk | contribs) (Will add information about techniques (gradient descent) later.)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Linear regression is the approximation of many data points using a single linear function of one or more variables.

Mean squared error

Suppose we are given a multiset of dependent variable values $Y = \{y_1, y_2, \dots , y_n\}$ and a multiset of corresponding independent variable values $X = \{x_1, x_2, \dots , x_n\}$. We want to create a function $f(x)$ that predicts $y$ with as little overall error as possible. We can quantify the error using mean squared error, defined by \[\mathrm{MSE} = \frac{\sum_{i=1}^{n} (y_i - f(x_i))^2}{n}.\] Sometimes $f(x_i)$ is notated $\hat{y_i}$, because $f(x_i)$ is a prediction of $y_i$.

Note the similarity of $\mathrm{MSE}$ to the distance formula $d$ in Euclidean space; in fact, $\mathrm{MSE} = \frac{d^2}{n}$, so $\mathrm{MSE}$ increases monotonically with $d$ (which is always nonnegative). Thus, minimizing $\mathrm{MSE}$ corresponds to minimizing Euclidean distance between a point whose coordinates are the $y_i$ and a point whose coordinates are the $f(x_i)$.

If $f(x)$ is a constant function equal to the arithmetic mean of $Y$, then the $\mathrm{MSE}$ equals the variance of $Y$.

Vector-valued functions

Sometimes multiple values are to be predicted in conjunction (for example, in a weather forecast, the wind components in both the north and east directions). In this case the $y_i$ are represented by vectors, so the predictor function $f(x)$ should also be a vector-valued function. The $\mathrm{MSE}$ formula is altered slightly to include magnitudes: \[\mathrm{MSE} = \frac{\sum_{i=1}^{n} \lVert \mathbf{y}_i - \mathbf{f}(x_i) \rVert ^2}{n}.\] The summands in the numerator, by the vector magnitude formula, are themselves the sum of squares of differences between components.

Multiple regression

If there are multiple independent variables in conjunction (for example, if a student's past three AMC scores are used to predict the next score), then the regression becomes a multiple regression. Each $x_i$ must then be viewed as a sequence $x_{i1}, x_{i2}, \dots, x_{im}$, where $m$ is the number of predictors. The terms of this sequence are passed one by one into $f$ as arguments; $f$ is therefore a function of $m$ variables.