Linear regression
Linear regression is the approximation of many data points using a single linear function of one or more variables.
Mean squared error
Suppose we are given a multiset of dependent variable values and a multiset of corresponding independent variable values . We want to create a function that predicts with as little overall error as possible. We can quantify the error using mean squared error, defined by Sometimes is notated , because is a prediction of .
Note the similarity of to the distance formula in Euclidean space; in fact, , so increases monotonically with (which is always nonnegative). Thus, minimizing corresponds to minimizing Euclidean distance between a point whose coordinates are the and a point whose coordinates are the .
If is a constant function equal to the arithmetic mean of , then the equals the variance of .
Vector-valued functions
Sometimes multiple values are to be predicted in conjunction (for example, in a weather forecast, the wind components in both the north and east directions). In this case the are represented by vectors, so the predictor function should also be a vector-valued function. The formula is altered slightly to include magnitudes: The summands in the numerator, by the vector magnitude formula, are themselves the sum of squares of differences between components.
Multiple regression
If there are multiple independent variables in conjunction (for example, if a student's past three AMC scores are used to predict the next score), then the regression becomes a multiple regression. Each must then be viewed as a sequence , where is the number of predictors. The terms of this sequence are passed one by one into as arguments; is therefore a function of variables.