We have a collection of labeled examples \(\{ ( \mathbf x_i , y_i ) \}^N_{i=1}\) , where \(N\) is the size of the collection, \(\mathbf x_i\) is the \(D\)-dimensional feature vector of example \(i = 1 , . . . , N\) , \(y_i\) is a real-valued target and every feature \(x^{(j)}_i , j = 1, \ldots , D\), is also a real number. We want to build a model \(f_{\mathbf w,b} (\mathbf x)\) as a linear combination of features of example \(\mathbf x\):
\(f_{\mathbf w,b} (\mathbf x) = \mathbf w \mathbf x + b\),
where \(\mathbf w\) is a \(D\)-dimensional vector of parameters and \(b\) is a real number. The notation \(f_{\mathbf w,b} (\mathbf x)\) means that the model \(f\) is parametrized by two values: \(\mathbf w\) and \(\mathbf b\).
We will use the model to predict the unknown \(y\) for a given \(\mathbf x\) like this: \(y \leftarrow f_{\mathbf w,b} ( \mathbf{x} )\). Two models parametrized by two different pairs \(( \mathbf w, b )\) will likely produce two different predictions when applied to the same example. We want to find the optimal values \(( \mathbf w^\ast, b^\ast )\). Obviously, the optimal values of parameters define the model that makes the most accurate predictions.