While finding a Least Square solution, we minimize the sum of the square of the error that each data point is making in estimation of y. Understanding it geometrically gives an unique intuitive sense of what is happening and why we are doing it.
- We want to minimize square of || y - Xw || where w are the weights or the parameters of the least square model that we have learnt using the data, y is the original dependent variable and X are input vectors.
- Xw are the estimates of the y that we have obtained using the data. You can also infer that we weight the columns in X by values in w (learnt from data) in order to approximate the value of y.
- The LS solution returns the value of w such that the approximated values of y i.e, Xw are as close to the real values of y as possible in Euclidean sense.
Fig 1: Geometry of Least Squares
In the above figure, y-hat is the point physically closest to point y in the subspace defined by the input vectors X1 and X2..
So, in this way, when we use Least Squares, we actually try to find the values of w that best approximates the value of y when X is scaled by vector w.
Hope this gives a more intuitive sense of least squares.