Given a dataset D={(X1,Y2),…,(XN,YN)} such as Xi and Yi are continuous, The goal of "Linear Regression" is to find the best line that fits this data.
In other words, we want to create the model:
y^=a∗0+a∗1.x∗1+⋯+a∗p.x_p
where p is the number of dimensions of the variable X.
In this article we will see how to solve this problem in three scenarios:
When X is one dimensional i.e. p=1.
When X is multi-dimensional i.e. p>1.
Using gradient descent.
X is one dimensional (Ordinary Least Square)
The model that we want to create is of shape:
y^=a∗0+a∗1.x
Remember that the goal of linear regression is to find the line that best fits the data. In other words, we need to minimize the distance between the data points and the line.
(a∗0^,a∗1^)=(a∗0,a∗1)argmin∑∗i=1N(y∗i−y∗i^)2
=(a∗0,a∗1)argmin∑∗i=1N(y∗i−(a∗0+a∗1.x∗i))2
Let's put:
L=∑∗i=1N(y∗i−(a∗0+a∗1.x_i))2
In order to find the minimum, we need to solve the following equations:
In this case, Xi is no longer a real number, but instead it's a vector of size p:
X∗i=(X∗i1,X∗i2,…,X∗ip)
So, the model is written as follow:
y^=a∗0+a∗1x∗1+a∗2x∗2+⋯+a∗px_p
or, it can be written in a matrix format:
Y^=X.W
where:
Y is of shape (N,1).
X is of shape (N,p).
W is of shape (p,1): this is the parameters vector (w1,w2,…,wp).
Similarly to the first case, we aim to minimize the following quantity:
W^=Wargmin∑∗i=1N(y∗i−y_i^)2
Again let's put:
L=∑∗i=1N(y∗i−y_i^)2
=(Y−XW)T(Y−XW)
=YTY−YTXW−WTXTY+WTXTXW
=YTY−2WTXTY+WTXTXW
Since we want to minimize L with respect to W, then we can ignore the first term "YTY" because it's independent of W and let's solve the following equation:
∂W∂(−2WTXTY+WTXTXW)=0
−2XTY+2XTXW^=0
W^=(XTX)−1XTY
Using gradient descent
Here is the formulation of the gradient descent algorithm:
w∗n+1=w∗n−lr×∂w_n∂f
Now all we have to do is to apply it on the two parameters a0 and a1 (in the case of a one variable X):
Dedicated and focussed on you. We help you to understand, leverage and showcase your powerful new skills through resume reviews, interview practice and industry discussions.