Linear Regression Training with One Step

Salih Talha Akgün
4 min readOct 24, 2020

--

In every machine learning example I have ever seen, there was at least one for loop. But this formula made it possible to learn weights for linear regression with just one step thanks to linear algebra.

Normal Equation
Normal Equation

Even though I’ve surprised to see a formula like this, it is very intuitive. You can think of it as calculus made with linear algebra notations. I know this formula is not such a big deal but it made me exited. Since we always demonstrate loss function ( when there is two weight ) as 3D map where weights are floor and loss function is the map, and since linear regression loss function always has one local minimum (which is also global minimum) in this space, we can find that point as the point where derivative of the function is equal to zero.

So what we are trying to do is, solve weight vector with this formula:

Derivative of loss function with respect to θ

and we will find the parameters ( weights ) for the global minimum of the loss function.

If you continue to solve this equation you will get the normal equation but I will skip the math for now and write some code. I’ve also added some resources that I’ve used at the end, I’ll recommend you to at least check one of them.

Creating Artificial Dataset

true_vector = np.array([1.0,2.0,3.0]) # Vector that we want to learnd = len(true_vector)points = []for i in range(10000): # We'r going to have 10000 data points
x = np.random.rand(d)
# add a litle bit noise
y = true_vector.dot(x) + np.random.rand() / 10.0
points.append((x,y))

I’ve created artificial data to learn from it. After I’ve created the data, I’ll use gradient descent, stochastic gradient descent and normal equation and compare them.

Normal Equation Function

start = time.time() # start timerXt = np.transpose(X)w = np.linalg.inv( Xt.dot(X) ).dot(Xt.dot(Y))stop = time.time() # stop timer

I’ve also done some data preparation for using data as numpy arrays, so don’t copy and paste this code to try it yourself. You can check all the code from here:

Results

As the result, we can see that our weight vector approaches to [1,2,3] as we want.

Normal Equation Results:

w = [1.029 2.030 3.031], F(w) = 0.00111, time = 0.00099

Gradient Descent Results:

w = [1.224 2.030 2.836], F(w) = 0.00733, time = 52.349

Stochastic Gradient Descent Results:

w = [1.043 2.046 3.001], F(w) = 0.00142, time = 71.673

You can see that compared to others, Normal Equation finds the closest result in just a millisecond. Also we can calculate the derivative of the loss function given the weights found by normal equation

[3.2e-15, 7.7e-17 6.3e-16] (which is nearly zero as we want).

You can try this yourself but don’t forget that the data has random parameters and this can create some misleading results. Be careful about the randomness that you’ve added to the dataset.

Math Behind This

I recommend you to watch this video until the end, but if you want to see the equations only, you can skip the video.

I’ll continue to write the equations with handwritten notes.

Intermediate Steps Mentioned Above:

Resources:

University of Toronto Linear Regression Notes: https://www.cs.toronto.edu/~guerzhoy/321/lec/W01/linear_regression.pdf

Normal Equation Wiki on MLWiki: http://mlwiki.org/index.php/Normal_Equation

Stanford’s CS229 Course:

Stanford’s CS229 Course Playlist

--

--

Salih Talha Akgün
Salih Talha Akgün

No responses yet