Apply to our new Data Science and Cybersecurity Part-time cohorts

deep learning
math
gradient descent

Gradient Descent

Fri Dec 23 2022

Gradient Descent cover image

Introduction

Imagine that we have a function $f(x)$ and we would like to find its minimum. What would you do ?

Simple right ? We only need to solve the following equation: $$f'(x) = 0$$

The thing is that finding the formula of $f'$ is not always easy as they tend to be extremely complicated especially in deep learning where we deal with complex functions. So we need to find another method that can provide us the minimum of a function without the need of finding the formula of the derivative $f'$.

Let's build some intuition

Let's suppose that we have a function f with the corresponding graph:

Graph 1

Let's start with a random point $x_{0}$. The goal is to move this point and make it closer and closer to $x*$ such that $f'($x*$) = 0$. So the problem can be divided in two parts:

  • In which direction should we move the point $x$ ? Left or Right ?
  • How much should we move it ?

The direction

Let’s build some intuition in order to answer the first question. Take a look at the following point:

Graph 2 Graph 3

Note that:

  • when the point $x_{0}$ is to the right of the optimal point $x*$ its tangent line goes up.
  • when the point $x_{0}$ is to the right of the optimal point $x*$ its tangent line goes down.

The direction of a line is determined by the sign of its slope:

  • A line goes up $\implies$ the slope $a$ is positive.
  • A line goes down $\implies$ the slope $a$ is negative.

Note that:
The slope of the tangent line of a function in a certain point $x_{0}$ is no more than the derivative in that point $f'(x_{0})$: $$ tangent(x*{0}): g(x) = f'(x*{0}).(x-x*{0}) + f(x*{0}) $$

So as an answer to the question "Where should we move $x_{0}$ ?":

  • $f'(x_{0}) < 0$ $\implies$ $x_{0}$ to the right of $x*$ $\implies$ We need to move $x_{0}$ to the left.
  • $f'(x_{0}) > 0$ $\implies$ $x_{0}$ to the left of $x*$ $\implies$ We need to move $x_{0}$ to the right.

The steps

Now for the second question, How much should we move $x_{0}$ ? Take a look at the following examples:

Graph 4 Graph 5

We can conclude that:

  • $x_{0}$ is close to $x*$ => The slope of the tangent is small => $f'(x_{0})$ is small.
  • $x_{0}$ is distant from $x*$ => The slope of the tangent is big => $f'(x_{0})$ is big.

By answering both questions, we concluded that only the knowledge of the derivative in the point $x_{0}$ can give us a lot of insight about the direction and the distance of the optimal point $x_{0}$.

Gradient descent

Gradient descent is the formulation of the answers of the previous two questions. It's an optimization iterative algorithm that approximates the minimum $x*$ of function starting from a random initial point $x_{0}$. The algorithm is stated as follow:

$$ x*{n+1} = x*{n} - lr \times \frac{\mathrm{d} f}{\mathrm{d} x_{n}} $$

where:

  • $ \frac{\mathrm{d} f}{\mathrm{d} x*{n}} $ is no more than the derivative of $f$ in the point $x*{n}$.
  • $lr$ is a positive constant that determines how big the steps are going to be.

Notice that:

  • $x_{n}$ is to the right of $x*$ => $\frac{\mathrm{d} f}{\mathrm{d} x_{n}} > 0 $ => $ x_{n+1} = x_{n} - positive $ => $x_{n}$ moves to the left.
  • $x_{n}$ is to the left of $x*$ => $\frac{\mathrm{d} f}{\mathrm{d} x_{n}} < 0$ => $ x*{n+1} = x*{n} + positive $ => $x_{n}$ moves to the right.
  • $x_{n}$ close to $x*$ => $\frac{\mathrm{d} f}{\mathrm{d} x_{n}}$ close to $0$ => Small update to $x_{n}$.

Quizz

  • When does gradient descent stop iterating:
    • When $x_{n}$ is small enough.
    • When $x_{n}$ is close to $x_{0}$ .
    • When $\frac{\mathrm{d} f}{\mathrm{d} x_{n}} = 0 $. XXX
  • How do we choose $x_{0}$:
    • We pick it randomly. XXX
    • We take it in the neighborhood of $x_{n}$.
    • It depends on the problem.
  • Why do we need gradient descent:
    • Because computers are not powerful enough to calculate derivatives.
    • Because it's extremely hard to find the derivatives formulas of deep learning models. XXX
    • Because functions have more than one local minimum.

Career Services background pattern

Career Services

Contact Section background image

Let’s stay in touch

Code Labs Academy © 2024 All rights reserved.