GD vs. SGD

What are the differences between gradient descent and stochastic gradient descent? When would you use one over the other?

ⵊⵓⵏⵢⵓⵕ

ⴰⵙⵍⵎⴷ ⵏ ⵜⵎⴰⵛⵉⵏⵜ


Gradient descent and stochastic gradient descent (SGD) are optimization algorithms used to minimize a function, typically associated with minimizing the error in a model.

The primary differences between the two are the following:

Gradient Descent (GD)

Stochastic Gradient Descent (SGD)

When to use one over the other:

Moreover, variations such as mini-batch gradient descent, which balances the benefits of both GD and SGD by considering a subset of the data for each update, are often used in practice. The choice between these algorithms often depends on computational resources, dataset size, and the specific problem’s characteristics.