Post

A First Course in Optimization (PENDING)

Gradient Descent is a very powerful optimization algorithm that is used almost everywhere in machine learning, from solving logistic regression in 1950s to GPT3!

A First Course in Optimization (PENDING)

You can play with the algorithms here and checkout the implementation details here

Problem Statement

Consider a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$, our aim is to find the minimum value of the function. Now before we begin we could have generalised a little bit more, what about functions whose domain is $\subseteq \mathbb{R}^n$. But for the time being let us continue with former problem statement.

Let $f(x) = x^2$ be a one-dimensional function whose domain is real line. One can quickly infer its minimum value is 0 and is attained at $x = 0$. One way to see this follows from a keen inspection of derivative, namely by setting it to 0.

Theorem 1 (First-Order Necessary Conditions) 1
If $x^∗$ is a local minimizer and $f$ is continuously differentiable in an open neighborhood of $x^∗$ , then $\nabla f(x^∗) = 0$.

Proof. We prove by contradiction. Assume wrongly that gradient is non-zero at $x^*$. If we now move in direction opposite to gradient we are bound to attain a lower value of function! We are using the fact that gradient points in direction of greatest ascent and consequently opposite direction points to steepest descent. Hence by contradiction, $\nabla f(x^*) = 0$.

Further Resources

References

This post is licensed under CC BY 4.0 by the author.