{"id":2051,"date":"2024-07-04T21:23:21","date_gmt":"2024-07-04T21:23:21","guid":{"rendered":"https:\/\/www.w3computing.com\/articles\/?p=2051"},"modified":"2024-07-04T21:23:27","modified_gmt":"2024-07-04T21:23:27","slug":"implementing-gradient-descent-from-scratch-in-python","status":"publish","type":"post","link":"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/","title":{"rendered":"Implementing Gradient Descent from Scratch in Python"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\">Introduction<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Gradient Descent is one of the most fundamental and widely-used optimization algorithms in machine learning and deep learning. It is the backbone of many algorithms used in supervised learning. Understanding Gradient Descent is crucial for anyone aiming to delve deeper into machine learning, as it provides insights into how models learn from data. This tutorial will guide you through implementing Gradient Descent from scratch in Python, ensuring you understand each step of the process.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Prerequisites<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Before we start, it is essential to have a solid understanding of the following concepts:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Basic Python programming<\/li>\n\n\n\n<li>Basic linear algebra (vectors and matrices)<\/li>\n\n\n\n<li>Calculus (partial derivatives)<\/li>\n\n\n\n<li>Basic understanding of machine learning models (especially linear regression)<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">1. Understanding Gradient Descent<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">What is Gradient Descent?<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Gradient Descent is an optimization algorithm used to minimize the cost function in machine learning models. The cost function, also known as the loss function, measures the difference between the predicted values and the actual values. Gradient Descent iteratively adjusts the model parameters to minimize this cost function.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Types of Gradient Descent<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">There are three main types of Gradient Descent:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Batch Gradient Descent<\/strong>: Uses the entire dataset to compute the gradient at each step.<\/li>\n\n\n\n<li><strong>Stochastic Gradient Descent (SGD)<\/strong>: Uses one data point at a time to compute the gradient.<\/li>\n\n\n\n<li><strong>Mini-Batch Gradient Descent<\/strong>: Uses a small subset of the dataset to compute the gradient at each step.<\/li>\n<\/ol>\n\n\n\n<h3 class=\"wp-block-heading\">The Mathematics Behind Gradient Descent<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">The core idea of Gradient Descent is to use the gradient (the vector of partial derivatives) of the cost function to adjust the parameters in the opposite direction of the gradient. This is because the gradient points in the direction of the steepest increase in the cost function, so moving in the opposite direction decreases the cost.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The update rule for the parameters (<img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Ctheta&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"&#92;theta\" class=\"latex\" \/>) in Gradient Descent is:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Ctheta+%3A%3D+%5Ctheta+-+%5Calpha+%5Cnabla+J%28%5Ctheta%29&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"&#92;theta := &#92;theta - &#92;alpha &#92;nabla J(&#92;theta)\" class=\"latex\" \/><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Ctheta&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"&#92;theta\" class=\"latex\" \/> are the model parameters.<\/li>\n\n\n\n<li><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Calpha&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"&#92;alpha\" class=\"latex\" \/> is the learning rate, a small positive number that controls the step size.<\/li>\n\n\n\n<li><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=J%28%5Ctheta%29&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"J(&#92;theta)\" class=\"latex\" \/> is the cost function.<\/li>\n\n\n\n<li><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Cnabla+J%28%5Ctheta%29&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"&#92;nabla J(&#92;theta)\" class=\"latex\" \/> is the gradient of the cost function with respect to (\\theta).<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">2. Implementing Gradient Descent in Python<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Setting Up the Environment<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">First, let&#8217;s set up our Python environment. Ensure you have Python installed, along with essential libraries like NumPy and Matplotlib. You can install them using pip if you haven&#8217;t already:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"Bash\" data-shcb-language-slug=\"bash\"><span><code class=\"hljs language-bash\">pip install numpy matplotlib<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Bash<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">bash<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Dataset Preparation<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">For this tutorial, we&#8217;ll use a simple synthetic dataset for linear regression. Let&#8217;s create a dataset with a single feature and a target variable.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">import<\/span> numpy <span class=\"hljs-keyword\">as<\/span> np\n<span class=\"hljs-keyword\">import<\/span> matplotlib.pyplot <span class=\"hljs-keyword\">as<\/span> plt\n\n<span class=\"hljs-comment\"># Seed for reproducibility<\/span>\nnp.random.seed(<span class=\"hljs-number\">42<\/span>)\n\n<span class=\"hljs-comment\"># Generate synthetic data<\/span>\nX = <span class=\"hljs-number\">2<\/span> * np.random.rand(<span class=\"hljs-number\">100<\/span>, <span class=\"hljs-number\">1<\/span>)\ny = <span class=\"hljs-number\">4<\/span> + <span class=\"hljs-number\">3<\/span> * X + np.random.randn(<span class=\"hljs-number\">100<\/span>, <span class=\"hljs-number\">1<\/span>)\n\n<span class=\"hljs-comment\"># Plot the data<\/span>\nplt.scatter(X, y)\nplt.xlabel(<span class=\"hljs-string\">'Feature'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Target'<\/span>)\nplt.title(<span class=\"hljs-string\">'Synthetic Data'<\/span>)\nplt.show()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Linear Regression and Cost Function<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Next, we define our linear regression model and the cost function. The model can be represented as:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Chat%7By%7D+%3D+%5Ctheta_0+%2B+%5Ctheta_1+x&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"&#92;hat{y} = &#92;theta_0 + &#92;theta_1 x\" class=\"latex\" \/><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The cost function for linear regression is the Mean Squared Error (MSE):<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=J%28%5Ctheta%29+%3D+%5Cfrac%7B1%7D%7B2m%7D+%5Csum_%7Bi%3D1%7D%5E%7Bm%7D+%28%5Chat%7By%7D%5E%7B%28i%29%7D+-+y%5E%7B%28i%29%7D%29%5E2&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"J(&#92;theta) = &#92;frac{1}{2m} &#92;sum_{i=1}^{m} (&#92;hat{y}^{(i)} - y^{(i)})^2\" class=\"latex\" \/><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Where:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=m&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"m\" class=\"latex\" \/> is the number of training examples.<\/li>\n\n\n\n<li><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=%5Chat%7By%7D%5E%7B%28i%29%7D&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"&#92;hat{y}^{(i)}\" class=\"latex\" \/> is the predicted value for the <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=i&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"i\" class=\"latex\" \/>-th training example.<\/li>\n\n\n\n<li><img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=y%5E%7B%28i%29%7D&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"y^{(i)}\" class=\"latex\" \/> is the actual value for the <img decoding=\"async\" src=\"https:\/\/s0.wp.com\/latex.php?latex=i&#038;bg=ffffff&#038;fg=000&#038;s=2&#038;c=20201002\" alt=\"i\" class=\"latex\" \/>-th training example.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Let&#8217;s implement these in Python:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">predict<\/span><span class=\"hljs-params\">(X, theta)<\/span>:<\/span>\n    <span class=\"hljs-keyword\">return<\/span> X.dot(theta)\n\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">compute_cost<\/span><span class=\"hljs-params\">(X, y, theta)<\/span>:<\/span>\n    m = len(y)\n    predictions = predict(X, theta)\n    cost = (<span class=\"hljs-number\">1<\/span> \/ (<span class=\"hljs-number\">2<\/span> * m)) * np.sum((predictions - y) ** <span class=\"hljs-number\">2<\/span>)\n    <span class=\"hljs-keyword\">return<\/span> cost<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Implementing Batch Gradient Descent<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Now, we&#8217;ll implement Batch Gradient Descent. This involves updating the parameters using the entire dataset at each step.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">batch_gradient_descent<\/span><span class=\"hljs-params\">(X, y, theta, learning_rate, iterations)<\/span>:<\/span>\n    m = len(y)\n    cost_history = np.zeros(iterations)\n\n    <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(iterations):\n        gradients = (<span class=\"hljs-number\">1<\/span> \/ m) * X.T.dot(predict(X, theta) - y)\n        theta = theta - learning_rate * gradients\n        cost_history&#91;i] = compute_cost(X, y, theta)\n\n    <span class=\"hljs-keyword\">return<\/span> theta, cost_history\n\n<span class=\"hljs-comment\"># Prepare the data (add a column of ones for the bias term)<\/span>\nX_b = np.c_&#91;np.ones((<span class=\"hljs-number\">100<\/span>, <span class=\"hljs-number\">1<\/span>)), X]\ntheta = np.random.randn(<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">1<\/span>)\nlearning_rate = <span class=\"hljs-number\">0.1<\/span>\niterations = <span class=\"hljs-number\">1000<\/span>\n\ntheta_final, cost_history = batch_gradient_descent(X_b, y, theta, learning_rate, iterations)\n\nprint(<span class=\"hljs-string\">\"Final parameters:\"<\/span>, theta_final)\nplt.plot(range(iterations), cost_history)\nplt.xlabel(<span class=\"hljs-string\">'Iterations'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Cost'<\/span>)\nplt.title(<span class=\"hljs-string\">'Cost Function History'<\/span>)\nplt.show()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Implementing Stochastic Gradient Descent<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Stochastic Gradient Descent (SGD) updates the parameters using one data point at a time. This can lead to faster convergence on large datasets, though it can be more noisy.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">stochastic_gradient_descent<\/span><span class=\"hljs-params\">(X, y, theta, learning_rate, iterations)<\/span>:<\/span>\n    m = len(y)\n    cost_history = np.zeros(iterations)\n\n    <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(iterations):\n        <span class=\"hljs-keyword\">for<\/span> j <span class=\"hljs-keyword\">in<\/span> range(m):\n            random_index = np.random.randint(m)\n            xi = X&#91;random_index:random_index+<span class=\"hljs-number\">1<\/span>]\n            yi = y&#91;random_index:random_index+<span class=\"hljs-number\">1<\/span>]\n            gradients = <span class=\"hljs-number\">2<\/span> * xi.T.dot(predict(xi, theta) - yi)\n            theta = theta - learning_rate * gradients\n        cost_history&#91;i] = compute_cost(X, y, theta)\n\n    <span class=\"hljs-keyword\">return<\/span> theta, cost_history\n\ntheta = np.random.randn(<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">1<\/span>)\nlearning_rate = <span class=\"hljs-number\">0.01<\/span>\niterations = <span class=\"hljs-number\">50<\/span>\n\ntheta_final, cost_history = stochastic_gradient_descent(X_b, y, theta, learning_rate, iterations)\n\nprint(<span class=\"hljs-string\">\"Final parameters:\"<\/span>, theta_final)\nplt.plot(range(iterations), cost_history)\nplt.xlabel(<span class=\"hljs-string\">'Iterations'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Cost'<\/span>)\nplt.title(<span class=\"hljs-string\">'Cost Function History'<\/span>)\nplt.show()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Implementing Mini-Batch Gradient Descent<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Mini-Batch Gradient Descent is a compromise between Batch Gradient Descent and SGD. It uses a small subset (batch) of the dataset at each step to compute the gradient.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">mini_batch_gradient_descent<\/span><span class=\"hljs-params\">(X, y, theta, learning_rate, iterations, batch_size)<\/span>:<\/span>\n    m = len(y)\n    cost_history = np.zeros(iterations)\n\n    <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(iterations):\n        shuffled_indices = np.random.permutation(m)\n        X_shuffled = X&#91;shuffled_indices]\n        y_shuffled = y&#91;shuffled_indices]\n\n        <span class=\"hljs-keyword\">for<\/span> j <span class=\"hljs-keyword\">in<\/span> range(<span class=\"hljs-number\">0<\/span>, m, batch_size):\n            xi = X_shuffled&#91;j:j+batch_size]\n            yi = y_shuffled&#91;j:j+batch_size]\n            gradients = (<span class=\"hljs-number\">1<\/span> \/ batch_size) * xi.T.dot(predict(xi, theta) - yi)\n            theta = theta - learning_rate * gradients\n        cost_history&#91;i] = compute_cost(X, y, theta)\n\n    <span class=\"hljs-keyword\">return<\/span> theta, cost_history\n\ntheta = np.random.randn(<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">1<\/span>)\nlearning_rate = <span class=\"hljs-number\">0.1<\/span>\niterations = <span class=\"hljs-number\">200<\/span>\nbatch_size = <span class=\"hljs-number\">20<\/span>\n\ntheta_final, cost_history = mini_batch_gradient_descent(X_b, y, theta, learning_rate, iterations, batch_size)\n\nprint(<span class=\"hljs-string\">\"Final parameters:\"<\/span>, theta_final)\nplt.plot(range(iterations), cost_history)\nplt.xlabel(<span class=\"hljs-string\">'Iterations'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Cost'<\/span>)\nplt.title(<span class=\"hljs-string\">'Cost Function History'<\/span>)\nplt.show()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">3. Advanced Topics<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Learning Rate Schedules<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">A static learning rate may not always be optimal. A learning rate schedule adjusts the learning rate over time to improve convergence.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">learning_rate_schedule<\/span><span class=\"hljs-params\">(t, t0=<span class=\"hljs-number\">5<\/span>, t1=<span class=\"hljs-number\">50<\/span>)<\/span>:<\/span>\n    <span class=\"hljs-keyword\">return<\/span> t0 \/ (t + t1)\n\n<span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">sgd_with_schedule<\/span><span class=\"hljs-params\">(X, y, theta, iterations)<\/span>:<\/span>\n    m = len(y)\n    cost_history = np.zeros(iterations)\n\n    <span class=\"hljs-keyword\">for<\/span> epoch <span class=\"hljs-keyword\">in<\/span> range(iterations):\n        <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(m\n\n):\n            random_index = np.random.randint(m)\n            xi = X&#91;random_index:random_index+<span class=\"hljs-number\">1<\/span>]\n            yi = y&#91;random_index:random_index+<span class=\"hljs-number\">1<\/span>]\n            gradients = <span class=\"hljs-number\">2<\/span> * xi.T.dot(predict(xi, theta) - yi)\n            learning_rate = learning_rate_schedule(epoch * m + i)\n            theta = theta - learning_rate * gradients\n        cost_history&#91;epoch] = compute_cost(X, y, theta)\n\n    <span class=\"hljs-keyword\">return<\/span> theta, cost_history\n\ntheta = np.random.randn(<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">1<\/span>)\niterations = <span class=\"hljs-number\">50<\/span>\n\ntheta_final, cost_history = sgd_with_schedule(X_b, y, theta, iterations)\n\nprint(<span class=\"hljs-string\">\"Final parameters:\"<\/span>, theta_final)\nplt.plot(range(iterations), cost_history)\nplt.xlabel(<span class=\"hljs-string\">'Iterations'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Cost'<\/span>)\nplt.title(<span class=\"hljs-string\">'Cost Function History with Learning Rate Schedule'<\/span>)\nplt.show()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Gradient Descent with Momentum<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Momentum helps accelerate Gradient Descent in the relevant direction and dampens oscillations.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">gradient_descent_with_momentum<\/span><span class=\"hljs-params\">(X, y, theta, learning_rate, iterations, beta=<span class=\"hljs-number\">0.9<\/span>)<\/span>:<\/span>\n    m = len(y)\n    cost_history = np.zeros(iterations)\n    v = np.zeros(theta.shape)\n\n    <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(iterations):\n        gradients = (<span class=\"hljs-number\">1<\/span> \/ m) * X.T.dot(predict(X, theta) - y)\n        v = beta * v + (<span class=\"hljs-number\">1<\/span> - beta) * gradients\n        theta = theta - learning_rate * v\n        cost_history&#91;i] = compute_cost(X, y, theta)\n\n    <span class=\"hljs-keyword\">return<\/span> theta, cost_history\n\ntheta = np.random.randn(<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">1<\/span>)\nlearning_rate = <span class=\"hljs-number\">0.1<\/span>\niterations = <span class=\"hljs-number\">1000<\/span>\n\ntheta_final, cost_history = gradient_descent_with_momentum(X_b, y, theta, learning_rate, iterations)\n\nprint(<span class=\"hljs-string\">\"Final parameters:\"<\/span>, theta_final)\nplt.plot(range(iterations), cost_history)\nplt.xlabel(<span class=\"hljs-string\">'Iterations'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Cost'<\/span>)\nplt.title(<span class=\"hljs-string\">'Cost Function History with Momentum'<\/span>)\nplt.show()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Adaptive Gradient Descent Methods (AdaGrad, RMSProp, Adam)<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Adaptive methods adjust the learning rate based on past gradients.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>AdaGrad<\/strong>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-9\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">adagrad<\/span><span class=\"hljs-params\">(X, y, theta, learning_rate, iterations, epsilon=<span class=\"hljs-number\">1e-8<\/span>)<\/span>:<\/span>\n    m = len(y)\n    cost_history = np.zeros(iterations)\n    G = np.zeros(theta.shape)\n\n    <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(iterations):\n        gradients = (<span class=\"hljs-number\">1<\/span> \/ m) * X.T.dot(predict(X, theta) - y)\n        G += gradients ** <span class=\"hljs-number\">2<\/span>\n        adjusted_gradients = gradients \/ (np.sqrt(G) + epsilon)\n        theta = theta - learning_rate * adjusted_gradients\n        cost_history&#91;i] = compute_cost(X, y, theta)\n\n    <span class=\"hljs-keyword\">return<\/span> theta, cost_history\n\ntheta = np.random.randn(<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">1<\/span>)\nlearning_rate = <span class=\"hljs-number\">0.1<\/span>\niterations = <span class=\"hljs-number\">1000<\/span>\n\ntheta_final, cost_history = adagrad(X_b, y, theta, learning_rate, iterations)\n\nprint(<span class=\"hljs-string\">\"Final parameters:\"<\/span>, theta_final)\nplt.plot(range(iterations), cost_history)\nplt.xlabel(<span class=\"hljs-string\">'Iterations'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Cost'<\/span>)\nplt.title(<span class=\"hljs-string\">'Cost Function History with AdaGrad'<\/span>)\nplt.show()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-9\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\"><strong>RMSProp<\/strong>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-10\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">rmsprop<\/span><span class=\"hljs-params\">(X, y, theta, learning_rate, iterations, beta=<span class=\"hljs-number\">0.9<\/span>, epsilon=<span class=\"hljs-number\">1e-8<\/span>)<\/span>:<\/span>\n    m = len(y)\n    cost_history = np.zeros(iterations)\n    E = np.zeros(theta.shape)\n\n    <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(iterations):\n        gradients = (<span class=\"hljs-number\">1<\/span> \/ m) * X.T.dot(predict(X, theta) - y)\n        E = beta * E + (<span class=\"hljs-number\">1<\/span> - beta) * gradients ** <span class=\"hljs-number\">2<\/span>\n        adjusted_gradients = gradients \/ (np.sqrt(E) + epsilon)\n        theta = theta - learning_rate * adjusted_gradients\n        cost_history&#91;i] = compute_cost(X, y, theta)\n\n    <span class=\"hljs-keyword\">return<\/span> theta, cost_history\n\ntheta = np.random.randn(<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">1<\/span>)\nlearning_rate = <span class=\"hljs-number\">0.01<\/span>\niterations = <span class=\"hljs-number\">1000<\/span>\n\ntheta_final, cost_history = rmsprop(X_b, y, theta, learning_rate, iterations)\n\nprint(<span class=\"hljs-string\">\"Final parameters:\"<\/span>, theta_final)\nplt.plot(range(iterations), cost_history)\nplt.xlabel(<span class=\"hljs-string\">'Iterations'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Cost'<\/span>)\nplt.title(<span class=\"hljs-string\">'Cost Function History with RMSProp'<\/span>)\nplt.show()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-10\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p class=\"wp-block-paragraph\"><strong>Adam<\/strong>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-11\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-function\"><span class=\"hljs-keyword\">def<\/span> <span class=\"hljs-title\">adam<\/span><span class=\"hljs-params\">(X, y, theta, learning_rate, iterations, beta1=<span class=\"hljs-number\">0.9<\/span>, beta2=<span class=\"hljs-number\">0.999<\/span>, epsilon=<span class=\"hljs-number\">1e-8<\/span>)<\/span>:<\/span>\n    m = len(y)\n    cost_history = np.zeros(iterations)\n    v = np.zeros(theta.shape)\n    s = np.zeros(theta.shape)\n\n    <span class=\"hljs-keyword\">for<\/span> i <span class=\"hljs-keyword\">in<\/span> range(iterations):\n        gradients = (<span class=\"hljs-number\">1<\/span> \/ m) * X.T.dot(predict(X, theta) - y)\n        v = beta1 * v + (<span class=\"hljs-number\">1<\/span> - beta1) * gradients\n        s = beta2 * s + (<span class=\"hljs-number\">1<\/span> - beta2) * gradients ** <span class=\"hljs-number\">2<\/span>\n        v_corrected = v \/ (<span class=\"hljs-number\">1<\/span> - beta1 ** (i + <span class=\"hljs-number\">1<\/span>))\n        s_corrected = s \/ (<span class=\"hljs-number\">1<\/span> - beta2 ** (i + <span class=\"hljs-number\">1<\/span>))\n        theta = theta - learning_rate * v_corrected \/ (np.sqrt(s_corrected) + epsilon)\n        cost_history&#91;i] = compute_cost(X, y, theta)\n\n    <span class=\"hljs-keyword\">return<\/span> theta, cost_history\n\ntheta = np.random.randn(<span class=\"hljs-number\">2<\/span>, <span class=\"hljs-number\">1<\/span>)\nlearning_rate = <span class=\"hljs-number\">0.01<\/span>\niterations = <span class=\"hljs-number\">1000<\/span>\n\ntheta_final, cost_history = adam(X_b, y, theta, learning_rate, iterations)\n\nprint(<span class=\"hljs-string\">\"Final parameters:\"<\/span>, theta_final)\nplt.plot(range(iterations), cost_history)\nplt.xlabel(<span class=\"hljs-string\">'Iterations'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Cost'<\/span>)\nplt.title(<span class=\"hljs-string\">'Cost Function History with Adam'<\/span>)\nplt.show()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-11\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">4. Testing and Evaluation<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Visualizing the Gradient Descent Process<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Visualizing the convergence of the cost function helps understand how the algorithm progresses.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-12\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\">plt.plot(range(iterations), cost_history)\nplt.xlabel(<span class=\"hljs-string\">'Iterations'<\/span>)\nplt.ylabel(<span class=\"hljs-string\">'Cost'<\/span>)\nplt.title(<span class=\"hljs-string\">'Cost Function History'<\/span>)\nplt.show()<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-12\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h3 class=\"wp-block-heading\">Evaluating the Model Performance<\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, let&#8217;s evaluate our model&#8217;s performance by calculating the Mean Absolute Error (MAE) and R-squared (R\u00b2) score.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-13\" data-shcb-language-name=\"Python\" data-shcb-language-slug=\"python\"><span><code class=\"hljs language-python\"><span class=\"hljs-keyword\">from<\/span> sklearn.metrics <span class=\"hljs-keyword\">import<\/span> mean_absolute_error, r2_score\n\n<span class=\"hljs-comment\"># Predictions<\/span>\ny_pred = predict(X_b, theta_final)\n\n<span class=\"hljs-comment\"># Evaluation metrics<\/span>\nmae = mean_absolute_error(y, y_pred)\nr2 = r2_score(y, y_pred)\n\nprint(<span class=\"hljs-string\">f\"Mean Absolute Error: <span class=\"hljs-subst\">{mae}<\/span>\"<\/span>)\nprint(<span class=\"hljs-string\">f\"R-squared: <span class=\"hljs-subst\">{r2}<\/span>\"<\/span>)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-13\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">Python<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">python<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<h2 class=\"wp-block-heading\">5. Conclusion and Further Reading<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In this tutorial, we covered:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>The basics of Gradient Descent and its variants (Batch, Stochastic, Mini-Batch)<\/li>\n\n\n\n<li>Implementing these variants from scratch in Python<\/li>\n\n\n\n<li>Advanced optimization techniques like Momentum, AdaGrad, RMSProp, and Adam<\/li>\n\n\n\n<li>Visualizing and evaluating the Gradient Descent process<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Gradient Descent is a cornerstone of many machine learning algorithms, and mastering its implementation provides a solid foundation for further exploration in the field.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">For further reading, consider exploring:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>&#8220;Deep Learning&#8221; by Ian Goodfellow, Yoshua Bengio, and Aaron Courville<\/li>\n\n\n\n<li>&#8220;Pattern Recognition and Machine Learning&#8221; by Christopher M. Bishop<\/li>\n\n\n\n<li>Online courses like Andrew Ng&#8217;s Machine Learning on Coursera or the Deep Learning Specialization<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">By understanding and implementing Gradient Descent, you now have a powerful tool at your disposal to optimize various machine learning models.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction Gradient Descent is one of the most fundamental and widely-used optimization algorithms in machine learning and deep learning. It is the backbone of many algorithms used in supervised learning. Understanding Gradient Descent is crucial for anyone aiming to delve deeper into machine learning, as it provides insights into how models learn from data. This [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_post_was_ever_published":false},"categories":[18,4,6],"tags":[],"class_list":["post-2051","post","type-post","status-publish","format-standard","category-artificial-intelligence","category-programming-languages","category-python","entry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.8 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Implementing Gradient Descent from Scratch in Python<\/title>\n<meta name=\"description\" content=\"Gradient Descent is one of the most fundamental and widely-used optimization algorithms in machine learning and deep learning.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Implementing Gradient Descent from Scratch in Python\" \/>\n<meta property=\"og:description\" content=\"Gradient Descent is one of the most fundamental and widely-used optimization algorithms in machine learning and deep learning.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/\" \/>\n<meta property=\"article:published_time\" content=\"2024-07-04T21:23:21+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-07-04T21:23:27+00:00\" \/>\n<meta name=\"author\" content=\"w3compadmin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"w3compadmin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"TechArticle\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-gradient-descent-from-scratch-in-python\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-gradient-descent-from-scratch-in-python\\\/\"},\"author\":{\"name\":\"w3compadmin\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\"},\"headline\":\"Implementing Gradient Descent from Scratch in Python\",\"datePublished\":\"2024-07-04T21:23:21+00:00\",\"dateModified\":\"2024-07-04T21:23:27+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-gradient-descent-from-scratch-in-python\\\/\"},\"wordCount\":873,\"articleSection\":[\"Artificial Intelligence\",\"Programming Languages\",\"Python\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-gradient-descent-from-scratch-in-python\\\/\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-gradient-descent-from-scratch-in-python\\\/\",\"name\":\"Implementing Gradient Descent from Scratch in Python\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#website\"},\"datePublished\":\"2024-07-04T21:23:21+00:00\",\"dateModified\":\"2024-07-04T21:23:27+00:00\",\"author\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\"},\"description\":\"Gradient Descent is one of the most fundamental and widely-used optimization algorithms in machine learning and deep learning.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-gradient-descent-from-scratch-in-python\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-gradient-descent-from-scratch-in-python\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/implementing-gradient-descent-from-scratch-in-python\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Articles Home\",\"item\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Programming Languages\",\"item\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/programming-languages\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Implementing Gradient Descent from Scratch in Python\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#website\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/\",\"name\":\"Developer Articles Hub\",\"description\":\"\",\"alternateName\":\"Developer Articles\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/#\\\/schema\\\/person\\\/a550b3e20d78bb4f79b7c6b7b53f0561\",\"name\":\"w3compadmin\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1781352167\",\"url\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1781352167\",\"contentUrl\":\"https:\\\/\\\/www.w3computing.com\\\/articles\\\/wp-content\\\/litespeed\\\/avatar\\\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1781352167\",\"caption\":\"w3compadmin\"},\"sameAs\":[\"http:\\\/\\\/w3computing.com\\\/articles\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Implementing Gradient Descent from Scratch in Python","description":"Gradient Descent is one of the most fundamental and widely-used optimization algorithms in machine learning and deep learning.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/","og_locale":"en_US","og_type":"article","og_title":"Implementing Gradient Descent from Scratch in Python","og_description":"Gradient Descent is one of the most fundamental and widely-used optimization algorithms in machine learning and deep learning.","og_url":"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/","article_published_time":"2024-07-04T21:23:21+00:00","article_modified_time":"2024-07-04T21:23:27+00:00","author":"w3compadmin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"w3compadmin","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"TechArticle","@id":"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/#article","isPartOf":{"@id":"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/"},"author":{"name":"w3compadmin","@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561"},"headline":"Implementing Gradient Descent from Scratch in Python","datePublished":"2024-07-04T21:23:21+00:00","dateModified":"2024-07-04T21:23:27+00:00","mainEntityOfPage":{"@id":"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/"},"wordCount":873,"articleSection":["Artificial Intelligence","Programming Languages","Python"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/","url":"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/","name":"Implementing Gradient Descent from Scratch in Python","isPartOf":{"@id":"https:\/\/www.w3computing.com\/articles\/#website"},"datePublished":"2024-07-04T21:23:21+00:00","dateModified":"2024-07-04T21:23:27+00:00","author":{"@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561"},"description":"Gradient Descent is one of the most fundamental and widely-used optimization algorithms in machine learning and deep learning.","breadcrumb":{"@id":"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.w3computing.com\/articles\/implementing-gradient-descent-from-scratch-in-python\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Articles Home","item":"https:\/\/www.w3computing.com\/articles\/"},{"@type":"ListItem","position":2,"name":"Programming Languages","item":"https:\/\/www.w3computing.com\/articles\/programming-languages\/"},{"@type":"ListItem","position":3,"name":"Implementing Gradient Descent from Scratch in Python"}]},{"@type":"WebSite","@id":"https:\/\/www.w3computing.com\/articles\/#website","url":"https:\/\/www.w3computing.com\/articles\/","name":"Developer Articles Hub","description":"","alternateName":"Developer Articles","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.w3computing.com\/articles\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.w3computing.com\/articles\/#\/schema\/person\/a550b3e20d78bb4f79b7c6b7b53f0561","name":"w3compadmin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1781352167","url":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1781352167","contentUrl":"https:\/\/www.w3computing.com\/articles\/wp-content\/litespeed\/avatar\/bd481d404e42caa2763662a3bfe825f8.jpg?ver=1781352167","caption":"w3compadmin"},"sameAs":["http:\/\/w3computing.com\/articles"]}]}},"featured_image_src":null,"featured_image_src_square":null,"author_info":{"display_name":"w3compadmin","author_link":"https:\/\/www.w3computing.com\/articles\/author\/w3compadmin\/"},"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2051","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/comments?post=2051"}],"version-history":[{"count":6,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2051\/revisions"}],"predecessor-version":[{"id":2057,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/posts\/2051\/revisions\/2057"}],"wp:attachment":[{"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/media?parent=2051"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/categories?post=2051"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.w3computing.com\/articles\/wp-json\/wp\/v2\/tags?post=2051"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}