Cracking Logistic Regression from Scratch

4 min readMay 23, 2024

Introduction

Ever tried to predict if your cat will jump on your keyboard during a Zoom call? Well, that’s a classification problem! Logistic regression is a great tool for binary classification tasks like this. In this blog post, we’ll walk you through implementing logistic regression from scratch, sprinkle in some optimization magic, and share some amusing and practical applications. Ready? Let’s dive in!

Understanding Logistic Regression

Logistic regression might sound like something straight out of a spaceship’s control panel, but it’s actually quite simple. It’s a method to predict binary outcomes (yes/no, true/false, cat on keyboard/not on keyboard) using a logistic function. This function outputs probabilities that are then thresholded to make final predictions.

Here’s a quick rundown of how it works:

Linear Combination: Compute a weighted sum of the input features.
Sigmoid Function: Apply the sigmoid function to squash the linear combination into a probability (between 0 and 1).
Thresholding: Convert the probability to a binary outcome (e.g., 1 if probability > 0.5, else 0).

Implementing Logistic Regression from Scratch

Let’s break down the code step by step:

import numpy as np

def sigmoid(z):
    return 1 / (1 + np.exp(-z))

def initialize_weights(dim):
    w = np.zeros(dim)
    b = 0
    return w, b

def propagate(w, b, X, Y):
    m = X.shape[0]
    A = sigmoid(np.dot(X, w) + b)
    cost = -1/m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))
    
    dw = 1/m * np.dot(X.T, (A - Y))
    db = 1/m * np.sum(A - Y)
    
    grads = {"dw": dw, "db": db}
    
    return grads, cost

def optimize(w, b, X, Y, num_iterations, learning_rate):
    costs = []
    
    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)
        
        w -= learning_rate * grads["dw"]
        b -= learning_rate * grads["db"]
        
        if i % 100 == 0:
            costs.append(cost)
            print(f"Cost after iteration {i}: {cost}")
            
    params = {"w": w, "b": b}
    grads = {"dw": grads["dw"], "db": grads["db"]}
    
    return params, grads, costs

def predict(w, b, X):
    m = X.shape[0]
    Y_prediction = np.zeros((m,))
    A = sigmoid(np.dot(X, w) + b)
    
    for i in range(A.shape[0]):
        Y_prediction[i] = 1 if A[i] > 0.5 else 0
        
    return Y_prediction

# Example usage
X_train = np.random.rand(100, 2)  # Example training data
Y_train = np.random.randint(0, 2, 100)  # Example binary labels

w, b = initialize_weights(X_train.shape[1])
params, grads, costs = optimize(w, b, X_train, Y_train, num_iterations=1000, learning_rate=0.01)

Y_prediction = predict(params["w"], params["b"], X_train)
print("Train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction - Y_train)) * 100))

Optimization Techniques

Gradient Descent Variants: Using Stochastic Gradient Descent (SGD) or Mini-batch Gradient Descent can speed up convergence and escape local minima. Imagine hiking up a mountain but only looking at your feet; that’s normal gradient descent. Now imagine you look up every few steps; that’s mini-batch!

def mini_batch_gradient_descent(X, Y, w, b, learning_rate, batch_size, num_iterations):
    m = X.shape[0]
    for i in range(num_iterations):
        idx = np.random.permutation(m)
        X_shuffled = X[idx]
        Y_shuffled = Y[idx]
        
        for j in range(0, m, batch_size):
            X_batch = X_shuffled[j:j+batch_size]
            Y_batch = Y_shuffled[j:j+batch_size]
            
            grads, cost = propagate(w, b, X_batch, Y_batch)
            w -= learning_rate * grads["dw"]
            b -= learning_rate * grads["db"]
            
            if i % 100 == 0 and j == 0:
                print(f"Cost after iteration {i}, batch {j}: {cost}")
    return w, b

Regularization: Add regularization terms (like L2 regularization) to avoid overfitting. This is like putting blinders on a horse to keep it focused on the path and not the distractions.

def propagate_with_regularization(w, b, X, Y, lambda_):
    m = X.shape[0]
    A = sigmoid(np.dot(X, w) + b)
    cost = -1/m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A)) + (lambda_ / (2 * m)) * np.sum(np.square(w))
    
    dw = 1/m * np.dot(X.T, (A - Y)) + (lambda_ / m) * w
    db = 1/m * np.sum(A - Y)
    
    grads = {"dw": dw, "db": db}
    
    return grads, cost

Real-World Applications and Fun Facts

Spam Detection: Logistic regression helps email services predict if an email is spam or not. Fun fact: Google’s spam filter uses advanced logistic regression techniques to keep your inbox clean.
Medical Diagnosis: It’s used to predict the likelihood of diseases based on patient data. For instance, predicting diabetes based on factors like age, weight, and lifestyle.
Credit Scoring: Banks use logistic regression to determine the likelihood of a customer defaulting on a loan. It’s like a digital fortune teller for financial health!
Social Media Virality: Platforms like Instagram and TikTok might use logistic regression to predict which posts are likely to go viral. Who knew your cat videos could be analyzed so thoroughly?

Conclusion

Logistic regression is a powerful yet simple tool for binary classification tasks. By understanding and implementing it from scratch, you gain a deeper appreciation of its workings and can fine-tune it for specific needs. Plus, knowing about optimization techniques and real-world applications makes you a data science rockstar!

Remember, data science is not just about crunching numbers; it’s about solving real-world problems and sometimes predicting if your cat will make a grand appearance in your next Zoom call. Happy coding!