Logistic Regression

It is a classification algorithm used to estimate discrete values ( binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function.

Hence, logistic regression is also known as logit regression. Since, it predicts the probability, its output values lies between 0 and 1 (as expected).

Let’s say your friend gives you a puzzle to solve. There are only 2 outcome scenarios – either you solve it or you don’t.Coming to the math, the log odds of the outcome is modeled as a linear combination of the predictor variables.

odds= p/ (1-p) = probability of event occurrence / probability of not 
\event occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk

Above, p is the probability of presence of the characteristic of interest. It chooses parameters that maximize the likelihood of observing the sample values rather than that minimize the sum of squared errors (like in ordinary regression).

Now, you may ask, why take a log? For the sake of simplicity, let’s just say that this is one of the best mathematical way to replicate a step function.

What if we just use linear regression on discrete classes?

Pink: a workable case

Blue: a problematic case

Linear Regression vs Logistic Regression output value range:

Logistic Function

A logistic function or logistic curve is a common "S" shape (sigmoid curve), with equation:

where

  • e = the natural logarithm base (also known as Euler's number),

  • x0 = the x-value of the sigmoid's midpoint,

  • L = the curve's maximum value, and

  • k = the steepness of the curve.

Not use least square cost function since it would be non-convex

Two class cross entropy loss

h(x) is a binary class logistic function and the output of two classes are already normalized.

Multi-Class one vs all

Multinomial Logistic Regression - SoftMax

Last updated