Logistic Regression
Last updated
Last updated
It is a classification algorithm used to estimate discrete values ( binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function.
Hence, logistic regression is also known as logit regression. Since, it predicts the probability, its output values lies between 0 and 1 (as expected).
Let’s say your friend gives you a puzzle to solve. There are only 2 outcome scenarios – either you solve it or you don’t.Coming to the math, the log odds of the outcome is modeled as a linear combination of the predictor variables.
Above, p is the probability of presence of the characteristic of interest. It chooses parameters that maximize the likelihood of observing the sample values rather than that minimize the sum of squared errors (like in ordinary regression).
Now, you may ask, why take a log? For the sake of simplicity, let’s just say that this is one of the best mathematical way to replicate a step function.
Pink: a workable case
Blue: a problematic case
A logistic function or logistic curve is a common "S" shape (sigmoid curve), with equation:
where
e = the natural logarithm base (also known as Euler's number),
x0 = the x-value of the sigmoid's midpoint,
L = the curve's maximum value, and
k = the steepness of the curve.
h(x) is a binary class logistic function and the output of two classes are already normalized.