# Logistic Regression

It is a **classification** algorithm used to estimate discrete values ( binary values like 0/1, yes/no, true/false ) based on given set of independent variable(s). In simple words, it predicts the probability of occurrence of an event by fitting data to a logit function.

Hence, **logistic regression** is also known as **logit regression**. Since, it predicts the probability, its output values lies between 0 and 1 (as expected).

Let’s say your friend gives you a puzzle to solve. There are only 2 outcome scenarios – either you solve it or you don’t.Coming to the math, the log odds of the outcome is modeled as a linear combination of the predictor variables.

```
odds= p/ (1-p) = probability of event occurrence / probability of not 
\event occurrence
ln(odds) = ln(p/(1-p))
logit(p) = ln(p/(1-p)) = b0+b1X1+b2X2+b3X3....+bkXk
```

Above, p is the probability of presence of the characteristic of interest. *It chooses parameters that maximize the likelihood of observing the sample values rather than that minimize the sum of squared errors (like in ordinary regression).*

Now, you may ask, why take a log? For the sake of simplicity, let’s just say that this is one of the best mathematical way to replicate a step function.

### What if we just use linear regression on discrete classes?

![](https://443921002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LGHUhl6VYqrZm4Re77O%2F-LTKjN_GzJDoRWte1sjI%2F-LTJo8VzqWjUQQuFZNub%2FScreen%20Shot%202018-12-09%20at%201.09.51%20PM.png?alt=media\&token=adf1131b-985c-41b1-9c1b-da910f903e95)

**Pink**: a workable case

**Blue**: a problematic case

### **Linear Regression vs Logistic Regression output value range:**

![](https://443921002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LGHUhl6VYqrZm4Re77O%2F-LTKjN_GzJDoRWte1sjI%2F-LTJp0y49Z5-8vGnEsm8%2FScreen%20Shot%202018-12-09%20at%201.15.45%20PM.png?alt=media\&token=99cc105a-94fb-4ba3-ac9a-a651215a0f0a)

### Logistic Function

![](https://443921002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LGHUhl6VYqrZm4Re77O%2F-LTKxy37hNQ68Ltk2sPK%2F-LTKy3wQgcYgg-b688kg%2FScreen%20Shot%202018-12-09%20at%206.34.12%20PM.png?alt=media\&token=81a06e6d-34b6-4e69-9000-39770c4d5784)

A **logistic function** or **logistic curve** is a common "S" shape ([sigmoid curve](https://en.wikipedia.org/wiki/Sigmoid_function)), with equation:

&#x20;                                                 <img src="https://wikimedia.org/api/rest_v1/media/math/render/svg/6f42e36c949c94189976ae00853af9a1b618e099" alt="{\displaystyle f(x)={\frac {L}{1+e^{-k(x-x_{0})}}}}" data-size="original">

where

* e = the [natural logarithm](https://en.wikipedia.org/wiki/Natural_logarithm) base (also known as [Euler's number](https://en.wikipedia.org/wiki/E_\(mathematical_constant\))),
* x0 = the x-value of the sigmoid's midpoint,
* L = the curve's maximum value, and
* k = the steepness of the curve.

&#x20;                                 <img src="https://443921002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LGHUhl6VYqrZm4Re77O%2F-LOBduJI0U3ErwK9tLHD%2F-LOBiefxMZ86TONKvtbu%2FScreen%20Shot%202018-10-06%20at%209.12.20%20PM.png?alt=media&#x26;token=3e6ab284-7904-404d-ae71-ff8fa72cf725" alt="" data-size="original">

![](https://wikimedia.org/api/rest_v1/media/math/render/svg/c9ccf5c48fc073952bbbafe5e2a11d4eaddb90cb)

### Not use least square cost function since it would be non-convex

![](https://443921002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LGHUhl6VYqrZm4Re77O%2F-LTKjN_GzJDoRWte1sjI%2F-LTK-fVzeMbzqPsnyfTf%2FScreen%20Shot%202018-12-09%20at%202.04.57%20PM.png?alt=media\&token=65d82405-ee9b-410d-bc78-75f0466a2ccb)

### Two class cross entropy loss

h(x) is a binary class logistic function and the output of two classes are already normalized.

![](https://443921002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LGHUhl6VYqrZm4Re77O%2F-LTKjN_GzJDoRWte1sjI%2F-LTK-t9v0a3sQUkjFk5H%2FScreen%20Shot%202018-12-09%20at%202.08.05%20PM.png?alt=media\&token=b7d30f35-d7c3-4fe4-a669-4e11e8b63934)

![](https://443921002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LGHUhl6VYqrZm4Re77O%2F-LTKjN_GzJDoRWte1sjI%2F-LTK05pK7B0YztSObxqx%2FScreen%20Shot%202018-12-09%20at%202.09.05%20PM.png?alt=media\&token=c16179d9-018c-45f0-8644-71f096a40423)

### Multi-Class one vs all

![](https://443921002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LGHUhl6VYqrZm4Re77O%2F-LTKjN_GzJDoRWte1sjI%2F-LTK2PJjpT5qfIpLNBEX%2FScreen%20Shot%202018-12-09%20at%202.18.47%20PM.png?alt=media\&token=f3bad9d8-6ab3-4a66-8b3e-286b9584935f)

### Multinomial Logistic Regression - SoftMax

<div align="center"><img src="https://443921002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LGHUhl6VYqrZm4Re77O%2F-LGHYwvncirBGkeyLS8z%2F-LGH_1zvfJXzh6IBv8dZ%2FScreen%20Shot%202018-06-26%20at%204.57.26%20PM.png?alt=media&#x26;token=568a9ec5-1a55-4e63-8a9e-931e9ea21b1b" alt="What is Cross Entropy Loss"></div>

![Differences](https://443921002-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LGHUhl6VYqrZm4Re77O%2F-LGHYwvncirBGkeyLS8z%2F-LGHa7nGk0U1Ch6QxT1i%2FScreen%20Shot%202018-06-29%20at%204.04.45%20PM.png?alt=media\&token=8e7530ab-4947-4562-abe3-f76f5bde71f3)
