Linear Regression

Linear regression is used to estimate real values from continuous variable(s). In statistics, linear regression is a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable

Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.

What applications of Linear Regression are most used:

  1. Prediction, forecasting. Linear regression can be used to fit a predictive model to an observed data set of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response.

  2. Distinguish the contribution variation of the explanatory variables to response variable.

    1. Quantify the strength of the relationship between the response and the explanatory variables;

    2. determine whether some explanatory variables may have no linear relationship with the response at all;

    3. identify which subsets of explanatory variables may contain redundant information about the response.

This best fit line is known as regression line and represented by a linear equation Y= a *X + b. These coefficients a and b are derived based on minimizing the sum of squared difference of distance between data points and regression line. Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous.

Linear Regression is of mainly two types: 1) Simple Linear Regression and 2) Multiple Linear Regression. Simple Linear Regression is characterized by one independent variable. And, Multiple Linear Regression(as the name suggests) is characterized by multiple (more than 1) independent variables. While finding best fit line, you can fit a polynomial or curvilinear regression. And these are known as polynomial or curvilinear regression.

Linear Regression Categories:

  1. Simple and multiple linear regression:

    1. single input + single output: simple regression

    2. multiple input + single output: multiple regression

  2. General linear models / Multivariate linear models: multiple input + multiple output

  3. Generalized linear models: for handling response variables that are bounded or discrete

    1. Only positive predictions (price estimation)

    2. Categorial data prediction (Candidate selection, which is better described using a Bernoulli distribution/binomial distribution for binary choices, or a categorical distribution/multinomial distribution for multi-way choices)

    3. Predicting ordinal data (restaurant ratings, where 4 star one is not strictly twice better the 2 star one)

    4. Some common examples of GLMs are:

Last updated