PART 1 : Assumptions of Simple Linear Regression Model and Estimation of Unknown Parameters by Using the Method of Least-Squares

AYŞE DUMAN
Yıldız Technical University - Sky Lab
6 min readOct 16, 2020

--

Source

☆╮What is the purpose of regression analysis?

Regression analysis is a set of statistical methods used for describing the relationships between independent variables and dependent variable. Our main goal is to estimate the relationships between dependent variable(also known as response variable) y and one or more independent (also known as predictor or explanatory) variables (X). It can be used to evaluate the strength of the relationship between variables for modeling the future relationship between them.

╰☆╮Simple Linear Regression Model

Source

Suppose we observe bivariate data (X, Y ), but we do not know the regression function E(y | X = x). In many cases it is reasonable to assume that the function is linear:

✤ β1 is slope, which determines whether relationship between x and y is positive or negative.

✤ β0 is intercept or constant term, which determines where the linear relationship intersects the y axis.

╰☆╮ Is it possible that this is a definite, “deterministic” relationship?

The answer is no. Given data (almost) never fit exactly along line. A few reasons for this are as follows.

✤ Measurement error (incorrect definition or mismeasurement)

✤ Other variables that affect y

✤ The relationship is not entirely linear

✤ Relationship may be different for different observations

The statistical model must be modeled as determining the expected value of y.

Adding an error term for a “stochastic” relationship gives us the actual value of y. This expression is given below for the first order linear model.

the first order linear model
The first order linear model

╰☆╮ Error term (or residual) “e” covers all of the above problems.

✤ Error term is considered as a random variable and is not observed directly.

✤ Variance of e is σ², which is the conditional variance of y given x, the variance of the conditional distribution of y given x.

✤ The simplest, but not generally valid assumption is that the conditional variance is the same for all observations in our sample (homoskedasticity -> equal variance)

╰☆╮The Goal of Regression Models

True regression line: actual relationship in population

✤ True β and f (e|x)

✤ Sample of observations in the population come from drawing random realizations of e from f (e|x) and plotting points appropriately above and below the true regression line.

❝ Based on the observed sample of the y and x pairs, we want to find an estimated regression line that comes as close as possible to the true regression line. ❞

✤ Estimate values of parameters β0 and β1.

✤ Estimate properties of probability distribution of error term or residual (e).

✤ Make inferences about the above estimates.

✤ Use the estimates to make conditional estimates of y.

✤ Determine the statistical reliability of these estimates.

╰☆╮Assumptions of simple regression model

Linear regression analysis is based on six fundamental assumptions:

  1. The regression model is linear in parameters

2. The mean of residuals is zero

3. Homoscedasticity of residuals or equal variance:

If X is random this becomes:

✤ Homoscedasticity express a situation in which the error term (“noise” or random disturbance in the relationship between the independent variables and the dependent variable) is the samefor all values of the independent variables.

4. No autocorrelation of residuals

✤ This is especially true for time series data. Autocorrelation is the correlation of a time Series with lags of itself. When the residuals are autocorrelated, it means that the current value is dependent of the previous (historic) values and that there is a precise unexplained pattern in the Y variable that occurs the disturbances.

5. The X variables and residuals are uncorrelated

✤ x is non-random and takes on at least two values

We will allow random x later and see that E(e | x ) = 0 implies that e must be uncorrelated with x.

6. Normality of residuals:

✤ The residuals should be normally distributed. If the maximum likelihood method (not OLS) is used to compute the estimates, this also implies the Y and the X are also normally distributed.

╰☆╮ Strategies for estimating coefficients

What is an estimator?

A rule (formula) for calculating an estimate of a parameter ( β0, β1, σ²) based on the sample values y, x.

How might we estimate the β coefficients of the simple regression model?

There are three methods for this :

✤ Method of least-squares

✤ Method of moments

✤ Method of maximum likelihood

╰☆╮ Method of least-squares

Estimation strategy: It works by making the total of the square of the errors as small as possible.

Given coefficient estimates β0 and β1 residuals are defined as

Why not minimize the sum of the residuals?

✤ We don’t want sum of residuals to be large negative number: Minimize sum of residuals by having all residuals infinitely negative.

✤ Many alternative lines that make sum of residuals zero (which is desirable) because positives and negatives cancel out.

Why do we use squares rather than absolute values ​​to deal with the cancellation of positives and negatives?

✤ This is because square function is continuously differentiable; absolute value function is not.

✤ Least-squares estimation is much easier than least-absolute deviation estimation.

➤ Least-squares estimators, β0 and β1, is the solution of minimizing the S function.

Sum of Squares of the residuals(SSR) FUNCTİON

➤ Since S is a continuously differentiable function of the estimated parameters, we can differentiate and set the partial derivatives equal to zero to get the least-squares normal equations:

Least-Squares Normal Equations (1)
Least-Squares Normal Equations (2)

If we multiply each term in equation( 2) by 1 / N, the following expression is obtained.

Ordinary Least-Squares(OLS) coefficient estimators (1)

Substituting the second condition into the first divided by N, the following expressions are obtained.

Ordinary Least-Squares”(OLS) coefficient estimators (2)

The β1 estimator is the sample covariance of x and y divided by the sample variance of x.

Note that OLS line passes through the mean point (x_mean, y_mean)

Ordinary Least Squares- Rómulo A. Chumacero

What if x is constant in all observations in our example?

✤ Denominator is zero and therefore we can’t calculate β1.

✤ This is our first encounter with the problem of collinearity: if x is a constant then x is a linear combination of the “other regressor” — the constant one that is multiplied by b0. 

✤ Collinearity (or multicollinearity) will be more of a problem in multiple regression. If it is extreme (or perfect), it means that we can’t compute the slope estimates.

╰☆╮╰☆╮╰☆╮╰☆╮╰☆╮╰☆╮╰☆╮╰☆╮╰☆╮╰☆╮╰☆╮

In my next post,
I’ll be explaining the “Method of moments” and “Method of maximum likelihood” in order to estimate β coefficients with different approach.

References:

https://www.statisticssolutions.com

https://365datascience.com

https://corporatefinanceinstitute.com

http://r-statistics.co

https://www.reed.edu

--

--

AYŞE DUMAN
Yıldız Technical University - Sky Lab

I am an undergraduate student at Yıldız Technical University Department of Mathematical Engineering and Statistics.