Simple Regression model

Февраль 11, 2021

Главная
Разное
Simple Regression model

Содержание

2. This is an example plot of linear function: The nature of the relationship between variables can
3. 1 Y SIMPLE REGRESSION MODEL Suppose that a variable Y is a linear function of another
4. If the relationship were an exact one, the observations would lie on a straight line and
5. P4 In practice, most economic relationships are not exact and the actual values of Y are
6. P4 To allow for such divergences, we will write the model as Y = β0 +
7. P4 Each value of Y thus has a nonrandom component, β0 + β1X, and a random
8. P4 In practice we can see only the P points. P3 P2 P1 SIMPLE REGRESSION MODEL
9. P4 Obviously, we can use the P points to draw a line which is an approximation
11. However, we have obtained data from only a random sample of the population. For a sample,
12. P4 The line is called the fitted model and the values of Y predicted by it
13. P4 The discrepancies between the actual and fitted values of Y are known as the residuals.
14. SIMPLE REGRESSION MODEL Least squares criterion: Minimize SSE (residual sum of squares), where To begin with,
15. SIMPLE REGRESSION MODEL Why the squares of the residuals? Why not just minimize the sum of
16. P4 The answer is that you would get an apparently perfect fit by drawing a horizontal
17. P4 You must prevent negative residuals from cancelling positive ones, and one way to do this
18. SIMPLE REGRESSION MODEL Since we are minimizing which has two unknowns, b0 and b1. A mathematical
19. SIMPLE REGRESSION MODEL For the mathematically curious , I provide a condensed derivation of the coefficients.
20. Since there are two equations with two unknown, we can solve these equations simultaneously for b0
21. SIMPLE REGRESSION MODEL In matrix notation OLS may be written as: Y = Xb + e
22. SIMPLE REGRESSION MODEL We can state as follows: How to inverse XTX? 1. matrix determinant 2.
23. SIMPLE REGRESSION MODEL EXAMPLE In this problem we were looking at the way home size is
24. SIMPLE REGRESSION MODEL
25. SIMPLE REGRESSION MODEL
26. SIMPLE REGRESSION MODEL
27. Let’s try another example: X – commercial time (minutes) Y – sales ($ hundred thousand)
29. REGRESSION MODEL WITH TWO EXPLANATORY VARIABLES
30. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES Y X2 X1 β0 1 This sequence provides a geometrical
31. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES Y X2 X1 β0 3 The model has three dimensions,
32. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES Y X2 X1 β0 4 Literally the intercept gives weekly
33. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES 5 Y X2 The next term on the right side
34. pure X2 effect MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES X1 β0 β0 + β2X2 Y X2
35. pure X2 effect pure X1 effect MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES X1 β0 β0 +
36. pure X2 effect pure X1 effect MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES X1 β0 β0 +
37. pure X2 effect pure X1 effect MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES X1 β0 β0+ β1X1+
38. pure X2 effect pure X1 effect MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES 10 X1 β0 β0
39. Slope coefficients are interpreted as partial slope/partial regression coefficients: ? b1 = average change in Y
40. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES The regression coefficients are derived using the same least squares
41. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES The residual ei in observation i is the difference between
42. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES We define SSE, the sum of the squares of the
43. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES First we expand SSE as shown, and then we use
44. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES We thus obtain three equations in three unknowns. Solving for
45. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES The expression for b0 is a straightforward extension of the
46. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES However, the expressions for the slope coefficients are considerably more
47. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES For the general case when there are many explanatory variables,
48. In matrix notation OLS may be written as: Y = Xb + e The normal equations
49. MATRIX ALGEBRA: SUMMARY A vector is a collection of n numbers or elements, collected either in
50. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE Data for weekly salary based upon the length of
51. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE Y-weekly salary ($) X1 –length of employment (months) X2-age
52. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
53. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE Y-weekly salary ($) X1 –length of employment (months) X2-age
54. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE These are our data points in 3dimensional space (graph
55. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE Data points with the regression surface (Statistica 6.0) X1
56. MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE Data points with the regression surface (Statistica 6.0) after
57. There are times when a variable of interest in a regression cannot possibly be considered quantitative.
58. If a large sample size is not possible, a dummy variable can be employed to introduce
59. For example, a male could be designated with the code 0 and the female could be
60. Example 1 Returning to real-estate developer, we noticed that all the houses in the population were
61. Using these data, we can construct the necessary dummy variables and determine whether they contribute significantly
62. However, this type of coding has many problems. First, because 0
63. To represent the three neighborhoods, we use two dummy variables, by letting
64. What happened to neighborhood C? It is not necessary to develop a third dummy variable. IT
65. Why? One predictor variable is a linear combination (including a constant term) of one or more
66. The final array of data is
67. · If family income increases 1000$ the average home size will increase about 0,082 hundred of
68. · The houses located in neighborhood A are 1,613 hundred of square feet bigger then houses
69. Example 2 Joanne Herr, an analyst for the Best Foods grocery chain, wanted to know whether
70. A model can be set up to predict the dollar amount per purchase: where Y^- expected
71. The data The variables X1 and X2 are dummy variables representing purchases in store A or
72. The regression equation
73. · the average dollar amount per purchase is for store A is 10,01$ higher comparing with
75. Скачать презентацию

Слайд 2

This is an example plot of linear function:
The nature of the

relationship between variables can take many forms, ranging from simple mathematical functions to extremely complicated ones. The simplest relationship consists of a straight-line or linear relationship (linear function).

Слайд 3

1
Y
SIMPLE REGRESSION MODEL
Suppose that a variable Y is a linear function of

another variable X, with unknown parameters β0 and β1 that we wish to estimate.

β0

Suppose that we have a sample of 4 observations with X values as shown.

Слайд 4

If the relationship were an exact one, the observations would lie on

a straight line and we would have no trouble obtaining accurate estimates of β0 and β1. When all empirical pairs of X-Y points lie on a straight line – it is called a functional or deterministic relationship.

SIMPLE REGRESSION MODEL

β0

Слайд 5

P4
In practice, most economic relationships are not exact and the actual values

of Y are different from those corresponding to the straight line.

SIMPLE REGRESSION MODEL

β0

Слайд 6

P4
To allow for such divergences, we will write the model as Y

= β0 + β1X + e, where e is a disturbance term.

SIMPLE REGRESSION MODEL

β0

Слайд 7

P4
Each value of Y thus has a nonrandom component, β0 + β1X,

and a random component, e. The first observation has been decomposed into these two components.

SIMPLE REGRESSION MODEL

β0

Слайд 8

P4
In practice we can see only the P points.
P3
P2
P1
SIMPLE REGRESSION MODEL
7
Y

Слайд 9

P4
Obviously, we can use the P points to draw a line which

is an approximation to the line Y = β0 + β1X. If we write this line Y = b0 + b1X, b0 is an estimate of β0 and b1 is an estimate of β1.

SIMPLE REGRESSION MODEL

Слайд 10

Слайд 11

However, we have obtained data from only a random sample of the

population. For a sample, b0 and b1 can be used as estimates (estimators) of the respective population parameters β0 and β1
The intercept b0 and the slope b1 are the coefficients of the regression line. The slope b1 is the change in Y (increase, if >0, and decrease, if <0) associated with a unit change in X. The intercept is the value of Y when X=0; it’s the point at which the population regression line intersects the Y axis. In some cases the intercept has no real-world meaning (for example when X is the class size, Y is the test score – the intercept is the predicted value of test scores when there are no students in the class!).
Random error contains all the other factors besides X that determine the value of the dependent variable Y, for a specific observation.

SIMPLE REGRESSION MODEL

Слайд 12

P4
The line is called the fitted model and the values of Y

predicted by it are called the fitted values of Y. They are given by the heights of the R points.

SIMPLE REGRESSION MODEL

(fitted value)

Y (actual value)

Слайд 13

P4
The discrepancies between the actual and fitted values of Y are known

as the residuals.

(residual)

SIMPLE REGRESSION MODEL

(fitted value)

Y (actual value)

Слайд 14

SIMPLE REGRESSION MODEL
Least squares criterion:
Minimize SSE (residual sum of squares), where
To begin

with, we will draw the fitted line so as to minimize the sum of the squares of the residuals, SSE. This is described as the least squares criterion.

Слайд 15

SIMPLE REGRESSION MODEL
Why the squares of the residuals? Why not just minimize

the sum of the residuals?

Least squares criterion:

Why not minimize

Minimize SSE (residual sum of squares), where

Слайд 16

P4
The answer is that you would get an apparently perfect fit by

drawing a horizontal line through the mean value of Y. The sum of the residuals would be zero.

SIMPLE REGRESSION MODEL

Слайд 17

P4
You must prevent negative residuals from cancelling positive ones, and one way

to do this is to use the squares of the residuals.

SIMPLE REGRESSION MODEL

Слайд 18

SIMPLE REGRESSION MODEL
Since we are minimizing
which has two unknowns, b0 and b1.

A mathematical technique which determines the values of b0 and b1 that best fit the observed data is known as the Ordinary Least Squares method (OLS).
Ordinary Least Squares is a procedure that selects the best fit line given a set of data points, by minimizing the sum of the squared deviations of the points from a line. That is, if is the equation of the best line to fit through the data then in order to get this best line, using the least squares criteria, for each value data point (xi,yi) if where , then ei is the amount of deviation of the data point from the line. The least squares criteria minimizes, finds the slope b1 and the y-intercept b0 from the data, that minimizes the sum of the square deviations, .

Слайд 19

SIMPLE REGRESSION MODEL
For the mathematically curious , I provide a condensed derivation

of the coefficients.
To minimize determine the
partial derivatives with respect to b0 and with respect to b1. These are:

Setting and solving for b0 and b1 results in equations given below.

Слайд 20

Since there are two equations with two unknown, we can solve these

equations simultaneously for b0 and b1 as follows:
ONLY FOR REGRESSION MODELS WITH ONE INDEPENDENT VARIABLE!
We also note that the regression line always goes through the mean ( ).

SIMPLE REGRESSION MODEL

Слайд 21

SIMPLE REGRESSION MODEL
In matrix notation OLS may be written as:
Y = Xb

+ e
The normal equations in matrix form are now
XT Y = XTXb
And when we solve it for b we get:
b = (XTX)-1XTY
where Y is a column vector of the Y values and X is a matrix containing a column of ones (to pick up the intercept) followed by a column of the X variable containing the observations on it and b is a vector containing the estimators of regression parameters.

Слайд 22

SIMPLE REGRESSION MODEL
We can state as follows:
How to inverse XTX?
1. matrix determinant

2.    minor matrix
3.    cofactor matrix
4.    inverse matrix

Слайд 23

SIMPLE REGRESSION MODEL
EXAMPLE
In this problem we were looking at the way home

size is effected by the family income. We will use this model to try to predict the value of the dependent variable based on the independent variable. Also, the slope will help us to understand how the Y variable changes for each unit change in the X variable.
Assume a real-estate developer is interested in determining the relationship between family income (X, in thousand of dollars) of the local resident and the square footage of their homes (Y, in hundreds of square feet). A random sample of ten families is obtained with the following results:

Слайд 24

SIMPLE REGRESSION MODEL

Слайд 25

SIMPLE REGRESSION MODEL

Слайд 26

SIMPLE REGRESSION MODEL

Слайд 27

Let’s try another example:
X – commercial time (minutes) Y – sales ($

hundred thousand)

Слайд 28

Слайд 29

REGRESSION MODEL WITH TWO EXPLANATORY VARIABLES

Слайд 30

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
Y
X2
X1
β0
1
This sequence provides a geometrical interpretation of

a multiple regression model with two explanatory variables.

Y – weekly salary ($)
X1 – length of employment (in months)
X2 – age (in years)

Specifically, we will look at weekly salary function model where weekly salary, Y, depend on length of employment X1, and age, X2.

Слайд 31

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
Y
X2
X1
β0
3
The model has three dimensions, one each

for Y, X1, and X2. The starting point for investigating the determination of Y is the intercept, β0.

Y – weekly salary ($)
X1 – length of employment (in months)
X2 – age (in years)

Слайд 32

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
Y
X2
X1
β0
4
Literally the intercept gives weekly salary for

those respondents who have no age (??) and no length of employment (??). Hence a literal interpretation of β0 would be unwise.

Y – weekly salary ($)
X1 – length of employment (in months)
X2 – age (in years)

Слайд 33

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
5
Y
X2
The next term on the right side

of the equation gives the effect of X1. A one month of employment increase in X1 causes weekly salary to increase by β1dollars, holding X2 constant.

β0

pure X1 effect

β0 + β1X1

Y – weekly salary ($)
X1 – length of employment (in months)
X2 – age (in years)

Слайд 34

pure X2 effect
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
X1
β0
β0 + β2X2
Y
X2
6
Similarly, the third

term gives the effect of variations in X2. A one year of age increase in X2 causes weekly salary to increase by β2 dollars, holding X1 constant.

Слайд 35

pure X2 effect
pure X1 effect
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
X1
β0
β0 + β2X2
β0

+ β1X1 + β2X2

β0 + β1X1

combined effect of X1 and X2

Different combinations of X1 and X2 give rise to values of weekly salary which lie on the plane shown in the diagram, defined by the equation Y = β0 + β1X1 + β2X2. This is the nonrandom component of the model.

Слайд 36

pure X2 effect
pure X1 effect
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
X1
β0
β0 + β2X2
β0

+ β1X1 + β2X2

β0 + β1X1 + β2X2+ ei

combined effect of X1 and X2

The final element of the model is the error term, e. This causes the actual values of Y to deviate from the plane. In this observation, e happens to have a positive value.

β0 + β1X1

Слайд 37

pure X2 effect
pure X1 effect
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
X1
β0
β0+ β1X1+ β2X2
β0

+ β1X1 + β2X2 + e

combined effect of X1 and X2

A sample consists of a number of observations generated in this way. Note that the interpretation of the model does not depend on whether X1 and X2 are correlated or not.

β0 + β1X1

β0 + β2X2

Слайд 38

pure X2 effect
pure X1 effect
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
10
X1
β0
β0 + β1X1+

β2X2

β0 + β1X1 + β2X2+ e

combined effect of X1 and X2

However we do assume that the effects of X1 and X2 on salary are additive. The impact of a difference in X1 on salary is not affected by the value of X2, or vice versa.

β0 + β1X1

β0 + β2X2

Слайд 39

Slope coefficients are interpreted as partial slope/partial regression coefficients:
? b1 =

average change in Y associated with a unit change in X1, with the other independent variables held constant (all else equal);
? b2 = average change in Y associated with a unit change in X2, with the other independent variables held constant (all else equal).

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES

Слайд 40

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
The regression coefficients are derived using the

same least squares principle used in simple regression analysis. The fitted value of Y in observation i depends on our choice of b0, b1, and b2.

Слайд 41

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
The residual ei in observation i is

the difference between the actual and fitted values of Y.

Слайд 42

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
We define SSE, the sum of the

squares of the residuals, and choose b0, b1, and b2 so as to minimize it.

Слайд 43

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
First we expand SSE as shown, and

then we use the first order conditions for minimizing it.

Слайд 44

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
We thus obtain three equations in three

unknowns. Solving for b0, b1, and b2, we obtain the expressions shown above.

Слайд 45

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
The expression for b0 is a straightforward

extension of the expression for it in simple regression analysis.

Слайд 46

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
However, the expressions for the slope coefficients

are considerably more complex than that for the slope coefficient in simple regression analysis.

Слайд 47

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
For the general case when there are

many explanatory variables, ordinary algebra is inadequate. It is necessary to switch to matrix algebra.

Слайд 48

In matrix notation OLS may be written as:
Y = Xb + e
The

normal equations in matrix form are now
XT Y = XTXb
And when we solve it for b we get:
b = (XTX)-1XTY
where Y is a column vector of the Y values and X is a matrix containing a column of ones (to pick up the intercept) followed by a column of the X variables containing the observations on them and b is a vector containing the estimators of regression parameters.

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES

Слайд 49

MATRIX ALGEBRA: SUMMARY
A vector is a collection of n numbers or elements,

collected either in a column (a column vector) or in a row (a row vector).
A matrix is a collection, or array, of numbers of elements in which the elements are laid out in columns and rows. The dimension of matrix is n x m where n is the number of rows and m is the number of columns.
Types of matrices
A matrix is said to be square if the number of rows equals the number of columns. A square matrix is said to be symmetric if its (i, j) element equals its (j, i) element. A diagonal matrix is a square matrix in which all the off-diagonal elements equal zero, that is, if the square matrix A is diagonal, then aij =0 for i≠j.
The transpose of a matrix switches the rows and the columns. That is, the transpose of a matrix turns the n x m matrix A into the m x n matrix denoted by AT, where the (i, j) element of A becomes the (j, i) element of AT; said differently, the transpose of a matrix A turns the rows of A into the columns of AT. The inverse of the matrix A is defined as the matrix for which A-1A=1. If in fact the inverse matrix A-1 exists, then A is said to be invertible or nonsingular.
Vector and matrix multiplication
The matrices A and B can be multiplied together if they are conformable, that is, if the number of columns of A equals the number of rows of B. In general, matrix multiplication does not commute, that is, in general AB≠ BA.

Слайд 50

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
Data for weekly salary based upon

the length of employment and
age of employees of a large industrial corporation are shown in the table.

Calculate the OLS estimates for regression coefficients for the available sample. Comment on your results.

Слайд 51

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
Y-weekly salary ($) X1 –length of employment

(months) X2-age (years)

Слайд 52

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

Слайд 53

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
Y-weekly salary ($) X1 –length of employment

(months) X2-age (years)

Our regression equation with two predictors (X1, X2):

Слайд 54

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
These are our data points in

3dimensional space (graph drawn using Statistica 6.0)

Слайд 55

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
Data points with the regression surface

(Statistica 6.0)

Слайд 56

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
Data points with the regression surface

(Statistica 6.0) after rotation.

Слайд 57

There are times when a variable of interest in a regression cannot

possibly be considered quantitative. An example is the variable gender.
Although this variable may be considered important in predicting a quantitative dependent variable, it cannot be regarded as quantitative.

Dummy variables in econometric models

The best course of action in such case is to take separate samples of males and females and conduct two separate regression analyses.
The results for the males can be compared with the results for the females to see if the same predictor variables and the same regression coefficients results.

Слайд 58

If a large sample size is not possible, a dummy variable can

be employed to introduce qualitative variable into the analysis.

A DUMMY VARIABLE IN A REGRESSION ANALYSIS IS A QUALITATIVE OR CATEGORICAL VARIABLE THAT IS USED AS A PREDICTOR VARIABLE.

Слайд 59

For example, a male could be designated with the code 0 and

the female could be coded as 1.
Each person sampled could then be measured as either a 0 or a 1 for the variable gender, and this variable, along with the quantitative variables for the persons, could be entered into a multiple regression program and analyzed.

Слайд 60

Example 1
Returning to real-estate developer, we noticed that all the houses in

the population were from three neighborhoods, A, B, and C.

Слайд 61

Using these data, we can construct the necessary dummy variables and determine

whether they contribute significantly to the prediction of home size (Y).

One way to code neighborhoods would be to define:

Слайд 62

However, this type of coding has many problems. First, because 0 <

1< 2, the codes imply that neighborhood A is smaller then neighborhood B, which is smaller then neighborhood C. A better procedure is to use the necessary number of dummy variables to represent the neighborhood.

Слайд 63

To represent the three neighborhoods, we use two dummy variables, by letting

Слайд 64

What happened to neighborhood C? It is not necessary to develop a

third dummy variable.
IT IS VERY IMPORTANT
THAT YOU NOT INCLUDE IT!!
If you attempted to use three such dummy variables in your model, you would receive a message in your computer output informing you that no solution exists for this model.

Слайд 65

Why?
One predictor variable is a linear combination (including a constant term)

of one or more other predictors, then mathematically no solution exists for the least squares coefficients. To arrive at a usable equation, any such predictor variable must not be included. We don’t lose any information – this excluded category is the reference system. The coefficients are the measure of the categories included in comparison to this one excluded.

Слайд 66

The final array of data is

Слайд 67

· If family income increases 1000$ the average home size will increase

about 0,082 hundred of square feet (holding family size constant)

· If family size increases 1 person the average home size will increase about 3,27 hundred of square feet (holding family income constant)

Слайд 68

· The houses located in neighborhood A are 1,613 hundred of square

feet bigger then houses from neighborhood C.

· The houses located in neighborhood B are 0,9 hundred of square feet smaller then houses from neighborhood C.

Слайд 69

Example 2
Joanne Herr, an analyst for the Best Foods grocery chain,

wanted to know whether three stores have the same average dollar amount per purchase or not. Stores can be thought of a single qualitative variable set at 3 levels – A, B, and C.

Слайд 70

A model can be set up to predict the dollar amount per

purchase:

where
Y^- expected dollar amount per purchase

Слайд 71

The data
The variables X1 and X2 are dummy variables representing purchases in

store A or B, respectively.

Note that the three levels of the qualitative variable have been described with only two variables.

Слайд 72

The regression equation

Слайд 73

· the average dollar amount per purchase is for store A is 10,01$

higher comparing with store C

· the average dollar amount per purchase is for store B is 9,42$ higher comparing with store C

always compare to the excluded category!!

Simple Regression model

Содержание

This is an example plot of linear function: The nature of the

1YSIMPLE REGRESSION MODELSuppose that a variable Y is a linear function of

If the relationship were an exact one, the observations would lie on

P4In practice, most economic relationships are not exact and the actual values

P4To allow for such divergences, we will write the model as Y

P4Each value of Y thus has a nonrandom component, β0 + β1X,

P4In practice we can see only the P points.P3P2P1SIMPLE REGRESSION MODEL7Y

P4Obviously, we can use the P points to draw a line which

However, we have obtained data from only a random sample of the

P4The line is called the fitted model and the values of Y

P4The discrepancies between the actual and fitted values of Y are known

SIMPLE REGRESSION MODELLeast squares criterion:Minimize SSE (residual sum of squares), whereTo begin

SIMPLE REGRESSION MODELWhy the squares of the residuals? Why not just minimize

P4The answer is that you would get an apparently perfect fit by

P4You must prevent negative residuals from cancelling positive ones, and one way

SIMPLE REGRESSION MODELSince we are minimizingwhich has two unknowns, b0 and b1.

SIMPLE REGRESSION MODELFor the mathematically curious , I provide a condensed derivation

Since there are two equations with two unknown, we can solve these

SIMPLE REGRESSION MODELIn matrix notation OLS may be written as:Y = Xb

SIMPLE REGRESSION MODELWe can state as follows:How to inverse XTX?1. matrix determinant

SIMPLE REGRESSION MODELEXAMPLEIn this problem we were looking at the way home

SIMPLE REGRESSION MODEL

SIMPLE REGRESSION MODEL

SIMPLE REGRESSION MODEL

Let’s try another example:X – commercial time (minutes) Y – sales ($

REGRESSION MODEL WITH TWO EXPLANATORY VARIABLES

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESYX2X1β01This sequence provides a geometrical interpretation of

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESYX2X1β03The model has three dimensions, one each

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESYX2X1β04Literally the intercept gives weekly salary for

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES5YX2The next term on the right side

pure X2 effectMULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESX1β0β0 + β2X2YX26Similarly, the third

pure X2 effectpure X1 effectMULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESX1β0β0 + β2X2β0

pure X2 effectpure X1 effectMULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESX1β0β0 + β2X2β0

pure X2 effectpure X1 effectMULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESX1β0β0+ β1X1+ β2X2β0

pure X2 effectpure X1 effectMULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES10X1β0β0 + β1X1+

Slope coefficients are interpreted as partial slope/partial regression coefficients: ? b1 =

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESThe regression coefficients are derived using the

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESThe residual ei in observation i is

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESWe define SSE, the sum of the

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESFirst we expand SSE as shown, and

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESWe thus obtain three equations in three

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESThe expression for b0 is a straightforward

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESHowever, the expressions for the slope coefficients

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLESFor the general case when there are

In matrix notation OLS may be written as:Y = Xb + eThe

MATRIX ALGEBRA: SUMMARYA vector is a collection of n numbers or elements,

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLEData for weekly salary based upon

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLEY-weekly salary ($) X1 –length of employment

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLEY-weekly salary ($) X1 –length of employment

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLEThese are our data points in

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLEData points with the regression surface

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLEData points with the regression surface

There are times when a variable of interest in a regression cannot

If a large sample size is not possible, a dummy variable can

For example, a male could be designated with the code 0 and

Example 1Returning to real-estate developer, we noticed that all the houses in

Using these data, we can construct the necessary dummy variables and determine

However, this type of coding has many problems. First, because 0 <

To represent the three neighborhoods, we use two dummy variables, by letting

What happened to neighborhood C? It is not necessary to develop a

Why? One predictor variable is a linear combination (including a constant term)

The final array of data is

· If family income increases 1000$ the average home size will increase

· The houses located in neighborhood A are 1,613 hundred of square

Example 2 Joanne Herr, an analyst for the Best Foods grocery chain,

A model can be set up to predict the dollar amount per

The dataThe variables X1 and X2 are dummy variables representing purchases in

The regression equation

· the average dollar amount per purchase is for store A is 10,01$

Похожие презентации

This is an example plot of linear function:
The nature of the

1
Y
SIMPLE REGRESSION MODEL
Suppose that a variable Y is a linear function of

P4
In practice, most economic relationships are not exact and the actual values

P4
To allow for such divergences, we will write the model as Y

P4
Each value of Y thus has a nonrandom component, β0 + β1X,

P4
In practice we can see only the P points.
P3
P2
P1
SIMPLE REGRESSION MODEL
7
Y

P4
Obviously, we can use the P points to draw a line which

P4
The line is called the fitted model and the values of Y

P4
The discrepancies between the actual and fitted values of Y are known

SIMPLE REGRESSION MODEL
Least squares criterion:
Minimize SSE (residual sum of squares), where
To begin

SIMPLE REGRESSION MODEL
Why the squares of the residuals? Why not just minimize

P4
The answer is that you would get an apparently perfect fit by

P4
You must prevent negative residuals from cancelling positive ones, and one way

SIMPLE REGRESSION MODEL
Since we are minimizing
which has two unknowns, b0 and b1.

SIMPLE REGRESSION MODEL
For the mathematically curious , I provide a condensed derivation

SIMPLE REGRESSION MODEL
In matrix notation OLS may be written as:
Y = Xb

SIMPLE REGRESSION MODEL
We can state as follows:
How to inverse XTX?
1. matrix determinant

SIMPLE REGRESSION MODEL
EXAMPLE
In this problem we were looking at the way home

Let’s try another example:
X – commercial time (minutes) Y – sales ($

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
Y
X2
X1
β0
1
This sequence provides a geometrical interpretation of

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
Y
X2
X1
β0
3
The model has three dimensions, one each

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
Y
X2
X1
β0
4
Literally the intercept gives weekly salary for

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
5
Y
X2
The next term on the right side

pure X2 effect
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
X1
β0
β0 + β2X2
Y
X2
6
Similarly, the third

pure X2 effect
pure X1 effect
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
X1
β0
β0 + β2X2
β0

pure X2 effect
pure X1 effect
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
X1
β0
β0 + β2X2
β0

pure X2 effect
pure X1 effect
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
X1
β0
β0+ β1X1+ β2X2
β0

pure X2 effect
pure X1 effect
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
10
X1
β0
β0 + β1X1+

Slope coefficients are interpreted as partial slope/partial regression coefficients:
? b1 =

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
The regression coefficients are derived using the

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
The residual ei in observation i is

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
We define SSE, the sum of the

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
First we expand SSE as shown, and

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
We thus obtain three equations in three

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
The expression for b0 is a straightforward

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
However, the expressions for the slope coefficients

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES
For the general case when there are

In matrix notation OLS may be written as:
Y = Xb + e
The

MATRIX ALGEBRA: SUMMARY
A vector is a collection of n numbers or elements,

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
Data for weekly salary based upon

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
Y-weekly salary ($) X1 –length of employment

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
Y-weekly salary ($) X1 –length of employment

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
These are our data points in

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
Data points with the regression surface

MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE
Data points with the regression surface

Example 1
Returning to real-estate developer, we noticed that all the houses in

Why?
One predictor variable is a linear combination (including a constant term)

Example 2
Joanne Herr, an analyst for the Best Foods grocery chain,

The data
The variables X1 and X2 are dummy variables representing purchases in