Porsche Price
Porsche Price
y=f(x)+ϵ
y=f(x)+ϵ
y=f(x)+ϵ
y=f(x)+ϵ
y=f(x)+ϵ
function: a function is the relationship between a set of inputs to a set of outputs
function: a function is the relationship between a set of inputs to a set of outputs
function: a function is the relationship between a set of inputs to a set of outputs
What function do you think we are using get the mean value of y with simple linear regression?
What function do you think we are using get the mean value of y with simple linear regression?
What is the equation that represents this line?
What is β0?
What is β0?
What is β1?
What is β1?
y=β0+β1x+ϵ
y=β0+β1x+ϵ
If we had the whole population of sparrows we could quantify the exact relationship between y and x
^y=^β0+^β1x
^y=^β0+^β1x
In reality, we have a sample of sparrows to estimate the relationship between x and y. The "hats" represent that these are estimated (fitted) values
How can you tell the difference between a parameter that is from the whole population versus a sample?
library(Stat2Data)data(Sparrows)lm(Weight ~ WingLength, data = Sparrows)
## ## Call:## lm(formula = Weight ~ WingLength, data = Sparrows)## ## Coefficients:## (Intercept) WingLength ## 1.3655 0.4674
What is ^β0?
lm(Weight ~ WingLength, data = Sparrows)
## ## Call:## lm(formula = Weight ~ WingLength, data = Sparrows)## ## Coefficients:## (Intercept) WingLength ## 1.3655 0.4674
What is ^β1?
lm(Weight ~ WingLength, data = Sparrows)
## ## Call:## lm(formula = Weight ~ WingLength, data = Sparrows)## ## Coefficients:## (Intercept) WingLength ## 1.3655 0.4674
y_hat <- lm(Weight ~ WingLength, data = Sparrows) %>% predict()
y_hat <- lm(Weight ~ WingLength, data = Sparrows) %>% predict() Sparrows %>% mutate(y_hat = y_hat) %>% select(WingLength, y_hat) %>% slice(1:5)
## WingLength y_hat## 1 29 14.92020## 2 31 15.85501## 3 25 13.05059## 4 29 14.92020## 5 30 15.38761
## ## Call:## lm(formula = Weight ~ WingLength, data = Sparrows)## ## Coefficients:## (Intercept) WingLength ## 1.3655 0.4674
## WingLength y_hat## 1 29 14.92020## 2 31 15.85501## 3 25 13.05059## 4 29 14.92020## 5 30 15.38761
lm(Weight ~ WingLength, data = Sparrows)
## ## Call:## lm(formula = Weight ~ WingLength, data = Sparrows)## ## Coefficients:## (Intercept) WingLength ## 1.3655 0.4674
lm(Weight ~ WingLength, data = Sparrows)
## ## Call:## lm(formula = Weight ~ WingLength, data = Sparrows)## ## Coefficients:## (Intercept) WingLength ## 1.3655 0.4674
y_hat <- lm(Weight ~ WingLength, data = Sparrows) %>% predict() Sparrows %>% mutate(y_hat = y_hat) %>% select(WingLength, y_hat) %>% slice(1:5)
## WingLength y_hat## 1 29 14.92020## 2 31 15.85501## 3 25 13.05059## 4 29 14.92020## 5 30 15.38761
How did we decide on THIS line?
ei=yi−^yi
e1=y1−^y1
−0.02=14.9−14.92
y_hat <- lm(Weight ~ WingLength, data = Sparrows) %>% predict() Sparrows %>% mutate(y_hat = y_hat, residual = Weight - y_hat) %>% select(Weight, y_hat, residual) %>% slice(1:5)
## Weight y_hat residual## 1 14.9 14.92020 -0.02020496## 2 15.0 15.85501 -0.85501292## 3 14.3 13.05059 1.24941095## 4 17.0 14.92020 2.07979504## 5 16.0 15.38761 0.61239106
y_hat <- lm(Weight ~ WingLength, data = Sparrows) %>% predict() Sparrows %>% mutate(y_hat = y_hat, residual = Weight - y_hat, residual_2 = residual^2) %>% select(Weight, y_hat, residual, residual_2) %>% slice(1:5)
## Weight y_hat residual residual_2## 1 14.9 14.92020 -0.02020496 0.0004082405## 2 15.0 15.85501 -0.85501292 0.7310470869## 3 14.3 13.05059 1.24941095 1.5610277150## 4 17.0 14.92020 2.07979504 4.3255474012## 5 16.0 15.38761 0.61239106 0.3750228116
y_hat <- lm(Weight ~ WingLength, data = Sparrows) %>% predict() Sparrows %>% mutate(y_hat = y_hat, residual = Weight - y_hat, residual_2 = residual^2) %>% summarise(sse = sum(residual_2))
## sse## 1 223.3107
y_hat <- lm(Weight ~ WingLength, data = Sparrows) %>% predict() Sparrows %>% mutate(y_hat = y_hat, residual = Weight - y_hat, residual_2 = residual^2) %>% summarise(sse = sum(residual_2), n = n(), rse = sqrt(sse / (n - 2)))
## sse n rse## 1 223.3107 116 1.399595
## sse n rse## 1 223.3107 116 1.399595
lm(Weight ~ WingLength, data = Sparrows) %>% summary()
## ## Call:## lm(formula = Weight ~ WingLength, data = Sparrows)## ## Residuals:## Min 1Q Median 3Q Max ## -3.5440 -0.9935 0.0809 1.0559 3.4168 ## ## Coefficients:## Estimate Std. Error t value Pr(>|t|)## (Intercept) 1.36549 0.95731 1.426 0.156## WingLength 0.46740 0.03472 13.463 <2e-16## ## Residual standard error: 1.4 on 114 degrees of freedom## Multiple R-squared: 0.6139, Adjusted R-squared: 0.6105 ## F-statistic: 181.3 on 1 and 114 DF, p-value: < 2.2e-16
Porsche Price
Porsche Price
eval
chunk option to TRUE
and knitThe overall relationship between the variables has a linear pattern. The average values of the response y for each value of x fall on a common straight line.
The error distribution is centered at zero. This means that the points are scattered at random above and below the line. (Note: By using least squares regression, we force the residual mean to be zero. Other techniques would not necessarily satisfy this condition.)
The variability in the errors is the same for all values of the predictor variable. This means that the spread of points around the line remains fairly constant.
The errors are assumed to be independent from one another. Thus, one point falling above or below the line has no influence on the location of another point. When we are interested in using the model to make formal inferences (conducting hypothesis tests or providing confidence intervals), additional assumptions are needed.
The data are obtained using a random process. Most commonly, this arises either from random sampling from a population of interest or from the use of randomization in a statistical experiment.
In order to use standard distributions for confidence intervals and hypothesis tests, we often need to assume that the random errors follow a normal distribution.
For a quantitative response variable y and a single quantitative explanatory variable x, the simple linear regression model is
y=β0+β1x+ϵ
where ϵ follows a normal distribution, that is, ϵ∼N(0,σϵ), and the errors are independent from one another.
library(tidyverse)
lm
, and turns them into tidy data frames.library(broom)
## Warning: package 'broom' was built under R version 3.5.2
Porsche Price
Porsche Price
Porsche Price
Porsche Price
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |