The Bivariate Linear Model
This lab scripts introduces students to the basic linear model in its bivariate form. Rather than use real-world data, we’ll be using fake data we can create. Let this underscore that much of what we know about the linear model and what it tells us comes from simulation. Here, we can play God. We are not afforded that luxury in the real world, but the tools we use come from situations where we can cosplay as the Almighty with respect to a population we created.
R Packages/Data for This Session
You should’ve already installed the R packages for this lab session.
{tidyverse}
will be for all things workflow and that’s all we’ll be
using today.
library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.2 ✔ readr 2.1.4
#> ✔ forcats 1.0.0 ✔ stringr 1.5.0
#> ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
#> ✔ purrr 1.0.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
For those of you wanting the packages we’ll be using today without loading all of the {tidyverse}, run this. I won’t, but here are the component packages.
# library(dplyr) # the workhorse
# library(magrittr) # for the pipe
# library(tibble) # for tibbles
# library(ggplot2) # for plots
We’ll run one other command, because you may know how I feel by now
about {ggplot2}
’s default theme. I have {stevethemes}
and I’d really
like to use it here, but we want to stay strictly within {tidyverse}
today. So, we’ll do this.
theme_set(theme_minimal())
Cool beans. Let’s get going.
Creating Fake Data to Learn About the Linear Model
The lecture uses actual (toy) data from the World Bank to introduce you
to the linear model in the bivariate case. Now, I think it might be time
for you to get it in the completely “fabricated” sense. In this lab
script, you are going to create data that you will use to learn more
about what the linear model with its OLS estimator is telling you (by
way of lm()
output).
Take careful inventory of what’s going to happen here. We’re going to
set a reproducible seed, and then we’re going to create two new “random”
variables, called x
and e
from a normal distribution.
set.seed(8675309) # Jenny, I got your number...
tibble(x = rnorm(100),
e = rnorm(100)) -> Fake
Fake
#> # A tibble: 100 × 2
#> x e
#> <dbl> <dbl>
#> 1 -0.997 0.745
#> 2 0.722 -1.35
#> 3 -0.617 0.330
#> 4 2.03 -0.0127
#> 5 1.07 -0.464
#> 6 0.987 0.205
#> 7 0.0275 0.464
#> 8 0.673 -0.757
#> 9 0.572 -1.07
#> 10 0.904 -0.263
#> # ℹ 90 more rows
x
is going to be our independent variable, which is exogenously
created prior to an observed outcome. e
is going to be our error term,
which includes everything not included in the linear model. The presence
of an error term is an important assumption in OLS, and notice here that
the error term is created independently from the x variable. Yes,
they’re in the same data set, and yes they were created by the same
command. However, the generation of x and e are simultaneously
created, but completely independent from each other. If they were
correlated with each other, that would violate a major assumption about
endogeneity in the linear model (that we’ll belabor in another lab
script).
You can test that we successfully pulled this off with a correlation test.
cor.test(Fake$x, Fake$e)
#>
#> Pearson's product-moment correlation
#>
#> data: Fake$x and Fake$e
#> t = 0.63308, df = 98, p-value = 0.5282
#> alternative hypothesis: true correlation is not equal to 0
#> 95 percent confidence interval:
#> -0.1342806 0.2570170
#> sample estimates:
#> cor
#> 0.06382077
Notice that the correlation between x and e is about .06, but the 95% interval includes 0. The modest correlation we do observation is indeed due to random chance.
Now, watch this. We’re going to create a new variable, y, that is a linear function of x and e, plus some baseline value (intercept) when x is 0.
Fake %>%
mutate(y = 5 + x + e) -> Fake
Fake
#> # A tibble: 100 × 3
#> x e y
#> <dbl> <dbl> <dbl>
#> 1 -0.997 0.745 4.75
#> 2 0.722 -1.35 4.38
#> 3 -0.617 0.330 4.71
#> 4 2.03 -0.0127 7.02
#> 5 1.07 -0.464 5.60
#> 6 0.987 0.205 6.19
#> 7 0.0275 0.464 5.49
#> 8 0.673 -0.757 4.92
#> 9 0.572 -1.07 4.51
#> 10 0.904 -0.263 5.64
#> # ℹ 90 more rows
Here’s how I want you to think about what you’re going to do here. By simulating data to evaluate a model, you’re playing god to see how well the model can capture the “true” parameters you’re generating. The “true” value of y is 5 when x is 0. The “true” effect of x on y is to increase y by 1 for every one-unit increase in x. The error term here has equal footing to x, but recall that the error term (e) includes everything else that influences y but is not included in the model. It’s ultimately going to introduce noise into how well we can capture the other true parameters.
The data are fake, sure, but the process of simulating so-called “random variables” (drawn from a normal distribution with pre-defined parameters) is useful for illustrating the properties of a statistical model like this. It’ll also help to show what these data “look like.” Here, for example, is the distribution of x. When you have granularity in a variable like this (with so many trailing decimals), it’s useful to jettison a bar chart or histogram (which has somewhat arbitrary bin width) in favor of the density plot. The density plot may not have the same easy takeaway as to what’s happening on the y-axis, but it’s arguably more useful for getting the basic shape of a distribution.
There are two ways of doing this. Here’s the base R way.
plot(density(Fake$x))
Here, we see the data look nice and “normal” with a clear central tendency and a shape approximating a true bell curve.
Here’s the prettier ggplot way. ggplot stuff (by way of {ggplot2}
)
comes default in {tidyverse}
. Once you load {tidyverse}
, you load
all the {ggplot2}
functionality you could want.
Fake %>%
ggplot(.,aes(x)) +
geom_density()
ggplot()
’s defaults want to draw attention to the fact that there’s
more a left skew than a right skew. In other words, the maximum of x
is ~2.02 whereas the minimum is about -2.59, but the shape of the data
still look normal. Any irregularities in shape could be attributed to
the fact we only drew 100 observations, which is fairly small set of
observations (all things considered).
We can do the same thing for the outcome variable y.
plot(density(Fake$y))
Also, in {ggplot2}
format…
Fake %>%
ggplot(.,aes(y)) +
geom_density()
It’s worth reiterating the y is “endogenously” created, and the only endogenous part of our data set. It is an outcome of two exogenous variables (x, which we can measure, and e, which serves as a plug-in for background noise in the underlying population).
Can OLS capture these known attributes about the data? We’ll use the
lm()
function in R to find out. lm()
is the base R function for
estimating linear models. It’s syntax is simple, especially in a simple
case like this. The two basic things you’ll want to feed it are the
formula, separated by a comma, then the name of the data set for which
you want to regress your dependent variable on your independent
variable(s). Here, we’re doing a super simple regression of y on x.
Then, we’ll get a summary of the model with summary()
.
M1 <- lm(y ~ x, data=Fake)
summary(M1)
#>
#> Call:
#> lm(formula = y ~ x, data = Fake)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.5346 -0.7053 0.1223 0.7281 1.9631
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4.94563 0.09956 49.677 <2e-16 ***
#> x 1.06807 0.10752 9.934 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.994 on 98 degrees of freedom
#> Multiple R-squared: 0.5017, Adjusted R-squared: 0.4967
#> F-statistic: 98.69 on 1 and 98 DF, p-value: < 2.2e-16
Our question was whether OLS can capture the known population parameters
that we created. The answer is yes. Yes, it can. In a simple bivariate
case, you can see what this would look like visually with a scatterplot.
I want you to see what this is doing. First, we’re going to use our data
and, with the pipe operator (%>%
), we’re going to pass the data to a
ggplot()
function with a simple aesthetic (aes()
) argument creating
an x-axis for x
and y
axis for y
. Then, using +
(which is the
{ggplot2}
continuation operator), we’re going to create a scatterplot
with geom_point()
. Then, using another continuation operator (+
),
we’re going to use geom_smooth()
to create a line of best fit for the
data. Importantly, you have to specify you want a linear smoother
(method = "lm"
) to communicate what a simple bivariate case is doing.
The default is method = "loess"
, which has its own nice features but
it’s not what we want here. Finally, we’re going to create two lines
over the plot. The first uses geom_vline(xintercept = mean(Fake$x))
to
add a vertical line through the mean of x
. The second uses
geom_vline(yintercept = mean(Fake$y))
to create a horizontal line
through the mean of y
. Notice that the line of best fit runs through
the intersection of both them.
Fake %>%
ggplot(.,aes(x, y)) +
geom_point() +
geom_smooth(method = "lm") +
geom_vline(xintercept = mean(Fake$x)) +
geom_hline(yintercept = mean(Fake$y))
#> `geom_smooth()` using formula = 'y ~ x'
How to Evaluate Your Linear Model Output
This graph is great for visualizing a super simple bivariate case, but things are never so simple. So, let’s walk through a little bit of the important output from the model on its own terms. Let’s start with the “coefficients”, since they are the main reason you’re doing this. We mainly wanted to know if we could capture the “true” effect of x on y. The “true” effect is 1. The OLS estimator says it’s about 1.06. That’s technically wrong, sure, but the coefficient is what it is because it’s also capturing the correlation between x and the error term e, which we don’t get to include in the model. So, OLS says it’s 1.06, which comes with a standard error of about .10. A standard error below the estimate is about .961. A standard error above the estimate is about 1.175.
Put another way, given estimation and fundamental uncertainty about the model, our estimated effect of x on y is within a standard error of what we know it actually is. We captured the true population parameter of interest to us.
The other thing you want to know from your regression model is whether the effect of your independent variable of interest on the dependent variable is “statistically significant.” This term is very misleading, to be frank. About 10 years ago, it appeared on this list of scientific ideas that scientists wish you’d stop misusing. Briefly:
“Statistically significant” is one of those phrases scientists would love to have a chance to take back and rename. “Significant” suggests importance; but the test of statistical significance, developed by the British statistician R.A. Fisher, doesn’t measure the importance or size of an effect; only whether we are able to distinguish it, using our keenest statistical tools, from zero. “Statistically noticeable” or “Statistically discernible” would be much better.
Notice what this means for what you’re doing. You’re not testing, necessarily, whether you’ve identified the “true” effect when you evaluate statistical significance. You’re testing whether you could rule out something else: 0. That’s at least what gets returned to you here, and the t-statistic that gets reported is conveniently just the coefficient over the standard error. When the absolute value of that t-statistic is large, and its associated p-value falls below some threshold of interest to you, you can reject the null hypothesis and assert that what you observed is highly improbable if the true effect is 0 with precision. That’s indeed the case here. The true effect is 1. We observed 1.06. The t-statistic associated with that coefficient and standard error is about 9.9. That’s comically large, given the distribution of Student’s t for 98 degrees of freedom. The probability of observing what we got is basically 0 if the true effect were 0 (and it’s not, because it’s 1). So, you can reject the null hypothesis and assert that what you got is closer to what it truly is, assuming repeated sampling and fixed population parameters. The first part of that may not apply here, but the second one does. We created the data.
What About the Intercept?
You don’t evaluate “statistical significance” in the same way for the y-intercept. There is no “null hypothesis” to note in the context of the intercept. It’s just an estimate of y when x is 0. Here, we are adequately capturing it’s true value (i.e. it’s actually 5, but we got 4.94). Dividing that estimated intercept over its standard error returns a ridiculously large t-statistic that means you can comfortably rule out 0. Then again, you weren’t testing that it’s 0. It’s just some constant baseline for y.
Here’s another way of illustrating this. Let’s create another variable, y2. It has this formula.
# I could alternatively write this as just `y2 = x + e`.
Fake %>%
mutate(y2 = 0 + x + e) -> Fake
Fake
#> # A tibble: 100 × 4
#> x e y y2
#> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.997 0.745 4.75 -0.252
#> 2 0.722 -1.35 4.38 -0.625
#> 3 -0.617 0.330 4.71 -0.287
#> 4 2.03 -0.0127 7.02 2.02
#> 5 1.07 -0.464 5.60 0.602
#> 6 0.987 0.205 6.19 1.19
#> 7 0.0275 0.464 5.49 0.491
#> 8 0.673 -0.757 4.92 -0.0838
#> 9 0.572 -1.07 4.51 -0.493
#> 10 0.904 -0.263 5.64 0.641
#> # ℹ 90 more rows
Let’s run another regression.
M2 <- lm(y2 ~ x, Fake)
summary(M2)
#>
#> Call:
#> lm(formula = y2 ~ x, data = Fake)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.5346 -0.7053 0.1223 0.7281 1.9631
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -0.05437 0.09956 -0.546 0.586
#> x 1.06807 0.10752 9.934 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.994 on 98 degrees of freedom
#> Multiple R-squared: 0.5017, Adjusted R-squared: 0.4967
#> F-statistic: 98.69 on 1 and 98 DF, p-value: < 2.2e-16
Notice x is unaffected, but now the y-intercept is “statistically insignificant?” Except that it’s not, because that’s not how you should evaluate the y-intercept. The statistics you get back just tell you the estimated value of y when x is 0 could be 0 (because that’s what it is).
This is often why you’ll read regression tables that just outright omit the y-intercept and take no effort to model them. I’ll only retort that you may encounter or want to learn more complicated models, so called “mixed (effects) models”, that include random intercepts for different grouping clusters in a data set. For those models to converge, they’ll often need a so-called “global” intercept in order to know against what baseline a group may vary. This is why I encourage students to get comfortable with scaling inputs into the model in order to create more meaningful y-intercepts that, with real world data, might tell the reader (and yourself) something about the typical value of y.
Indeed, that’s what the intercept is communicating here, plus-or-minus error.
Fake %>%
summarize(mean_y = mean(y),
mean_y2 = mean(y2)) %>% data.frame
#> mean_y mean_y2
#> 1 5.001493 0.001493478
Other Things You Should Know About Your OLS Model
I’ll go into the other things you should know about the OLS model, based loosely on the order of importance of them.
Fitted Values and Residuals
The OLS model draws a line of best fit through the data and that line of best fit is the one that minimizes the sum of squared prediction errors. This means there are estimates for a particular data point, given this line of “best fit”, and invariably an “error” (which is the distance between the estimated value of y, and what the “true” value of y is).
In R speak, these are the fitted values and the residuals. The fitted values are the estimated values of y for each data point (given the regressors). You may also call these the y-hats. The residuals are the distances between what was fitted and what the observation really is. These are the estimates of e from the data-generating process.
You can extract them both pretty easily in R. Observe, for M1
.
Fake %>%
mutate(fitted = fitted(M1),
resid = resid(M1)) -> Fake
Fake
#> # A tibble: 100 × 6
#> x e y y2 fitted resid
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 -0.997 0.745 4.75 -0.252 3.88 0.867
#> 2 0.722 -1.35 4.38 -0.625 5.72 -1.34
#> 3 -0.617 0.330 4.71 -0.287 4.29 0.427
#> 4 2.03 -0.0127 7.02 2.02 7.11 -0.0965
#> 5 1.07 -0.464 5.60 0.602 6.08 -0.482
#> 6 0.987 0.205 6.19 1.19 6.00 0.192
#> 7 0.0275 0.464 5.49 0.491 4.97 0.516
#> 8 0.673 -0.757 4.92 -0.0838 5.66 -0.748
#> 9 0.572 -1.07 4.51 -0.493 5.56 -1.05
#> 10 0.904 -0.263 5.64 0.641 5.91 -0.270
#> # ℹ 90 more rows
Here’s a way to wrap your head around what this is telling you. For that
first row, the value of x is about -.997. The value of y is 4.75.
Given the line that minimizes the sum of squared prediction errors, the
coefficient for a one-unit change of x on y is about 1.06. The
y-intercept we estimate to be about 4.94.
4.94 + 1.06*(-.997) = about 3.88
. That’s the fitted value of y. In
actuality, it’s about 4.75. 4.75 - 3.88 (alternatively: y - y-hat
for
that observation) is about .867. That’s the error, or the residual you
see there. Do note there’s rounding error here for presentation, but
hopefully the principle is clear.
Of note, I don’t really know why R’s default summary()
option includes
that kind of summary of the residuals first, but that’s the top thing
you see. Observe:
tibble(x = resid(M1)) %>%
summarize(min = min(x),
q1 = quantile(x, .25),
median = median(x),
q3 = quantile(x, .75),
max = max(x)) %>% data.frame
#> min q1 median q3 max
#> 1 -2.5346 -0.7053101 0.1223389 0.7280966 1.963144
summary(M1)
#>
#> Call:
#> lm(formula = y ~ x, data = Fake)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -2.5346 -0.7053 0.1223 0.7281 1.9631
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 4.94563 0.09956 49.677 <2e-16 ***
#> x 1.06807 0.10752 9.934 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.994 on 98 degrees of freedom
#> Multiple R-squared: 0.5017, Adjusted R-squared: 0.4967
#> F-statistic: 98.69 on 1 and 98 DF, p-value: < 2.2e-16
Keep your fitted values and residuals handy, as they’ll be very useful
(really: necessary) information for model diagnostics. Another good way
of extracting them with the augment()
function in the {broom}
package that is installed when you install {tidyverse}
.
broom::augment(M1)
#> # A tibble: 100 × 8
#> y x .fitted .resid .hat .sigma .cooksd .std.resid
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4.75 -0.997 3.88 0.867 0.0229 0.995 0.00911 0.882
#> 2 4.38 0.722 5.72 -1.34 0.0152 0.990 0.0143 -1.36
#> 3 4.71 -0.617 4.29 0.427 0.0152 0.998 0.00145 0.432
#> 4 7.02 2.03 7.11 -0.0965 0.0557 0.999 0.000295 -0.0999
#> 5 5.60 1.07 6.08 -0.482 0.0220 0.998 0.00270 -0.490
#> 6 6.19 0.987 6.00 0.192 0.0202 0.999 0.000394 0.195
#> 7 5.49 0.0275 4.97 0.516 0.0100 0.998 0.00138 0.522
#> 8 4.92 0.673 5.66 -0.748 0.0145 0.996 0.00423 -0.758
#> 9 4.51 0.572 5.56 -1.05 0.0132 0.993 0.00754 -1.06
#> 10 5.64 0.904 5.91 -0.270 0.0185 0.999 0.000708 -0.274
#> # ℹ 90 more rows
You obviously get other output with this function, which will come in handy when we delve into model diagnostics in earnest.
R-Squared
OLS textbooks sometimes encourage new students to read too much into R-squared. Minimally, it tells you how well the model fits the data. It’s bound between 0 and 1, and higher values = better fit. However, ask yourself what primarily is the purpose of the regression model. Is it to identify a causal effect? If so, those minimal OLS models regressing some outcome y on a treatment will typically have very low R-squareds even if the causal effect is identified. Is the goal to fit the data? Then higher R-squared is obviously very nice but saturating the model with regressors for the sake of juicing an R-squared may come with biasing coefficients (making it more difficult to identify signals from the din).
I do want to draw your attention to a few things. For one, in the simple bivariate case, R-squared is quite literally Pearson’s r, squared. Observe.
Fake %>% summarize(cor = cor(x, y),
cor2 = cor^2) -> correlations
correlations
#> # A tibble: 1 × 2
#> cor cor2
#> <dbl> <dbl>
#> 1 0.708 0.502
correlations %>% pull(2)
#> [1] 0.5017422
summary(M1)$r.squared
#> [1] 0.5017422
Second, you’ll see something called an adjusted R-squared. Simply, think of this as a kind of penalty that downweights (sic?) the normal (“multiple”) R-squared for the presence of additional parameters. It’s why it’ll always want to be smaller than the other R-squared metric you see. Here’s the hard way of calculating what comes default in R.
(1 - ((1-summary(M1)$r.squared)*(nobs(M1)-1)/(nobs(M1)-length(M1$coefficients))))
#> [1] 0.496658
In plain English: adjusted R-squared = 1 - (1 - R^2)*(n - 1)/n - k
. In
this case, the difference is small, but the adjusted R-squared is
telling you have more than one parameter here (i.e. the intercept).
Long story short: a high R-squared is nice, but it’s not necessary for identifying a causal effect of x on y. There’s no agreement on what a “good R-squared” looks like.
Residual Standard Error
The residual standard error is the standard error of the residuals. It’s not too hard to calculate in a simple case like this. Take the sum of squared residuals and square those. Divide it over the degrees of freedom in the model. There’s your residual standard error. You’ll see the sum of squared residuals often referred to as the deviance of the model.
sqrt(deviance(M1)/df.residual(M1))
#> [1] 0.9939681
sqrt(sum(resid(M1)^2)/98)
#> [1] 0.9939681
There’s more you can do with that deviance. For example, does this look familiar?
1 - (sum(resid(M1)^2)/(sum((Fake$y - mean(Fake$y))^2)))
#> [1] 0.5017422
It’s R-squared, right? You can also calculate R-squared as the deviance
over the total sum of squares. You might see that formula as
1 - RSS/TSS
where the deviance is also known as the residual sum of
squares. You can also calculate R^2, more generally, as the sum of
fitted squares (so-called “Explained Squares”) over the total sum of
squares.
(sum((M1$fitted.values - mean(Fake$y))^2))/(sum((Fake$y - mean(Fake$y))^2))
#> [1] 0.5017422
F-statistic
Simply, you’ll never use this. The F-statistic tests against a null hypothesis that the model you estimate performs just as well as the model that guesses just the mean of y. The moment you know anything about your data-generating process is the moment you can start to safely ignore this statistic. In our case, the F-statistic of 98.69 against 98 degrees of freedom comes with a p-value that is basically 0. This model is better than an intercept-only model.
Want to fail this statistic? Do this.
Fake %>%
mutate(noise = rnorm(100)) -> Fake
M3 <- lm(y ~ noise, data=Fake)
summary(M3)
#>
#> Call:
#> lm(formula = y ~ noise, data = Fake)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.2105 -0.6361 -0.1454 0.8449 3.5082
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 5.00114 0.14062 35.565 <2e-16 ***
#> noise -0.07235 0.13833 -0.523 0.602
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 1.406 on 98 degrees of freedom
#> Multiple R-squared: 0.002783, Adjusted R-squared: -0.007392
#> F-statistic: 0.2735 on 1 and 98 DF, p-value: 0.6022
See that p-value with that F statistic? I can’t say that this model performs any better than an intercept only model. Well, duh.
Conclusion
For now, take stock of the following things:
- formula:
lm()
is the function you want. You regress whatever your DV is on (~
) whatever your independent variable (or variables) is called. If you have multiple IVs, it’ll be something likey ~ x1 + x2
. Tell the formula your data come from whatever your data object is called. - coefficient: the coefficient is the rise over run from the slope intercept equation, in as many words. A one-unit change in x coincides with an estimated change of [INSERT COEFFICIENT VALUE HERE] in y.
- statistical significance: you’re assessing the probability of what you observe (with the standard error) if the true effect is zero. Assess statistical significance by whatever evidentiary threshold you choose, but the convention is .05. Don’t bother interpreting “significance” for the intercept.
- other model parameters: R-square is nice when it’s high but it’s not necessary for identifying causal effects. Grabbing the fitted values and residuals from the model will be useful for important diagnostics.