Seems like a generic selection-on-observables design (explanation). But they don’t seem to have any discussion of what those observables actually are or what drives propensity to play violent video games. In short: I wouldn’t put much stock in it.
>Mutta tuo linkkaamasi tutkimus ei kyllä sano etteikö asumistuki vaikuttaisi vuokriin, vaan että tietyn alueen sisällä tarkasteltuna ihmiset eivät muuta korkeamman vuokran alueelle saadakseen enemmän asumistukea.
En väittänyt, että asumistuki ei vaikuta vuokratasoon. Varmasti vaikuttaa ja vaikutus on positiivinen, mutta valitettavasti sitä vaikutusta on mahdoton estimoida.
Jos keksit tavan sen estimointiin, sillä varmaan pokaat jo tenuroidun professuurin ulkomailta itsellesi.
Tutkimusasetelma on käytännössä RDD, joten tämä vaikutus itseasiassa on arvioitu asumistuen kausaalivaikutus marginaalilla. Lisää aiheesta tässä klassikossa.
>Tutkimus on tämän kysymyksen kannalta merkityksetön koska vertailututkimusta jossa asumistukea ei ole koskaan ollut olemassa ei ole saatavilla.
Ei ole. Tutkimus antaa estimaatin marginaalivaikutukselle. Käytännössä näiden tulosten perusteella voidaan sanoa, ettei asumistuen leikkaaminen todennäköisesti laske vuokratasoa, jos leikkaus ei ole "liian suuri".
Suosittelen petraamaan vähän tutkimuskirjallisuuden tulkintaa. Eerolan ja Lyytikäinen bloggauksia löytyy VATT:in sivuilta.
Also a really good companion book to any econometrics text -- it's been my reference throughout my PhD program and it's better than google most of the time :)
So then how do you explain the correlation between belly button position and running speed? If there's a statistically significant relationship, there's usually an explanation for it. I noticed all of your explanations for why "the article is BS" don't take into account the relationship between belly button position and running speed.
My favorite quote on the matter is from this book:
>Less thoughtful observers fall back on the truism that "correlation is not causality." Like most people who work with data for a living, we believe that correlation can sometimes provide pretty good evidence of a causal relation, even when the variable of interest has not been manipulated by a researcher or experimenter.
I haven't looked at their research (and I probably won't), but if the authors separated the racial groups to avoid lurking variables-- which I assume (hope?) they did-- and used a large enough sample size, then have fun explaining the relationship away with "but correlation doesn't imply causation!!!"
As it stands, it seems belly button position indicates leg and torso length. You could say that "technically" it's not belly button position but leg and torso length. But assuming the authors performed their research correctly, then that's about as unnecessarily nitpicky as you can get. But you can't just dismiss the entire relationship away.
As an addendum to this, I would recommend Mostly Harmless Econometrics
But, +1 for having such concise answers.
some suggestions:
Ben Lambert's channel has lectures on econometrics, at an undergrad level, and a graduate level, as well as some linear algebra and math stat review. This can help with a broad overview and intuition.
as for books, Mostly Harmless Econometrics, is a good look into how modern applied microeconometrics is typically thought about and conducted. if the material is a little too much on the technical side(i.e. you don't have mathematical statistics /probability theory under your belt) they have a book Mastering Metrics you can look at as well.
Additionally, when I took masters econometrics courses in undergrad we consulted Microeconometrics Using Stata . perhaps you can get a cheaper version on Amazon, but I'm linking it from Stata itself. It covers quite a bit of metrics and implementation on Stata- so it is a more practical book and reference as you start trying to apply these concepts to actual data and research.
Basically the last two nobel prizes in economics (Kremer, Duflo, Banerjee and now Card, Imbens, Angrist) have gone to Causal Inference. Go Causal Inference!!
If interested in their line of work, Angrist probably has the most practical books in Mostly Harmless Econometrics and Mastering Metrics. Imbens has a good reference in Causal Inference for Statistics, Social and Biomedical Sciences. To learn of its application to poverty reduction, Poor Economics.
This is all from the social science/economics side of applied causal methods which isn't exactly the same approach as the CS approach. They are more or less equivalent and the CS (Pearl) approach feels methodologically cleaner and more consistent but fwiw, the economists have done a better job actually producing this applied causal inference research.
Except in some special circumstances, you can almost never assert with complete certainty that one factor causes another in a statistical analysis. In statistics, the basic issue is known as "the fundamental problem of causal inference" (Holland 1986). In plain language, the fundamental problem refers to the fact that we only observe the world as it is; we cannot observe how the world would be under different circumstances (i.e. the "counterfactual" world).
For example, consider the question of whether taking a painkiller medication will ease your headache. If you take a painkiller, and your headache goes away, you might be tempted to say that the painkiller caused your headache to go away. However, you only observe yourself under the condition of having taken the painkiller. You do not observe yourself under the condition of not having taken the painkiller (i.e. the counterfactual scenario). It could be that your headache would have gone away on its own, but you will never know. For any given cause (or "treatment"), you only ever observe one of several potential outcomes. This way of thinking is called the "potential outcomes" framework in statistics.
To know whether the painkiller really caused your headache to go away, we would need to observe both a world in which you took the painkiller and a world in which you did not. But we can never observe both the actual world and the counterfactual world simultaneously. Fortunately, we have come up with clever ways to simulate counterfactual scenarios. The most basic is the randomized control trial. Here, we assume that if we have two identical groups of people with headaches, and we give half of them a painkiller (the treatment group) and half of them no painkiller (the control group), then any difference in headache outcomes between the two groups must have been caused by the painkiller. We make sure the groups are identical through randomization, but that's another topic for another time.
The randomized control trial is considered the "gold standard" for causal inference because it does a very good job simulating a counterfactual world. Assuming everything else is held equal, there is no difference between the treatment group and the control group except their exposure to the treatment, so any difference in observed outcomes has to be attributable to the treatment. Now, this isn't the same as complete certainty that the treatment had a causal effect, because we haven't resolved the fundamental problem of causal inference. However, we have mitigated the problem to an extent, and we can use statistical tests to estimate our degree of certainty (or uncertainty).
There are a number of statistical methods and models that build on the basic logic of randomized control trials, which are variously called "quasi-experimental" or "causal inference" methods. You can read about them in Counterfactuals and Causal Inference by Morgan and Winship, Mostly Harmless Econometrics by Joshua Angrist, or this encyclopedia article by Jennifer Hill and Elizabeth Stewart. In general, all of these methods are trying to do the same thing: control for any variables (observed or unobserved) that might somehow be correlated with a (hypothesized) causal factor and the outcome we are interested in. Once we are confident that we have excluded all potential confounders, we infer that any remaining correlation between the hypothesized causal factor and the outcome is, indeed, causal. But it is still an inference, made under strong assumptions.
tl;dr: You are right that most statistical tests only measure correlation. Our ability to infer that a correlation between an X variable and a Y variable is due to a causal relationship (X caused Y) depends on good research design, careful analysis, and certain assumptions. Statistics by themselves cannot prove causation, but our application and interpretation of statistics can make us more-or-less confident that a correlation is causal.
I can't speak for epi, but no one in economics would ever use stepwise regression for inference. They would be laughed out of the room. If you're interested in some of the methods used in economics, this is a good primer aimed at an undergraduate audience (PDF). If you're at a graduate level, Mostly Harmless Econometrics provides a good overview of causal inference in economics.
TMLE seems super interesting! But it's still conditional on normal causal assumptions. Fundamentally, I don't think that prediction answers the questions that I would want to ask.
As a motivating example, if I'm interested in something like "how much does a college education affect wages?" prediction is not particularly helpful. The decision to attend college is affected by observable characteristics (like parental income, where you went to high school, etc.) but also unobservable ones like ability, enjoyment of schooling, and what careers interest you. Under this example, even if you can predict with 100% accuracy someones income, you will never recover how much college matters because of the unobservable characteristics and selection bias, so TMLE and Super Learner are not very helpful.
To get around this, an economist might look at people who were right at and right below an SAT cutoff for college (link is to a paper that does that). There you just have to assume that someone who got a 999 and didn't get into college is basically identical to someone who got a 1000, and under that assumption (plus a few more) the causal effect is well defined.
I don't understand how my example of spurious correlation among randomly generated numbers doesn't already meet that burden. That's a data generating process that is not causal by design but produces your preferred observed signal.
Your additions of "repeated", "different times" and "different places" only reduce likelihood of finding a set with your preferred signal (or similarly require checking more pairs). There's literally a cottage industry around finding these funny noncausal relationships http://tylervigen.com/page?page=1
If you're imagining something more elaborate about what it means to move "reliably" together, Mostly Harmless Econometrics walks through how every single thing you might be thinking of is really just trying to get back to Rubin style randomized treatment assignment https://www.amazon.com/Mostly-Harmless-Econometrics-Empiricists-Companion/dp/0691120358