How to calculate and use predicted Y-values in multiple regression

I was planning to publish this article after my paper on the black-white vocabulary gap in the GSS is released, but I have changed my mind. So, here, I will explain how to use the so-called “Yhat” or predicted values of Y when doing regression (OLS, logistic and multilevel).
Continue reading


The Fallacy of Significance Tests

It must be known that a p-value, or any other statistics based on the Chi-Square, is not a useful number. It has two components : sample size and effect size. Its ability to detect a non-zero difference increases when either sample size or effect size increases. If only sample size increases, even with the other left constant, the statistics become inflated. There is also a problem with the assumption. If it is about the detection of “non-zero” difference, it is of no use if the magnitude, i.e., effect size, is of no importance. I will provide several examples of the dangerosity of the significance tests.
Continue reading

Multiple Regression, Multiple Fallacies

It goes without saying that multiple regression is one of most popular and applied statistical methods. Thus, it would be odd if most practitioners among scientists and researchers do not understand and misapply it. And yet, this provocative conclusion seems most likely.

Because a simple bivariate correlation does not disentangle confounding effects, the multiple regression is said to be preferred. The technique attempts to evaluate the strength of an independent (predictor) variable in the prediction of an outcome (dependent) variable, when controlling, i.e., holding constant, every other variables entered (included) as independent variables into the regression model, either progressively step by step or altogether at the same time. The rationale is to get the effect of an independent variable that only belongs to it. But this is a fallacy.
Continue reading

What does it mean to have a low R-squared ? A warning about misleading interpretation

A common argument we read everytime, everywhere. All with the same common mistake. It consists in squaring the correlation. For example : “Your brain-IQ correlation is r=0.40, so if you square it, that only amounts to a tiny 16% (r²=0.40*0.40=0.16) of variance explained which is not impressive”. Or something in this vein. R² use and abuse caused enough damage. It is more than time to put an end to this utter fallacy.

Continue reading