PhDmama New Recruit 9 Posts user info edit post |
I'm running analyses on a data set of 300+ cases.
I have 6 predictors that are positively correlated with my DV in bivariate correlations. Upon entering these variables into a multiple regression (simple enter method), one of my predictors now has a [i]negative beta coefficient. How is this possible?
I found a few general forum discussions about this stating that it might be a case of a suppressor variable, but my advisor is saying that it would be close to impossible because it is extremely rare. However, all indicators show that there's no issue with multicollinearity of the predictors (tolerance and VIF are acceptable). Any ideas or has anyone else dealt with a similar issue?
[Edited on February 20, 2014 at 6:41 PM. Reason : ./.] 2/20/2014 6:41:28 PM |
y0willy0 All American 7863 Posts user info edit post |
i have a hole in my butt. 2/20/2014 11:16:13 PM |
neolithic All American 706 Posts user info edit post |
Could you give us a little more information? What are the magnitudes of the estimated coefficients? What are the associated p-values?
Here is an example where you can have all positive bivariate correlations and still get a negative sign during the linear regression. Let's say we have 3 variables - X1, X2, and X3 and we are modeling the linear dependence on Y. Let's further suppose that the true relationship between Y and these variables is given by:
Y = X1 + X2
X1 and X2 are independent of each other but X3 is related to X1 by the following relationship:
X3 = X1 + e
where e is just random Gaussian noise. If you have a small enough sample size (300 would qualify as small enough here) then it's entirely possible that when you try to fit a full linear model (e.g. Y = X1 + X2 + X3) that this setup could generate the situation you described. This is why bivariate correlations can be misleading. In this case X1, X2, and X3 would all have positive correlations with Y, but it is possible to get a negative coefficient estimate (albeit a small one) for X3 from the the linear regression.
[Edited on February 21, 2014 at 10:06 AM. Reason : stuff] 2/21/2014 9:54:21 AM |
PhDmama New Recruit 9 Posts user info edit post |
More information: correlation for my "problem variable" (call it X6, since I have six predictors) (the one that is switching signs) (r = 0.131, p = .008) with the DV.
Upon adding four other predictors in the regression (that are all also positively and significantly related to the DV as correlations), the same predictor ß= -.094, p = .063.
Interestingly, it remains positive when I add X1 X2 X3, X4, but with X5 the ß then becomes negative. The reliability is very high for my X5, but so-so for the DV. Variable X6 is a single item (7 point likert) measure.
Thanks for any thoughts or ideas. 2/21/2014 8:47:36 PM |
neolithic All American 706 Posts user info edit post |
I'm not sure what the units are for your problematic variable X6, so I don't know if -0.094 is a large effect. My guess is that that this variable has little to no effect on your DV while being slightly correlated with some other variables that are related to your DV, and you're just getting a noisy estimate of this small effect. If X6 has no effect, you can imagine when you sample 300 observations sometimes the sign will be negative, sometimes the sign will be positive, because you're estimating a small effect with a relatively small sample size. The situation you describe doesn't strike me as all that strange.
Another question you might think about is why would you care about bivariate correlations at all when you are doing the full linear regression? Is this standard practice in your field to report these values? In general, I would trust the results from a full regression over any sort of univariate measure, because univariate measures can often be misleading or confounded.
If you really want to get a wide variety of thoughts, I would suggest you post your question to the stats section of stack exchange. Those guys live to answer questions like this, so I bet you would get some high quality responses to your question.
http://stats.stackexchange.com
[Edited on February 22, 2014 at 11:26 AM. Reason : ] 2/22/2014 11:25:29 AM |
PhDmama New Recruit 9 Posts user info edit post |
Thanks so much! I think I will post it in stats exchange.
I also put it on Reddit (stat sub-reddit), and they said the same thing about correlations. It seems to be the typical thing to report correlations in social science-- maybe not so much to make interpretations, but to decide what variables to keep for the regression when we're dealing with a ton of variables. Technically I started out with 8 predictors and decided not to keep the significant ones for the regression (could be even more if looking at some other demographic variables and adding more "exploratory research" questions). Some folks would say to put variables in the regression if it's a part of the hypotheses or research questions-- regardless of significance at the bivariate level. 2/22/2014 8:19:33 PM |
lewisje All American 9196 Posts user info edit post |
2/24/2014 12:26:48 AM |
neolithic All American 706 Posts user info edit post |
^Describes me perfectly right now. 2/24/2014 9:50:15 AM |