User not logged in - login - register
Home Calendar Books School Tool Photo Gallery Message Boards Users Statistics Advertise Site Info
go to bottom | |
 Message Boards » » Stat Q: Sign flipping in bivariate to regression Page [1]  
PhDmama
New Recruit
9 Posts
user info
edit post

I'm running analyses on a data set of 300+ cases.

I have 6 predictors that are positively correlated with my DV in bivariate correlations. Upon entering these variables into a multiple regression (simple enter method), one of my predictors now has a [i]negative beta coefficient. How is this possible?

I found a few general forum discussions about this stating that it might be a case of a suppressor variable, but my advisor is saying that it would be close to impossible because it is extremely rare. However, all indicators show that there's no issue with multicollinearity of the predictors (tolerance and VIF are acceptable). Any ideas or has anyone else dealt with a similar issue?

[Edited on February 20, 2014 at 6:41 PM. Reason : ./.]

2/20/2014 6:41:28 PM

y0willy0
All American
7863 Posts
user info
edit post

i have a hole in my butt.

2/20/2014 11:16:13 PM

neolithic
All American
706 Posts
user info
edit post

Could you give us a little more information? What are the magnitudes of the estimated coefficients? What are the associated p-values?

Here is an example where you can have all positive bivariate correlations and still get a negative sign during the linear regression. Let's say we have 3 variables - X1, X2, and X3 and we are modeling the linear dependence on Y. Let's further suppose that the true relationship between Y and these variables is given by:

Y = X1 + X2

X1 and X2 are independent of each other but X3 is related to X1 by the following relationship:

X3 = X1 + e

where e is just random Gaussian noise. If you have a small enough sample size (300 would qualify as small enough here) then it's entirely possible that when you try to fit a full linear model (e.g. Y = X1 + X2 + X3) that this setup could generate the situation you described. This is why bivariate correlations can be misleading. In this case X1, X2, and X3 would all have positive correlations with Y, but it is possible to get a negative coefficient estimate (albeit a small one) for X3 from the the linear regression.

[Edited on February 21, 2014 at 10:06 AM. Reason : stuff]

2/21/2014 9:54:21 AM

PhDmama
New Recruit
9 Posts
user info
edit post

More information: correlation for my "problem variable" (call it X6, since I have six predictors) (the one that is switching signs) (r = 0.131, p = .008) with the DV.

Upon adding four other predictors in the regression (that are all also positively and significantly related to the DV as correlations), the same predictor ß= -.094, p = .063.

Interestingly, it remains positive when I add X1 X2 X3, X4, but with X5 the ß then becomes negative. The reliability is very high for my X5, but so-so for the DV. Variable X6 is a single item (7 point likert) measure.

Thanks for any thoughts or ideas.

2/21/2014 8:47:36 PM

neolithic
All American
706 Posts
user info
edit post

I'm not sure what the units are for your problematic variable X6, so I don't know if -0.094 is a large effect. My guess is that that this variable has little to no effect on your DV while being slightly correlated with some other variables that are related to your DV, and you're just getting a noisy estimate of this small effect. If X6 has no effect, you can imagine when you sample 300 observations sometimes the sign will be negative, sometimes the sign will be positive, because you're estimating a small effect with a relatively small sample size. The situation you describe doesn't strike me as all that strange.

Another question you might think about is why would you care about bivariate correlations at all when you are doing the full linear regression? Is this standard practice in your field to report these values? In general, I would trust the results from a full regression over any sort of univariate measure, because univariate measures can often be misleading or confounded.

If you really want to get a wide variety of thoughts, I would suggest you post your question to the stats section of stack exchange. Those guys live to answer questions like this, so I bet you would get some high quality responses to your question.

http://stats.stackexchange.com

[Edited on February 22, 2014 at 11:26 AM. Reason : ]

2/22/2014 11:25:29 AM

PhDmama
New Recruit
9 Posts
user info
edit post

Thanks so much! I think I will post it in stats exchange.

I also put it on Reddit (stat sub-reddit), and they said the same thing about correlations. It seems to be the typical thing to report correlations in social science-- maybe not so much to make interpretations, but to decide what variables to keep for the regression when we're dealing with a ton of variables. Technically I started out with 8 predictors and decided not to keep the significant ones for the regression (could be even more if looking at some other demographic variables and adding more "exploratory research" questions). Some folks would say to put variables in the regression if it's a part of the hypotheses or research questions-- regardless of significance at the bivariate level.

2/22/2014 8:19:33 PM

lewisje
All American
9196 Posts
user info
edit post

2/24/2014 12:26:48 AM

neolithic
All American
706 Posts
user info
edit post

^Describes me perfectly right now.

2/24/2014 9:50:15 AM

 Message Boards » Study Hall » Stat Q: Sign flipping in bivariate to regression Page [1]  
go to top | |
Admin Options : move topic | lock topic

© 2024 by The Wolf Web - All Rights Reserved.
The material located at this site is not endorsed, sponsored or provided by or on behalf of North Carolina State University.
Powered by CrazyWeb v2.38 - our disclaimer.