About that Mani et al. study

I regularly see this study thrown around from Mani et al. (2013) titled “Poverty impedes cognitive function”. It should be no surprise. In just a few years, it’s gathered over 1,400 citations and made multiple sources of mainstream news (see here, here and here for example). I see it all the time in debates about socioeconomic effects on IQ, despite the fact that it is, all around, a very bad study.

Study Plan

Mani et al. propose a causal relationship between poverty and intelligence, where poverty, stress about financial concerns, etc. put negative pressure on the brain and disallow it from being as productive as it could be. They tested their hypothesis in two experimental settings. The first:

“We induced richer and poorer participants to think about everyday financial demands. We hypothesized that for the rich, these run-of-the-mill financial snags are of little consequence. For the poor, however, these demands can trigger persistent and distracting concerns (18, 19). The laboratory study is designed to show that similarly sized financial challenges can have different cognitive impacts on the poor and the rich.” (Mani et al. 2013: 976)

As they note, this laboratory setting may leave out important details about the real world, so they conferred to a real-life sample. The experiment plan for the second experimental setting is explained as so:

“Our second study takes a different approach and allows us to assess what happens when income varies naturally. We conducted a field study that used quasi-experimental variation in actual wealth. Indian sugarcane farmers receive income annually at harvest time and find it hard to smooth their consumption (20). As a result, they experience cycles of poverty poor before harvest and richer after. This allows us to compare cognitive capacity for the same farmer when poor (pre-harvest) versus richer (post-harvest). Because harvest dates are distributed arbitrarily across farmers, we can further control for calendar effects. In this study, we did not experimentally induce financial concerns; we relied on whatever concerns occurred naturally. We were careful to control for other possible changes, such as nutrition and work effort. Additionally, we accounted for the impact of stress. Any effect on cognitive performance then observed would thus illustrate a causal relationship between actual income and cognitive function in situ. As such, the two studies are highly complementary. The laboratory study has a great deal of internal validity and illustrates our proposed mechanism, whereas the field study boosts the external validity of the laboratory study.” (Ibid.)

To start, these may sound like solid plans for developing a simple study. The first methodology essentially simulates the experiment in a laboratory setting and the latter uses a matched-pairs design in order to test the effects of reduced income, and the stress associated with it, on measures of intelligence. But, once they describe the actual study they conduct, problems begin to arise.

Laboratory Studies

First, we can address the samples used for the laboratory settings. The sample was essentially a convenience/voluntary response sample from New Jersey malls. This can not be used for inference of generalizability; the results can be strongly biased. Not only was the sample methodology flawed, but the sample itself for each laboratory experiment was very small. All of them were under 200 people. This sample is not only inappropriate for attempting to create a high quality study, but also irresponsible.

Their method of measuring cognitive function has been criticized by Dang et al. (2015). While I agree with the usage of a Raven’s test, Dang et al. point out, “financial concerns increased selectivity of attention, away from irrelevant tasks (which IQ tests arguably are) toward relevant task (which financial decision making arguably is)”. Normally, I wouldn’t take this argument too seriously because most people are committed in taking an IQ test (because they do matter). But it’s of particular importance here for two reasons. First of all, because the sample they used was just a bunch of random people from a mall. These people were also offered money to participate in the study. But, they have little incentive to actually care in the study setting, at least for the first laboratory experiment (we’ll get into the third in a moment). Second of all, the researchers did not control for test motivation. Combining these two facts and the criticism brought up by Dang et al. altogether makes the first two laboratory setting studies unreliable.

In the third laboratory study, they offered money for each question answered correctly, to adjust for this motivation problem. The results were barely significant, generally hanging around the p=0.01 or p<0.01 area. Some correlations are presented as existing, yet they aren’t actually statistically significant. This experiment still falls in line with the problem of sample bias. But, these criticisms may not seem to be enough. But wait… there’s more.

In the laboratory settings, they used median splits to divide the income groups. This is problematic due to being associated with lower power results and unreliability (fail to pick up non-linearity, unnecessary random error, etc.). Wicherts and Scholten (2013) point this out in a commentary to Mani et al. They go on to re-analyze the data, stating

“Of the two measures of cognitive functioning in Mani et al.’s studies, only the Raven’s scores are fairly symmetrically distributed. We therefore submitted these data to linear regressions involving family income (mean-centered to facilitate interpretation) and an interaction between income and the type of scenario. Results are given in Table 1. In none of the three core experiments (1, 3, and 4) was the interaction significant when analyzed without unnecessary dichotomization of income. We also analyzed data from study 2, which aimed to show that the effect of poverty-related worries could be distinguished from a form of test anxiety and would not occur in similar, but nonfinancial, scenarios.” (Wicherts and Scholten, 2013)

The usage of median splits has been criticized elsewhere. McClelland et al. (2015) is very harsh on Mani et al. for their usage of them, stating,

“The next issue of Science printed a criticism of those findings by Wicherts and Scholten (2013). They reported that when the dichotomized indicators were replaced by the original continuous variables, the critical interactions were not significant at p b .05 in any of the three core studies: p values were .084, .323, and .164. In a reply to Wicherts and Scholten, Mani, Mullainathan, Shafir, and Zhao (2013b) justified their use of median splits by citing papers published in Science and other prestigious journals that also used median splits. This “Officer, other drivers were speeding too” defense is often tried but rarely persuasive, especially here when the results of the (nonsignificant) continuous analyses were known. Though Mani et al. further noted their effect reached the .05 level if one pooled the three studies, we would guess that the editor poured himself or herself a stiff drink the night after reading Wicherts and Scholten’s critique and the Mani et al. reply. It is hard to imagine that Science or many less prestigious journals would have published the paper had the authors initially reported the correct analyses with a string of three nonsignificant findings conventionally significant only by meta-analysis at the end of the paper. The reader considering the use of median splits should consider living through a similarly deflating experience. Splitting the data at the median resulted in an inaccurate sense of the magnitude of the fragile and small interaction effect (in this case, an interaction that required the goosing of a meta-analysis to reach significance), and a publication that was unfortunately subject to easy criticism.” (McClelland et al. 2015: 3)

Overall, the laboratory settings seem to fail statistically and methodologically. There is no reason to put faith in these. As Wicherts et al. showed, when they were replicated, there was no significant effect.

Field Setting:

After testing the effect of poverty on cognitive function in a lab setting, it’s time for the real world. And what better representative sample to pick than… sugarcane farmers. Let’s run down the study:

They sampled 464 Indian sugarcane farmers who made roughly 60% of their income from selling their sugarcane. They interviewed the farmers before and after the harvest, noting that there was much more financial stress before than after. The results they find are simple: the farmers scored lower on measures of intelligence before the harvest than after. But, of course, this doesn’t mean a change in financial stress was the cause of this.

Wicherts et al. point out they don’t account for retest effects:

“We note that a highly relevant potential confound in the field study presented by Mani et al. is the possibility of retesting effects. The lack of any retesting effect in Mani et al.’s field study involving Indian farmers is clearly at odds with one of the more robust findings in the literature on cognitive testing (9). Retesting effects on the Raven’s tests are particularly profound among test-takers with little education (10).” (Wicherts et al. 2013)

They cite the following source as (10): Wicherts et al. (2010) which may be of interest. Test-retest reliability for the Stroop test, which Mani et al. used, was very low when using difference scores (Strauss et al. 2005).

Mani et al. go on to discuss the conclusions from this study and list items which hereditarians have subscribed to as plausible for IQ deficits (nutrition and Richard Lynn) but aren’t complete measures of poverty. In fact, introducing nutrition into the equation makes their whole case entirely unsatisfying as that is an easily visible confounding variable. Nutrition may become much better after the harvest, as would be expected. But they claim the relationship persists beyond controlling for nutrition. As I have stated above, there are plenty of reasons to find this unreliable, or at the very least, ungeneralizable.

A much better study has actually been done since Mani et al.s to test the effect of poverty on cognitive function. Carvalho et al. (2016) used a large, random, American sample and measured cognitive function through working memory, which they show is strongly correlated to other measures of cognitive ability. The group was tested before pay day and after pay day, similar to the harvest setting though with less potential confounding (particularly due to the length of time). There were incredibly small differences between the groups, which were generally statistically insignificant. So, the Mani et al. study fails to a much better study.

Worth noting, though not the same, is a study of Bangladeshi children by Hamadani et al. (2014) which found poverty generally had little effect on the children’s IQs when controlling for a number of confounding variables which changed over time (something Mani et al. did not do).

Finally, it’s not even statistically probable for poverty to have a large effect on IQ, especially to the degree they speak of, considering over 80% of the variance in g is heritable and the far majority of the remainder is non-shared environmental influences. This creates a major problem for environmentalists who offer public policy suggestions for raising overall IQ (Yang has repeatedly used this Mani et al. study, unfortunately).

Altogether, this study is bad. Like very, very, very bad. It holds no academic worth and should not have made it far in the peer review process. But here we are in 2019, still seeing it everywhere.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s