Saturday, 30 August 2014

How NDTV’s Claims about “Love Jihad” Don’t Add Up

BJP UP party president Laxmikant Bajpai addressing party working committee (source:PTI)
Rupa Subramanya 
(With inputs from Saurav Chatterjee) 
Recently, NDTV’s Sreenivasan Jain and Niha Masih published two stories, with the following provocative headlines: 
Unfortunately for Jain and Masih, incomplete data analysis — in particular, a basic conceptual error in how to interpret their own data — mean instead that their claims don’t add up, and constitute a form of spin all their own.
But this didn’t stop those sympathetic to their findings from sharing them widely and uncritically — without, apparently, checking if Jain and Masih’s analysis made sense. 
Is it true, as some in the UP unit of the BJP seem to allege, that Muslim men exhibit a greater propensity to commit violence against women, such as rape, than Hindu men? 
The difficulty in assessing this claim is that official statistics of violence against women do not report the religious affiliation of perpetrator or victim. So it’s impossible to either confirm or refute the claims made by the BJP in any statistically meaningful way given the publicly available data.
This is how the first of the two stories by Jain and Masih explains the above: “…unclear on what basis the BJP has made that claim, given that the data of violence against women….is never compiled on the basis of religion of the accused”.
Then, they give us official UP police data on rapes in Meerut in 2013, which, of course, are not coded by religion. 
Note that there’s a careless error here, since they claim the data showing 389 cases of rape registered last year were in Meerut district  — when in fact it the UP police data show that the number for Meerut district is 109. 
Their number seemed so out of line with previous years, so I checked the U.P. police data (Table 65, p. 228).
In fact, the number 389 refers to Meerut range, which comprises six districts. The Meerut zone includes three additional districts, which brings the total up to 506.
This seems like a rather elementary error and one wonders how it wasn’t picked up by simple fact checking.
The more basic problem is what they conclude: “There is nothing to suggest that minorities are responsible for these high numbers.” 
This contradicts what they earlier tell us — correctly — which is that it’s impossible to identify the religious affiliations in rape cases. 
How, then, can they assert that there is “nothing to suggest”…? There is also nothing to suggest that minorities aren’t responsible, given the absence of religion-wise data. 
As worded, their claim is misleading and incomplete.
A more accurate way to report this is to simply say there’s no way to assess the BJP’s claim, one way or another, with publicly available data.
Then, they tell us that they have managed to construct data to answer the question. 
In particular, Jain and Masih claim to have reviewed police data covering January to August of 2014. In the first piece, they offer us data from Meerut district only, and in the second, they give us pooled data for all nine districts comprising Meerut zone, which I analyse below.
But since religious affiliation is not reported, how did NDTV get this breakdown? 
Presumably, they inferred religious identity from the names of alleged perpetrators and victims. All NDTV tells us in their second piece is:  “We specifically asked for it, to check political claims. This is a time-consuming and sensitive exercise that should be avoided.”
It’s not quite clear to me what this means.
I’ve asked both Jain and Masih on Twitter to share details of their methodology, so one could independently assess if their coding method is accurate or not. Neither one has replied to me. In fact Jain has blocked me! 
Thus, there’s no way to verify the data — in particular, the coding of perpetrator and victim by religion. We are simply asked to take their word that the coding method, which they don’t reveal, is accurate.
Let’s leave that aside and assume for the sake of argument that their data are completely accurate. Do they support the story that Jain and Masih try to tell?
According to NDTV’s pooled data, reported in the second piece, there have been 334 rape cases from January to August of this year in the Meerut zone. In 25 cases, Muslims are the accused and alleged victims are Hindus. There are 23 cases of Hindu accused and Muslim alleged victims. In 96 cases both accused and alleged victims are Muslim, whereas in 190 cases, the accused and alleged victims are Hindus. 
Jain and Masih draw attention to the fact that “the highest number” of incidents are Hindu against Hindu violence — as if this tells us something meaningful. But they reveal instead a basic fallacy in how to interpret their own numbers — which, as I will show shortly, actually undermines their own story!
The fundamental error that they commit is in failing to perform a basic statistical adjustment to the raw numbers, an adjustment necessary to draw any inferences. 
Specifically, what they fail to do is to adjust the raw numbers to show us the percentage of crimes committed by each community and then compare the adjusted numbers to each community’s share in the total population. 
This very basic statistical adjustment— something a high school student could do — is vital for the following reason.
Claims around “love jihad” boil down to claims comparing the propensity to commit crimes across two communities, not to claims comparing raw numbers — very obviously, since the populations shares of the two communities aren’t equal.
If Jain and Masih had wanted to check whether these claims about the propensities to commit rapes differed between Hindus and Muslims, they had the data at their fingertips — but failed to use them correctly.
What they should have done was to compare the propensity rates in their own data to data on the population shares of the two communities.
If, for example, community A comprises x% of the population, but commits more than x% of rapes, then there is prime facie evidence for a great propensity to commit rape by members of that community than would be implied by their share of the population.
Ideally, even this propensity rate should be subject to further statistical controls — as I discuss later — but, at a bare minimum, it is necessary to convert raw numbers to propensity rates to say anything at all meaningful.
The difficulty is that the 2011 census data on district wise breakdown by religion is not yet publicly available. The best we can do is to use data from the 2001 census, which is accurate only if there have not been major changes in population shares since then.  A further wrinkle is that two of the present 9 districts comprising Meerut zone didn’t exist in 2001, as they were carved out later. 
However, with the data we have available, this is the best we can do.
With these caveats in mind, it is possible to compute population shares for the Meerut zone, using district wise data on each of the individual districts, and weighting appropriately by the size of each district to obtain a final weighted average. That exercise gives us 69% Hindus and 29% Muslims. 
Of the total crimes committed (i.e., whether against someone from one’s own or the other community), we can easily calculate from Jain and Masih’s own data that 64% of attacks are by Hindu men and 36% by Muslim men.
Putting it all together: Hindus comprise 69% of the population in Meerut zone but commit only 64% of the total number of rapes. Muslims comprise only 29% of the population but commit 36% of rapes.
This shows us that Hindus actually have a slightly lower propensity to commit rapes and Muslims have a slightly higher propensity, compared against their respective population shares.
It is important to note that we cannot say if these differences are statistically significant, since we have no way to put confidence bounds around the relevant ratios we are comparing. The difference in magnitudes is at least noteworthy.
There’s a further interesting nugget one can extract from Jain and Masih’s own data, which again they fail to do, and again it goes against the story they’re trying to tell. 
If you break down crimes within each community to those committed against members of the opposite community, Jain and Masih’s own data tell us that in 11% of cases where the attackers are Hindu, they’re attacking someone of the opposite community. By contrast, in 21% of cases where the attackers are Muslim, Muslim men are attacking women of the other community.
This means that Muslims who commit rape have almost twice as high a propensity to attack someone of the opposite community than Hindus who commit rape — as contained in NDTV’s own data, when their raw numbers are converted to propensity ratios — as one must do to say anything statistically meaningful. 
This is the relevant information in their own data — not Jain and Masih’s illogical reference to the “highest number”.
This is certainly an inconvenient implication of Jain and Masih’s own data, in a piece trying to debunk claims that Muslims have a greater propensity to commit rape against Hindus — when their own data, correctly interpreted, tell exactly the opposite tale.
It’s important again to note all of the caveats, given the understandable sensitivities on a subject which has become politicised. 
First, we’re going by NDTV’s data — which there’s no way to verify independently. 
Second, we’re using population data from the 2001 census on Hindu and Muslim population shares, which might well have changed in the intervening years.
Specifically, if Muslim share of the population were actually higher today in the Meerut zone than in 2001, it would attenuate the discrepancy in the propensities I earlier calculated. If however, the Muslim share has gone down, that would magnify the discrepancy. 
Since population shares can change due to different rates of fertility, mortality, and migration, we simply have no way to know whether or in what direction population shares might have changed until the new census data are released. 
Note that this would not affect the comparison of propensities to attack members of the opposite community, since these are based on shares of crimes — contained in Jain and Masih’s data — not population shares.
The further caveat — and this is important — is that that even propensity ratios do not allow us to make statistically sound causal inferences, since there could be other factors besides religion driving those differences.
For example, if Muslim women are more likely to be cloistered within their community than Hindu women, which seems plausible, then part of the propensity difference may just reflect a different level of access to the other community, not a different intrinsic level of intended violence against the other. 
What one would really need to do is to perform a multiple regression analysis, including all of the factors that economists and demographers normally take to be drivers of violence against women, and then see if there’s some residual effect that the difference in religion picks up. Even this can’t conclusively establish causality, for the well-known reason that regressions show us correlation, not causation. 
The truth of the matter is we simply don’t have the quality of data, or data analysis, to conclusively either confirm or refute claims that one community or the other has a greater propensity to commit violence. The hype comes when Jain and Masih claim to have debunked the BJP’s claims using “cold hard facts” — when, as I have shown, they exhibit a basic failure in how to understand and interpret their own facts.
So in this pair of pieces by Jain and Masih, what’s the “truth”, and what’s the “hype”? You decide.
Rupa Subramanya is co-author of Indianomix: Making Sense of Modern India (Random House India,  2012). On Twitter: @rupasubramanya 
Saurav Chatterjee is on Twitter @p_adic_Saurav

