BJP UP party president Laxmikant Bajpai addressing party working committee (source:PTI) |
Rupa Subramanya
(With inputs from Saurav Chatterjee)
Recently, NDTV’s Sreenivasan Jain and Niha Masih published two stories, with the following provocative headlines:
and
Unfortunately for Jain and Masih,
incomplete data analysis — in particular, a basic conceptual error in
how to interpret their own data — mean instead that their claims don’t
add up, and constitute a form of spin all their own.
But this didn’t stop those
sympathetic to their findings from sharing them widely and uncritically —
without, apparently, checking if Jain and Masih’s analysis made sense.
Is it true, as some in the UP unit
of the BJP seem to allege, that Muslim men exhibit a greater propensity
to commit violence against women, such as rape, than Hindu men?
The difficulty in assessing this
claim is that official statistics of violence against women do not
report the religious affiliation of perpetrator or victim. So it’s
impossible to either confirm or refute the claims made by the BJP in any
statistically meaningful way given the publicly available data.
This is how the first of the two stories by Jain and Masih explains the above: “…unclear
on what basis the BJP has made that claim, given that the data of
violence against women….is never compiled on the basis of religion of
the accused”.
Then, they give us official UP police data on rapes in Meerut in 2013, which, of course, are not coded by religion.
Note that there’s a careless error
here, since they claim the data showing 389 cases of rape registered
last year were in Meerut district — when in fact it the UP police data show that the number for Meerut district is 109.
Their number seemed so out of line with previous years, so I checked the U.P. police data (Table 65, p. 228).
In fact, the number 389 refers to Meerut range, which comprises six districts. The Meerut zone includes three additional districts, which brings the total up to 506.
This seems like a rather elementary error and one wonders how it wasn’t picked up by simple fact checking.
The more basic problem is what they conclude: “There is nothing to suggest that minorities are responsible for these high numbers.”
This contradicts what they earlier
tell us — correctly — which is that it’s impossible to identify the
religious affiliations in rape cases.
How, then, can they assert that there is “nothing to suggest”…? There is also nothing to suggest that minorities aren’t responsible, given the absence of religion-wise data.
As worded, their claim is misleading and incomplete.
A more accurate way to report this is to simply say there’s no way to assess the BJP’s claim, one way or another, with publicly available data.
Then, they tell us that they have managed to construct data to answer the question.
In particular, Jain and Masih claim
to have reviewed police data covering January to August of 2014. In the
first piece, they offer us data from Meerut district only, and in the
second, they give us pooled data for all nine districts comprising
Meerut zone, which I analyse below.
But since religious affiliation is not reported, how did NDTV get this breakdown?
Presumably, they inferred religious
identity from the names of alleged perpetrators and victims. All NDTV
tells us in their second piece is: “We specifically asked for it,
to check political claims. This is a time-consuming and sensitive
exercise that should be avoided.”
It’s not quite clear to me what this means.
I’ve asked both Jain and Masih on
Twitter to share details of their methodology, so one could
independently assess if their coding method is accurate or not. Neither
one has replied to me. In fact Jain has blocked me!
Thus, there’s no way to verify the
data — in particular, the coding of perpetrator and victim by religion.
We are simply asked to take their word that the coding method, which
they don’t reveal, is accurate.
Let’s leave that aside and assume
for the sake of argument that their data are completely accurate. Do
they support the story that Jain and Masih try to tell?
According to NDTV’s pooled data,
reported in the second piece, there have been 334 rape cases from
January to August of this year in the Meerut zone. In 25 cases, Muslims
are the accused and alleged victims are Hindus. There are 23 cases of
Hindu accused and Muslim alleged victims. In 96 cases both accused and
alleged victims are Muslim, whereas in 190 cases, the accused and
alleged victims are Hindus.
Jain and Masih draw attention to the fact that “the highest number”
of incidents are Hindu against Hindu violence — as if this tells us
something meaningful. But they reveal instead a basic fallacy in how to
interpret their own numbers — which, as I will show shortly, actually
undermines their own story!
The fundamental error that
they commit is in failing to perform a basic statistical adjustment to
the raw numbers, an adjustment necessary to draw any inferences.
Specifically, what they fail to do
is to adjust the raw numbers to show us the percentage of crimes
committed by each community and then compare the adjusted numbers to
each community’s share in the total population.
This very basic statistical adjustment— something a high school student could do — is vital for the following reason.
Claims around “love jihad” boil down to claims comparing the propensity
to commit crimes across two communities, not to claims comparing raw
numbers — very obviously, since the populations shares of the two
communities aren’t equal.
If Jain and Masih had wanted to
check whether these claims about the propensities to commit rapes
differed between Hindus and Muslims, they had the data at their
fingertips — but failed to use them correctly.
What they should have done was to compare the propensity rates in their own data to data on the population shares of the two communities.
If, for example, community A
comprises x% of the population, but commits more than x% of rapes, then
there is prime facie evidence for a great propensity to commit rape by
members of that community than would be implied by their share of the
population.
Ideally, even this propensity rate should be subject to further statistical controls — as I discuss later — but, at a bare minimum, it is necessary to convert raw numbers to propensity rates to say anything at all meaningful.
The difficulty is that the 2011
census data on district wise breakdown by religion is not yet publicly
available. The best we can do is to use data from the 2001 census, which
is accurate only if there have not been major changes in population
shares since then. A further wrinkle is that two of the present 9
districts comprising Meerut zone didn’t exist in 2001, as they were
carved out later.
However, with the data we have available, this is the best we can do.
With these caveats in mind, it is
possible to compute population shares for the Meerut zone, using
district wise data on each of the individual districts, and weighting
appropriately by the size of each district to obtain a final weighted
average. That exercise gives us 69% Hindus and 29% Muslims.
Of the total crimes committed
(i.e., whether against someone from one’s own or the other community),
we can easily calculate from Jain and Masih’s own data that 64% of
attacks are by Hindu men and 36% by Muslim men.
Putting it all together: Hindus
comprise 69% of the population in Meerut zone but commit only 64% of the
total number of rapes. Muslims comprise only 29% of the population but
commit 36% of rapes.
This shows us that Hindus actually have a slightly lower propensity to commit rapes and Muslims have a slightly higher propensity, compared against their respective population shares.
It is important to note that we cannot say if these differences are statistically
significant, since we have no way to put confidence bounds around the
relevant ratios we are comparing. The difference in magnitudes is at
least noteworthy.
There’s a further interesting
nugget one can extract from Jain and Masih’s own data, which again they
fail to do, and again it goes against the story they’re trying to tell.
If you break down crimes within
each community to those committed against members of the opposite
community, Jain and Masih’s own data tell us that in 11% of cases where
the attackers are Hindu, they’re attacking someone of the opposite
community. By contrast, in 21% of cases where the attackers are Muslim,
Muslim men are attacking women of the other community.
This means that Muslims who commit rape have almost twice as high a propensity to attack someone of the opposite community than Hindus who commit rape — as contained in NDTV’s own data, when their raw numbers are converted to propensity ratios — as one must do to say anything statistically meaningful.
This is the relevant information in their own data — not Jain and Masih’s illogical reference to the “highest number”.
This is certainly an inconvenient
implication of Jain and Masih’s own data, in a piece trying to debunk
claims that Muslims have a greater propensity to commit rape against
Hindus — when their own data, correctly interpreted, tell exactly the opposite tale.
It’s important again to note all of
the caveats, given the understandable sensitivities on a subject which
has become politicised.
First, we’re going by NDTV’s data — which there’s no way to verify independently.
Second, we’re using population data
from the 2001 census on Hindu and Muslim population shares, which might
well have changed in the intervening years.
Specifically, if Muslim share of the population were actually higher today in the Meerut zone than in 2001, it would attenuate the discrepancy in the propensities I earlier calculated. If however, the Muslim share has gone down, that would magnify the discrepancy.
Since population shares can change
due to different rates of fertility, mortality, and migration, we simply
have no way to know whether or in what direction population shares
might have changed until the new census data are released.
Note that this would not affect
the comparison of propensities to attack members of the opposite
community, since these are based on shares of crimes — contained in Jain
and Masih’s data — not population shares.
The further caveat — and this is
important — is that that even propensity ratios do not allow us to make
statistically sound causal inferences, since there could be other
factors besides religion driving those differences.
For example, if Muslim women are
more likely to be cloistered within their community than Hindu women,
which seems plausible, then part of the propensity difference may just
reflect a different level of access to the other community, not a
different intrinsic level of intended violence against the other.
What one would really need to do is
to perform a multiple regression analysis, including all of the factors
that economists and demographers normally take to be drivers of
violence against women, and then see if there’s some residual
effect that the difference in religion picks up. Even this can’t
conclusively establish causality, for the well-known reason that
regressions show us correlation, not causation.
The truth of the matter is we
simply don’t have the quality of data, or data analysis, to conclusively
either confirm or refute claims that one community or the other has a
greater propensity to commit violence. The hype comes when Jain and
Masih claim to have debunked the BJP’s claims using “cold hard facts” —
when, as I have shown, they exhibit a basic failure in how to understand
and interpret their own facts.
So in this pair of pieces by Jain and Masih, what’s the “truth”, and what’s the “hype”? You decide.
Rupa Subramanya is co-author
of Indianomix: Making Sense of Modern India (Random House India, 2012).
On Twitter: @rupasubramanya
Saurav Chatterjee is on Twitter @p_adic_Saurav
No comments:
Post a Comment