Conductive Quantitative Research in a Region of Limited Data Validity

by Petr Čermák


In the last two decades, research on peace and conflict has undergone its ‘local turn’. The global conflict environment changed rapidly after the end of the Cold War as traditional inter-state conflicts vanished, while the occurrence and intensity of intra-state conflicts rapidly increased. Many scholars, both from the quantitative and the qualitative camp, have responded to the changing nature of violent conflict by shifting their focus from the once dominant state-level towards the largely understudied local-level.

This shift has provided scholars the opportunity to retest and redefine established theories and develop new arguments on locally-rooted causal mechanisms. It has also opened a new space methodologically, allowing us to work with much larger samples than the limited state-level analyses. This has been most visible within the quantitative branch of research where states have been frequently replaced with micro-regions as primary units of analysis and, instead of country-level data, disaggregated geo-referenced local data have been used (see Dorussen and Raleigh 2009; Raleigh et al. 2010; Eck 2012; Sundberg and Melander 2013).

A model area for such a locally-focused research is the post-Yugoslav region, which features a number of different conflict and post-conflict processes. While the region can be easily divided into comparable local units, it also presents sufficient variance of explanatory and dependent variables. In theory, this is an ideal case for large-N exploratory comparative research or theory testing.

In practice, however, this opportunity remains largely unexploited. While there are some pioneering quantitative works on local level conflict dynamics in Bosnia (Slack and Doyon 2001; Costalli and Moro 2012) and post-conflict developments in Macedonia (Ringdal et al. 2007; Dyrstad et al. 2011), the field generally remains understudied and cross-regional analyses are virtually non-existent. One reason for this is the limited availability of valid and reliable data; the essential element of any quantitative research.

Data validity in the post-Yugoslav space

A case in point is the elementary demographic data about ethnic structure of populations. The collection of valid data on ethnic demography is a necessary first step for any local-level analysis of socio-political developments related to ethnic conflict. Even critics of the concept of ethnicity would probably agree that ethnic categories are still relevant for description and analysis of the recent post-conflict reality in the former Yugoslav states. Besides, ethnic figures have extremely high political relevance in the local political environment as they often stand as ruling principle for many institutional structures.

However, as a direct consequence of this politicization, this category of data is produced and used more as a political weapon than a statistical figure in much of the post-Yugoslav space. The consequence is that most states of the region are unable to produce reliable data on ethnic structure of their post-conflict areas.

In Bosnia, the long expected results of the first post-war census were published earlier this year; more than three years after the census was conducted. After years of political fighting over methodological issues, it is not surprising that there was little if any scholarly interest in the new data. Macedonia avoided such a post-census volatility simply by cancelling the census during its course. Data showing the actual ethnic structure of country struggling with latent ethnic tensions has never been released amidst ongoing political debate.

In contrast, Kosovo, a state born from ethnic conflict, technically managed to conduct its first post-independence census in 2011. Nevertheless, it is not possible to use the official data to even estimate the number of Serbs who remained living in the country as most of them boycotted the census. A similar situation is in the adjacent post-conflict region of Preshevo Valley in southern Serbia, where the local Albanian community refused to take part in the official census in 2011.

Croatia seems to be the only post-conflict former Yugoslav state, where the census was conducted without such apparent controversies. Still, the figures on number of Serbs living in the once contested regions of the country have also been frequently politicized and challenged by Croat nationalists.

The following table shows some of most alarming examples of discrepancy between official figures on particular groups in ethnically mixed areas and secondary estimates of their population share.

As can be seen, data can be biased in both directions. On the one hand, Serbs in Kosovo or Albanians in Serbia boycott the census to express their resistance to the state structures and thus make their demographic figures appear much lower than the reality. On the other hand, local minorities in Bosnia and Croatia allegedly try to exaggerate their numbers by including people who are not actually residing in the area to claim a bigger share of the local power. While these sorts of political manipulation may present interesting research questions (as was shown in the special issue of Contemporary Southeastern Europe journal), they constitute a serious limitation for use of the census-based data for any further analysis.

Can we improve data validity?

The pressing question, then, is how to clear the data of its politicized components and improve validity. The natural choice in cases of uncertainty about data validity should be triangulation with other available sources. This, however, is far from being an easy task. Refining the demographic figures through triangulation is not only extremely time-consuming, but can also pose risks of new misinterpretation and unintended manipulation of demographic figures.

In the regional context of the former Yugoslavia, there is a wide range of estimates of ethnic demography  with varying reliability. These range from scholarly calculations based on well-grounded indirect measurements (such as Bochsler and Schläpfer 2016) to locally produced estimates, often by unknown authors, which are found throughout internet forums (such as here or here). To triangulate such different sources and produce the qualified estimate necessarily leads us into the spiral of clearing, verifying and interpreting the messy figures of various quality. The search for more detailed and accurate data can then become seemingly indefinite. In the end, the researcher may be left to wonder if it would not be easier to collect new data directly in the field.

What would probably be a straight forward technical procedure for a researcher working on ethnic issues in other regions can become a week-long detective work for a scholar investigating the same phenomena in the post-Yugoslav space. Yet, we simply cannot avoid it. Until someone does the thankless job of producing accurate data from the field, we will have to make do with uncertain estimates. In the meanwhile, scholars who intend to employ the data on ethnic demography in their analyses should be aware that official figures cannot be taken for granted, census results need to be treated with suspicion, and that alternative sources should be explored.


