Sign In
Indexes of Data Quality and Openness

This article reviews three indexes that assess the openness or quality of data produced by national governments. The Open Data Barometer (ODB), produced by the Open Data Institute and the Worldwide Web Foundation, and the Open Data Index (ODI), produced by the Open Knowledge Foundation, rate the openness of heterogonous sets of data produced by governments, of which the outputs of the national statistical system are only a part. The third, the World Bank’s Statistical Capacity Index (SCI), rates the capacity of a national statistical system to produce reliable statistics but does not consider whether the data meet the criteria for openness. Although the three differ in design and content, their ratings across countries are, for the most part, highly correlated. The purpose of this article is not to rate the raters or pick a winner among the three approaches. Rather, it is part of an ongoing effort to develop a measure that captures both the quality and the openness of development statistics. As is so often the case, progress can be made more quickly by learning from what others have done. 

 

Open Data Barometer 

The Open Data Barometer (ODB) is an expert assessment system. It relies on scoring by local informants on questions concerning the policies, implementation, and impacts of open government data initiatives and a scored assessment of the openness on fourteen types of data in each country. In addition secondary data are used to complement the expert survey data and assess the readiness of countries to implement open government data initiatives. (See Open Data Barometer 2013, pages 9-10 and 37-42.) Results are summarized in three sub-indexes and an overall score scaled from 0 to 100.  

 

The fourteen datasets reviewed by the ODB are: 

1 

Map data 

 

2 

Land ownership 

 

3 

Census 

 

4 

Government Budget 

 

5 

Govern​ment Spend ​​

 

6 

Company register 

 

7 

Legislation 

 

8 

Public trans​port timetables 

 

9 

International trade  

 

10 

Health sector performance 

 

11 

Primary or secondary education performance 

 

12 

Crime statistics 

 

13 

National environmental statistics 

 

14 

National election results 

 


All these data types are of importance to a variety of users and responsibility for their production is distributed across many units of government. Broadly speaking the data of concern to us here are the traditional products of a national statistical office: the census, international trade statistics, health and education statistics (although not necessarily “performance” statistics), crime statistics, and environmental statistics. The business register also falls within the purview of the statistical office. Missing from this list but of great importance to the management of the macro economy and development programs are monetary statistics, labor market statistics, price statistics, and measures of poverty or economic welfare.  

 

The underlying data and scores from the ODB were taken from the Open Data Barometer 2013 Global Report – datasets. Data from the 2013 report is available from Zenodo 

 

Open Data Index 

The Open Data Index (ODI) is a crowd-sourced indicator of the openness datasets produced by the Open Knowledge Foundation. Information on datasets is gathered through the Open Data Census. The census is “… compiled using contributions from civil society members and open data practitioners around the world, to which the public is invited to contribute at any time; it is then peer-reviewed and checked periodically by a team of 60+ expert country editors.” (See About the Open Data Index.) 

 

To create the index, each country dataset is scored against a set of nine attributes derived from the Open Definition. Weights adding up to 100 are assigned to each attribute. The greatest weight (30 points) is assigned to openly licensed data. Datasets that are not available are assigned a score of 0. The country score is the sum of the dataset scores. Perhaps because the Open Data Census is an ongoing activity to which anyone is invited to contribute, the scores computed form the archived database do not always match those tabulated in the on-line report. 

 

Ten types of datasets are included in the Open Data Census: 

1 

Governm​ent budget 

2 

Compa​ny registers 

3 

Election results 

4 

Emissions of (air) pollutants 

5 

Legislation 

6 

National map 

7 

Postcodes 

8 

Government spending 

9 

National statistics 

10 

Transport timetables 

 

The types of data covered by the Open Data Census and included in the ODI are similar to those included in the ODB, but category of national statistics is limited to “a reasonable amount …” of “key demographic and economic indicators” such as “GDP, unemployment, population, etc.” The implicit weight of the conventional products of a national statistical office appears  to be lower in the ODI than in the ODB.   

 

Statistical Capacity Indicator 

The World Bank’s Statistical Capacity Indicator (SCI) differs from the ODB and ODI in several respects. It considers only the datasets that are traditionally the responsibility of the national statistical office, although modern statistical systems may produce many other kinds of information; the criteria by which datasets are evaluated are derived from published information, rather than the judgment of experts or data users; and it is available for 149 developing countries but not for countries classified by the World Bank as high income. Finally it does not explicitly consider whether the datasets satisfy the criteria for openness. 

 

The SCI was designed to provide a measure of the capacity of national statistical systems. It is not a measure of the quality of individual datasets although it considers factors that provide the basis for quality. The methodology underlying the SCI is described in a 2012 “Note on the Statistical Capacity Indicator. The indicator is composed of three sub-indexes: 

 

Statistical methodology, a ten point measure of a countries adherence to international standards. Statistical domains considered in this measure include the national accounts; balance of payments; external debt; consumer price index; industrial production index; export and import prices; government finance statistics; education and health statistics; and participation in the IMF’s Special Data Dissemination Standard.  

 

Source data, a five point measure that assesses whether a country conducts data collection activities at internationally recommended intervals and whether data from administrative systems are available and reliable. Statistical activities considered in this measure are the population census; agricultural census; poverty surveys; health-related surveys; and vital registration systems.  

 

Periodicity and timeliness, a ten-point measure of the availability and periodicity of key socio-economic indicators. Within each sub-index, each element is equally weighted. Evaluated here are the frequency or currency of indicators of poverty, child malnutrition and mortality, immunization, HIV/AIDS prevalence; maternal mortality; educational enrollments by sex; primary education completion rate, access to water, and GDP growth rate.  

 

The overall index value is an equally weighted average of the three sub-indexes scaled from 0 to 100. Country scores are available from 1999 through 2012 from the World Bank’s Data Catalog 

 

Comparing the indexes 

Table 1 shows the overlap of coverage by the three indexes. The ODB and ODI include a large number of high-income countries, while the SCI includes only developing countries. The ODB has 46 or 47 countries in common with the ODI and SCI. The ODI, which includes fewer developing countries, has only 28 countries in common with the SCI. Both the ODB and ODI omit more than half the countries classified by the World Bank as low- or middle-income.

 

Table 1: Country coverage 

Open Data Barometer 
2013 

Open Data Index 
2013 

World Bank Statistical Capacity Index 
2012 

Number of countries rated 

77 

65 

149 

Number of developing countries rated* 

47 

28 

149 

Number of countries in common 

 

 

 

Open Data Barometer 

-- 

46 

47 

Open Data Index 

46 

-- 

28 

World Bank Statistical Capacity Index 

47 

28 

-- 

*Developing countries are countries or territories with GNI per capita in 2012 of less than $12,616 as  
reported by the World Bank. 

 

As we have already seen, the construction of the three measures is significantly different. However, their scores and rankings using the sample of countries they share in common are, for the most part, highly correlated. See Table 2. 

 

Table 2: Correlations of country scores and rankings 

Open Data Barometer 
2013 

Open Data Index 
2013 

World Bank Statistical Capacity Index 
2012 

Correlation of scores 

 

 

 

Open Data Barometer 

-- 

84.4% 

62.2% 

Open Data Index 

84.4% 

-- 

78.2% 

World Bank Statistical Capacity Index 

62.2% 

78.2% 

-- 

Rank correlation  

 

 

 

Open Data Barometer 

-- 

84.4% 

69.5% 

Open Data Index 

84.4% 

-- 

65.9% 

World Bank Statistical Capacity Index 

69.5% 

65.9% 

-- 

Correlation of scores -- 18  developing countries* 

 

 

 

Open Data Barometer 

 

30.5% 

47.0% 

Open Data Index 

30.5% 

 

73.9% 

World Bank Statistical Capacity Index 

47.0% 

73.9% 

 

*The 18 developing countries are Bangladesh, Brazil, Burkina Faso, China, Costa Rica, Ecuador, Hungary, India, Indonesia, Kenya, Mexico, Nepal, Nigeria, Russian Federation, Senegal, South Africa, Tunisia, Yemen. 

 

When taking into account all countries in common, the correlations between the ODB and ODI indexes are somewhat higher than their correlations with the SCI, but they are all close to each other. This is, perhaps, unsurprising: countries with the capacity to produce reliable social and economic statistics are also more likely to have statistical systems capable of producing and disseminating the datasets evaluated by the ODB and ODI. However, the similarities disappear when we consider only the 18 developing countries that are common across the three indexes. The ODB appears to be most affected: its correlation with the ODI falls to 30 percent and is less than its correlation with the SCI at 47 percent. The correlation between the ODI and SCI remains at nearly the same value as in the larger, common subset. This suggests that the rating system used by the ODB may be less robust in developing countries. It may be harder, for example, to find and train local experts to review datasets. It is notable that the 18 developing countries included in the three datasets are among the largest and wealthiest. These are likely to be countries for which the ODI can recruit a larger “crowd” of evaluators. But given the small number of countries, other factors may be involved. 

 

Toward a comprehensive index 

The high-level panel on the post-2015 sustainable development goals called for a data revolution in developing countries. Such a revolution requires change on three fronts: data coverage, data quality, and data openness. New methods for compiling statistics– big data, for example, or crowd sourcing -- may come to compete with traditional methods, but standards, definitions, and common nomenclature – the hallmarks of data quality -- are still required. And open data without reliable data are of little value.  

A comprehensive assessment of a statistical system's outputs should take in all three dimensions: coverage, quality, and openness along with the functionality of its dissemination systems. National statistical systems should be construed more broadly than the traditional emphasis on economic and social indicators, although greater weight (and priority) should be given to the core data needed to guide the development of a modern state.  

 

Composite indexes provide a compact way of summarizing many kinds of information, rendering them more amenable to analysis and display. Notwithstanding their somewhat arbitrary construction, they can be useful for characterizing the performance of large complex systems. When applied to the performance of public institutions, an index may be used to diagnose gaps in policies or practices; encourage competition to improve results; or inform users of the reliability or potential defects of its outputs. The ODB, ODI, and SCI provide important but incomplete elements of our evaluation of statistical systems. To lead a data revolution, we need a measure that encompasses the totality of a statistical system.