From Principles to Practice: a consultation on joined-up data standards
About this paper
This consultation paper aims to start a discussion on what technical and political solutions to joining up data standards could look like at a global level.
The consultation builds on the research that we have undertaken within the Joined-up Data Standards (JUDS) project over the past year. It works on the assumption that data standards are relevant wherever data exists and is used and recognises the opportunity that the 2030 Agenda for Sustainable Development presents to help develop lasting solutions for currently entrenched problems. Our thematic focus has been on how global data standards impact on the needs of developing countries in both meeting and monitoring these goals. We recommend what policy coordination mechanisms are needed at the international level to ensure that suitable technical solutions are found and implemented.
There are many different types of standards, and as many interpretations of how they should be defined and classified. We focus on ‘measurement standards’ – not basic units of measure, but those that define, classify and group data – that are used in more than one country, often across multiple subject matter domains, and are governed by global or representative institutions that play credible and authoritative roles in upholding standards.
The consultation paper is divided into five chapters. Chapter 1 explains the data challenges that exist, the rationale for this project and the methodology we have employed in our research. Chapter 2 sets out our research findings in joining-up data standards; Chapter 3 describes the international policy landscape; and Chapters 4 and 5 present our preliminary conclusions and recommendations. Consultation questions can be found at the end of each chapter, and also all collated at the end of the paper.
We hope to receive a broad range of responses to this consultation from international standard-setting institutions, open data standard initiatives, data producing organisations and the users of development data themselves. We invite responses to this consultation to be submitted by 31st April 2017, using this feedback form and will engage in a series of international events to consult on the paper in the coming months. More details on the consultation process will be published on our website, juds.joinedupdata.org in the near future.
This paper was prepared by Beata Lisowska and Tom Orrell and edited by Bill Anderson. We would like to thank the many people who have helped in putting this consultation paper together: it is the culmination of our collective work to date. We would like to particularly thank all of our colleagues at Development Initiatives, Publish What You Fund, and Jen Claydon, our copy-editor. We would also like to thank the Omidyar Network for its support.
We would also like to express our collective gratitude to all the statisticians, data and policy experts from governments, multilaterals, think tanks and civil society who have taken the time to engage with us and have helped shape our thinking over the past year. The opinions expressed here are, however, our own.
API: Application program interface
COFOG: UN Classification of Functions of Government
CRS: Creditor Reporting System
DAC: Development Assistance Committee
DHS: Demographic and Health Survey
EUROSTAT: Statistical Office of the European Union
FAO: Food and Agriculture Organization (UN)
FS: Fragile state
HIPC: Heavily indebted poor country
HLG: High-Level Group
IAEG: Inter-Agency and Expert Group
IATI: International Aid Transparency Initiative
ICT: Information communications technology
IDB: Inter-American Development Bank
IMF: International Monetary Fund
JUDS: Joined-up Data Standards
LDC: Least developed country
LHD: Low human development country
LIC: Low income country
LSMS: Living Standards Measurement Study
MDG: Millennium Development Goal
MICS: Multiple Indicator Cluster Survey
NSO: National statistical office
ODA: official development assistance
OECD: Organisation for Economic Co-operation and Development
SDG: Sustainable Development Goal
SIDS: Small island developing state
SKOS: Simple Knowledge Organisation System
UN: United Nations
UN-OHRLLS: UN Office of the High Representative for the Least Developed Countries, Landlocked Developing Countries and Small Island Developing States
UNCTAD: UN Conference on Trade and Development
UNDP: UN Development Programme
UNECA: UN Economic Commission for Africa
UNESCO: UN Educational, Scientific and Cultural Organization
UNICEF: UN Children’s Emergency Fund
UNSC: UN Statistical Commission
USAID: US Agency for International Development
WHO: World Health Organization
Data has immense potential to help drive poverty eradication and international development, but it remains incredibly difficult to join up data on resources, people and results because it is published in different formats and to different standards. Overcoming this challenge requires both technical solutions and political will. The challenge facing governments, international institutions, civil society, academics and the private sector alike is how to make sense of the vast quantities of data now being generated in order to create a coherent, holistic picture.
The aims of the Joined-up Data Standards project are to: explore the problems caused by incompatible data in international development; work with partners to find common solutions to these problems; and to build international consensus that all data should be joined up. We have to date produced four discussion papers that cumulatively explore: the way in which global institutions define and classify geographic, sectoral and results data; the overlaps that occur between competing standards; and the policy landscape that governs international data standard setting.
This paper aims to start a discussion on what practical solutions to joining up data standards could look like.
We have reached the following preliminary conclusions.
- The policy environment is conducive to joining up data
Global and regional institutions are recognising the value of joined-up data and interoperability is now an internationally accepted principle. Official statistics bodies increasingly recognise the importance of embracing all producers and users of data as partners in their work.
- Turning new principles into practice is a challenge
While international commitments on interoperability are now a given, standard-setting work still takes place in highly specialised forums and data silos persist, limiting the comparability of data.
- Solutions are demonstrably achievable
Technologies are allowing machines to speak more easily to each other, to understand different languages and to translate between them.
We suggest three preliminary recommendations that form the starting point of our consultation.
- New standards must be joined up
There is no longer any reason why new standards that duplicate existing standards should be developed. All standard-setting bodies must commit to making new standards and their components fully compatible with existing standards and build interoperability into their architecture from the outset.
- We need joined-up leadership
We need united and integrated leadership, structures and mechanisms to drive the Data Revolution and the Transformative Agenda for Official Statistics forward at speeds commensurate with both the aspirations and urgency of current global ambitions.
- Translation services are urgently needed
In the immediate future, the many disparate standards that relate to the Sustainable Development Goal (SDG) goals, targets and indicators need to be mapped and compared to enable data from different standards to be ‘cross-walked’ through a translating machine. This is the responsibility of all standard-setting bodies.
We are inviting responses to a number of questions outlined below and will also be holding a series of events to consult on the paper in the coming months. For more details on the consultation process see juds.joinedupdata.org
More data is being produced than ever before and more of it is publicly available. Improving connectivity and data processing technology allows for increasing numbers of users with increasingly powerful machines to access increasing numbers of datasets.
Most of this expansion has taken place in the developed global North – the United States (US) government portal, for instance, contains over 180,000 datasets from over 70 separate departments or agencies. However, it is becoming a global phenomenon. On a different scale, but no less important, 700 unique primary sources of development-related data have been identified in Uganda with 87% of these accessible online.
One outcome of this digital revolution is the increasing use of data to formulate evidence of needs and performance that is fed into policy and decision-making processes: a process that requires joining up data on inputs with that on outcomes
Another outcome of the digital revolution is the increasing use of data by civil society to hold government and the private sector to account. The emerging professions of data journalism and data science are indicative of this trend. They fulfil the function of translating and contextualising complex data from different sources into meaningful information. They join up data.
The production of vast quantities of data does not automatically guarantee its usefulness. One-third of the datasets found on the United Kingdom (UK) government’s portal have never been used. In the period March to May 2016 one-third of the datasets on the US government portal were accessed, but only 4% were downloaded more than 10 times (Figure 2).There are a number of reasons why data usage is not keeping up with production: the data may not be particularly useful; users may be unaware of its existence; the volume of data is overwhelming. Most importantly, data users can no longer rely on manual processes and the machines that produce the data, one dataset at a time, are not very good at reading and combining other machine’s datasets.
In most cases this is not the fault of the machine, but of the person who designed the dataset in the first place. Machines are good at technical jobs: reading data from different operating systems on different platforms in different file formats and through different application programme interface syntaxes. There is a relatively finite set of mostly standardised rules that a machine needs to be programmed with to navigate these hurdles.
When it comes to the data itself the complexity mounts. Figure 3 shows a complex data ecosystem: a mapping of how international and domestic resource flows relate to one another and contribute to (or undermine in the case of illicit flows) development. The challenge facing governments, international institutions, researchers and campaigners alike is how to make sense of the data generated by the institutions involved to create a coherent understanding of the whole. Most of the flows involve financial transactions, so amounts of money, currencies and dates are a common starting point. That is where the simplicity ends. For example:
- Data from the Organisation for Economic Co-operation and Development (OECD) uses country codes that are incompatible with the International Organization for Standardization and the UN
- The UN and OECD define supranational regions differently
- There is no global standard for identifying institutions
- There is no global system for classifying financial flow types
- Functional sectors (such as health, education) are classified in many different ways by, among others, the OECD (Creditor Reporting System [CRS] purpose codes), UN (Classifications of Functions of Government), World Bank (themes and sectors) and the IRS (National Taxonomy of Exempt Entities, used by US foundations)
- There is no standard for defining data labels (the column header in a spreadsheet)
In November 2013, Development Initiatives and Open Knowledge presented a paper at the Open Government Partnership London Summit that scoped out the data intersections between five transparency initiatives. It argued that although the construction, extractives, aid and contracting standards operated in different spheres, they shared many of the same building blocks – such as geo-referencing, organisation identifiers and sector definitions – and that it was in the interests of data producers and users alike for these standards to therefore share, where appropriate, the same methodologies and coding systems.
Notes: CoST: Construction Sector Transparency Initiative; EITI: Extractives Industries Transparency Initiative; IATI: International Aid Transparency Initiative
In taking the work further it was recognised that most of these building blocks were developed by global institutions, the traditional curators of data standards, and that there are many instances where, as described in the previous section, it is very difficult to compare data from different sources as a result of the competing standards that underpin them. Supported by the Omidyar Network, Development Initiatives and Publish What You Fund – who have collaborated on the International Aid Transparency Initiative (IATI) standard from its beginnings in 2009 – joined forces to work on Joined-up Data Standards in April 2015. The aims of the project are to:
- explore the challenges of joining up standards
- work with partners to find common solutions to these challenges
- build international consensus that all data should be joined up.
Chapter 3 describes the policy gains that have been made in the last couple of years in establishing the principles of data interoperability and comparability. Unlike in 2013, these are now a given in the joined-up data debate. We no longer have to persuade anyone that joining up data is a good idea. We only have to figure out how to do it. The challenge facing all concerned is how to turn these principles into practice.
The aim of this paper is to stimulate this debate. It attempts to distil the findings of our research and engagements over the past year into questions and challenges. It aims to generate discussions with:
- the official statistics community, both at global and national levels
- all those involved in the 2030 Agenda for Sustainable Development (2030 Agenda) who realise that data is a critical element for both meeting and monitoring the Sustainable Development Goals (SDGs)
- the data revolution and open data communities including all those working with innovative technologies and new data sources.
In preparing for the 2013 Open Government Partnership summit, two interoperability problems were explored. The first concerned intersections between datasets. Put simply, if two datasets share the same concept (such as ‘country’) you can only compare the data if both sets use the same definition of ‘country’. Moreover, the ‘country’ in both sets must not only look the same to the human eye, but must be identical so that machines recognise that they are the same.
The second concerned identifying entities, such as institutions and activities. It is almost impossible, for example, to create a single finite list of all institutions or contracts in the world. It is, however, possible to agree a standard methodology that recognises identifiers issued by national agencies that maintain authorised registers of organisations.
Over the past year our scoping research has explored these and other barriers to joined-up data, seeking out contradictions, attempting to understand the timelines and circumstances that have led to unnecessary disconnects, highlighting the challenges and, occasionally, proposing possible solutions.
- We have explored the different ways in which global institutions subdivide the world into a variety of geopolitical and socioeconomic groupings.
- We have explored the ways in which sectoral classifications used for resource flows are difficult to match with the indicator classifications used to describe development impacts.
- We have monitored the progress, and sometimes lack of, in developing the SDG indicators, and their connections with existing indicator databases.
- We have investigated the data overlaps that occur between competing household survey programmes.
- We have also ventured unsuccessfully into a number of issues, seeking narratives that have not added up: how data is being shared in tracking the Zika virus and how data from satellite imagery is digitally connected to subnational administrative districts are two such forays that have not produced any meaningful insights.
The research methodology for this project is rooted firmly in semantic technologies using linked data that enables computers to discover and share data. Standards are grouped into projects and mapped as concept schemes in the JUDS Thesaurus Manager. The Simple Knowledge Organisation System (SKOS) is used to define relationships between concepts both within and across concept schemes. The Thesaurus Manager also hosts a project containing government identifiers maintained by the Natural Resource Governance Institute.
SKOS provides the facility to accurately define relationships between concepts in different standards; mappings can be exact, close, broader, narrower or related. The Thesaurus Manager stores these relationships in both human and machine readable formats (Figure 4). Machines are thus able to automatically navigate their way from one standard to another. A user friendly interface will be in place soon to provide manual and automatic ‘translation services’ between standards.
Joined-up metadata from the Thesaurus Manager can also be exported in formats to drive visualisations such as Chord and Sankey diagrams that simplify the narratives needed to explain relationships between standards.
The goals, targets and indicators of the SDGs are viewed as three interlinked standards. What are the challenges inherent in creating new standards in the context of their predecessors (the MDGs) and, in the case of the indicators, other existing global statistical databases?
Both the choice of indicator and its associated methodology pose a number of dilemmas to the data expert.
- Do you design a ‘pure’ indicator that is theoretically precise in its accurate interpretation of the target?
- Do you make a pragmatic choice based on the availability of data?
- Do you make a choice based on the potential interoperability of the data?
In the ideal world all three conditions should be met, but this is rarely possible. What should at least be recognised is that it is critical for the designers of indicators to be fully aware of the importance of getting this balance as ‘right’ as possible.
Digging deeper and pragmatically into one of the 17 goals, SDG 2, this paper highlights the huge amount of work that still needs to be done to build functional indicators that have a credible methodology and have access to both current and historical data.
The biggest lesson learned is that the starting point in the design of standards must involve a review of existing standards. In many instances relevant data sources and methodologies do already exist and need to be more effectively harnessed. An effective monitoring framework can be compiled through conducting a global analysis of all existing indicators, assessing their universality and quality, and co-opting the best and most appropriate into the SDGs.
Small island developing states: a case study of standards in defining supranational regions and groupings
The differences in how international organisations classify small island developing states are examples of a wider problem facing the interoperability of geopolitical data standards. While a joined-up data standards ‘cross-walk’ approach can be used to link many similar terms between the standards, it is the unmappable gaps that make comparing data challenging and can lead to real economic impact on the countries affected. While this plethora of classifications may be navigable for large global institutions, developing countries may be paying the price for this complexity.
There are three major international household survey programmes in use around the world. These have become increasingly similar – two-thirds of the questions in the two most widely used surveys are either identical or similar enough to be practically comparable – but each contains unique, useful modules.
This presents developing countries with a dilemma. Do they, at great expense, commission multiple surveys, or do they accept that they cannot afford to collect all the data they require?
There are two ways to solve the problem of competing standards: combine them into one, or establish functional links between them. We argue that the interests of developing countries would be best served by the integration of the three programmes, and that until this is possible it is critical that data from different surveys is capable of being joined up.
The existence of competing standards has emerged in different ways within the various pieces of research we have undertaken to date. Three examples highlight this issue.
The Alliance of Small Island States has 44 members and observers. Of these the UN Office of the High Representative for the Least Developed Countries, Landlocked Developing Countries and Small Island Developing States (UN-OHRLLS) recognises 43 of them. UNESCO’s list of small island developing states (SIDS) includes 39 of them, the UN Conference on Trade and Development (UNCTAD)’s list only 28. The World Bank maintains a category called ‘small states’ that contains 31; and the IMF has a similar list that contains 29.The UN, World Bank (WB), IMF and OECD all maintain different systems of classifying the poorest developing countries. The geographic region into which Afghanistan falls is also open to interpretation. The institutions responsible for these categorisations have perfectly reasonable explanations as to why they need to divide up the world in different ways in order to fulfil their specific policy and operational objectives. This argument would hold if these listings were only used for internal administration. It does not. Firstly, global institutions are keen to project the intellectual frameworks that underpin their policies. Secondly, developing countries need to negotiate the overlapping interests of these institutions. Thirdly an increasing number of data users are attempting to make sense of the world through regional as well as country-level analysis.
In the early 1970s two new systems of classification were introduced to describe activities related to resource flows. The OECD CRS created a list of purpose codes to describe aid flows. The UN Statistics Division created its Classification of Functions of Government (COFOG) to describe government expenditure.They are very different. The OECD also uses a coding system for countries and regions that is unique to itself. Forty years ago when data was collected on paper and the connections between aid and national development were weak these anomalies were understandable. That it is still virtually impossible to access a single country dataset that contains both international and domestic resource flows illustrates the difficulties getting in the way of joined-up data.
The United States Agency for International Development (USAID), UNICEF and the World Bank run household survey programmes around the world that, in order to meet country needs and global statistical databases, are steadily converging in their content. As Table 6 shows, 77% of all questions found in the UNICEF Multiple Indicator Cluster Survey (MICS) are the same or sufficiently similar to questions in the USAID Demographic and Health Survey (DHS) survey.Data from the DHS and MICS surveys is available for 98 countries. However, not only do the two programmes label their data differently so it can only be merged manually, but different versions of the same survey are coded differently. As Table 7 illustrates this global dataset is, from a data user’s point of view, effectively siloed across six incompatible standards. The argument for the pooling of resources is strong. Yet, despite having signed a public statement of collaboration an integrated solution does not appear part of the discussion. top
We have used the metaphor of language as a simple way of framing the joined-up data debate. If two people who speak different languages want to understand each other they have two options: either to agree to speak the same language or to use an interpreter.
Data standards must accommodate the needs of different stakeholders. The interests of stakeholders in different constituencies are not necessarily the same. The way in which data is organised, classified and described may therefore need to be different.
While we argue that the time is ripe for an integrated household survey, the competition between the three programmes (Living Standards Measurement Study/LSMS, DHS and MICS) has been a key driver in improving overall content, methodologies and quality.
If the content of competing standards allows for accurate one-to-one or one-to-many mappings between all their elements, then it is totally feasible for a machine translation to join them up. Even if the standards contain some unmappable elements this may still be a pragmatic approach that covers most use cases.
The full range of SKOS mappings adopted by this project are useful for providing an accurate portrayal of the relationships between standards, but do not necessarily allow for a machine to decide how to translate a particular concept (for example where a concept in one standard is similar to two concepts in another). For this reason, we are exploring the use of a new relationship – ‘Best Match’ – which instructs a machine which one of multiple options to use.
There are, however, a set of conditions which, if all met, make a compelling case for the integration of existing standards into one:
- It improves the ability of data users to extract maximum benefit
- It allows for a wider range of data to be compared
- It is cost effective
- The disruption to existing use of a standard being made redundant is not excessive
- Translation tools are available to handle legacy data
- There is sufficient political consensus and will to drive the integration.
Table 8 applies these conditions to the case of household surveys.
The increasing prevalence of accessible data, machine readable interfaces (application program interfaces/APIs), and technologies such as Linked Open Data in general and SKOS in particular, provide new opportunities for connecting standards. The work that the JUDS project is doing to build an online repository of mapped standards as a shared public good provides proof-of-concept as to how easy it now is for standard-setting institutions to engage in such collaborative exercises.
This work requires three core commitments:
- the collaboration of subject matter experts
- investment in sound technical infrastructures
- the political will to make it happen.
The goals of the 2030 Agenda have been adopted and embraced with almost unanimous consensus and support. Constructing the associated indicators has proved to be far more challenging. The meeting of the UN Statistical Commission in March 2016 recognised that work has only reached “a practical starting point” and that “the development of a robust and high-quality indicator framework is a technical process that will need to continue over time”. The Inter-Agency and Expert Group (IAEG) tasked with developing the indicators wisely devised a tier system to assess the status of the indicators. The preliminary work done on these tiers in the lead up to the IAEG’s meeting at the end of March 2016 produced the following startling picture (Table 9) with little more than one third of the indicators assessed to have both a credible methodology and data available from most countries.What makes this more surprising is that the architects were not starting from a blank page. They inherited the 15 years of data and experience from the MDGs and they could draw on the existing corpus of global statistics. Moreover, a pragmatic approach to SDG 3 (Health) could be to rely more heavily on the World Health Organization’s (WHO) Indicator and Measurement Registry. A pragmatic approach to filling the SDG 2 (Hunger) indicators that do not have methodologies could be to adopt a similar indicator already maintained by the Food and Agriculture Organization (FAO) or others.
These assertions are made purely from a standards point of view. The subject matter experts who have proposed Tier II and III indicators have no doubt excellent reasons for challenging statisticians and development agencies to produce the data that will provide the correct answers to their most pressing questions.
There is, however, a danger that the ‘purist’ approach will lead to elegantly defined indicators for which no data exists, or, even worse, for which elaborate algorithms will be created to forge the appearance of data when in fact none exists.
The lessons learned from our research in general, but particularly from monitoring the progress of the technical and political work behind the work of the IAEG-SDGs leads us to suggest a work-in-progress checklist for consideration when embarking on a new data standard:
- Is there a clear need and demand?
- Does it duplicate the efforts of or compete directly with standards that already exist?
- Is the design of the architecture and individual elements intellectually, logically and methodologically sound?
- Do components (building blocks) within the standard adopt other existing standards wherever possible?
- Is it designed to ensure comparability and interoperability with other standards?
- Will the data be available through open, sustainable and easily accessible channels?
- Is there political buy-in from the institutions that will drive the standard until it gains acceptance?
- Is there political buy-in from the institutions that need to produce the data?
- Are timelines for development, implementation and adoption realistic?
- Does the data already exist that can feed the standard?
- Is it realistic to expect that new data can be produced to feed the standard?
- Does any historical data exist that can act as a ‘rear view mirror’ for the standard?
A number of major documents published in the past two years dealing with data and development have recognised the importance of joined-up data standards (Box 1). The call has moved beyond the need for open data alone, as typified in the G8 Open Data Charter of June 2013, to the recognition that standards are the essential foundations for comparable and interoperable data.
While there is a clear global recognition of the need to move beyond data silos, there is no clear consensus yet on what a global solution looks like.
The UN Statistical Commission (UNSC), representing the interests of the national statistics offices of all UN members, sits at the centre of a constellation of global and regional standard-setting institutions. It hosts an array of technical initiatives. Their names reflect the complexities of UN governance arrangements. Much of the work of these groups, supported by a relatively small secretariat, the UN Statistics Division (UNSD), is concerned with maintaining and coordinating standards in their specific areas of expertise. They report back to the annual UNSC meetings but do not, on the face of it, appear to collaborate between themselves.
The Intersecretariat Working Group on National Accounts, for example, is comprised of five members: the European Commission, IMF, OECD, UN and World Bank. For over 30 years it has overseen the UN System of National Accounts, the internationally agreed standard set of recommendations on how to compile measures of economic activity.
Over the past couple of decades, the UNSD has facilitated the establishment of informal and ad hoc City Groups, tasked with addressing “selected problems in statistical methods” in an agile and timely manner. According to the UNSD, “It is important to note also that technical expertise and certainly practical experience resides mainly with national statistical offices. It was, therefore, recognised that these informal consultation groups are an innovative way to use country resources to improve and speed up the international standards development process.”
The Washington Group on Disability Statistics, for example, has been tasked with coordinating international cooperation “in the area of health statistics by focusing on disability measures suitable for censuses and national surveys. The aim is to provide basic necessary information on disability which is comparable throughout the world.” The Group’s work in developing a discrete set of internationally comparable indicators has been very successful, with new internationally-comparable survey modules containing disability indicators being agreed within a relatively short space of time.
Other UN departments and agencies (notably WHO and FAO), the World Bank, the IMF, the OECD and Eurostat also play key roles in global standards setting. These international and supranational agencies are part of the Coordination of Statistical Activities (CCSA) hosted by the UNSC.
The African official statistical landscape is a good example of where the need for cooperation to enable harmonisation between standards is explicitly recognised. The three pan-African institutions – the African Union Commission, the UN Economic Commission for Africa (UNECA) and the African Development Bank are tasked with coordinating statistical activities across the continent.
For example the Committee of the Directors-General of African National Statistical Offices, established by the African Union Commission, now shares its annual conference with the Statistical Commission for Africa established by UNECA. Together they represent the highest decision-making body for statistical coordination and supervise the work of 14 technical working groups. They also oversee the work of the AU Institute for Statistics and the Economic Commission for Africa-led African Group on Statistical Training and Human Resources. An annual Africa Symposium on Statistical Development is also held, which focuses on specific challenges facing African statisticians.
This political commitment to an integrated approach is further reflected at a policy level. The African Charter on Statistics and the Strategy for the Harmonisation of Statistics in Africa are forward-looking documents that rise above the capacity challenges faced by many African countries and lay out road maps towards data quality and compatibility.
Much is made of the role of technology in creating a new level playing field for the whole world. Much has been written about the benefits that internet connectivity, innovative ICT and big data can bring to developing countries. This is all true, but it is still work in progress. In Uganda, as outlined in Table 10, radio remains the main source of information and while 75% of adults in Africa own mobile phones, less than 20% of them have smartphones with internet connectivity (see Figure 7). The SDGs call for internet access for all by 2020 while a report by the Alliance for Affordable Internet estimates that at current growth rates, this target will only be reached by 2042.
This digital divide applies to data standards as well. The leading standard setters in the world are global institutions – the UN; the World Bank and IMF; the OECD and Eurostat – based in the developed North and engaged in collecting data to fill global datasets. National statistics offices in developing countries spend much of their limited resources servicing global databases rather than the needs of their own governments.
Developing countries need socioeconomic data disaggregated down to the lowest level of geographic administration in order to plan and monitor the delivery of services. Global databases of comparative statistics only require national aggregates. Household surveys have been a pragmatic and relatively cheap solution in creating a corpus of global statistics. Their results are also of use to national policy makers, but have little benefit for subnational planning and service delivery. Uganda is governed through 112 district administrations. Surveys provide data disaggregated by 10 statistical domains. The arithmetic of survey pricing would require a tenfold increase in cost to produce district data of use to local government.
Standards do not produce money, but the acceptance by standards bodies of a status quo that prioritises globally compatible national statistics over subnational data – global monitoring over national planning – is, to a large extent, responsible for the underinvestment in sustainable data infrastructures in developing countries. Developing countries need the same systems as their more developed counterparts: civil registration systems that use real numbers to count births and the causes of death; health and education management information systems that are used to make health facilities and schools function properly. Data has become political, and standard setters are, whether they choose to be or not, involved in the debate.
Multi-stakeholder transparency initiatives have emerged over the past three decades – the International Budget Partnership in 1997, the Extractive Industries Transparency Initiative in 2002, the International Aid Transparency Initiative (IATI) in 2009, OpenCorporates in 2010, the Open Government Partnership in 2011, and the Open Contracting Partnership in 2012. This has led to new voluntary open data standards that borrow ‘building-block’ components from, and require interfaces to, official statistics.
Not only do these initiatives borrow from the official world. They see the benefit of engaging with traditional standard setters in order to improve their content. The relationship between IATI and the OECD is a case in point. When the IATI publishing standard was drafted it borrowed heavily from the reporting guidelines of the OECD CRS. One key building block inherited from the CRS is the list of purpose codes used to classify functional sectors. An IATI working group has spent the last five years establishing a better method of linking aid flows to recipient country budgets and, in April 2016, the OECD’s Working Party on Statistics agreed to include over 50 new codes in the CRS standard – an unprecedented breakthrough in the relationship between a voluntary multi-stakeholder standard and a mandatory reporting system used by the world’s wealthiest countries.
At a national level the dividing line between official and ‘non-official’ statistics is also becoming increasingly blurred. National statistics offices are beginning to reach out to other data producers and users while civil society, the private sector and academia are looking to more inclusive relationships with government.
In September 2015 the UN General Assembly adopted the 17 SDGs together with their accompanying 169 targets “to stimulate action over the next 15 years in areas of critical importance for humanity and the planet”. Some 230 indicators (still not finalised at the time of press) are being designed to ensure that the goals can be met and monitored.
The adopted document recognised the critical role of data, and, in particular, the need to overcome considerable challenges:
“We will support developing countries, particularly African countries, least developed countries, small island developing states and landlocked developing countries, in strengthening the capacity of national statistical offices and data systems to ensure access to high-quality, timely, reliable and disaggregated data. We will promote transparent and accountable scaling-up of appropriate public–private cooperation to exploit the contribution to be made by a wide range of data, including earth observation and geospatial information, while ensuring national ownership in supporting and tracking progress.”
2030 is an important political target to focus the minds and activities of the world. It would, however, be naïve to assume that the data challenges facing all developing countries – for example, fully functioning civil registration and vital statistics systems producing credible data on the causes of death – can be solved within 15 years. There is a danger that to satisfy short-term data needs to meet short-term political goals, longer-term investments in sustainable data systems may be overlooked. There is, however, increasing evidence that this will not be the case.
In July 2012 the UN Secretary-General created a High Level Panel of Eminent Persons to make recommendations that led to the 2030 Agenda. Its report, published in May 2013 called for “a data revolution for sustainable development, with a new international initiative to improve the quality of statistics and information available to citizens.”
In response to this, in August 2014 the UN Secretary-General established an Independent Expert Advisory Group on the Data Revolution for Sustainable Development to provide him with inputs to shape “an ambitious and achievable vision” for a future development agenda.
Within three months the group had published its report, A World That Counts, which recommended establishing a UN-led “Global Partnership for Sustainable Development Data”.
The IEAG worked on a tight deadline in order to feed in to the UN Secretary-General’s synthesis report, The road to dignity by 2030: ending poverty, transforming all lives and protecting the planet published in December 2014. The report recommended that “under the auspices of the Statistical Commission of the United Nations, a comprehensive programme of action on data be established. This includes the building of a global consensus, applicable principles and standards for data, a web of data innovation networks to advance innovation and analysis, a new innovative financing stream to support national data capacities and a global data partnership to promote leadership and governance.”
Well before the publication of these two reports, the UNSD was busy considering its response to the high level panel’s recommendations. At the end of February 2014 a seminar was held on ‘Managing the Data Revolution’, The objective was “to clearly emphasise the strategic necessity of modernising the national statistical systems in order to respond not only to the regular requests for sound official statistics, but also to emerging needs.” A few days later the UNSC itself adopted a resolution that led to the convening of the Global Conference on a Transformative Agenda for Official Statistics in January 2015.
A dominant theme throughout the conference, and notably in the outcome document, was integration and coordination as the approach through which official statistics can make the data revolution happen.
The conference also recognised the importance of stakeholders beyond the world of official statistics. A working group identified the challenge of:
“how to move from a silos system to an integrated system and who drives the process. The head of the [national statistical office] NSO was considered an important and leading figure in this process: the head of the NSO should drive the process and engage his staff within the institution and stakeholders outside its institution.”
In March 2015, the UN Statistical Commission supported the formation of:
“a new high-level group (HLG) to provide strategic leadership for the SDG implementation process; such a group should consist of national statistical offices, and regional and international organisations as observers operating under the auspices of the Statistical Commission. The high-level group is tasked with promoting national ownership of the post-2015 monitoring system and foster capacity-building, partnership and coordination for post-2015 monitoring.”
As the above brief history shows, what brings the official Transformative Agenda, Data Revolution and 2030 Agendas together is that they all recognise the need for clearer coordination between stakeholder groups at the international level. In addition to the efforts made by these UN-led processes to develop coordination mechanisms such as the HLG, the Global Partnership for Sustainable Development Data (GPSDD) was launched in September 2015; having grown out of a broad base of stakeholders responding to the “World That Counts” report.
The HLG and GPSDD represent two sides of the same coin and following the announcements made in March 2016 by the HLG for a Global Action Plan for Data and a World Data Forum, there is now a real opportunity for the two processes to develop closer ties.
The principles of comparability and interoperability are now a given. Since the end of 2014 every major global initiative relating to data has included a commitment to the principle of interoperability.
There is an increasing awareness that open data is not an end in and of itself: unless it can be converted into contextualised, usable information it has no benefit. Context invariably involves the joining up of different data, and if that data is not compatible it does not work.
Global institutions are beginning to recognise the value of opening up. National statistics offices as well as regional and global statistics bodies increasingly recognise the importance of embracing all producers and users of data. Parts of the private sector are willing to explore how their data can contribute to the public good. Academia and civil society are beginning to seek partnerships with government. Multi-stakeholder initiatives with representative leaderships can unify divergent interests.
There is currently a lack of mutual understanding and appreciation between many of the advocates of the Data Revolution and the Transformative Agenda for Official Statistics. They are in fact two sides of the same coin and a great opportunity exists for their language and focus to converge and merge.
Most importantly the aspirations of the 2030 Agenda have, for the moment, captured the imagination of the world. This optimism will not last forever. The moment needs to be seized.
At the top of the evolving data ecosystem is a coherent suite of joined-up thinking ready to be turned into effective programmes. At the bottom are a number of highly proficient technical working groups in which highly professional experts seek solutions to the nuts and bolts, but often working on their specialisations in isolation from one another. There appears to be a disconnect between the top and the bottom. Operationalising new policy consensuses takes longer than it needs to. Standards bodies are, correctly, cautious and slow-moving by nature, and not naturally suited to revolution.
Notwithstanding the evidence presented in the first two chapters of this paper, there is much to be hopeful about. The number of technical working groups involved in the coordination of standards is testament to the energy that is being thrown into this field. Technologies are allowing machines to speak more easily to one another, to understand their different languages and to translate between them. Our project has demonstrated the utility of semantic mappings and there are many similar initiatives taking root. Discrete pieces of work being done by some of the UNSD City Groups show how focused approaches can have far-reaching successes.
There is no longer any reason for new standards to be created that unnecessarily duplicate existing standards, or that do not build where possible on existing components and methodologies.
All global and regional data standards bodies, official or otherwise, should commit to a simple undertaking: new standards must be joined up.
We need an integrated leadership body to drive the Data Revolution and transformative agenda forward at speeds commensurate with both the aspirations and urgency of current global ambitions. As recognised by both the World that Counts and synthesis reports, this should be led by the UN. As recognised by all the policy instruments discussed in this paper, it should be a multi-stakeholder partnership.
A solution involving the High-level Group for Partnership, Coordination and Capacity-Building for Statistics for the 2030 Agenda and the Global Partnership for Sustainable Development Data would appear fitting.
This solution should include a specific commitment and programme to actively promote joined-up data standards throughout the ecosystem.
Solving the problems caused by existing incompatibilities between data standards will involve journeys with two destinations. The one in the distance involves thorough rationalisations between standards; possibly causing temporary disruptions in data flows; changing the culture of producers and users; and treading on the toes of those resistant to change. In many cases this process may well be deemed to be permanently unfeasible.
The closer destination can, however, always be reached, if only the travellers are prepared to make the effort. We urgently need all standard setters to recognise the benefits available to data users when data from different standards is ‘cross-walked’ through a translation machine.
This is particularly urgent in the immediate future for the many disparate standards that govern data that now need to be mapped and compared with SDG goals, targets and indicators.
- Is there a clear need and demand?
- What, if any, are the main interoperability challenges that you experience in your sector/work?
- What do you think is the most effective technical approach to enabling interoperability between international data standards?
- Should similar international standards be integrated? What would be the benefits/drawbacks of this?
- Is there a need for a globally recognised checklist for new data standards to ensure interoperability is built into new standards from the outset? If so, what should it include?
- What are the main changes you would like to see at the international level when it comes to the governance of standard-setting processes?
- Should there be a global strategy for the harmonisation for statistics, based on the African example?
- How do you think that specialised working and expert groups working on specific standards or thematic areas could coordinate their work more closely?
- How can formal UN and other official statistics bodies better share knowledge and experience with multi-stakeholder initiatives; at the global, regional and/or national levels?
- What do you consider to be the ‘next steps’ in transforming international commitments on the need for interoperability into workable technical solutions?
- What do you think is the most effective way of joining up the standards that underpin the SDG framework?
 Russell A (2014) “Open Standards and the Digital Age”. Cambridge University Press. www.amazon.com/Open-Standards-Digital-Age-Enterprise/dp/1107612047.
 Data.Gov (2016) “Analysis of visitor metrics”. Available at: https://catalog.data.gov/dataset/data-gov-visitor-metrics, accessed on: 22 June 2016
 Author’s analysis based on Data.Gov, UK 2016 and Data.Gov, 2016. Op cit
 From unpublished research by Development Initiatives on international and domestic resource flows
 Construction Sector Transparency Initiative (CoST), Extractive Industries Transparency Initiative (EITI), Global Initiative for Fiscal Transparency (GIFT), International Aid Transparency Initiative (IATI) and Open Contracting Partnership
 Joined-Up Data Standards (2016) “Small island developing states: a case-study of standards in defining supranational regions and groupings”. Accessed at: http://juds.joinedupdata.org/discussion-papers/paper-3-sids/
 Alliance of Small Island States 2016 membership list. Accessed on 5 July 2016 at: http://aosis.org/about/members/
 UNOHRLLS (2016) membership list. Accessed on: 5 July 2016 at: http://unohrlls.org/about-sids/country-profiles/
 UNESCO (2016) SIDS list, accessed on: 5 of July 2016 at: www.unesco.org/new/en/natural-sciences/priority-areas/sids/about-unesco-and-sids/sids-list/
 UNCTAD (2016) SIDS list. Accessed on: 5 July 2016 at: http://unctad.org/en/Pages/ALDC/Small Island Developing States/SIDS-map.aspx
 World Bank (2016) Small states list. Accessed on: 5 July 2016 at: www.worldbank.org/en/country/smallstates
 IMF (2016) Small states list. Accessed on: 5 July 2016 at: www.imf.org/external/np/pp/eng/2013/022013.pdf
 Fialho D and Bergeijk AG (2014) “Noodles and Spaghetti: why is the developing country differentiation landscape so complex?” Accessed at: wp.peio.me/wp-content/uploads/PEIO8/Fialho,%20van%20Bergeijk%2004.09.2014.pdf
 Joined-Up Data Standards (2015) Afghanistan entry in the Supranational Regions and Groupings Project in the JUDS Thesaurus Manager. Available at: http://joinedupdata.org/geo-pol/af.html
 Joined-Up Data Standards (2015) Chord diagram created from mappings in the JUDS Thesaurus Manager. http://joinedupdata.org/Sectors/cofog.html
 Joined-Up Data Standards (2016) “Household surveys: do competing standards serve country needs?” Available at: http://juds.joinedupdata.org/household-surveys-do-competing-standards-serve-country-needs/
 UN Statistical Division (2015) “The concept note on The Transformative Agenda for Official Statistics repeatedly calls for the mainstreaming of an integrated household survey programme”. Available at: http://unstats.un.org/unsd/nationalaccount/workshops/2015/NewYork/Outcome.pdf
 From research by data scientists at Development Initiatives working on global poverty data
 UN Statistical Commission (2016) ‘Report on the forty-seventh session”. Available at: http://unstats.un.org/unsd/statcom/47th-session/documents/Report-on-the-47th-session-of-the-statistical-commission-E.pdf
 IAEG-SDG (2016) “Provisional Proposed Tiers Global SDG Indicators”. Available at: http://unstats.un.org/sdgs/files/meetings/iaeg-sdgs-meeting-03/Provisional-Proposed-Tiers-for-SDG-Indicators-24-03-16.pdf
 The way in which maternal mortality is calculated in many countries is a case in point. An algorithm that employs GDP and the fertility rate is used in the absence of real data.
 UN Statistical Division (2016) City Groups. Available at: http://unstats.un.org/unsd/methods/citygroup/index.htm
 UN Statistical Division (2016) Washington Group on Disability Statistics. Available at: http://unstats.un.org/unsd/methods/citygroup/washington.htm
 UN Economic Commission for Africa (2014) Joint ECA-AUC Statistical Commission for Africa and Committee of Directors General of National Statistics Offices StatCom-Africa/CoDG. www.uneca.org/statcomdodgggim-africa
 UN Economic Commission for Africa (2014) “First Joint Session of the Committee of Directors General of National Statistics Offices and the Statistical Commission for Africa: Draft annotated agenda”. Available at: www.uneca.org/sites/default/files/uploaded-documents/Statistics/statcom2014/annotated_agenda_statcom_final_en.pdf
 UN Economic Commission for Africa (2014) “African Group on Statistical Training and Human Resources”. Available at: http://www.uneca.org/sites/default/files/uploaded-documents/Statistics/statcom2014/agrost_report_-_statcom_iv_-_2014_en.pdf
 UN Economic Commission for Africa, 2010, Strategy for Harmonisation of Statistics in Africa (SHaSA). Available at: www.afdb.org/fileadmin/uploads/afdb/Documents/Publications/AfDB,%20SHaSA_web.pdf
 Uganda Census (2014) “Percentage distribution of Main Source of Information in the Household, 2002–2014”. Available at: www.ubos.org/onlinefiles/uploads/ubos/NPHC/NPHC%202014%20FINAL%20RESULTS%20REPORT.pdf
 UN General Assembly (2015) ”Transforming our world: the 2030 Agenda for Sustainable Development. Resolution adopted by the General Assembly on 25 September 2015”. Available at: www.un.org/ga/search/view_doc.asp?symbol=A/RES/70/1&Lang=E
 Anderson, B (2015) “What to do with one and a half billion dollars”. Available at: http://devinit.org/#!/post/what-to-do-with-one-and-a-half-billion-dollars
 UN General Assembly (2014) “Synthesis report of the Secretary-General on the post-2015 sustainable development agenda”. Available at: www.un.org/ga/search/view_doc.asp?symbol=A/69/700&Lang=E
 UN General Assembly (2014) “Synthesis report of the Secretary-General on the post-2015 sustainable development agenda”. Available at: www.un.org/ga/search/view_doc.asp?symbol=A/69/700&Lang=E
 UN Statistical Commission (2015) Global Conference on a Transformative Agenda for Official Statistics: “Towards a Strategic Framework for Statistics in Support of the Post-2015 Development Agenda”. Available at: http://unstats.un.org/unsd/nationalaccount/workshops/2015/NewYork/NY_D2.pdf
 UN Statistical Commission (2014) “Decision 45/103 – Programme review: broader measures of progress. Report on the forty-fifth session of the UN Statistical Commission”. Available at: http://unstats.un.org/unsd/statcom/45th-session/documents/statcom-2014-45th-report-E.pdf
 Joined-Up Data (2016) Analysis on the outcome document for the word ‘integrate’ (107 times) and ‘coordinate’ (114 times). Source of document: http://unstats.un.org/unsd/nationalaccount/workshops/2015/NewYork/Outcome.pdf
 UN Statistical Division and the Statistical Office of the EU (Eurostat) (2015) “Proceedings of the Global Conference on a Transformative Agenda for Official Statistics: Outcomes and Summaries of Sessions” Session 3, Group 2. Available at: http://unstats.un.org/unsd/nationalaccount/workshops/2015/NewYork/Outcome.pdf
 UN Statistical Commission (2015) “Decision 46/101. Report on the 46th session of the UN Statistical Commission”. Available at: http://unstats.un.org/unsd/statcom/46th-session/documents/statcom-2015-46th-report-E.pdf