Data Analysis / ICT4D

Big Data, Big Challenges

There has been a common theme discussed in international development and humanitarian aid circles for decades: How do we leverage new technologies to improve our program effectiveness? The answer to this question is two fold- developing appropriate technology to monitor programs and evaluating the data. Many NGOs, foundations, private sector partners, research institutions, and government agencies are embracing the technology side of the equation. New apps, ICT4D tools, web-based platforms, and GIS mapping initiatives are launched every day to improve how we work in the aid world. But even with Big data search algorithms and petabyte cluster computing resources are at our finger tips, we still are struggling to address the second half of the solution: aggregating the data and evaluating it.

Whenever you get a group of data wonks in a room, we inevitably arrive at the same conversation: Who has new data? How big is your data set? How does it compare to my data set? And most importantly- why isn’t there a central catalog of everyone’s data so I can access it whenever I get a new research idea?

This isn’t a conversation that only happens at WaSH Happy Hours. Every field of research on the cutting edge of development struggles with these same questions. Humanitarian aid research just has an extra handicap to play with. Medical conglomerates are willing to foot the bill for big data costs of biological and genome research. Google and Apple have no spending cap as they compete to build the ultimate datasets of human social marketing. But who is willing to foot the bill to catalog rural water systems in countries with a GDP smaller than the annual revenue of Amazon.com?

There are some foundations and OEDC donor agencies that are contributing to international development data management projects, but WaSH is still lagging behind. Initiatives such as WASHFunders, who track private sector WaSH investment, and AKVO OpenAid, a partnership to build a digital library of IATI data sets, are making an effort to build open data resources. Recent E-Discussions hosted by the RWSN have led to the sharing of water point mapping datasets by participants. The challenge is taking these projects to a much larger scale. The aggregate size of the WaSH databases that are open to the public are about a couple hundred of Gigabytes in size, possibly a few Terabytes at maximum. In the grand scheme of other Big Data projects, we are a little league team from Kansas competing against the datasets that are funded like the New York Yankees.

The task of scaling from Terabytes to Petabytes will obviously require financial investment, but more importantly it will take collaboration from international, national, and local stakeholders involved in managing systems and providing WaSH services. The local stakeholders are the gate keepers of data and yet they have they most difficult path to sharing their knowledge.

The first step in unleashing the power of Big Data in WaSH will be modernizing the field data that we already have. Sexy smartphones are becoming more common in developing countries and are great tools for collecting new data- digitizing data from the past 40 years is unfortunately not as hip. Time and money will have to be invested in scanning paper reports and uploading local files to cloud servers. Ask any regional water manager about functionality data and they will most likely point to a stack of binders and notebooks gathering dust on their shelves. Incentives need to be put in place to turn this data into a digital format so this rich history and local context is not lost. Aggregating this data will allow us to look beyond individual efforts and look at the impact of WaSH across entire community clusters, countries, and continents.

The second (and more difficult) step will be sharing and managing the data. Trying to understand rural water usage, treatment technologies, sustainability of schemes, and other system related engineering metrics using the amount of potential information available could keep researchers busy for years. However, this is only one small part of what WaSH research is about. The end goal of any international aid intervention is to improve the lives of people. These beneficiaries are more than just data points and incredible care must be taken in how we curate any dataset that includes personal, financial, or private information. The complexities of working with human subjects will make security of the databases, terms of use, and requirements for reporting a priority. Academic Institutions already have these systems for their own internal research and silo-ed data, but we will need to extend these concepts across organizations to capture the quantity of data we need. The greatest question lies in who will bear the burden of protecting the interests of the communities in which we work and what protocols must be put in place to convince data-holders that sharing their knowledge is a safe investment.

Big Data analysis has already kept experts debating concepts like common indicators and who is responsible for monitoring and collecting data for too long. My hope is that we can move beyond these obstacles and make strides to change the way we analyze data.

We have the technology for meta-analysis, now lets go get the data!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s