There has been a common theme discussed in international development and humanitarian aid circles for decades: How do we leverage new technologies to improve our program effectiveness? The answer to this question is two fold- developing appropriate technology to monitor programs and evaluating the data. Many NGOs, foundations, private sector partners, research institutions, and government agencies are embracing the technology side of the equation. New apps, ICT4D tools, web-based platforms, and GIS mapping initiatives are launched every day to improve how we work in the aid world. But even with Big data search algorithms and petabyte cluster computing resources are at our finger tips, we still are struggling to address the second half of the solution: aggregating the data and evaluating it.
Whenever you get a group of data wonks in a room, we inevitably arrive at the same conversation: Who has new data? How big is your data set? How does it compare to my data set? And most importantly- why isn’t there a central catalog of everyone’s data so I can access it whenever I get a new research idea?
This isn’t a conversation that only happens at WaSH Happy Hours. Every field of research on the cutting edge of development struggles with these same questions. Humanitarian aid research just has an extra handicap to play with. Medical conglomerates are willing to foot the bill for big data costs of biological and genome research. Google and Apple have no spending cap as they compete to build the ultimate datasets of human social marketing. But who is willing to foot the bill to catalog rural water systems in countries with a GDP smaller than the annual revenue of Amazon.com?
There are some foundations and OEDC donor agencies that are contributing to international development data management projects, but WaSH is still lagging behind. Initiatives such as WASHFunders, who track private sector WaSH investment, and AKVO OpenAid, a partnership to build a digital library of IATI data sets, are making an effort to build open data resources. Recent E-Discussions hosted by the RWSN have led to the sharing of water point mapping datasets by participants. The challenge is taking these projects to a much larger scale. The aggregate size of the WaSH databases that are open to the public are about a couple hundred of Gigabytes in size, possibly a few Terabytes at maximum. In the grand scheme of other Big Data projects, we are a little league team from Kansas competing against the datasets that are funded like the New York Yankees.
The task of scaling from Terabytes to Petabytes will obviously require financial investment, but more importantly it will take collaboration from international, national, and local stakeholders involved in managing systems and providing WaSH services. The local stakeholders are the gate keepers of data and yet they have they most difficult path to sharing their knowledge.
The first step in unleashing the power of Big Data in WaSH will be modernizing the field data that we already have. Sexy smartphones are becoming more common in developing countries and are great tools for collecting new data- digitizing data from the past 40 years is unfortunately not as hip. Time and money will have to be invested in scanning paper reports and uploading local files to cloud servers. Ask any regional water manager about functionality data and they will most likely point to a stack of binders and notebooks gathering dust on their shelves. Incentives need to be put in place to turn this data into a digital format so this rich history and local context is not lost. Aggregating this data will allow us to look beyond individual efforts and look at the impact of WaSH across entire community clusters, countries, and continents.
Big Data analysis has already kept experts debating concepts like common indicators and who is responsible for monitoring and collecting data for too long. My hope is that we can move beyond these obstacles and make strides to change the way we analyze data.
We have the technology for meta-analysis, now lets go get the data!