Data Analysis / Graphing Databases / Open Data

Rethinking Data: Part 2- Graphing “Open” Data

a continuation from Rethinking Data: Part 1

Two weeks ago I posted a blog about open data in WaSH- the challenges in gaining sector buy-in and our responsibility to promote open data (read it here:  The Wild-Card: Open Data). After the Akvo Track-Day event, I had a great conversation with Henry Jewell about what do we do with open data once we have it. Once it’s on the internet, is anyone really using it? What are the using it for? Is it making an impact?

I realized that while I promote open data, but I do very little with it. I’ve used some open data to generate statistics for reports and info graphics. I’ve even used some as a primary research data source. But the real point of open data is to improve public understanding of a given topic and influence future policy/decision making, from a national level to the community level. And I haven’t used it for that.

Which brings us to today’s post. It’s time for an open data project! My research question: Within the private sector, who is funding who in WaSH? How do all of the various organizations connect via different funding streams? And most importantly, how do they compare in size and scope of funding given/received?

Gathering the data.

Surprisingly, or maybe not surprising depending on your experience with open data, open data comes in a lot of shapes and forms. Data is only truly open when it is easily extractable and in a user-friendly format.  Data is often (sadly) promoted as open when its locked in a .pdf or hidden within flash objects.  Lets layout a ground rule: If an API can’t read it, it’s not really “open”.

The data I decided to use was what I like to call “semi-open”. WASHFunders.org has a great dataset that the Foundation Center has curated, tracking grants and financial distributions from donors to recipients. It is hosted on their site in a great map format and can be viewed in a table format as well. However, the table is segmented into pages and there is no option to export any of the data. When I contacted the site manager, I was informed that the raw data could be accessed for a nominal fee. In an effort to harness the data in its public-facing format, I was decided to copy and paste the data from each table-page (194 pages, to be exact). While time consuming, I was able to build the raw data set needed.

Data? Check.

Analyzing the data.

I’ve been diving into graphing databases and tools, and what better way to put some of these skills to work than in a project that will combine open-source with open-data? I stumbled upon a graphing tool called Gephi a few weeks ago and have been tinkering with it on random data sets. Gephi allows users to upload edge and node tables, run layout algorithms to determine all sorts of fun stats (connectiveness, centrality, graph density, etc.), and put together very visually appealing graphs.

Tool? Check.

Graphing the data!

After some scrubbing, merging, and re-formatting, I was able to build two CSVs with edge and node information needed for Gephi.  A combination of Force Atlas and Noverlap layouts gave the base for my graph.  There was still a decent amount of overlap in labels, so a couple hours of manual tweaking took place to finalize the graph format.

And Viola! Here is the graph:

Click here for an interactive zoom version.

 

Private WASH Funding_final*********************************

Warning: Technical Part Beginning

This is not how I originally wanted to display my graphing work. As you can see, it is very detailed and it’s rather difficult to view in it’s current format. I was hoping to put out something interactive (like this: http://diseasome.eu/map.html) that allowed for more data to be displayed with each node. Unfortunately, it is beyond my programming skills to put out something like Diseasome’s graphing map. My second option was to use SeaDragon/Zoom.it, which I was able to do successfully- but WordPress has a strict “No Javascript” policy for free blogs. Since I currently do not have a place to host the needed files and a web page that allows me to embed JS, I am stuck with the static image above and linking to zoom.it (although its a lower-res PNG output and not as high quality as I’d like). If any of my tech-savvy readers have suggestions for getting an interactive version online, please leave me a comment/suggestion.

Advertisements

11 thoughts on “Rethinking Data: Part 2- Graphing “Open” Data

  1. Ben, thanks for posting this, hope you or someone else can add more interactive elements.

    I think readers should be aware that there is a strong US bias in the selection of private funders. Missing for example is HSBC (the bank involved in drugs money laundering), which launched a US$ 100 million, five year partnership with WaterAid, WWF and the Earthwatch Institute last year (see http://www.source.irc.nl/page/72419). Another large non-US private funder is Dubai Cares (http://www.dubaicares.ae/), which supports a global WASH in Schools.program.

    Secondly, a number of “recipients” like charity:water, BRAC USA and the Coca-Cola Africa Foundation merely channel funds through to other organisations which actually implement projects.

    Cor

    • Cor-

      I 100% agree. The data came from WASHFunders.org, which is US based (I imagine that is largely responsible for the bias). I had no idea that so much of the focus was on US sources until I was knee-deep in the data.

      Unfortunately, open funding data that is aggregated in one place is hard to come by. I’ve been trying to add OECD funding to the graph as well, to also show the relatively small amount is given through private funding compared to Government spending. The only easily accessible data stream that segregates out govt funding to specific organizations for WASH is from DGIS (thanks to openaid.nl). The DFID and USAID open data systems are more clunky and hard to extract pertinent information from (like project level budgets).

      If there are other good resources available to improve & expand this graph, I’d love to include them. Feel free to post your ideas or links to other data sources!

  2. Very nice mapping exercise even though it can be improved on if data becomes available in a format that allows it to be used. Had just a clarifying question. Are the surfaces (e.g. mm^2/U$D) of the circles for providing funding the same as for receiving funding? It seem to me there is more Blue surface (funding given) form some funding agencies than funding available (green surfaces).

    I ask because if the scale is the same for both I could fro example conclude from your nice analysis that the amount of funding received by the IRC International Water and Sanitation Centre from The Bill & Melinda Gates Foundation equals the amount the Foundation of PepsiCo provides to the WASH sector.

    Kind regards
    Kristof

    • All of the node sizes are on the same scale. They are determined by total funding related to WASH that is either given or received. The scaling, however, is quite large (tens of millions for Gates Foundation, for example). The graph only shows comparative size. For specific amounts, check out the original data set at http://www.WASHfunders.org

  3. Hi Chris,

    Cor Dietvorst pointed me again to this post as I forgot I had even commented to Ben’s post some time ago. Like Ben I’m a an advocate for open data but always find myself analysing data on standalone desktop software not having the programming skills or even the time to make such interactive networks as you created. A really nice piece of work. While the dynamic version is great it stop short of allowing analysis to allow some sensemaking of what the network really learns us.The reason Cor reminded me of this blog is that we are considering of mapping the flow of funding for sanitation projects within the WASH sector worldwide. The goal is to analyse the various degrees of centrality of funders and receivers of aid in the sanitation sector but I would like a find a way in which such effort can be build on rather than that it becomes another nice but one-off effort. WIth IATI data standard for open data on development it would be nice to find a convenient format in which data can be added to a database which then leads “automatically” to visualisations and allows possible analysis. Currently I use Pajek and Gephi fro SNA the later being a multi platform software but ideally would like to see an online version Kine as more appropriate tool to actually use the information available online. I can clearly see what kind of tool would entice people to contribute data and allow sensemaking of the data analysis but have no idea how hard it would be to build.
    What do you think Chris could such tool be reality or still a distant dream for the time being?

    Kind regards
    K.

  4. Kristof,

    If I recall correctly, the visualization tool which the other Chris put together ingests JSON data files (JavaScript Object Notation). JSON is a pretty standard data format which could be dynamically extracted from an existing and readily-updated database. You wouldn’t have to build the visualization from scratch every time you updated your dataset.

    So, in short, not a distant dream at all: Totally doable.

    Regards,

    -Chris Cline

  5. I agree with Chris Cline- the technology is there and the tools are available. The limiting factor in Kristof’s idea is the data itself. IATI standardized data makes all of these ideas more possible, but the data streams are still only flowing at a trickle. I haven’t seen a robust data set of private, bilateral, and/or multilateral donations/grants/contract awards that would give insight into funding for sanitation. Without a regularly updated data set (or sets), it will be a challenge to really capture a high percentage of funding flowing into the sanitation sector.

  6. Hi Kristof,

    Seconding what Ben and Chris Cline have said. Storing data on a web server is cheap, and analyzing it on the fly and rendering it in a web browser is surprisingly fast and fairly easy. Making it more than a one-off effort means live data streams and giving organizations a reason to update their information. For contrast, updating the examples that Ben and I built would mean manually copying and pasting data from washfunders.org into a spreadsheet or database table.

    The first step in motivating WASH organizations to share their data, it would seem, is to try and keep a record of all active WASH organizations period, and then highlight individual organizations by their data openness. Perhaps this recognition could apply to for various levels of data sharing, (a) like regularly submitting updates to contact information, (b) publishing data on their own website, (c) sharing live data streams in their own format, (d) sharing data in a standardized format like IATI.

    Does anyone know of an organization that does this already — that attempts to comprehensively index WASH organizations?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s