4 Things I Thought About At Transparency Camp

Following up from last week with a more in-depth post. While I heard a bunch of compelling stories and found out about a ton of amazing projects, ultimately what I appreciated most about Tcamp was the chance to talk about the common issues that have arisen for those of us doing open government work.

The largest issue, to me, is accessibility. Obviously the first step to making data accessible is making it available – getting it out of government hands and into the public’s, whether through open data initiatives, FOIA requests, or asking very nicely. But how that information is made available very much impacts who will access it:

Not all obstacles are the result of malevolence, but they diminish accessibility just the same. Mike Morisy of Muckrock often talks about how they’ll request digitized information – databases, emails – and receive it printed on thousands of pages of paper. PDFs aren’t much better. At our last open government meetup we spent an hour debating the best text recognition program for searching meeting minutes and proposed regulations only available in PDF format. Non-programmers may face the opposite problem. I know plenty of people who’d be confounded by a CSV file, nevermind the raw contents of an Access or mySQL database. They’d be happy to get their information on printed paper or in PDF files. What works best for one person with one set of skills will be a constant frustration for another.

Once you’ve got the data in your preferred format, you need to have the the training to manipulate it, to have/know software programs like Excel or Calc or SPSS or JMP or scripting languages like R or Matlab or Octave. And you’ll need to understand at least some statistics – no simple feat when people who do analysis for a living often fall prey to common mistakes. Not to mention the healthy amount of civic literacy necessary to understand the meaning behind the numbers: how bill amendment works, or how federal contracts are awarded, or how the FDA’s clinical trial system works.

Accessibility isn’t simple. But it isn’t something we can ignore either, not if we want to be truthful when we say that we’re advocating for better data access for all. As transparency bloggers have talked about before, efforts to increase transparency can have unexpectedly oppressive effects:

A very interesting and well-documented example of this empowering of the empowered can be found in the work of Solly Benjamin and his colleagues looking at the impact of the digitization of land records in Bangalore. Their findings were that newly available access to land ownership and title information in Bangalore was primarily being put to use by middle and upper income people and by corporations to gain ownership of land from the marginalized and the poor. The newly digitized and openly accessible data allowed the well to do to take the information provided and use that as the basis for instructions to land surveyors and lawyers and others to challenge titles, exploit gaps in title, take advantage of mistakes in documentation, identify opportunities and targets for bribery, among others. They were able to directly translate their enhanced access to the information along with their already available access to capital and professional skills into unequal contests around land titles, court actions, offers of purchase and so on for self-benefit and to further marginalize those already marginalized.

The digital divide exists within the United States as well. Last year Boston unveiled a new initiative called Street Bump, the goal of which was to map pot holes by collecting and analyzing accelerometer data from smartphones. My friends and I were eager to join in the effort, until a friend pointed out that smartphone users were likely to live in – and therefore travel over the pot holes of – mostly well off neighborhoods. And in a session at TCamp, an activist (whose name unfortunately I didn’t catch) pointed out how increased access to crime rate data and the creation of apps like CrimeReports has the potential to stigmatize and further disadvantage poorer neighborhoods.

Which brings me to another issue I’ve been wrestling with – how can you build a narrative while maintaining accuracy? Data is, by itself, meaningless. Take this table:

Group A 0 0 1 0
Group B 1 1 0 1

What does it mean? Is it a count of men and women at a particular event? Is it a record of coin flips? Is it the political affiliation of people being canvassed at a sporting event? Even if we label our variables – say, assigning “Group A” as men and “Group B” as women, we’re still creating a narrative. We’re implying that men and women are the only two possible categories. We’re assuming that our sample is representative. We’re asserting that sex ratios at this event are a topic worth considering, if only for a moment. And this with a set of only four data points!

More concretely, let’s look at a semi-random dataset from Data.gov, labeled EPA Toxics Release Inventory Program. It’s a dataset with a few dozens columns and nearly two million rows – there’s no way a human mind could understand this holistically. We have to group the data together some way, maybe by location, or by the parent company, or maybe by whether the parent company was military, or whether the released toxic was a carcinogen. And as we organize, stories emerge. Maybe we see that toxics are disproportionately released in southern states, or that the vast majority of toxics are released by the military (or vice versa – I have not actually analyzed this data.) These are good and useful stories but they come at a price: lost nuance. A quick skim of this EPA page suggests that facilities do not have to report toxics release if it is under a certain amount per year. So we can’t necessarily say release is greater in southern states – only releasing by larger facilities. What are the potencies of the various carcinogens? If we say that some areas or companies or industries release “more carcinogens” than others, we may be misleading, if others are releasing small amounts of much more hazardous materials.

This might seem like nit-picking. In many ways it is. But the language that some transparency advocates use worries me. From the Data Journalism Handbook, which was released freely and very recently – on the first day of TCamp, as it happens:

Data analysis can reveal “a story’s shape” (Sarah Cohen), or provides us with a “new camera” (David McCandless). Using data the job of journalists shifts its main focus from being the first ones to report to being the ones telling us what a certain development might actually mean. The range of topics can be far and wide. The next financial crisis that is in the making. The economics behind the products we use. The misuse of funds or political blunders, presented in a compelling data visualization that leaves little room to argue with it.

I haven’t had the chance to read the book “cover to cover”, though I have skimmed it. I see a lot of quotes like the above, and not much about how to interrogate data or avoid common statistical mistakes. (Although to be fair, there is some discussion.) If we want to set ourselves up as storytellers, if we want to turn data into something meaningful, then we have a responsibility to make sure that what we’re saying is, well, true. Or as close to true as we can get it, with qualifications and caveats as our epilogue.

One last issue which I found myself talking about a lot was, ironically, communication. This came up first in an early session with the creator of Purple Binder, an online directory of Chicago social services. There is apparently a wealth of information being stored in the paper directories of social workers around the country – information that is duplicated and deprecated a little more every day. An online directory seems like an obvious solution, but it’s hard to get social workers to buy in to the process – they’re already stressed to the limit with a heavy workload, and entering data online is more cost (transferring information, adapting to new formats, dealing with bugs) than benefit to early adopters. And there have been efforts like this before, efforts which have failed due to disagreements about how to organize information, and power struggles over who gets to play the gatekeeper.

There were also debates about data standards. If we can agree on taxonomies and formats, we can combine and share data more easily, making it more accessible, more meaningful, more powerful. I’m not going to talk much more about this, because it’s not an area of expertise for me (although one talk at TCamp about RDF organization has me bookmarking pages to learn more.)

Finally, there’s a question of community. TCamp was, as most conferences are, primarily a networking opportunity. I met so many people working on a variety of projects, most of whom I’d never heard about before. I exchanged email addresses and project URLs with people who were doing work very similar to mine, and left the camp each day pleased to have all these new resources. But where are the tools for building a community beyond TCamp, and for those who couldn’t get there? The Sunlight Foundation has a google group, which is well-trafficked but difficult to search through, and an IRC channel, which is pretty quiet. Open Congress has a wiki, but I haven’t seen it promoted much and consequently, it stores only a fraction of the community’s knowledge.

My very last Tcamp session was with a dozen other local open government organizers from around the country. During the session, we did an exercise where we wrote down what we need to be successful. When we compared notes, we realized that we were all starved for communication. We needed to talk to our communities and find out what they wanted from the open government movement. We needed to talk to our local officials and figure out if they were willing to work with us, and how. We needed to talk to experts who could give us legal and technical advice. And most of all, we needed to talk to each other. To share resources and insights, to keep ourselves from needlessly duplicating others’ hard work, and, perhaps most importantly, to build a community. Because organizing can be hard, frustrating work, and it’s good to do it with friends. And you don’t really need a better to reason to do something than that.

11 Things I Learned About At Transparency Camp

I just spent the last two days at Transparency Camp. My capacity for coherent thought is pretty much used up after all the talks and sessions, and I want to save what small amount is left for the hackathon tomorrow. So here’s a list of some neat things I learned about, and I’ll save the insightful analysis for later.

1. Participatory Budgeting
Developed in Porto Alegre, Brazil, this process allows community members to decide how to spend a portion of the budget. Since 2009, Chicago Alderman Joe Moore has been doing this with his ward, and as of just a week ago, Vallejo, California will be doing participatory budgeting as well.

2. Opening Up Atlanta
Matthew Cardinale told the story of how he sued the city of Atlanta for violation of the state’s open meeting law. Warned by every lawyer he approached that his activism was futile, he argued as a pro se litigant all the way up to the state supreme court, where he won. In their decision, they made clear that the default state of government in Georgia is openness.

3. Superfastmatch
An API for a new algorithm which finds overlap between large blocks of text, well, super fast. Why is this good for transparency advocates? It can help detect when journalists are regurgitating press releases, for one. It can also uncover when legislators use “model bills” given to them by think tanks, as was the case with some of the recent ultrasound bills and the ALEC-backed Stand Your Ground laws.

4. The Hacker Bus
A project of Transparencia Hacker, a Brazilian open government group, this bus travels around hosting transparency awareness events and arranging meetings between local hackers and government authorities.

5. FollowTheMoney.org
Where you can find some truly disheartening reports, such as: “The top five recipients of $3.7 billion in federal corporate tax breaks paid $0 in 2009 federal taxes and enjoyed a combined profit of $77.16 billion in 2010. This report reveals that these corporations also gave $78.7 million to state political campaigns and $45.3 million to federal campaigns in the last decade.”

6. Lots of funny t-shirts
This one was my favorite:

7. The Plain Writing Act of 2010
A perfect example of the difference between availability and accessibility. If you’re not already convinced we needed this, take a look at the example on the wikipedia page linked above.

8. France’s Literal Underground of Hacker-Artists
I forget what this has to do with transparency, but it’s still pretty cool. From the liked article: “This stealthy undertaking was not an act of robbery or espionage but rather a crucial operation in what would become an association called UX, for “Urban eXperiment.” UX is sort of like an artist’s collective, but far from being avant-garde—confronting audiences by pushing the boundaries of the new—its only audience is itself. More surprising still, its work is often radically conservative, intemperate in its devotion to the old. Through meticulous infiltration, UX members have carried out shocking acts of cultural preservation and repair.”

9. Politwoops
An API of tweets deleted by politicians.

10. Programming Metaphors
Unit-testing and system-testing legislation. That is, documenting the stated purpose of legislation on a modular and holistic level and evaluating whether it “passes”. Related, Gitlaw: using git (or git-like) version control systems to help track/comprehend incremental changes to legislation. (I feel like maybe this would only be helpful to programmers, and just confuse everyone else more – but I’d like to see it done.)

11. Citizen Science
Okay, I already knew about most of the projects mentioned in this session but I was still glad to talk about them! I see a lot of overlap in the open science and open government movement, and not just in the names. ;) And I did learn about a couple new projects: Be a Martian and Leaf Snap.

That’s not nearly all, but I’ll stop there for now.

Laptop Labs

I was going to title this post ‘How the Internet Is Facilitating Public Participation in Scientific Research’ but I decided the above was more catchy.

Over the last week, I’ve come across a couple new ‘science experiments’ – that is, experiments in improving science by encouraging the involvement of the online public. The first one, Petri Dish, is basically a Kickstarter for science projects:

Right now, it’s skewed pretty heavily towards ecology and animal behavior. Which makes me curious about how different scientific fields have embraced online innovation in different ways. For instance, there’s arXiv, which provides open access to articles in a handful of fields, including nearly all articles published in most subfields of mathematics and physics. However many other fields hide the bulk of their research away in closed access journals which require $10-20 per article.

Then there’s crowdsourced science projects like FoldIt and GalaxyZoo, which allow the public to participate in large scale protein synthesis and astronomical classification experiments, respectively. These may just be pioneers at the forefront of a wave of crowdsourced experiments in all fields, but so far it doesn’t look like it. So why these projects, in these fields?

My own field of psychology is in some ways ideal for small scale, independent research. It can be quite cheap to do a study, doesn’t require years of training or elaborate equipment just to get started, and understanding the published work in the field can be fairly intuitive compared to, say, physics. We’ve all got minds of some kind, after all. But I haven’t seen many efforts to involve the public in psychology as scientists. The field has embraced the internet as a source of subjects (see PRO, Project Implicit, the Moral Sense Test and of course Mechanical Turk for just a few examples) but I don’t recall ever seeing an effort to get the public involved in the research.

Until a few days ago. A friend linked me to the Reproducibility Project, an effort by the Open Science Framework, which is in turn the brainchild of University of Virginia psychologist Brian Nosek. The project is an effort to replicate all articles published in three major psychology journals in the year 2008. This seems largely in response to growing concern over the accuracy of published psychology findings (see here, here here and here for again just a few examples), although Nosek is quick to point out that a failure to replicate does not mean that the original article was fraudulent or even incorrect.

Although it appears that the actual replications are being done entirely within academic institutions, the meta-analysis is being conducted openly on the web, with the public able to review the full introduction and methods sections for each replications, as well as the “bare bones” result, participate in discussions of the project as a whole on the group mailing list, help with administrative work such as coding, and possibly much more.

I am planning on getting involved with the project, so I will report back. While the researcher in me would love to be able to run a replication on my own, independently, I understand the ethical problems (though they could be solved by a public IRB! I’m just sayin’) and I appreciate just how open the rest of this project promises to be.

At the same time, though, each of the projects are a far cry from, say, the Public Lab, which as far as I know does not require institutional affiliation for any part of the research process. They work primarily with aerial mapping. Is it possible to achieve that level of independence in other fields? If not, what’s stopping us?

There are so many roles that people can play in the research process, even without being academics. They can be subjects, funders, reviewers, data collectors, analysts – and, I am sure, visionaries. But I’m not sure how we get there.

I think that I shall never see…

The other day I was cutting up cauliflower for soup. It looked rather a lot like this:

Cauliflower Tree

I turned to the other people in the room and said, “Look it’s a brain! A braaaaaaain!”

I was thinking specifically of the cerebellum which, when cut sagittally, looks like this:

Of course, what both a cauliflower and a cerebellum look like, when cut right, is this:

The term for this type of structure, dendritic, comes from the greek word for tree, dendron. Anyone passingly familiar with neuroscience will recognize the term dendrite:

But the most gorgeous natural dendritic structure I’ve seen recently are these rivers in California:

Any dendrites I’ve missed?

Chametzcakes

When searching for recipes for our passover seder – an extra challenge with vegetarians and people with nut and gluten allergies sharing the table – we stumbled across this abomination:

gefilte fish cupcakes with horseradish whipped cream

Gefilte fish cupcakes with horseradish whipped cream. Gefilte fish cupcakes.

That said, this site is actually pretty cool, and I do want to try out a bunch of her recipes (once passover is over.) On the list are:

Tomato Cupcakes with Basil Filling
Margarita Cupcakes
Parmesan Sour Cream Cupcakes with Whipped Raspberry Frosting
Olive Oil Cupcakes with Lemon & Thyme Frosting
Wine and Cheese Cupcakes

If you want to make them with me (or just taste test the results!) let me know.