Just one in a crowd

A few months ago, I joined a project called Crowdstorming a Dataset. It’s a project affiliated with Center for Open Science and its basic premise is this: what if you gave a single dataset to dozens of researchers, and asked them to prove or disprove a particular hypothesis? What are the different analytical approaches they might take? Would they all give similar answers? Once they’re given the opportunity to give and receive feedback, would their answers and methodologies converge?

The project is far from finished, and those answers are still mostly unknown, but as I finished with my role this week, I thought I’d take a moment to reflect.

Deciding to Join

I hesitated before joining this project. Not because I thought it wasn’t valuable, but because I worried my skills would be inadequate. Many of the researchers involved have a lot more training, experience, and resources than I, when it comes to data analysis. What if my proposal was flat-out wrong?

In the end, I decided that any contribution I made would be valuable. While a typical researcher might have more than a single stats class to their credit, education and experience are no guarantee against making mistakes. If my analysis plan was poor, it would test whether reviewers could identify those flaws. If my execution was off, it would signal that conceptual review is insufficient without technical review.

In order to facilitate the discovery of errors – and also because I like to use and promote good tools – I did my analysis in the form of an iPython Notebook, with ample documentation and commentary. You can find the notebooks here.

The Basic Structure

Researchers were given two research questions: (1) Are soccer referees more likely to give red cards to dark skin toned players than light skin toned players? and (2) Are soccer referees from countries high in skin-tone prejudice more likely to award red cards to dark skin toned players?

We were also supplied with a large dataset of player-referee dyads, which included information such as the number of red cards given by said referee to said player, the number of games in which they both participated, the number of goals scored by the player during those games, bias scores for the referee’s country of origin, skin tone ratings of the player by two independent raters, and more.

We were asked to create and implement an analysis plan. We reported the plan and results separately to the organizers, who set up a system for us to peer-review the former. Each research group was asked to give feedback on at least three other analysis plans. We then altered our own analysis plans as we felt we needed in response to feedback, re-did our analysis, and reported back to the organizers. We also rated our confidence in the hypotheses at various points throughout the study.

You can read more details about the project here.

Flaws

There were a few hiccups along the way, which is perfectly natural for a first-time project. Hopefully if there are future iterations they will be addressed.

  • The dataset was not thoroughly described. Most importantly, the organizers did not document the exact nature of the ‘club’ and ‘leagueCountry’ variables included in the dataset. Many researchers, including myself, assumed that these variables meant “the club and league that the player was in when this data was gathered”, but it turned out to mean “the club and league that the player began their career with” which covered an unknown fraction of the data. As a result, the many comments during feedback about how to address the multi-level nature of the data (with players nested in clubs nested in leagues) may have been inappropriate or even inaccurate. It’s worth thinking about best practices for documenting datasets and methodologies. How can we minimize omissions like these?
  • Some plans did not recieve enough feedback. One of the most interesting aspects of this project was the opportunity to see if participants consensed around which analysis plans were most likely to be effective/accurate. However due to how this process was designed there was significant variation in the number of ratings received. The average team received 5 ratings and responses, but many received only 2 or 3. How much is enough to indicate consensus? Surely it was too much to ask everyone to rate all 31 approaches, but I’m not sure how informative the ratings data actually are. I also found the qualitative feedback to be somewhat lacking, with some groups skipping it entirely and a few providing commentary that was too terse to be particularly useful.
  • For the final analysis, the organizers requested that we provide our results in the format of an odds ratio or cohen’s d. This presented a problem for me, as the result of Poisson regression is not easily converted to either of these statistics. I ended up submitting an incidence rate ratio, which will hopefully be useful. There is a tension here: to constrain result format too tightly is to falsely limit the kind of approaches researchers take, but to accept many different formats is to practically limit the ways in which results can be compared.

Educational Value

Regardless of the meta-analytical results, I think this protocol has strong value as an educational tool. Here are just a few topics I gained further understanding of:

  • Possibly the most helpful piece of feedback I recieved was that ‘games’ should have been an offset or exposure variable in my regression. This was not a concept I had heard of before, but a little reading made clear that the reviewer was absolutely correct. Offset/exposure variables are used when dealing with count data when the opportunity for events to occur – usually time – differs. Hence the term ‘exposure’.
  • Although I was familiar with multicollinearity before this project I had never grappled with it in a practical context. Multicollinearity is when two or more variables in a model are correlated with each other. Including multicollinear variables in a model doesn’t harm the predictive power of the model as a whole, but it can cause information about individual predictor variables to be wildly inaccurate. Since this hypothesis was a question not about how to predict red cards as a whole, but about the influence of predictor variables skin tone rating, mean implicit bias, and mean explicit bias in particular, this was a serious issue. One site I read suggested splitting the data and comparing coefficient values, but it was not clear to me how to interpret these results. Couldn’t high variance in a regression coefficient mean that there’s no true effect, as opposed to an effect being obscured by multicollinearity?
  • A piece of feedback on one of the other analysis plans mentioned the Akaike Information Criterion (AIC). This turns out to be a sort of abstracted way to compare models for a given dataset. It combines the likelihood of an observed dataset given a specific model with the sample size of the model as well as the number of parameters, discouraging overfitting. I would be interested in seeing the AIC values for the different models submitted in this project!

Looking Forward

Although we await the organizers’ report, I can already say that I found this to be a valuable and informative project. I thank Raphael Silberzahn, Eric Luis Uhlmann, Dan Martin and Brian Nosek for conducting it, and I hope it is not the last of its kind.

Attention Rob Thomas

I could write plenty about my experiences at Hackers on Planet Earth (HOPE) this weekend, and I probably will, later. But I had a quick thought during Steve Rambam’s talk on privacy loss yesterday that I wanted to share with you all.

How great would it have been if the Veronica Mars movie had focused on big data, surveillance and privacy issues instead of that throwaway murder mystery plot? I’m imagining Mac as a major player – a sys-admin for the government and perhaps a Snowden-esque leaker, or a corporate whistleblower. The Kanes are, in story, technology tycoons – you could easily make them a stand in for Facebook, Google, etc. There could even be a subplot about the Neptune police department buying drones and using new tools to invade people’s privacy.

It would have been relevant, provocative and even educational for viewers. Mac could’ve name-dropped real privacy tools like Tor, SecureDrop, CryptoCat, etc and Veronica could wrestle with the hypocrisy of fighting for privacy rights while she invades people’s privacy all the damn time.

Bonus: the focus would be on the two smart, complicated female leads and their friendship, rather than on predictable romantic subplots.

Maybe there’s hope for the next movie?

It’s my time

A little over a year ago I did a survey at the job fair of a major tech conference. At each booth I asked whether they were hiring people part time. The response was almost entirely no way, nuh-uh, never.

There is plenty of research showing that more time at work does not equal more productivity. (Caveat: I have not read the primary research here.)

I want to offer some anecdotal evidence.

I am a freelancer (or a contractor, or self-employed, whatever you want to call it). My main client right now is OpenHatch. Last year the hours I spent on OpenHatch worked out to approximately quarter time. This year my hours are the equivalent of half time. This means that at the end of June I had worked the same number of hours as if I had been hired on full-time for six months. So, what did I have to show for myself at my internal “six month review”?

  • I organized or co-organized 21 Open Source Comes to Campus events, personally running 15 of them.
  • I spoke at two conferences on behalf of OpenHatch, and wrote a proposal to speak at Grace Hopper this year. (The Grace Hopper proposal took an unexpectedly long time, as the process is quite competitive – approximately 20% of submissions are accepted.) I have also run workshops at three conferences.
  • I improved and documented our event planning process and made it far more efficient to use, both internally and for those who want to “fork” the project.
  • I created multiple tools which have been useful both for OpenHatch, and which other projects have shown interest in using/adapting. These are our In Person Event Guide, WelcomeBot, and Merge Stories.
  • I wrote 25 blog posts for the OpenHatch blog.
  • I helped redesign and maintain the program website.
  • I’ve done interviews (Wired, In Beta, Linux Magazine) which resulted in good publicity for OpenHatch.
  • Other small improvements including leading documentation sprints, creating and instituting a Code of Conduct for the IRC channel, organizational planning, and helping with the fundraising drive.
  • I have answered an uncountable number of questions via email and IRC.

I am almost certainly forgetting things.

I think this is more than most people can do in six months of full time work. It’s more than I could do in six months of full time work! Clearly, OpenHatch is benefiting from this arrangement.

And I’m benefiting too. I have tons of free time with which I can pursue other opportunities, whether that means working for other clients, or personal pursuits such as writing novels and children’s books, maintaining and writing for the Open Science Collaboration blog, taking online classes and reading non-fiction, and being there for family in the hardest times.

I wish more organizations were open to hiring contractors, because I know that I – and others! – can be amazing assets when given flexibility and independence. It’s funny how the capitalist desire to wring every last drop of productivity out of a worker often extinguishes the spark that makes them productive.

I’m guarding my spark. If I never work full time again, I don’t think I’ll regret it.

This is just to say

I got interviewed about OpenHatch / Open Source Comes to Campus for an article in Wired. Pretty cool!

How to hit a softball

I did a lightning talk at AdaCamp called “how to hit a softball”. This talk was born of frustration: the only activity I participate in which is less gender-diverse than open source software is softball. I’ve played in games where I’m the only woman on either side, out of 20+ people.

(I find this especially heartbreaking as softball currently functions as a way to keep women out of baseball. The rules, equipment, and pitching motions are just different enough that it is incredibly difficult to switch from one to the other. Talented girls are then forced to decide between the limited but attainable rewards of softball, which provides many with college scholarships, and the risks of baseball, which has given riches and fame but only so far to men. I could go on – and have – but for now, lets get back to my lightning talk.)

There are three simple tips I give completely new players that can get them from “missing 90% of the time” to “solidly connecting 90%” of the time. They are:

1) Watch the ball

This may seem obvious, but many novice players take their eyes off the ball when they swing. This not only makes it harder to hit the ball, it also messes up the motion: with the correct swing, it should be possible to watch the ball wherever it goes. If you can’t follow the ball with your eyes, it’s a sign that your swing is off.

You can see a demonstration here.

2) Keep your back foot planted

The power in a swing comes from shifting weight from the back of your body to the front, not in throwing your body at the ball. While the front foot may move, the back should never leave the ground (though it often will twist in place). If you find yourself twirling when you miss a pitch, you are surely making this error.

You can see a demonstration here.

3) Line up your knuckles

The above two tips should help you connect with the ball most of the time. To help the ball go farther when you connect with it, there are a number of subtle things you can do with both your upper and lower body. Perhaps the easiest of these things to learn is: when you grip a bat, line up your knuckles. This position forces you to hold the bat in your fingers, not your wrists, which allows your wrists to move fluidly, an important part of having a powerful swing.

You can see a demonstration here.

(Extra tip: your back hand should be higher up on the bat than the hand which is closer to the pitcher. Most people will place their hands this way naturally, without being told to, as it is the most comfortable.)

This has less of an impact on your swing than the above two tips, so if you can only remember a few things at once, practice the above before working on this one.

*

What I like about these tips is that they’re something anyone can do: you don’t need to lift weights, study pitches for hours, or memorize dozens of muscles movements in order to make these changes. Just remember: watch, plant, align (or eyes, back foot, knuckles) and your swing will go from embarrassing to serviceable very quickly.

Bonus! This gorgeous video of a fastpitch softball pitch in slow motion: