We鈥檙e recognizing Love Data Week (February 11-15) and this year鈥檚 theme is 鈥榙ata in everyday life.鈥 We鈥檝e asked several researchers who participated in our Better Science Through Better Data event to reflect on the importance of data sharing in their own lives. We鈥檒l be sharing their stories all week so keep checking back!
I鈥檓 supposed to write all about how I love data and how it can change the world. But I鈥檓 not going to. Not because I鈥檓 grumpy, but because I think we鈥檙e thinking about it all wrong. You see, a lot of the buzz around data seems to suggest that if only we had better data everything would be okay, as if people need data and that open data itself is the answer. I think we鈥檝e got this back to front. If we want to make things better, we need to understand how data needs people to make it useful and meaningful, to test it, and to question it.
So, I love data in the same way that I love iron ore: i.e. not very much at all. It鈥檚 a nice raw ingredient, but it does need a good bit of work before it becomes useful. Yet I do love what data can do, and I love the new world of open data analysis and the people trying to do good things with it. I might even go so far as to say I love some open source software. But I don鈥檛 want anyone thinking I鈥檓 weird. Instead, let me tell you a very short story about how I got a bit carried away with myself at work and ended up with new friends and a Tampa Bay Rays headband. My conclusion to all this is that data needs people more than people need data.
I鈥檓 quite into mapping and spatial analysis, and this is one of my go-to procrastination methods. In early 2016 a paper I wrote on commuting in the United States was very sensibly rejected by a leading academic journal, so instead of moping around I decided to put the paper online, alongside the data I used. Into the void, as they say. Some time after this, a brilliant American scholar called Garrett Dash Nelson stumbled across the data and paper and did something cool with it. It involved algorithms and code, so I was obviously going to have to take a closer look. Put simply, Garrett applied a network partitioning algorithm to the dataset I carefully assembled, cleaned and swore at for several months and, to be honest, he did it a lot better than I ever could have.
So, naturally, I asked him if instead of just doing it for Massachusetts, he wanted to attempt it for the whole United States and write a paper together. With a PhD to finish, a job to find and no spare time, Garrett somehow still said 鈥榶es鈥. We proceeded to write a paper on the 鈥榚conomic geography of the United States鈥, and found that the US can be divided into 50 or so 鈥榤egaregions鈥, kind of like states but their boundaries are based on where people work. Fast forward to January 2019 and our paper has now been viewed almost 290,000 times and one of our megaregion maps featured on a recent cover of the Proceedings of the National Academy of Sciences. I鈥檇 normally consider 500 views as 鈥榞oing viral鈥, so the reaction to our paper was quite nice. We also published all our files as open data on Figshare and this now has the highest Altmetric Attention score of any dataset in the UK. We even hosted a successful AMA on Reddit.
The results resonated far and wide and we were contacted by people working in industries we couldn鈥檛 possibly have imagined would find our work useful. An executive from the Tampa Bay Rays said our work was useful as he planned a new baseball stadium (thanks for the merch!), an epidemiologist said our new boundaries helped him understand disease transmission, a renewable energy expert said our work was just what she needed, a Silicon Valley infrastructure planner said our boundaries made perfect sense, and someone called 鈥榤onkeychef鈥 on Reddit said 鈥淗oly crap they drew the definitive line to divide north and south Jersey鈥.
Why do I think this story demonstrates that data needs people more than people need data? The first reason is that this wasn鈥檛 new data. It just needed people with the inclination and time to make it useful. The fact that it was open data wasn鈥檛 enough. The second reason is that it needed human input to make it appealing and accessible. A third reason is that it demonstrates how it needed human interaction to take it beyond the realm of data into information and knowledge. The knowledge came about through human interaction. A fourth reason is that the story shows how it needed people from different disciplines and backgrounds to see the richness in it and the value of what it could tell us. But perhaps the most important reason in this story is that the data needed people to create it in the first place. The daily grind of the commute for 130 million Americans is what made this data. Without them, there would be nothing.
If this reads something like a manifesto, that鈥檚 not entirely coincidental. In I Love Data Week I think we should focus on what data can help us do, and the people who help us achieve it.
Watch Alasdair鈥檚 lightning talk at #SciData18 .
Alasdair Rae is a Professorial Fellow in the Department of Urban Studies and Planning at the University of Sheffield. His research focuses on cities, regions, housing markets, neighbourhoods, inequality, transport and spatial analysis. He uses data a lot, and he likes to make maps. You can find out more about his work on his website and frequently updated Stats, Maps n Pix blog. Follow him on
50度灰 is committed to supporting researchers in sharing research data and in receiving the credit you deserve.
Read more about our research data products and services.