It’s been just over two months since I began an internship in the Science Program at Creative Commons. CC was definitely on my radar as I sought to become more involved in the open access movement, and I’m really grateful for the opportunity to bump shoulders with such driven and dedicated people. But the Science Program? Well yeah, Science and Data.
The title of this post has two meanings. On one hand there is the idea of learning data, information about how we learn.
Massive online courses like those served up on Coursera and Udacity provide access to educational content from top universities, at no cost to the user. But the user is actually giving the “MOOC” platforms highly valuable information. Every detail about how users move through content is recorded, tracking every correct answer, each clicked video, and of course…how often users lose interest and drop out of a course. If you didn’t pay for it, there’s no direct loss to you. And that’s what hundreds of thousands of users are doing.
I’m sure you’re more than aware about this: every time you log onto Facebook, FB knows who else’s profile you view. They know who you interact with. They know so much about how you use their platform, they can make somewhat-reliable guesses about advertisements you might click on. Every picture of your friend’s food you click on, and every time you (incessantly) check the FB app on your phone. They know it.
What if these 100,000-student strong courses helped the afore-mentioned education ventures develop metrics (points of measurement) and analytics (discovering patterns) to understand our behavior when it comes to learning? Well, they do. And they are. They’re recording all that learner-contributed goodness. Which quiz questions do students get wrong? Which might be too easy? And why? They see which lessons and courses stimulate the most discussion, driving interaction between students all over the world.
So then there’s the other hand. I, personally, need to know more about how to work with data. That is, I need to be learning about data. Fortunately, the internship at CC has given me the opportunity to examine data repositories and databases chocked full of research data. Boring? Nah, not at all. There’s variety in repository interfaces, disciplines that offer data (everything from NASA to World Bank to NIH), and a number of methods and tools to work with the data. That’s the issue I see: the tools to work with data need some help. They are unfriendly, especially for non-techie people.
But what if we could develop tools that help people work with data in a visual way, manipulating large datasets and discovering patterns more easily? Then we’d really be onto something. Not sure how to manage a Hadoop installation? Don’t want to mess around with Amazon’s Elastic Compute Cloud? Yeah, I don’t really, either. We all know how stoked people are about data visualization. Infographics, which used to be exciting, are now expected. We love beautiful visual representations of information and we hate staring at tables of data. And beyond being able to read these visuals, we need to be able to create them.
There should be a happy flavor of software that allows skilled learners to make the most of their talent but not require all users to have professional-level computer skills to communicate ideas through data. Everyone should be able to mash up, stretch, squeeze, and eyeball newly-opened data. Imagine an elementary school teacher who runs an after-school science club. What should stop him from teaching labs using NASA’s data? Or a high school teacher focusing on the Civil Rights Movement. What should stop her from having students work with US Census Bureau data to understand socioeconomic factors during that period? The tools to work with the data need to be friendlier so more of these learners can use them. Not everyone is going to be a computer science major, nor will they have an innate desire to rock an excel spreadsheet. But they do need powerful tools to manipulate data. And now, let’s combine both: data about learning and learning about data.
Through the process of learning about data by using data tools, the tools themselves can contribute to our knowledge of how people engage with abstract ideas. Design a well-thought-out data visualization tool. Find the best metrics, analytics, and algorithms, understand learning processes, and turn out meaningful updates regularly. Oh, and make the tool mobile (duh). And one more thing: be very agile throughout the development, release, and life of the tool.
This post was inspired by some lively conversations I’ve had recently with bright folk. In addition, this GigaOM article caught my attention and helped me to refine my idea of what could make a big impact.
Once domain experts are able to work directly with machine learning systems, we may enter a new age of big data where we learn from each other. Maybe then, big data will actually solve more problems than it creates.