Earlier today, I spoke at the first-ever White House Safety Datapalooza, an event organized by the White House Office of Public Engagement, Office of Science and Technology Policy and the U.S. Department of Transportation. Invited speakers came from the public and private sector, universities and non-profits.

I was asked to give a “TED-style talk” about data journalism. In case you missed the live stream of it on White House Live, here’s the prepared script:

What is data journalism?

Lots of people — including journalists themselves — have different opinions of what it is right now. There are blogs about it, and websites about it, and lately, new degrees at universities about it. There are even big conferences where “What is data journalism?” is a keynote topic.

Most of us understand what journalism is, what “news” is. But “data journalism”?

Me? I think it’s a new buzz word for a very old process: gathering, examining and finding meaning within collections of information — and letting people know.

So let’s talk about data. You’re probably familiar with that word, “data.” You’re probably thinking data means numbers. Piles of numbers. And you’re right. But there’s other kinds of data too: Factoids. Photos. Video. Names and locations. Time logs. Points in space. Things that we collect, electronic files we save that may not seem individually relevant or interesting, but when interconnected and cross-referenced and analyzed, show us patterns, tell us stories, reveal truths.

Why is access to data important? You get up in the morning, you’re getting ready for work and you ask, should I take the car or the Metro? Should I take the highway or drive on surface streets? Luckily, the local transportation agency is giving out data. Imagine how much more frustrating it would be if you didn’t get a traffic report or a subway system alert at all? Imagine how disastrous it would be if public safety officials didn’t know where emergencies were happening?

That’s just one example, but you get the picture. Access to data is important because it keeps us informed when we need the information most. It can also help us in hindsight: we can use data to understand how things happened and why. And it can help us for the future: give us something to work from so we can improve outcomes, create safer conditions, propose new ideas.

But if the data isn’t available in an easy-to-examine electronic format, that makes the work harder. And I’m not just talking about getting a stack of paper documents that you have to scan. In journalism, we sometimes talk about bad data or messy data. It’s data that can’t be easily imported into the software we use to organize it, for example, PDFs. PDFs make it hard to do data journalism.

There are other examples too:

  • Spreadsheets that have missing information.
  • Text files that have multiple ways of spelling one thing. Or typos.
  • Files with meaningless garble in them.

Messy data.

It’s important to have clean, structured, easy-to-find data — because journalism is about getting things right while beating the clock. Even if you’re not a journalist, I’m sure you’re familiar with this kind of need for speed. The quicker we can get to the examination and analysis stage, the faster we can see the patterns, unearth the stories, and explain what happened.

With data, journalists are producing all kinds of things that help people understand complex issues: Maps explaining the seriousness of national drought. Charts and interactive graphs showing the relationship between money and politics. Games that really bring home what terms like “distracted driving” mean.

Data is at the heart of what journalism is — and the more substantive it is, the more organized it is, the more easily accessible it is, the better we all can understand the events that affect our world, our nation, our communities and ourselves.