Archives for category: Data + Graphics

NICAR 2015 banner
The short link to this list is (case sensitive).

Consider donating to IRE
Investigative Reporters and Editors logoThis is the fifth anniversary of my NICAR Links List! If you’ve found the lists helpful, consider donating some money to IRE to help them continue training people and bringing NICAR to you. Donate today. You know you want to.

And now, on to business…
It’s back! The annual collection of presentations, tutorials, and resources from IRE’s CAR conference. This year’s event comes to you from Atlanta, March 5 – 8. Keep up with the chatter on Twitter at #NICAR15.

For attendees, IRE has created an schedule in The Guidebook app (iOS, Android & Web). Very helpful for planning the tactical mission known as “managing your time.”

Jeremy Singer-Vine also created CSV & JSON outputs of the schedule, along with the Python scraper to DIY. And there’s a Google spreadsheet with all the sessions. Awesome.

If you’re presenting at NICAR and would like this list to include your resource (presentation, tutorial repo, etc.), please send it using this form, or ping me on Twitter @MacDiva.

If you’re looking for a job, IRE keeps a list of open positions as does Knight-Mozilla OpenNews at Source Jobs. If you’re specifically interested in data visualization jobs, look here.

California Code Rush 2015And finally, NICAR in the Peach State will see its first California Code Rush. The Golden State’s campaign finance and lobbying database is online, and the California Code Rush aims to make the data easier to download, review and republish. It’s an open source project with lots of opportunities to help.

For previous years’ tutorials, videos, presentations and tips see the lists from 2014, 2013, 2012 and 2011.

Jump to
Presentations & Tutorials | Software & Tools | References & Additional Resources | Lightning Talks | Work Samples

Presentations & Tutorials

Jump to
Presentations & Tutorials | Software & Tools | References & Additional Resources | Lightning Talks | Work Samples

Software & Tools

  • Tarbell – Google spreadsheets-based website publishing tool
  • Landsat-ul: A utility to search, download and process Landsat 8 satellite imagery
  • JPL’s SMAP Viewer (SMAP is “Soil Moisture Active Passive” satellite imaging)
  • Plotly – graph and share your data
  • Plug Tableau into Excel with Tableau’s Reshaper
  • Bokeh Python interactive visualization library
  • markdowneyjr turns Markdown into JSON for slightly easier copyediting of data files
  • tracks website page changes and notifies you.
  • The New York Times graphics desk’s ai2html changes Adobe Illustrator files into HTML & CSS | example output
  • The New York Times graphics desk’s ArchieML – a structured text format optimized for human writability
  • Minezy email exploration tool (prototype by T. Christian Miller)
  • TimelineCurator works with TimelineJS to extract temporal references in freeform text to generate a visual timeline
  • The Upshot’s Bedfellows command-line tool for exploring the PAC donor-recipient relationship

Jump to
Presentations & Tutorials | Software & Tools | References & Additional Resources | Lightning Talks | Work Samples

References & Other Resources from NICARians

Jump to
Presentations & Tutorials | Software & Tools | References & Additional Resources | Lightning Talks | Work Samples

Lighting Talks

Jump to
Presentations & Tutorials | Software & Tools | References & Additional Resources | Lightning Talks | Work Samples

Work Samples

Jump to
Presentations & Tutorials | Software & Tools | References & Additional Resources | Lightning Talks | Work Samples

Photo by Cory M. GrenierLast week, PBS MediaShift invited me to answer some questions about teaching visualization for its #EdShift Twitter chat. I was part of a virtual panel that included Meredith Broussard (Temple University), Alberto Cairo (University of Miami), Hannah Fairfield (The New York Times), Susan McGregor (Columbia University) and Molly Steenson (University of Wisconsin-Madison; Go Badgers). Katy Culver, MediaShift’s education curator, moderated.

The compiled Q&A is on MediaShift’s website. Some of my tweeted answers were a bit longer than what was captured, so I’m putting them in paragraph form here, lightly edited for typos and readability. My answers may be a bit stilted due to the constraints of tweeting in 140-character bursts.

We should teach dataviz in J-schools (and other schools as well) because it’s a valuable way to tell people information. Dataviz is most powerful when it elicits emotion and understanding, and makes people remember information.

When first starting out, I think the challenge is getting data that’s relatively clean and easy to understand, so students can focus on the examination/inquiry tasks of data visualization.

Lots of universities collect cleanish datasets, e.g.: University of Edinburgh School of Informatics, Stanford Network Analysis Project (SNAP), and the Open Science Data Cloud public datasets. And one of my favorite data scientists, Hilary Mason, has a collection of research-quality datasets.

For students/dataviz novices, I’d suggest using datasets for things students are already familiar with. Familiar datasets let students dive into analysis and examination without having to climb a high hurdle of subject expertise they don’t yet have. As the analysis and examination techniques become more familiar, start using less familiar and dirtier (messed up, error-laden, not normalized) datasets.

I love museums & I find museum datasets interesting. Do a web search for “museum collection dataset” & you’ll find things like Sydney’s Powerhouse Museum science and design dataset and the Canadian Museum of Nature collection data. There are many more museum APIs and datasets on this wiki.

With regard to data visualization tools… Honestly? OpenRefine and Microsoft Excel. They’re not flashy, but OpenRefine is the most useful tool available for cleaning and exploration. And Excel a workhorse. Can’t afford an Excel license? LibreOffice will serve you well.

Hannah Fairfield recommended this New York Times lesson plan by Shannon Doyne, Holly Epstein Ojalvo and Katherine Schulten, and I do to. It’s a great resource.

I’d also suggest looking at WTF Visualizations and asking students to point out what’s wrong.

The biggest mistake? Assuming everything you need to know is held within the dataset itself.

It’s important to remember that data collection is often skewed in some way. Sometimes it’s maliciously. More often not. Regardless, always ask yourself questions not just about the data, but who gathered it, how it was gathered, why it was gathered, and what other data or information might complement, enhance or refute the dataset you have.

(Question 7 was directed to Alberto, so we move on to question 8…)

You could look at the “should it be interactive” or not question a few ways:

1) “Should it move?”
Things that move tend to get a lot of attention. But are you making things move for movement’s sake? Don’t bother.

2) “Does it serve the story well to be presented as a slow reveal?”
Perhaps. For example, look at “Riding the New Silk Road,” which allows the reader to concentrate on the story, be pulled through the narrative, and understand geographic location all at once.
Riding the New Silk Road

3) “Does the dataviz have something ‘about me’ in it that’s important to discover?”
If it does, then yes, make it interactive. Take for example, this piece on “The Jobless Rate for People Like You.
The Jobless Rate for People Like You

I’d highly recommend that people interested in data vizualization also work through John Foreman’s “Data Smart: Using Data Science to Transform Information into Insight.” It’s readable and practical, and will teach you great techniques for analyzing data. John is Mailchimp’s chief data scientist.

(Photo: Cory M. Grenier/Flickr)

CAR 2013 Conference logo
NICAR13 brings together some of the sharpest minds and most experienced hands in investigative journalism. Over four days, people share, discuss and teach techniques for hunting leads, gathering data, and presenting stories. Of all the conferences I go to, this one gets the highest marks from attendees for intensive, immediately applicable learning; networking and fun.

No one could possibly absorb and remember everything presented, so below is your memory card. If you’re looking for highlights from this list, read my NICAR13 roundup for Nieman Lab, “Data science, commoditized backends, and the need to know code.”

Have links from sessions you attended? Post them in comments or ping me on Twitter @MacDiva and I’ll add them to this list.

If you’re looking for a job, IRE keeps a list of open positions. Here’s who’s hiring.

NICAR 2014 will be in Baltimore from Feb. 27 to March 2. You should be there.

For additional tutorials, videos, presentations and tips see the lists from 2012 and 2011.

Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples

Presentations & Tutorials

Dashboards for Reporting (from Aaron Bycoffe, Jacob Harris & Derek Willis)
Data Science for Nerdy Journalists (from Hadley Wickham)
  – Sisi Wei shares her class notes
Data Scraping with Google Docs (from Sean Sposito)
How to create an automatically updating Google spreadsheet (from Sharon Machlis)
Demystifying Web Scraping (from Sean Sposito & Acton Gordon)
Campaign Finance the Data Science Way (from Chase Davis)
Exploratory Data Analysis (from Chase Davis)
Hone your Google Fusion Tables training skills tutorial (from Sreeram Balakrishnan)
Data Mining Machine Learning (from Jeff Larson)
Practical Machine Learning (from Chase Davis & Jeff Larson)
Journalism, Branding & Social Media (from Mandy Jenkins) 
Social media search tips and tools (from Doug Haddix)
How the Los Angeles Times uses DocumentCloud (from Ben Welsh)
Using Excel for Data Analysis (from Krista Kjellman Schmidt)
Excel I: Sorting and filtering (from Linda Johnson)
Excel II: Rates and Ratios (from Denise Malan)
Excel Magic: Advanced functions for data cleaning and more | Excel data (from MaryJo Webster)
Make Your First News App with Django
Data on the Fly (from John Keefe & Mark Wert)
Digging Deep with Data Journalism (from Jill Riepenhoff)
Information Design & Crossing the Digital Divide (from Helene Sears)
Dataviz on a shoestring (from Sharon Machlis)
Introduction to Ruby (from Al Shaw)
The Data Driven Story: Conceiving & Launching (from Jennifer LaFleur & David Donald)
Dataviz, Responsive Web Design + Mobile: Friends or Frenemies? (from Miranda Mulligan & Pete Karl II)
• Quick steps to mastering SQL through SQLite (from Troy Thibodeaux)
  – Emma Carew Grovum shares her notes from the tutorial
Reporting without revealing: Tools for hiding your tracks (from Paula Lavigne)
Covert reporting using technology to cover your tracks (from Mike Tigas)
Learning Python for journalists (from Jeremy Bowers & Serdar Tumgoren)
  – Ask to join the Google group
Fun with data in sports journalism (from Jack Gillum)
After the game: Top data ideas for investigating $port through $pending (from Paula Lavigne)
Is 911 a Joke in Your Town? (from Ben Welsh)
• Sample code for Introduction to JavaScript the Right Way (from Jeff Larson)
Food waste investigations (from Erin Jordan)
Government waste investigations (from Tim Eberly)
Investigating government waste (from Josh Sweigart)
OpenRefine (formerly Google Refine) slides and cheat sheet (from Tom Meagher)
How can we get the widest impact out of software projects? (from Rich Gordon)
How to be ready for your social media Sandy (on discovery, validation and publication) (from Steve Myers)
Github repo and example code from Developing reusable visualization components using D3 and Backbone.js (from Alastair Dant)
Code for drought maps & Data & code .zip file (from Amanda Cox)
Web scraping with Node.js (from Al Shaw)
• Zip file for Python workshops 1 & 3 | Github repo (from Ron Campbell)
• Tip sheet for Python workshop 2, plus dataset for the workshop (from Christopher Schnaars)
• Mike Ball shares his notes from Tasneem Raja’s Smarter interactive Web projects with Google Spreadsheets and Tabletop.js talk
Data Roadmaps: Priming your desktop with certain data slices helps you spot trends, find people and understand your city (from T.L. Langford)
Making Health Data Sexy (from Charles Ornstein)
Infect the CMS (from Heather Billings, Jacob Harris and Al Shaw)
Making interactives fun | List of interactives shown during the talk (from Tasneem Raja and Sisi Wei)
Covering public pensions (from MaryJo Webster)
• Learn to use Git and Github and fork this cheat sheet (from Tom Meagher)
Making Timelines (from Krista Kjellman Schmidt and Lena Groeger)
Inside baseball: What data journalism can learn from sports (from Jeremy Bowers, Ryan Pitts and Matt Waite
Disasters: Preparing for and digging in after the storm (from Ben Poston)
5 data journalism projects you might not have seen before and why they matter in Europe (from Sebastian Mondial)
The One-Query Story (from Kate Martin)
Mapping Best Practices (from Dave Cole, John Keefe and Matt Stiles)
Web Scraping (and more) with Google Apps Script (from Steven Melendez)
NodeXL for Network Analysis (from Peter Aldhous)
Data-driven Beats (from Chris Amico)
Bringing Excel to the Web with SkyDrive (from Cathy Harley)
Navigating U.S. Census Data (from Erran F. Persley)
How to Serve Mad Traffic, Part I (from Jeremy Bowers)
How to Serve Mad Traffic, Part II (from Jacqui Maher) 

Lightning Talks
5 Algorithms in 5 Minutes | Video (from Chase Davis)
Let’s make games for news | Video (from Sisi Wei)
Big datasets, small streams | Video (from Katie Park)
Z-Scores: How You Can Compare Apples With Oranges (downloads a PowerPoint file) | Video (from Robert Gebeloff)
Casino-Driven Design | Video (from Al Shaw)
Be your wn Nate Silver | Video (from Jeff Larson)
ILENE, the polite coding language | Video (from Jennifer LaFleur and Jeff Larson)
Every State is Weird: A selection of election edge cases | Video (from Jacob Harris)
Dude Who Stole My Congressman? (Data in .xls | Visualization) (from Paul Parker)
• Code for the Arduino Baggage Handler | Video (from Matt Waite)
• “Django Retrained: 5 ways coding like a web developer can make you a better investigative reporter” | Slides (from Ben Welsh)

Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples

Software & Tools

BatchGeo – monitor website changes
Citizen Quotes – A project to demonstrate maximum entropy models for extracting quotes from news articles in Python.
CometDocs converts PDFs to Word and Excel docs
Tabula for pulling data out of PDFs
• Tried and true XPDF (PDFtoText)
DocHive PDF to XML converter
Python wrapper for the Document Cloud API
DownThemAll Firefox plug-in for downloading website assets (photos, video, etc.)
• Embed Excel Interactive View into your site
Fast Cluster, a command line tool for grouping documents by similarity (from Jeff Larson)
FOIA Machine (automate your Freedom of Information requests)
Geofeedia search and monitor social media by location
iWitness from Adaptive Path – search social media content by time and place
OpenRefine (the open source repo of the data cleaning tool formerly known as Google Refine)
Overview Project | Read the getting started guide
Scrape screen scraper Chrome extension. Journalist Jens Finnäs wrote a tutorial for it on Dataists.
Time Flow by Martin Wattenberg & Fernanda Viegas
Stately – a symbol font to create a map of the U.S. using HTML & CSS
Weka 3: Data mining software in Java
Cascading Tree Sheets
Dataset (part of the Miso Project) – grabs data from Google Spreadsheets and helps visualize the data
Datawrapper (open source)
Google Chart Tools
Tableau Public (Windows only)
Mapbox and Tilemill
Adobe Edge Animate free tool for creating interactive content
Spoofcard caller ID spoofing
Trap Call unblocks private numbers
Burner iPhone app creates disposable phone numbers
• Tools for hiding an IP address:
  – Anonymizer ($80)
  – Privoxy
  – BeHidden
  – Anonymous
  – IxQuick
Orbot provides Tor proxying on Android phones
Silent Circle encrypted communication app for iPhone and Android
Whois (search for domain name owners)
SpiderOak private, secure data stored in the cloud who to follow on social platforms (mobile app)
Hachi social platform search tool
R Project for Statistical Computing
R Studio
• Learn to unlock government data with Sunlight Academy offered by the Sunlight Foundation
JS Console for debugging JavaScript
Programming Ruby 1.9 & 2.0 (4th edition): The Pragmatic Programmers’ Guide
• Production code for Overview Server, which does visual document mining
mitmproxy (“man in the middle” proxy) inspect and edit traffic flows on the fly. SSL compatible.
Python Social Auth social authentication/registration mechanism
XCode iPhone simulator
jQuery Vertical Timeline by MinnPost
Rubular regular expression editor for Ruby
UltraEdit text editor (Windows only)
• Tom MacWright’s Mistakes interactive JS editor
Sphinx open source search engine
• NPR’s App Template project template for client-side apps
ILENE the polite coding language (from Jeff Larson)
Django Bakery helps bake your Django site out as flat files
Invar generates map tiles from a Mapnik configuration
Table Capture Chrome extension grabs table HTML and drops it into a Google doc
TableTools2 Firefox extension allows you to copy and manipulate table data from the Web
Haystax point-and-click data collection
• Sisi Wei’s presentation framework
Bank Tracker contains data on every FDIC bank
Shpescape converts shape files to TopoJSON
Numeric.js JavaScript library for numerical calculations
Pixel Ping pixel tracker
Helium Scraper extracts website data into structured formats such as CSV and XML
Choose Your Own Adventure plug-in from Mother Jones
Timeline JS
• The WNYC interactive Bingo card generator
Proof Finder search email and other unstructured data (designed for lawyers and investigators)
Paper of the Congressional Record (requires a key from Sunlight Labs)
YUI, an open source JavaScript and CSS library for developing interactive applications
Tarbell Google docs-driven CMS from the Chicago Tribune apps team (currently in alpha)
• Chase Davis’s FEC Standardizer code and explainer
• Al Shaw’s Dirtyword Ruby script cleans HTML from Word docs.

Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples


• Jeff Larson recommends “Eloquent JavaScript” as the best book for learning JS
Mike Bostock’s d3.js tutorials (from Sharon Machlis)
Scott Murray’s d3.js tutorials (from Sharon Machlis)
How to select, create & remove elements in d3.js (from Jerome Cukier and Scott Murray)
Computational Journalism syllabus from Journalism and Media Studies Center at the University of Hong Kong, Spring 2013 (from Jonathan Stray)
Connected China from Fathom & Reuters (background)
  – Notes on Connected China by Chris Amico
How to Bulletproof Your Data (from Jennifer LaFleur, ProPublica)
Federal Reserve Economic Data (includes international data and an API; from Federal Reserve Bank of St. Louis)
Little Sis, a database of relationships between people in business and government
OpenMissouri a collection of state and local government data from Missouri, some of which isn’t ordinarily made available online
Privacy Rights Clearinghouse
• ProPublica’s News Apps Style Guide
TheyRule shows the relationships between people in corporations
• Hadley Wickham’s academic paper on tidy data
• Hadley Wickham’s guide to using regular expressions in R
• ProPublica News Apps Desk Coding Manifesto
• ProPublica’s Principles of News App Design Structure
Pretty Good Privacy (PGP) data encryption
Tor Project
OpenElections Project, certified historical election results for everyone
Open Innovation and open APIs in Digital Journalism (academic paper by Tanja Aitamurto and Seth C. Lewis)
• Chart of the differences between PHP, Python and Ruby
How to build a stepper visualization
How to install MySQL on Mac OS or Windows
R for Journalists
A journalists’ guide to verifying images
Finding the Wisdom in the Crowd (on verifying images found on social platforms)
How to visualize your backlinks with Google Fusion Tables (network visualization tutorial)
Design Patterns: Elements of Reusable Object-Oriented Software
Hospital Compare from
• Winners of Kaggle’s campaign finance interactive reporting contest
Working with Tabletop.js and Handlebars.js
Impact of Responsive Designs
• Drew Conway’s Data Science Venn Diagram (now in d3.js!)
How to Not Screw Up Your Data
• Did you watch Ben Welsh’s lightning talk? Here’s the presentation he credits for changing his life: Writing reusable code by James Bennett, now at Mozilla. Read the revamped slides

Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples

Work Samples

The Year in CAR presentation by Mark Horvit and Megan Luther, IRE
  (7.1 MB PDF)
The Year in CAR wrap by Ryan Graff, Knight Lab
The Evolution of Sandy’s Path (
Paralax Scrolling: James Bond (BBC)
How the Chicago Tribune News Apps team made the Chicago Crime site
Chinese Chemicals Flow Unchecked Onto World Drug Market (The New York Times)
Income Inequality in America (Reuters)
Australians who don’t pay tax: what would Romney say? (Financial Review)
Mid-Year Economic and Fiscal Outlook (Financial Review)
Workout at Work (Washington Post)
Ad Libs (PBS Newshour)
Could you be an Olympic medalist (from The Guardian)
Fake medical providers slip through Medicare loophole (Atlanta Journal-Constitution)
Medicare fraudsters used UPS boxes to fleece millions from taxpayers (Dayton Daily News)
The Killing Roads 10 years of traffic accidents in Norway (

Jump to Tutorials | Software & Tools | References | Work Samples

Just take a look at this beauty:
Courier Prime by Alan Dague-Greene
This is Courier Prime, designed for screenwriters — people who must format their manuscripts in 12-point Courier so their productions can estimate timing and length. Courier Prime is sharper, has proper bold and italic faces, and crisper printing and on-screen display.

Best of all, it’s free. And if you’re a font geek like me, you’ll enjoy the backstory on Courier Prime’s design.

Earlier today, I spoke at the first-ever White House Safety Datapalooza, an event organized by the White House Office of Public Engagement, Office of Science and Technology Policy and the U.S. Department of Transportation. Invited speakers came from the public and private sector, universities and non-profits.

I was asked to give a “TED-style talk” about data journalism. In case you missed the live stream of it on White House Live, here’s the prepared script:

What is data journalism?

Lots of people — including journalists themselves — have different opinions of what it is right now. There are blogs about it, and websites about it, and lately, new degrees at universities about it. There are even big conferences where “What is data journalism?” is a keynote topic.

Most of us understand what journalism is, what “news” is. But “data journalism”?

Me? I think it’s a new buzz word for a very old process: gathering, examining and finding meaning within collections of information — and letting people know.

So let’s talk about data. You’re probably familiar with that word, “data.” You’re probably thinking data means numbers. Piles of numbers. And you’re right. But there’s other kinds of data too: Factoids. Photos. Video. Names and locations. Time logs. Points in space. Things that we collect, electronic files we save that may not seem individually relevant or interesting, but when interconnected and cross-referenced and analyzed, show us patterns, tell us stories, reveal truths.

Why is access to data important? You get up in the morning, you’re getting ready for work and you ask, should I take the car or the Metro? Should I take the highway or drive on surface streets? Luckily, the local transportation agency is giving out data. Imagine how much more frustrating it would be if you didn’t get a traffic report or a subway system alert at all? Imagine how disastrous it would be if public safety officials didn’t know where emergencies were happening?

That’s just one example, but you get the picture. Access to data is important because it keeps us informed when we need the information most. It can also help us in hindsight: we can use data to understand how things happened and why. And it can help us for the future: give us something to work from so we can improve outcomes, create safer conditions, propose new ideas.

But if the data isn’t available in an easy-to-examine electronic format, that makes the work harder. And I’m not just talking about getting a stack of paper documents that you have to scan. In journalism, we sometimes talk about bad data or messy data. It’s data that can’t be easily imported into the software we use to organize it, for example, PDFs. PDFs make it hard to do data journalism.

There are other examples too:

  • Spreadsheets that have missing information.
  • Text files that have multiple ways of spelling one thing. Or typos.
  • Files with meaningless garble in them.

Messy data.

It’s important to have clean, structured, easy-to-find data — because journalism is about getting things right while beating the clock. Even if you’re not a journalist, I’m sure you’re familiar with this kind of need for speed. The quicker we can get to the examination and analysis stage, the faster we can see the patterns, unearth the stories, and explain what happened.

With data, journalists are producing all kinds of things that help people understand complex issues: Maps explaining the seriousness of national drought. Charts and interactive graphs showing the relationship between money and politics. Games that really bring home what terms like “distracted driving” mean.

Data is at the heart of what journalism is — and the more substantive it is, the more organized it is, the more easily accessible it is, the better we all can understand the events that affect our world, our nation, our communities and ourselves.

The graphics desk at The New York Times gets high praise amongst journalists and visual information specialists for their clear, clean and often creative graphics that explain and enhance news.

Among the team is the group’s resident statistician, Amanda Cox, who’s been hailed as the “queen of infographics” and has been responsible for some of the high concept pieces published by NYT.

Any time you can hear Amanda speak or learn from her, you should. At the Eyeo Festival in June, she looked at the evolution of data graphics, particularly within the history of the Times graphics department.

Want to learn more? Kevin Quealy, Amanda’s coworker, posts fantastic explanations of how she and other members of the graphics desk do some of its work. Follow along on Charts’n’Things.

One of the most popular posts on Ricochet was the collection of dataviz tools, slides and links from last year’s NICAR conference.

It was so popular, in fact, that people have asked me to make a similar collection again. So from Feb. 23–26, I’ll be updating this post with all the great things NICARians have to share this year.

Follow #NICAR12 on Twitter for the buzz; come to this page for the goods. And if you’re attending the conference, be sure to buy a T-shirt to support IRE, the organization that puts this fantastic event together. Ben Welsh of The Los Angeles Times is taking candid photos and posting them on Flickr.

Have links from sessions you attended? Post them in comments or ping me on Twitter @MacDiva and I’ll add them to this list.

Jump to Presentations & Tutorials | Software & Tools | References | Work Samples

Presentations & Tutorials

Bringing Maps to Fruition (from Michelle Minkoff)
Free tools for scraping data without programming (from Chris Keller and Michelle Minkoff)
Instructions for Hands-on Web Scraping Without Programming (from Chris Keller and Michelle Minkoff)
Locating the Story: The Latest in Online Maps and mapping links (from Ben Welsh)
Mapping links & presentation (from David Herzog)
Social Media Sleuthing (from Doug Haddix)
freeDive Tips & Tricks (from the Knight Digital Media Center)
CAR on a Shoestring (from Kevin Crowe, Patrick Sweet and Mary Jo Webster)
Regular Expressions: An Introduction (from Kevin Crowe, Patrick Sweet and Mary Jo Webster)
Create a moderation form using Google Forms and Fusion Tables
Scraping with Django (from Kevin Schaul)
How to turn PDFs into a searchable, sortable table (from Kevin Schaul)
Get the Most Out of Fusion Tables (from Rebecca Shapley)
Data viz in 20 minutes: jQuery DataTables (from Christopher Schnaars)
How to set up Python in Windows 7 (from Anthony DeBarros)
Data visualization best practices (from Kat Downs)
NodeXL for Network analysis (from Peter Aldhous)
Network Analysis for News (from Peter Aldhous and Peggy Heinkel-Wolfe)
Network analysis for news (video of Peter Aldhous’s NICAR12 talk)
How to Use Google Refine for Investigative Journalism (from Dan Nguyen)
Mapping is for Everyone – How to make all kinds of maps (from Sharon Machlis)
Advanced Excel techniques tipsheet (from MaryJo Webster)
How do you edit a story made of software? (from Alexander Howard)
Election Night Results & Maps (from John Keefe)
Covering Elections presentation (from Al Shaw)
Making friends with map projections (from Ben Welsh and Michael Corey)
Database validation (from JT Johnson)
Web scraping with Node.js (from Al Shaw)
Who is John Doe — and where to get the paper on him
Practical TastyPie for the Modern Djangonaut (from Jeremy Bowers)
Weathering the Storm: Using data to bolster the traditional weather story (from Stephen Stirling)
Build your first Django news app (from the IRE NICAR12 Django workshop)
GeoCommons walkthrough (from Paul Monies)
QGIS 1 workshop tutorial (from Michael Corey)
Tell Me a Story! – storytelling and data journalism (from Anthony DeBarros)
Human-assisted reporting: How to create robot reporters in your own image (from Ben Welsh)
How I learned to stop worrying and love flat files (from Ben Welsh)
Infect the CMS (from Jacob Harris)
Inspect the Web With Your Browser’s Web Inspector (from Dan Nguyen)
An Intro to R (from Jacob Fenton)
Slides from “Mapping is Hard” (from Brian Boyer)
TileMill hands-on tutorial (from Chris Amico, Brian Boyer and Matt Stiles)
Own Your Map Stack (from Chris Amico, Brian Boyer and Matt Stiles)
Natural Language Toolkit (NLTK) basics (from Jacob Perkins)
Connecting to state data using (from David Herzog)
How to convert PDFs to Excel in Windows (from IRE)
Quantum GIS (QGIS) 2 workshop (from Michael Corey)
How to turn PDFs into text (from Dan Nguyen)
Web scraping in Python workshop tutorial (from Mark Ng)
Infiltrate the Ad Department (from Ryan Pitts)
Map Graphics for Video (from Michael Corey)
What We Can Find Out from Elections (from Aaron Bycoffe)
The Latest in Mapping with Javascript and jQuery (from Timothy Barmann)
How to Make a PANDA (from Brian Boyer)
The Farenthold Surprise (election panel presentation from Derek Willis)
Displaying data geographically: Creating a one-layer map in ArcMap (from Tom Meagher)
An intro to csvKit (from Christopher Groskopf and Anthony DeBarros)
Integrating CAR into a daily Beat (from Kate Martin)
How to use the SIMILE Exhibit timeline framework (from David Karger)
Tableau training handouts (from Tableau)
CAR Training 2012 including mapping data sets, practice data sets and tip sheets (from Jennifer LaFleur)

Jump to Presentations & Tutorials | Software & Tools | References | Work Samples

Software & Tools

Twazzup – find breaking news, popular hashtags, influential users
Reporters’ Lab Reviews – a link list of tools, techniques and research for public affairs reporting
Twellow – a yellow pages for Twitter
Twiangulate – find sources and groups of people on Twitter
Crowdbooster – monitor and analyze buzz on social media sites
KnowEm Username Search – finds the social networks a person or organization/brand is using
Muckrack Pro – add yourself to the list of journalists or find journalists covering a particular topic
The Archivist – save tweets and export to Excel to analyze later
PowerPivot for Excel – “Load massive amounts of data from virtually any source, process in seconds and model with powerful analytical capabilities”
Pandoc – a universal document converter
HTML-to-PDF – converts HTML to PDF docs for free
Mr. Data Converter – converts Excel data into one of several Web-friendly formats, including HTML, JSON and XML.
Natural Language Toolkit – for machine language text analysis
Voyant Tools – Web-based document analysis
ClearForest Gnosis – Firefox plugin that uses OpenCalais for data extraction
Exhibit – a publishing framework for data-rich interactive web pages
DocumentCloud – store, analyze and annotate PDFs
DataTables – jQuery plugin to create sortable datasets
Ben Welsh’s triumvirate of tools that allow you to copy Google Maps’ functionality:
   – a data source, like OpenStreetMap
   – a tile set, like what you can make with TileMill
   – a JavaScript interface, like Leaflet
OpenOffice – open source office suite software (word processor, spreadsheet, presentation/slide deck, database)
QGIS – Open source geographic information system
Shape to Fusion (a.k.a. Shpescape) – Import shapefiles to Fusion Tables
MySQL – Database software
Google Refine – data cleaner
Junar – Discover and track data
The Overview Project
Visicheck – ensures your graphics are visible to the colorblind
Colorbrewer – in case you need help with color schemes for your design
Color Oracle – colorblindness simulator for Mac OS, Windows and Linux
0 to 255 – find variations of any color
Beautiful Soup – useful for many things, including parsing HTML
Weave – Web-based analysis and visualization environment. Made by a partnership between the University of Massachusetts Lowell and Open Indicators Consortium
Highcharts – create interactive JavaScript charts (free for non-commercial use)
Indiemapper – Upload shapefiles and convert them to create static, thematic maps
CSV-to-JSON converter
Sinatra a lightweight Ruby/Rails framework for creating apps
• Use Google Docs, XPath and the =importxml() function to put data in a spreadsheet
PANDA Project
Timemap syncs a SIMILE timeline to a web-based map
Tabletop – allows you to use Google spreadsheets as your app backend
Js2Coffee – converts Javascript to CoffeeScript and back
CoffeeScript sandbox
iPL2 – ask a librarian, search through the Internet Public Library (IPL) and the Librarians’ Internet Index (LII) websites.
• “Lesson of the night: Want to put census geos in fusion tables? Keep it stupid simple: convert US Census data from TIGER into shape files with shpescape” — tip from Matt Kiefer
Rubular – a Ruby regular expression editor
Timeline Setter – makes timelines from spreadsheets
Spoofcard changes your voice and gives you a temporary phone number
Tablechart turns HTML tables into charts
Spam Mimic – hide a message in spam
FEC scraper/FEC parser – Chris Schnaars’ script on Github

Jump to Presentations & Tutorials | Software & Tools | References | Work Samples


• The American Library Association’s wiki of government databases (from Dan Nguyen)
Penn Treebank Project reference – Use it in conjunction with the Natural Language Toolkit (NLTK)
Geomedia Google Group
NICAR-L mailing list
Google Public Data Explorer
InfoVis Wiki – a catchcall list of papers, conferences, patterns and jobs in information visualization
Spatial Reference – an IMDB-like catalog of spatial reference systems
22 free visualization tools collected by ComputerWorld
Free Data Visualization tools – a collection from Sharon Machlis
8 cool tools for data analysis, visualization and presentation (from Sharon Machlis)
Chart and image gallery: 30 free tools for data visualization and analysis (from Sharon Machlis) – find health data from more than 70 sources and 300+ datasets
Analytic Journalism “It’s not ‘all about story’ if you don’t have anything to say.”
How to install MySQL and Navicat on Windows
Freebase – an entity graph/Wikipedia-like collection of data
Save the Post Office – records U.S. post office consolidations and closures
• Los Angeles Times datadesk Github repository with code for you to use – Official record of Federal Funding Accountability and Transparency Act (Transparency Act)
&bull: Data for the Public Good by Alexander Howard (free eBook) shows what Illinois congressional candidates are tweeting about
Civic Commons Marketplace collects open government efforts in the U.S.
OpenCorporates is in the process of collecting information on every corporate entity in the world
• USA Today’s Developer Network

Jump to Presentations & Tutorials | Software & Tools | References | Work Samples

Work Samples

Bailed out banks profit from tax liens (Arizona Star heat maps showed property locations, making the story very clear)
Race gap found in traffic stops (Milwaukee Journal-Sentinel showed the racial disparity in pullovers and on further examination, municipal maintenance requests)
Texas redistricting map and slider code (Texas Tribune)
The Poverty Gap shows a clear correlation between poverty and access to education (ProPublica)
2012 Election Results big board, one approach to visual presentation of election info that tells you the story of the election immediately (The New York Times)
Little Loving County grabs a bit of Texas’ growth a census story unlike the usual census stories (The Dallas Morning News)
Riot rumours: how misinformation spread on Twitter during a time of crisis uses data analysis to watch the spread and suppression of rumors about the London riots (The Guardian)
Discover Boston Public Schools (Code for America)
SchoolBook makes teacher data reports for New York City schools
Redistricting: New lines leave some voters without a senator (The [Riverside, Calif.] Press-Enterprise)

Jump to Tutorials | Software & Tools | References | Work Samples

And finally, no journalism nerdfest would be complete without a demonstration of the latest hotness: Drone journalism by Matt Waite.

Drone Journalism Demo – Matt Waite from John Keefe on Vimeo.

Last week, Alastair Dant, lead interactive technologist at The Guardian, came to Hacks/Hackers NYC to show how his team produces its informative and award-winning interactive graphics.

It’s a wide-ranging talk about what’s new and inspiring about news technology, and how each team member’s unique skills contribute to the whole.

Well worth watching. And if you want to deeply nerd out with The Guardian, check out their Developer Blog.

The projects mentioned in Alastair’s talk:

Alastair’s team is Martin Shuttleworth, Mariana Santos, Jonathan Richards and Alex Graul.

I’ve started writing a few how-tos for The first piece is a step-by-step illustrated tutorial on how to make an interactive heat map with Google Fusion Tables, like the one below.

If you’d like to see other tutorials, let me know what you’re looking for.