Archives for category: General Journalism

PBS Off Book wanted to know: “Is code the most important language in the world?” They asked Adda Birnir of Skillcrush, Edd Dumbill of Silicon Valley Data Science, Evan Korth of of NYU and me to weigh in.

CAR 2013 Conference logo
NICAR13 brings together some of the sharpest minds and most experienced hands in investigative journalism. Over four days, people share, discuss and teach techniques for hunting leads, gathering data, and presenting stories. Of all the conferences I go to, this one gets the highest marks from attendees for intensive, immediately applicable learning; networking and fun.

No one could possibly absorb and remember everything presented, so below is your memory card. If you’re looking for highlights from this list, read my NICAR13 roundup for Nieman Lab, “Data science, commoditized backends, and the need to know code.”

Have links from sessions you attended? Post them in comments or ping me on Twitter @MacDiva and I’ll add them to this list.

If you’re looking for a job, IRE keeps a list of open positions. Here’s who’s hiring.

NICAR 2014 will be in Baltimore from Feb. 27 to March 2. You should be there.

For additional tutorials, videos, presentations and tips see the lists from 2012 and 2011.

Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples
 

Presentations & Tutorials


Dashboards for Reporting (from Aaron Bycoffe, Jacob Harris & Derek Willis)
Data Science for Nerdy Journalists (from Hadley Wickham)
  – Sisi Wei shares her class notes
Data Scraping with Google Docs (from Sean Sposito)
How to create an automatically updating Google spreadsheet (from Sharon Machlis)
Demystifying Web Scraping (from Sean Sposito & Acton Gordon)
Campaign Finance the Data Science Way (from Chase Davis)
Exploratory Data Analysis (from Chase Davis)
Hone your Google Fusion Tables training skills tutorial (from Sreeram Balakrishnan)
Data Mining Machine Learning (from Jeff Larson)
Practical Machine Learning (from Chase Davis & Jeff Larson)
Journalism, Branding & Social Media (from Mandy Jenkins) 
Social media search tips and tools (from Doug Haddix)
How the Los Angeles Times uses DocumentCloud (from Ben Welsh)
Using Excel for Data Analysis (from Krista Kjellman Schmidt)
Excel I: Sorting and filtering (from Linda Johnson)
Excel II: Rates and Ratios (from Denise Malan)
Excel Magic: Advanced functions for data cleaning and more | Excel data (from MaryJo Webster)
Make Your First News App with Django
Data on the Fly (from John Keefe & Mark Wert)
Digging Deep with Data Journalism (from Jill Riepenhoff)
Information Design & Crossing the Digital Divide (from Helene Sears)
Dataviz on a shoestring (from Sharon Machlis)
Introduction to Ruby (from Al Shaw)
The Data Driven Story: Conceiving & Launching (from Jennifer LaFleur & David Donald)
Dataviz, Responsive Web Design + Mobile: Friends or Frenemies? (from Miranda Mulligan & Pete Karl II)
• Quick steps to mastering SQL through SQLite (from Troy Thibodeaux)
  – Emma Carew Grovum shares her notes from the tutorial
Reporting without revealing: Tools for hiding your tracks (from Paula Lavigne)
Covert reporting using technology to cover your tracks (from Mike Tigas)
Learning Python for journalists (from Jeremy Bowers & Serdar Tumgoren)
  – Ask to join the Google group
Fun with data in sports journalism (from Jack Gillum)
After the game: Top data ideas for investigating $port through $pending (from Paula Lavigne)
Is 911 a Joke in Your Town? (from Ben Welsh)
• Sample code for Introduction to JavaScript the Right Way (from Jeff Larson)
Food waste investigations (from Erin Jordan)
Government waste investigations (from Tim Eberly)
Investigating government waste (from Josh Sweigart)
OpenRefine (formerly Google Refine) slides and cheat sheet (from Tom Meagher)
How can we get the widest impact out of software projects? (from Rich Gordon)
How to be ready for your social media Sandy (on discovery, validation and publication) (from Steve Myers)
Github repo and example code from Developing reusable visualization components using D3 and Backbone.js (from Alastair Dant)
Code for drought maps & Data & code .zip file (from Amanda Cox)
Web scraping with Node.js (from Al Shaw)
• Zip file for Python workshops 1 & 3 | Github repo (from Ron Campbell)
• Tip sheet for Python workshop 2, plus dataset for the workshop (from Christopher Schnaars)
• Mike Ball shares his notes from Tasneem Raja’s Smarter interactive Web projects with Google Spreadsheets and Tabletop.js talk
Data Roadmaps: Priming your desktop with certain data slices helps you spot trends, find people and understand your city (from T.L. Langford)
Making Health Data Sexy (from Charles Ornstein)
Infect the CMS (from Heather Billings, Jacob Harris and Al Shaw)
Making interactives fun | List of interactives shown during the talk (from Tasneem Raja and Sisi Wei)
Covering public pensions (from MaryJo Webster)
• Learn to use Git and Github and fork this cheat sheet (from Tom Meagher)
Making Timelines (from Krista Kjellman Schmidt and Lena Groeger)
Inside baseball: What data journalism can learn from sports (from Jeremy Bowers, Ryan Pitts and Matt Waite
Disasters: Preparing for and digging in after the storm (from Ben Poston)
5 data journalism projects you might not have seen before and why they matter in Europe (from Sebastian Mondial)
The One-Query Story (from Kate Martin)
Mapping Best Practices (from Dave Cole, John Keefe and Matt Stiles)
Web Scraping (and more) with Google Apps Script (from Steven Melendez)
NodeXL for Network Analysis (from Peter Aldhous)
Data-driven Beats (from Chris Amico)
Bringing Excel to the Web with SkyDrive (from Cathy Harley)
Navigating U.S. Census Data (from Erran F. Persley)
How to Serve Mad Traffic, Part I (from Jeremy Bowers)
How to Serve Mad Traffic, Part II (from Jacqui Maher) 

Lightning Talks
5 Algorithms in 5 Minutes | Video (from Chase Davis)
Let’s make games for news | Video (from Sisi Wei)
Big datasets, small streams | Video (from Katie Park)
Z-Scores: How You Can Compare Apples With Oranges (downloads a PowerPoint file) | Video (from Robert Gebeloff)
Casino-Driven Design | Video (from Al Shaw)
Be your wn Nate Silver | Video (from Jeff Larson)
ILENE, the polite coding language | Video (from Jennifer LaFleur and Jeff Larson)
Every State is Weird: A selection of election edge cases | Video (from Jacob Harris)
Dude Who Stole My Congressman? (Data in .xls | Visualization) (from Paul Parker)
• Code for the Arduino Baggage Handler | Video (from Matt Waite)
• “Django Retrained: 5 ways coding like a web developer can make you a better investigative reporter” | Slides (from Ben Welsh)


Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples
 

Software & Tools


BatchGeo
ChangeDetection.com – monitor website changes
Citizen Quotes – A project to demonstrate maximum entropy models for extracting quotes from news articles in Python.
CometDocs converts PDFs to Word and Excel docs
Tabula for pulling data out of PDFs
• Tried and true XPDF (PDFtoText)
DocHive PDF to XML converter
Python wrapper for the Document Cloud API
DownThemAll Firefox plug-in for downloading website assets (photos, video, etc.)
• Embed Excel Interactive View into your site
Fast Cluster, a command line tool for grouping documents by similarity (from Jeff Larson)
FOIA Machine (automate your Freedom of Information requests)
Geofeedia search and monitor social media by location
iWitness from Adaptive Path – search social media content by time and place
OpenRefine (the open source repo of the data cleaning tool formerly known as Google Refine)
Overview Project | Read the getting started guide
Scrape screen scraper Chrome extension. Journalist Jens Finnäs wrote a tutorial for it on Dataists.
Time Flow by Martin Wattenberg & Fernanda Viegas
Stately – a symbol font to create a map of the U.S. using HTML & CSS
Weka 3: Data mining software in Java
Cascading Tree Sheets
Dataset (part of the Miso Project) – grabs data from Google Spreadsheets and helps visualize the data
Datawrapper (open source)
Google Chart Tools
Infogram
ManyEyes
Tabletop.js
Tableau Public (Windows only)
Mapbox and Tilemill
Statwing
Adobe Edge Animate free tool for creating interactive content
Spoofcard caller ID spoofing
Trap Call unblocks private numbers
Burner iPhone app creates disposable phone numbers
• Tools for hiding an IP address:
  – Anonymizer ($80)
  – Privoxy
  – BeHidden
  – Anonymous
  – IxQuick
Orbot provides Tor proxying on Android phones
Silent Circle encrypted communication app for iPhone and Android
Whois (search for domain name owners)
SpiderOak private, secure data stored in the cloud
Foller.me who to follow on social platforms
Twazzup.com
Ban.jo (mobile app)
Hachi social platform search tool
R Project for Statistical Computing
R Studio
• Learn to unlock government data with Sunlight Academy offered by the Sunlight Foundation
JS Console for debugging JavaScript
Programming Ruby 1.9 & 2.0 (4th edition): The Pragmatic Programmers’ Guide
• Production code for Overview Server, which does visual document mining
mitmproxy (“man in the middle” proxy) inspect and edit traffic flows on the fly. SSL compatible.
Python Social Auth social authentication/registration mechanism
XCode iPhone simulator
jQuery Vertical Timeline by MinnPost
Rubular regular expression editor for Ruby
UltraEdit text editor (Windows only)
• Tom MacWright’s Mistakes interactive JS editor
Sphinx open source search engine
• NPR’s App Template project template for client-side apps
ILENE the polite coding language (from Jeff Larson)
Django Bakery helps bake your Django site out as flat files
Invar generates map tiles from a Mapnik configuration
Table Capture Chrome extension grabs table HTML and drops it into a Google doc
TableTools2 Firefox extension allows you to copy and manipulate table data from the Web
Haystax point-and-click data collection
• Sisi Wei’s presentation framework
Bank Tracker contains data on every FDIC bank
Shpescape converts shape files to TopoJSON
Numeric.js JavaScript library for numerical calculations
Pixel Ping pixel tracker
Helium Scraper extracts website data into structured formats such as CSV and XML
Choose Your Own Adventure plug-in from Mother Jones
Timeline JS
• The WNYC interactive Bingo card generator
Proof Finder search email and other unstructured data (designed for lawyers and investigators)
Paper of the Congressional Record (requires a key from Sunlight Labs)
YUI, an open source JavaScript and CSS library for developing interactive applications
Tarbell Google docs-driven CMS from the Chicago Tribune apps team (currently in alpha)
• Chase Davis’s FEC Standardizer code and explainer
• Al Shaw’s Dirtyword Ruby script cleans HTML from Word docs.


Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples
 

References


• Jeff Larson recommends “Eloquent JavaScript” as the best book for learning JS
Mike Bostock’s d3.js tutorials (from Sharon Machlis)
Scott Murray’s d3.js tutorials (from Sharon Machlis)
How to select, create & remove elements in d3.js (from Jerome Cukier and Scott Murray)
Computational Journalism syllabus from Journalism and Media Studies Center at the University of Hong Kong, Spring 2013 (from Jonathan Stray)
Connected China from Fathom & Reuters (background)
  – Notes on Connected China by Chris Amico
How to Bulletproof Your Data (from Jennifer LaFleur, ProPublica)
Federal Reserve Economic Data (includes international data and an API; from Federal Reserve Bank of St. Louis)
Little Sis, a database of relationships between people in business and government
OpenMissouri a collection of state and local government data from Missouri, some of which isn’t ordinarily made available online
Privacy Rights Clearinghouse
• ProPublica’s News Apps Style Guide
TheyRule shows the relationships between people in corporations
• Hadley Wickham’s academic paper on tidy data
• Hadley Wickham’s guide to using regular expressions in R
• ProPublica News Apps Desk Coding Manifesto
• ProPublica’s Principles of News App Design Structure
Pretty Good Privacy (PGP) data encryption
Tor Project
OpenElections Project, certified historical election results for everyone
Open Innovation and open APIs in Digital Journalism (academic paper by Tanja Aitamurto and Seth C. Lewis)
• Chart of the differences between PHP, Python and Ruby
How to build a stepper visualization
How to install MySQL on Mac OS or Windows
R for Journalists
A journalists’ guide to verifying images
Finding the Wisdom in the Crowd (on verifying images found on social platforms)
How to visualize your backlinks with Google Fusion Tables (network visualization tutorial)
Design Patterns: Elements of Reusable Object-Oriented Software
Hospital Compare from Medicare.gov
• Winners of Kaggle’s campaign finance interactive reporting contest
Working with Tabletop.js and Handlebars.js
Impact of Responsive Designs
• Drew Conway’s Data Science Venn Diagram (now in d3.js!)
How to Not Screw Up Your Data
• Did you watch Ben Welsh’s lightning talk? Here’s the presentation he credits for changing his life: Writing reusable code by James Bennett, now at Mozilla. Read the revamped slides


Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples
 

Work Samples


The Year in CAR presentation by Mark Horvit and Megan Luther, IRE
  (7.1 MB PDF)
The Year in CAR wrap by Ryan Graff, Knight Lab
The Evolution of Sandy’s Path (Weather.com)
Paralax Scrolling: James Bond (BBC)
How the Chicago Tribune News Apps team made the Chicago Crime site
Chinese Chemicals Flow Unchecked Onto World Drug Market (The New York Times)
Income Inequality in America (Reuters)
Australians who don’t pay tax: what would Romney say? (Financial Review)
Mid-Year Economic and Fiscal Outlook (Financial Review)
Workout at Work (Washington Post)
Ad Libs (PBS Newshour)
Could you be an Olympic medalist (from The Guardian)
Fake medical providers slip through Medicare loophole (Atlanta Journal-Constitution)
Medicare fraudsters used UPS boxes to fleece millions from taxpayers (Dayton Daily News)
The Killing Roads 10 years of traffic accidents in Norway (bt.no)

Jump to Tutorials | Software & Tools | References | Work Samples

In a single slide deck, LinkedIn co-founder Reid Hoffman offers salient and practical advice for working and continuing to be employed in changing times.

Even with the slightly salesy stuff about using LinkedIn at the end, this is worth flipping through and applying to your life, no matter where you are in your career as staffer, entrepreneur or freelancer (which, in my book, is an entrepreneur, but I’ll save that for another time).

Asking a question at IRE Las Vegas (2010), photo by Ben Welsh
Can you believe it? The annual Computer Assisted Reporting conference (also known as NICAR) is about three short weeks away.

Of all the events I’ve been to, this is the one I get the most out of. All of the sessions are meant to teach you skills you can apply immediately and reveal deep insights that will help you grow as a journalist.

Like years past, I’ll be collecting links to the tutorials, presentations, slide decks and video from NICAR13 and posting them here. In preparation — especially for new attendees — here’s some stuff you should know:

  • There will be 5-minute lightning talks. You could give one. In fact, IRE is taking talk proposals and votes right now. The most popular talks will be presented on Friday, March 1, at 4 p.m.
  • If you want one-on-one mentoring at the conference, sign up by Feb. 7. Organizers will then pair mentees up with mentors. Mentees: Bring work sample and story ideas. Mentorship slots fill up quickly, so apply today.
  • If you’re taking any hands-on training sessions or Hadley Wickham‘s data science masterclass, you might receive emails insisting you install a bunch of software before you arrive. Take the instructions seriously. Do not wait until the last minute or you will be very sad and very, very lost during class.
  • Ersi is offering a free ArcGIS for Desktop license (worth $1,500) if you attend all four of their 50-minute training and demo sessions. If you’re doing a lot of cartography and GIS work, you might want to consider it.
  • There’s Q&A after almost every session, and there’s always a pause before someone speaks up. So prepare a question (and please, not one of the “see how I’m smarter than you?” variety) and use your first-mover advantage.

NICAR is really friendly. If you’ve got a question or you have a reporting problem you’re trying to solve, just ask someone for help.

And if you want to be really prepared, Chris Fralic of First Round Capital has great advice on how to work a conference.

(Photo from IRE 2010 by Ben Welsh/Flickr)

Cute kitten by Brett Jordan on Flickr
No, no. It’s not entirely about cuteness. Though it does help.

Upworthy shares its great advice below.

Adorable kitten photo via Brett Jordan/Flickr

SXSW Accelerator 2013 logoWant to send your startup to SXSW Interactive (March 11-15, 2013)?

Apply to the 2013 SXSW Accelerator, an opportunity to showcase your emerging technology product or service in front of industry leaders.

The event takes place March 11 and 12 as a part of SXSW Interactive. If chosen for Accelerator, you can improve your product launch, attract venture capitalists, polish your elevator pitch, receive media exposure, build brand awareness and network. And you’ll get two comped registrations to SXSW Interactive.

The deadline to register is Friday, Nov. 9, so get details and enter as soon as you can.

I’ve joined the SXSW Accelerator board this year, so let me know once you’ve applied — and if you have any questions before the deadline, first check the Accelerator FAQ.

If the FAQ doesn’t answer your questions, post a comment or send an email and I’ll do my best to get you an answer or direct you to someone who can.

The data team at WNYC has one for you. The data team is led by John Keefe, who in three short years went from being the public radio website’s code-curious news director to full-on news developer.

Google Politics & Elections is heavily promoting their offerings. Among them, a live stream of tonight’s presidential debate on domestic policy between Democrat incumbent Barack Obama and Republican challenger Mitt Romney.

It’ll start at 9 p.m. Eastern Time (your local time equivalent is below). Print a few bingo cards made by Erica Smith to turn it into a social event.

Convert time zones with worldtimebuddy.com

Earlier today, I spoke at the first-ever White House Safety Datapalooza, an event organized by the White House Office of Public Engagement, Office of Science and Technology Policy and the U.S. Department of Transportation. Invited speakers came from the public and private sector, universities and non-profits.

I was asked to give a “TED-style talk” about data journalism. In case you missed the live stream of it on White House Live, here’s the prepared script:

What is data journalism?

Lots of people — including journalists themselves — have different opinions of what it is right now. There are blogs about it, and websites about it, and lately, new degrees at universities about it. There are even big conferences where “What is data journalism?” is a keynote topic.

Most of us understand what journalism is, what “news” is. But “data journalism”?

Me? I think it’s a new buzz word for a very old process: gathering, examining and finding meaning within collections of information — and letting people know.

So let’s talk about data. You’re probably familiar with that word, “data.” You’re probably thinking data means numbers. Piles of numbers. And you’re right. But there’s other kinds of data too: Factoids. Photos. Video. Names and locations. Time logs. Points in space. Things that we collect, electronic files we save that may not seem individually relevant or interesting, but when interconnected and cross-referenced and analyzed, show us patterns, tell us stories, reveal truths.

Why is access to data important? You get up in the morning, you’re getting ready for work and you ask, should I take the car or the Metro? Should I take the highway or drive on surface streets? Luckily, the local transportation agency is giving out data. Imagine how much more frustrating it would be if you didn’t get a traffic report or a subway system alert at all? Imagine how disastrous it would be if public safety officials didn’t know where emergencies were happening?

That’s just one example, but you get the picture. Access to data is important because it keeps us informed when we need the information most. It can also help us in hindsight: we can use data to understand how things happened and why. And it can help us for the future: give us something to work from so we can improve outcomes, create safer conditions, propose new ideas.

But if the data isn’t available in an easy-to-examine electronic format, that makes the work harder. And I’m not just talking about getting a stack of paper documents that you have to scan. In journalism, we sometimes talk about bad data or messy data. It’s data that can’t be easily imported into the software we use to organize it, for example, PDFs. PDFs make it hard to do data journalism.

There are other examples too:

  • Spreadsheets that have missing information.
  • Text files that have multiple ways of spelling one thing. Or typos.
  • Files with meaningless garble in them.

Messy data.

It’s important to have clean, structured, easy-to-find data — because journalism is about getting things right while beating the clock. Even if you’re not a journalist, I’m sure you’re familiar with this kind of need for speed. The quicker we can get to the examination and analysis stage, the faster we can see the patterns, unearth the stories, and explain what happened.

With data, journalists are producing all kinds of things that help people understand complex issues: Maps explaining the seriousness of national drought. Charts and interactive graphs showing the relationship between money and politics. Games that really bring home what terms like “distracted driving” mean.

Data is at the heart of what journalism is — and the more substantive it is, the more organized it is, the more easily accessible it is, the better we all can understand the events that affect our world, our nation, our communities and ourselves.

The Livingston Awards for Young Journalists announced its nominees yesterday.

The annual prizes recognize outstanding reporting by journalists under 35. Winners of the $10,000 prizes for local, national and international reporting will be announced June 6.

Sadly, the official announcement doesn’t include links to the entries so I’m collecting them here. I’ve started digging, but this seems like a relatively “quiet” award (unlike The Pulitzer Prizes, which get a ton of coverage).

You can help me by sending a link to the entry plus some verification (a press release or story from the outlet, for example). This year’s goal is to improve upon the list I made for the 2009 awards. Thanks for your help.

Finalists for the 2011 Livingston Awards prize