Archives for category: Code & Development

Over the last few years, I have walked into The New York Times building so often, the security guards know me by name. Today, though, will be the first time I walk in to pick up an official badge.

When the elevator doors open after the ride up Times Tower, I won’t be walking into the newsroom. I’ll be a few floors up with a new title: Developer Advocate for The New York Times.

New York Times Developers

It’s a public-facing role and an extension of the work I’ve been doing to bring people together at the various intersections of code, design and journalism. Among other things, you’ll see me at Times Developer events, like the one this Thursday. (You should sign up if you’re in town. It’s free.) You might see a few posts from me on the New York Times Open blog. And I may be pushing code to NYT GitHub repos.

Some things won’t change. You’ll still hear about me organizing Ruby Women and Hacks/Hackers NYC. I’ll still be helping other Hacks/Hackers chapters launch around the world and advising GORUCO. And I’ll still take on occasional consulting projects.

You’ll still see me giving talks and working on conferences like Write/Speak/Code, ONA13 and Strata + Hadoop World. With any luck (or maybe to your horror), you might even see me emceeing again at Visualized 2014.

Today is both the start of a new adventure and an extension of what I’ve always done to help others: solve problems, connect people, and create situations that allow for spontaneous awesome.

I’m looking forward to talking with you.

CAR 2013 Conference logo
NICAR13 brings together some of the sharpest minds and most experienced hands in investigative journalism. Over four days, people share, discuss and teach techniques for hunting leads, gathering data, and presenting stories. Of all the conferences I go to, this one gets the highest marks from attendees for intensive, immediately applicable learning; networking and fun.

No one could possibly absorb and remember everything presented, so below is your memory card. If you’re looking for highlights from this list, read my NICAR13 roundup for Nieman Lab, “Data science, commoditized backends, and the need to know code.”

Have links from sessions you attended? Post them in comments or ping me on Twitter @MacDiva and I’ll add them to this list.

If you’re looking for a job, IRE keeps a list of open positions. Here’s who’s hiring.

NICAR 2014 will be in Baltimore from Feb. 27 to March 2. You should be there.

For additional tutorials, videos, presentations and tips see the lists from 2012 and 2011.

Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples

Presentations & Tutorials

Dashboards for Reporting (from Aaron Bycoffe, Jacob Harris & Derek Willis)
Data Science for Nerdy Journalists (from Hadley Wickham)
  – Sisi Wei shares her class notes
Data Scraping with Google Docs (from Sean Sposito)
How to create an automatically updating Google spreadsheet (from Sharon Machlis)
Demystifying Web Scraping (from Sean Sposito & Acton Gordon)
Campaign Finance the Data Science Way (from Chase Davis)
Exploratory Data Analysis (from Chase Davis)
Hone your Google Fusion Tables training skills tutorial (from Sreeram Balakrishnan)
Data Mining Machine Learning (from Jeff Larson)
Practical Machine Learning (from Chase Davis & Jeff Larson)
Journalism, Branding & Social Media (from Mandy Jenkins) 
Social media search tips and tools (from Doug Haddix)
How the Los Angeles Times uses DocumentCloud (from Ben Welsh)
Using Excel for Data Analysis (from Krista Kjellman Schmidt)
Excel I: Sorting and filtering (from Linda Johnson)
Excel II: Rates and Ratios (from Denise Malan)
Excel Magic: Advanced functions for data cleaning and more | Excel data (from MaryJo Webster)
Make Your First News App with Django
Data on the Fly (from John Keefe & Mark Wert)
Digging Deep with Data Journalism (from Jill Riepenhoff)
Information Design & Crossing the Digital Divide (from Helene Sears)
Dataviz on a shoestring (from Sharon Machlis)
Introduction to Ruby (from Al Shaw)
The Data Driven Story: Conceiving & Launching (from Jennifer LaFleur & David Donald)
Dataviz, Responsive Web Design + Mobile: Friends or Frenemies? (from Miranda Mulligan & Pete Karl II)
• Quick steps to mastering SQL through SQLite (from Troy Thibodeaux)
  – Emma Carew Grovum shares her notes from the tutorial
Reporting without revealing: Tools for hiding your tracks (from Paula Lavigne)
Covert reporting using technology to cover your tracks (from Mike Tigas)
Learning Python for journalists (from Jeremy Bowers & Serdar Tumgoren)
  – Ask to join the Google group
Fun with data in sports journalism (from Jack Gillum)
After the game: Top data ideas for investigating $port through $pending (from Paula Lavigne)
Is 911 a Joke in Your Town? (from Ben Welsh)
• Sample code for Introduction to JavaScript the Right Way (from Jeff Larson)
Food waste investigations (from Erin Jordan)
Government waste investigations (from Tim Eberly)
Investigating government waste (from Josh Sweigart)
OpenRefine (formerly Google Refine) slides and cheat sheet (from Tom Meagher)
How can we get the widest impact out of software projects? (from Rich Gordon)
How to be ready for your social media Sandy (on discovery, validation and publication) (from Steve Myers)
Github repo and example code from Developing reusable visualization components using D3 and Backbone.js (from Alastair Dant)
Code for drought maps & Data & code .zip file (from Amanda Cox)
Web scraping with Node.js (from Al Shaw)
• Zip file for Python workshops 1 & 3 | Github repo (from Ron Campbell)
• Tip sheet for Python workshop 2, plus dataset for the workshop (from Christopher Schnaars)
• Mike Ball shares his notes from Tasneem Raja’s Smarter interactive Web projects with Google Spreadsheets and Tabletop.js talk
Data Roadmaps: Priming your desktop with certain data slices helps you spot trends, find people and understand your city (from T.L. Langford)
Making Health Data Sexy (from Charles Ornstein)
Infect the CMS (from Heather Billings, Jacob Harris and Al Shaw)
Making interactives fun | List of interactives shown during the talk (from Tasneem Raja and Sisi Wei)
Covering public pensions (from MaryJo Webster)
• Learn to use Git and Github and fork this cheat sheet (from Tom Meagher)
Making Timelines (from Krista Kjellman Schmidt and Lena Groeger)
Inside baseball: What data journalism can learn from sports (from Jeremy Bowers, Ryan Pitts and Matt Waite
Disasters: Preparing for and digging in after the storm (from Ben Poston)
5 data journalism projects you might not have seen before and why they matter in Europe (from Sebastian Mondial)
The One-Query Story (from Kate Martin)
Mapping Best Practices (from Dave Cole, John Keefe and Matt Stiles)
Web Scraping (and more) with Google Apps Script (from Steven Melendez)
NodeXL for Network Analysis (from Peter Aldhous)
Data-driven Beats (from Chris Amico)
Bringing Excel to the Web with SkyDrive (from Cathy Harley)
Navigating U.S. Census Data (from Erran F. Persley)
How to Serve Mad Traffic, Part I (from Jeremy Bowers)
How to Serve Mad Traffic, Part II (from Jacqui Maher) 

Lightning Talks
5 Algorithms in 5 Minutes | Video (from Chase Davis)
Let’s make games for news | Video (from Sisi Wei)
Big datasets, small streams | Video (from Katie Park)
Z-Scores: How You Can Compare Apples With Oranges (downloads a PowerPoint file) | Video (from Robert Gebeloff)
Casino-Driven Design | Video (from Al Shaw)
Be your wn Nate Silver | Video (from Jeff Larson)
ILENE, the polite coding language | Video (from Jennifer LaFleur and Jeff Larson)
Every State is Weird: A selection of election edge cases | Video (from Jacob Harris)
Dude Who Stole My Congressman? (Data in .xls | Visualization) (from Paul Parker)
• Code for the Arduino Baggage Handler | Video (from Matt Waite)
• “Django Retrained: 5 ways coding like a web developer can make you a better investigative reporter” | Slides (from Ben Welsh)

Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples

Software & Tools

BatchGeo – monitor website changes
Citizen Quotes – A project to demonstrate maximum entropy models for extracting quotes from news articles in Python.
CometDocs converts PDFs to Word and Excel docs
Tabula for pulling data out of PDFs
• Tried and true XPDF (PDFtoText)
DocHive PDF to XML converter
Python wrapper for the Document Cloud API
DownThemAll Firefox plug-in for downloading website assets (photos, video, etc.)
• Embed Excel Interactive View into your site
Fast Cluster, a command line tool for grouping documents by similarity (from Jeff Larson)
FOIA Machine (automate your Freedom of Information requests)
Geofeedia search and monitor social media by location
iWitness from Adaptive Path – search social media content by time and place
OpenRefine (the open source repo of the data cleaning tool formerly known as Google Refine)
Overview Project | Read the getting started guide
Scrape screen scraper Chrome extension. Journalist Jens Finnäs wrote a tutorial for it on Dataists.
Time Flow by Martin Wattenberg & Fernanda Viegas
Stately – a symbol font to create a map of the U.S. using HTML & CSS
Weka 3: Data mining software in Java
Cascading Tree Sheets
Dataset (part of the Miso Project) – grabs data from Google Spreadsheets and helps visualize the data
Datawrapper (open source)
Google Chart Tools
Tableau Public (Windows only)
Mapbox and Tilemill
Adobe Edge Animate free tool for creating interactive content
Spoofcard caller ID spoofing
Trap Call unblocks private numbers
Burner iPhone app creates disposable phone numbers
• Tools for hiding an IP address:
  – Anonymizer ($80)
  – Privoxy
  – BeHidden
  – Anonymous
  – IxQuick
Orbot provides Tor proxying on Android phones
Silent Circle encrypted communication app for iPhone and Android
Whois (search for domain name owners)
SpiderOak private, secure data stored in the cloud who to follow on social platforms (mobile app)
Hachi social platform search tool
R Project for Statistical Computing
R Studio
• Learn to unlock government data with Sunlight Academy offered by the Sunlight Foundation
JS Console for debugging JavaScript
Programming Ruby 1.9 & 2.0 (4th edition): The Pragmatic Programmers’ Guide
• Production code for Overview Server, which does visual document mining
mitmproxy (“man in the middle” proxy) inspect and edit traffic flows on the fly. SSL compatible.
Python Social Auth social authentication/registration mechanism
XCode iPhone simulator
jQuery Vertical Timeline by MinnPost
Rubular regular expression editor for Ruby
UltraEdit text editor (Windows only)
• Tom MacWright’s Mistakes interactive JS editor
Sphinx open source search engine
• NPR’s App Template project template for client-side apps
ILENE the polite coding language (from Jeff Larson)
Django Bakery helps bake your Django site out as flat files
Invar generates map tiles from a Mapnik configuration
Table Capture Chrome extension grabs table HTML and drops it into a Google doc
TableTools2 Firefox extension allows you to copy and manipulate table data from the Web
Haystax point-and-click data collection
• Sisi Wei’s presentation framework
Bank Tracker contains data on every FDIC bank
Shpescape converts shape files to TopoJSON
Numeric.js JavaScript library for numerical calculations
Pixel Ping pixel tracker
Helium Scraper extracts website data into structured formats such as CSV and XML
Choose Your Own Adventure plug-in from Mother Jones
Timeline JS
• The WNYC interactive Bingo card generator
Proof Finder search email and other unstructured data (designed for lawyers and investigators)
Paper of the Congressional Record (requires a key from Sunlight Labs)
YUI, an open source JavaScript and CSS library for developing interactive applications
Tarbell Google docs-driven CMS from the Chicago Tribune apps team (currently in alpha)
• Chase Davis’s FEC Standardizer code and explainer
• Al Shaw’s Dirtyword Ruby script cleans HTML from Word docs.

Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples


• Jeff Larson recommends “Eloquent JavaScript” as the best book for learning JS
Mike Bostock’s d3.js tutorials (from Sharon Machlis)
Scott Murray’s d3.js tutorials (from Sharon Machlis)
How to select, create & remove elements in d3.js (from Jerome Cukier and Scott Murray)
Computational Journalism syllabus from Journalism and Media Studies Center at the University of Hong Kong, Spring 2013 (from Jonathan Stray)
Connected China from Fathom & Reuters (background)
  – Notes on Connected China by Chris Amico
How to Bulletproof Your Data (from Jennifer LaFleur, ProPublica)
Federal Reserve Economic Data (includes international data and an API; from Federal Reserve Bank of St. Louis)
Little Sis, a database of relationships between people in business and government
OpenMissouri a collection of state and local government data from Missouri, some of which isn’t ordinarily made available online
Privacy Rights Clearinghouse
• ProPublica’s News Apps Style Guide
TheyRule shows the relationships between people in corporations
• Hadley Wickham’s academic paper on tidy data
• Hadley Wickham’s guide to using regular expressions in R
• ProPublica News Apps Desk Coding Manifesto
• ProPublica’s Principles of News App Design Structure
Pretty Good Privacy (PGP) data encryption
Tor Project
OpenElections Project, certified historical election results for everyone
Open Innovation and open APIs in Digital Journalism (academic paper by Tanja Aitamurto and Seth C. Lewis)
• Chart of the differences between PHP, Python and Ruby
How to build a stepper visualization
How to install MySQL on Mac OS or Windows
R for Journalists
A journalists’ guide to verifying images
Finding the Wisdom in the Crowd (on verifying images found on social platforms)
How to visualize your backlinks with Google Fusion Tables (network visualization tutorial)
Design Patterns: Elements of Reusable Object-Oriented Software
Hospital Compare from
• Winners of Kaggle’s campaign finance interactive reporting contest
Working with Tabletop.js and Handlebars.js
Impact of Responsive Designs
• Drew Conway’s Data Science Venn Diagram (now in d3.js!)
How to Not Screw Up Your Data
• Did you watch Ben Welsh’s lightning talk? Here’s the presentation he credits for changing his life: Writing reusable code by James Bennett, now at Mozilla. Read the revamped slides

Jump to
Presentations & Tutorials | Software & Tools | References | Work Samples

Work Samples

The Year in CAR presentation by Mark Horvit and Megan Luther, IRE
  (7.1 MB PDF)
The Year in CAR wrap by Ryan Graff, Knight Lab
The Evolution of Sandy’s Path (
Paralax Scrolling: James Bond (BBC)
How the Chicago Tribune News Apps team made the Chicago Crime site
Chinese Chemicals Flow Unchecked Onto World Drug Market (The New York Times)
Income Inequality in America (Reuters)
Australians who don’t pay tax: what would Romney say? (Financial Review)
Mid-Year Economic and Fiscal Outlook (Financial Review)
Workout at Work (Washington Post)
Ad Libs (PBS Newshour)
Could you be an Olympic medalist (from The Guardian)
Fake medical providers slip through Medicare loophole (Atlanta Journal-Constitution)
Medicare fraudsters used UPS boxes to fleece millions from taxpayers (Dayton Daily News)
The Killing Roads 10 years of traffic accidents in Norway (

Jump to Tutorials | Software & Tools | References | Work Samples

People often ask me for help finding more women to hire for their coding and engineering teams. Their main motivation is to hire great developers to fill their growing their workforce. Their secondary motivation is to increase diversity — not just in gender, but also in ideas, perspectives, people skills and problem solving.

I believe this is important and so I offer as much help as I can. The thing is, there are high hurdles to overcome when it comes to hiring great senior devs and engineers, especially great ones who are women because there just aren’t that many currently working in the industry.

Earlier this week, a video about increasing the number of women engineers at Etsy hit the web. If you care about hiring more women on your technical teams, it’s worthy viewing. In about 19 minutes, Etsy CTO Kellan Elliott-McCrea elegantly sums up almost every piece of advice I’ve given, and offers a viable path to achieving the goal.

First Round Capital, which hosted Kellan’s presentation, also posted their own take.

Around the web you’ll find other good advice on how to get more women to apply for technical roles. If you’ve found something that works, please post it in comments. The 2012 report from the Anita Borg Institute for Women and Technology offers additional caveats and 10 high-level solutions that would work especially well for large businesses.

NYC Ruby WomenA few years ago, I founded a code and social meetup for female Ruby developers called NYC Ruby Women. It’s been great to see a whole spectrum of Rubyists — from highly experienced pros to novice coders, all of whom are women — get something from the group.

As Ruby Women learn more, they want to go from learning the language and working on personal projects to working on teams and bigger projects. In other words, they’re looking for apprenticeships.

I know they’re out there somewhere. So friends, readers, do you know of any? NYC is preferred, but it’s good to know about opportunities in other cities too. (Direct hires only, no recruiters, please.)

There’s a Branch, which I’ve embedded below (a perfect excuse to test Branch’s group feature, which is currently in beta), or you can post a comment. If you’d rather contact me privately, write me here.

The data team at WNYC has one for you. The data team is led by John Keefe, who in three short years went from being the public radio website’s code-curious news director to full-on news developer.

If you want to understand someone, my advice is to sit next to them and solve a very hard problem together. You will learn who they are by watching how they think.
— Michael Lopp

PyGotham and the Q&A that followed, I’m finding more reasons than ever to read Michael Lopp’s books and blog, Rands in Repose.

The tension between those who make digital products and those who don’t is a systemic problem that seems to stymie every industry, yet so few people know how to resolve it — and resolve it at scale. There must be a collection of good advice somewhere. If not, it’s probably time to start one. What do you say?

(Photo: Ed Yourdon/Flickr) just published my how-to piece on reading API documentation.

It’s directed at readers with little to no coding experience. I hope the intended audience finds it helpful. The example I used — looking up New York Times “Harry Potter” movie reviews — was a fun one, rather than something more serious, because doing fun things lowers the barrier to getting started.

Reading API documentation takes patience and tenacity. Even the most experienced developers I know will sometimes come across documentation so poor that they spend a lot of time guessing at how the API works. So don’t feel daunted. Practice instead.

I’ll post a couple follow-up exercises here on Ricochet, but get started now by heading over to the beginner’s guide for journalists who want to understand API documentation.

Thanks for all the retweets, comments and link pass-alongs. Keep them coming, and feel free to ask questions and suggest other tutorial topics in the space below.

Michal Migurski of Stamen sent me some thoughts about writing APIs based on my post, which makes me think there might be hope for the way API documentation will be written in the future.

In the meantime, if you’re responsible for writing API docs — or technical documentation of any sort — Jacob Kaplan-Moss’s “Writing Great Documentation” instructional series is mandatory reading.

Jacob’s name might sound familiar to you: he’s one of the co-founders of Django, a Web development framework created by journalists and developers as a tool for doing data-based journalism.

Photo: Sean Dreilinger/Flickr

Over the weekend, I went to Jer Thorp’s Processing and data visualization workshop to dig deeper into the program.

While I don’t have new code to show yet, today I started looking for additional learning resources. Artist Marius Watz is publishing a free series of Processing primers on Modelab. The examples are fully commented, so even if you’re fairly new, it’s easy to follow along.

Daniel Shiffman, who wrote “Learning Processing: A Beginner’s Guide to Programming Images, Animation, and Interaction,” is planning a new book, due to be published this summer. It’s on Kickstarter:

Daniel’s got tutorials and excerpts from his current book online for those curious about his writing style and looking for additional examples to learn from.

Have some additional sites and sample files you’d like to share? Leave a note and help create a standing resource.

I don’t know about you, but December’s been pretty crazy for me. Between trying to maintain a healthy work-life balance (yeah, right) and trying to learn new things, I was shocked to realize Christmas is next week.


Nevertheless, I’ve a little treat for you: Do-it-yourself polka dotted Christmas wrap and digital wallpaper, made with Processing. A sample’s below.

Take your pick of default sizes: 960 x 600 pixels or 1280 x 800 pixels.
Christmas polka dots
I learned a few things while making this project:

  • What they say about coding is true: You’re more apt to learn something if you’ve got a project in mind.
  • The initial bits of Processing are pretty easy to understand. But then there’s trying to grok random (not so bad) and shuffle (oy).
  • Coffee is good. Sleep is better.

To try Processing for yourself, copy my code from Github and paste it into the Processing.js Web IDE, or download Processing and tweak it locally.

Creative Commons LicenseThe code is released under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Turns out this weekend is going to be hacktastic. If you’ve never been to a hackathon before, there are plenty of choices. Among them:

  • Dec. 4 is Open Data Day, where participants will use open public data to create all sorts of projects all over the world.
  • It’s also the start of Random Hacks of Kindness, a two-day international event for projects related to natural disaster risk and response.
  • Not to be outdone, The New York Times holds its first-ever TimesOpen Hack Day on Saturday too.
  • Leave it to San Francisco to hold Cloudstock, “the Woodstock for Cloud Developers.” That hackathon happens Dec. 6.

If you’re a developer who’s never been to a hack event before, register and go if an event sounds well-organized and interesting. Meet people. Find the ones you like. Build stuff together.

If you’re a journalist who’s never been to a hack event before, you’re probably wondering what the heck a hackathon is.

Wikipedia’s got a whole page about it. Basically, it’s an event for people (usually coders) to get together, hatch an idea, and produce a working model (a “hack”) within a fixed period of time using ingenuity, cooperation and whatever means are at their disposal.

Some hackathons are “open,” meaning you can build what you want. Others have themes and parameters. There are those, like next weekend’s OpenDoor Hackathon, that call for hardware hacks. Others, like Longshot magazine, are about storytelling and are definitely within the comfort zone of any journalist willing to hustle and forgo a little sleep.

There are way more techie hack days than there are journalist-specific hack days. But that does not mean you, Reporter/Editor/Visual Journalist-lacking-coding-skills, should be timid.

Pick the right event, and you’ll find yourself among people who are willing to teach you what you don’t know, or at least explain what they’re doing as they’re doing it.

While you’re putting the project together, you’ll discover opportunities to contribute your own knowledge and skills: looking for information, sharing subject expertise, asking incisive questions, picking through data troves, realizing when an idea needs to evolve (or as the startup people like to say, “pivot“).

You might feel like you’re barely hanging on during the first couple of hack events you go to. But the more you go, the more you’ll learn where your own hacker interests lie. Who knows? You might find yourself learning to program and creating data visualizations and making maps using something other than Google Maps.

Want some inspiration? Two years ago, journalist Jeremy Singer-Vine was not a programmer. But in two years, he learned enough to make a tool that’s been used by Slate and NPR to help the public make sense of financial jargon.

Cool and useful, right?

Additional links: