NYC Ruby WomenA few years ago, I founded a code and social meetup for female Ruby developers called NYC Ruby Women. It’s been great to see a whole spectrum of Rubyists — from highly experienced pros to novice coders, all of whom are women — get something from the group.

As Ruby Women learn more, they want to go from learning the language and working on personal projects to working on teams and bigger projects. In other words, they’re looking for apprenticeships.

I know they’re out there somewhere. So friends, readers, do you know of any? NYC is preferred, but it’s good to know about opportunities in other cities too. (Direct hires only, no recruiters, please.)

There’s a Branch, which I’ve embedded below (a perfect excuse to test Branch’s group feature, which is currently in beta), or you can post a comment. If you’d rather contact me privately, write me here.


SXSW Accelerator 2013 logoWant to send your startup to SXSW Interactive (March 11-15, 2013)?

Apply to the 2013 SXSW Accelerator, an opportunity to showcase your emerging technology product or service in front of industry leaders.

The event takes place March 11 and 12 as a part of SXSW Interactive. If chosen for Accelerator, you can improve your product launch, attract venture capitalists, polish your elevator pitch, receive media exposure, build brand awareness and network. And you’ll get two comped registrations to SXSW Interactive.

The deadline to register is Friday, Nov. 9, so get details and enter as soon as you can.

I’ve joined the SXSW Accelerator board this year, so let me know once you’ve applied — and if you have any questions before the deadline, first check the Accelerator FAQ.

If the FAQ doesn’t answer your questions, post a comment or send an email and I’ll do my best to get you an answer or direct you to someone who can.

The data team at WNYC has one for you. The data team is led by John Keefe, who in three short years went from being the public radio website’s code-curious news director to full-on news developer.

Google Politics & Elections is heavily promoting their offerings. Among them, a live stream of tonight’s presidential debate on domestic policy between Democrat incumbent Barack Obama and Republican challenger Mitt Romney.

It’ll start at 9 p.m. Eastern Time (your local time equivalent is below). Print a few bingo cards made by Erica Smith to turn it into a social event.

Convert time zones with worldtimebuddy.com

Earlier today, I spoke at the first-ever White House Safety Datapalooza, an event organized by the White House Office of Public Engagement, Office of Science and Technology Policy and the U.S. Department of Transportation. Invited speakers came from the public and private sector, universities and non-profits.

I was asked to give a “TED-style talk” about data journalism. In case you missed the live stream of it on White House Live, here’s the prepared script:

What is data journalism?

Lots of people — including journalists themselves — have different opinions of what it is right now. There are blogs about it, and websites about it, and lately, new degrees at universities about it. There are even big conferences where “What is data journalism?” is a keynote topic.

Most of us understand what journalism is, what “news” is. But “data journalism”?

Me? I think it’s a new buzz word for a very old process: gathering, examining and finding meaning within collections of information — and letting people know.

So let’s talk about data. You’re probably familiar with that word, “data.” You’re probably thinking data means numbers. Piles of numbers. And you’re right. But there’s other kinds of data too: Factoids. Photos. Video. Names and locations. Time logs. Points in space. Things that we collect, electronic files we save that may not seem individually relevant or interesting, but when interconnected and cross-referenced and analyzed, show us patterns, tell us stories, reveal truths.

Why is access to data important? You get up in the morning, you’re getting ready for work and you ask, should I take the car or the Metro? Should I take the highway or drive on surface streets? Luckily, the local transportation agency is giving out data. Imagine how much more frustrating it would be if you didn’t get a traffic report or a subway system alert at all? Imagine how disastrous it would be if public safety officials didn’t know where emergencies were happening?

That’s just one example, but you get the picture. Access to data is important because it keeps us informed when we need the information most. It can also help us in hindsight: we can use data to understand how things happened and why. And it can help us for the future: give us something to work from so we can improve outcomes, create safer conditions, propose new ideas.

But if the data isn’t available in an easy-to-examine electronic format, that makes the work harder. And I’m not just talking about getting a stack of paper documents that you have to scan. In journalism, we sometimes talk about bad data or messy data. It’s data that can’t be easily imported into the software we use to organize it, for example, PDFs. PDFs make it hard to do data journalism.

There are other examples too:

  • Spreadsheets that have missing information.
  • Text files that have multiple ways of spelling one thing. Or typos.
  • Files with meaningless garble in them.

Messy data.

It’s important to have clean, structured, easy-to-find data — because journalism is about getting things right while beating the clock. Even if you’re not a journalist, I’m sure you’re familiar with this kind of need for speed. The quicker we can get to the examination and analysis stage, the faster we can see the patterns, unearth the stories, and explain what happened.

With data, journalists are producing all kinds of things that help people understand complex issues: Maps explaining the seriousness of national drought. Charts and interactive graphs showing the relationship between money and politics. Games that really bring home what terms like “distracted driving” mean.

Data is at the heart of what journalism is — and the more substantive it is, the more organized it is, the more easily accessible it is, the better we all can understand the events that affect our world, our nation, our communities and ourselves.

The graphics desk at The New York Times gets high praise amongst journalists and visual information specialists for their clear, clean and often creative graphics that explain and enhance news.

Among the team is the group’s resident statistician, Amanda Cox, who’s been hailed as the “queen of infographics” and has been responsible for some of the high concept pieces published by NYT.

Any time you can hear Amanda speak or learn from her, you should. At the Eyeo Festival in June, she looked at the evolution of data graphics, particularly within the history of the Times graphics department.

Want to learn more? Kevin Quealy, Amanda’s coworker, posts fantastic explanations of how she and other members of the graphics desk do some of its work. Follow along on Charts’n’Things.

The Livingston Awards for Young Journalists announced its nominees yesterday.

The annual prizes recognize outstanding reporting by journalists under 35. Winners of the $10,000 prizes for local, national and international reporting will be announced June 6.

Sadly, the official announcement doesn’t include links to the entries so I’m collecting them here. I’ve started digging, but this seems like a relatively “quiet” award (unlike The Pulitzer Prizes, which get a ton of coverage).

You can help me by sending a link to the entry plus some verification (a press release or story from the outlet, for example). This year’s goal is to improve upon the list I made for the 2009 awards. Thanks for your help.

Finalists for the 2011 Livingston Awards prize

Ben Welsh of the Los Angeles Times Data Desk spoke at the International Symposium on Online Journalism in Austin yesterday, around the same time that I was speaking on a panel about data journalism with Erik Hinton (@erikhinton), Al Shaw (@A_L) and Andrei Scheinkman (@acheink) at NYU Local Young Media Weekend.

Ben gave this talk at NICAR in St. Louis earlier this year. Lucky for us, ISOJ streamed it, and La Nacion’s data team captured it.

Watch, learn, and dig deeper in Ben’s Delicious stack. Ben also writes terrific material on his site, Palewire, and tweets at @palewire.

One of the most popular posts on Ricochet was the collection of dataviz tools, slides and links from last year’s NICAR conference.

It was so popular, in fact, that people have asked me to make a similar collection again. So from Feb. 23–26, I’ll be updating this post with all the great things NICARians have to share this year.

Follow #NICAR12 on Twitter for the buzz; come to this page for the goods. And if you’re attending the conference, be sure to buy a T-shirt to support IRE, the organization that puts this fantastic event together. Ben Welsh of The Los Angeles Times is taking candid photos and posting them on Flickr.

Have links from sessions you attended? Post them in comments or ping me on Twitter @MacDiva and I’ll add them to this list.

Jump to Presentations & Tutorials | Software & Tools | References | Work Samples
 

Presentations & Tutorials


Bringing Maps to Fruition (from Michelle Minkoff)
Free tools for scraping data without programming (from Chris Keller and Michelle Minkoff)
Instructions for Hands-on Web Scraping Without Programming (from Chris Keller and Michelle Minkoff)
Locating the Story: The Latest in Online Maps and mapping links (from Ben Welsh)
Mapping links & presentation (from David Herzog)
Social Media Sleuthing (from Doug Haddix)
freeDive Tips & Tricks (from the Knight Digital Media Center)
CAR on a Shoestring (from Kevin Crowe, Patrick Sweet and Mary Jo Webster)
Regular Expressions: An Introduction (from Kevin Crowe, Patrick Sweet and Mary Jo Webster)
Create a moderation form using Google Forms and Fusion Tables
Scraping with Django (from Kevin Schaul)
How to turn PDFs into a searchable, sortable table (from Kevin Schaul)
Get the Most Out of Fusion Tables (from Rebecca Shapley)
Data viz in 20 minutes: jQuery DataTables (from Christopher Schnaars)
How to set up Python in Windows 7 (from Anthony DeBarros)
Data visualization best practices (from Kat Downs)
NodeXL for Network analysis (from Peter Aldhous)
Network Analysis for News (from Peter Aldhous and Peggy Heinkel-Wolfe)
Network analysis for news (video of Peter Aldhous’s NICAR12 talk)
How to Use Google Refine for Investigative Journalism (from Dan Nguyen)
Mapping is for Everyone – How to make all kinds of maps (from Sharon Machlis)
Advanced Excel techniques tipsheet (from MaryJo Webster)
How do you edit a story made of software? (from Alexander Howard)
Election Night Results & Maps (from John Keefe)
Covering Elections presentation (from Al Shaw)
Making friends with map projections (from Ben Welsh and Michael Corey)
Database validation (from JT Johnson)
Web scraping with Node.js (from Al Shaw)
Who is John Doe — and where to get the paper on him
Practical TastyPie for the Modern Djangonaut (from Jeremy Bowers)
Weathering the Storm: Using data to bolster the traditional weather story (from Stephen Stirling)
Build your first Django news app (from the IRE NICAR12 Django workshop)
GeoCommons walkthrough (from Paul Monies)
QGIS 1 workshop tutorial (from Michael Corey)
Tell Me a Story! – storytelling and data journalism (from Anthony DeBarros)
Human-assisted reporting: How to create robot reporters in your own image (from Ben Welsh)
How I learned to stop worrying and love flat files (from Ben Welsh)
Infect the CMS (from Jacob Harris)
Inspect the Web With Your Browser’s Web Inspector (from Dan Nguyen)
An Intro to R (from Jacob Fenton)
Slides from “Mapping is Hard” (from Brian Boyer)
TileMill hands-on tutorial (from Chris Amico, Brian Boyer and Matt Stiles)
Own Your Map Stack (from Chris Amico, Brian Boyer and Matt Stiles)
Natural Language Toolkit (NLTK) basics (from Jacob Perkins)
Connecting to state data using OpenMissouri.org (from David Herzog)
How to convert PDFs to Excel in Windows (from IRE)
Quantum GIS (QGIS) 2 workshop (from Michael Corey)
How to turn PDFs into text (from Dan Nguyen)
Web scraping in Python workshop tutorial (from Mark Ng)
Infiltrate the Ad Department (from Ryan Pitts)
Map Graphics for Video (from Michael Corey)
What We Can Find Out from Elections (from Aaron Bycoffe)
The Latest in Mapping with Javascript and jQuery (from Timothy Barmann)
How to Make a PANDA (from Brian Boyer)
The Farenthold Surprise (election panel presentation from Derek Willis)
Displaying data geographically: Creating a one-layer map in ArcMap (from Tom Meagher)
An intro to csvKit (from Christopher Groskopf and Anthony DeBarros)
Integrating CAR into a daily Beat (from Kate Martin)
How to use the SIMILE Exhibit timeline framework (from David Karger)
Tableau training handouts (from Tableau)
CAR Training 2012 including mapping data sets, practice data sets and tip sheets (from Jennifer LaFleur)


Jump to Presentations & Tutorials | Software & Tools | References | Work Samples
 

Software & Tools


Twazzup – find breaking news, popular hashtags, influential users
Reporters’ Lab Reviews – a link list of tools, techniques and research for public affairs reporting
Twellow – a yellow pages for Twitter
Twiangulate – find sources and groups of people on Twitter
Crowdbooster – monitor and analyze buzz on social media sites
KnowEm Username Search – finds the social networks a person or organization/brand is using
Muckrack Pro – add yourself to the list of journalists or find journalists covering a particular topic
The Archivist – save tweets and export to Excel to analyze later
PowerPivot for Excel – “Load massive amounts of data from virtually any source, process in seconds and model with powerful analytical capabilities”
Pandoc – a universal document converter
HTML-to-PDF – converts HTML to PDF docs for free
Mr. Data Converter – converts Excel data into one of several Web-friendly formats, including HTML, JSON and XML.
Natural Language Toolkit – for machine language text analysis
Voyant Tools – Web-based document analysis
ClearForest Gnosis – Firefox plugin that uses OpenCalais for data extraction
Exhibit – a publishing framework for data-rich interactive web pages
DocumentCloud – store, analyze and annotate PDFs
DataTables – jQuery plugin to create sortable datasets
Ben Welsh’s triumvirate of tools that allow you to copy Google Maps’ functionality:
   – a data source, like OpenStreetMap
   – a tile set, like what you can make with TileMill
   – a JavaScript interface, like Leaflet
OpenOffice – open source office suite software (word processor, spreadsheet, presentation/slide deck, database)
QGIS – Open source geographic information system
Shape to Fusion (a.k.a. Shpescape) – Import shapefiles to Fusion Tables
MySQL – Database software
Google Refine – data cleaner
Junar – Discover and track data
The Overview Project
Visicheck – ensures your graphics are visible to the colorblind
Colorbrewer – in case you need help with color schemes for your design
Color Oracle – colorblindness simulator for Mac OS, Windows and Linux
0 to 255 – find variations of any color
Beautiful Soup – useful for many things, including parsing HTML
Weave – Web-based analysis and visualization environment. Made by a partnership between the University of Massachusetts Lowell and Open Indicators Consortium
Highcharts – create interactive JavaScript charts (free for non-commercial use)
Indiemapper – Upload shapefiles and convert them to create static, thematic maps
CSV-to-JSON converter
Sinatra a lightweight Ruby/Rails framework for creating apps
• Use Google Docs, XPath and the =importxml() function to put data in a spreadsheet
PANDA Project
Timemap syncs a SIMILE timeline to a web-based map
Tabletop – allows you to use Google spreadsheets as your app backend
Js2Coffee – converts Javascript to CoffeeScript and back
CoffeeScript sandbox
iPL2 – ask a librarian, search through the Internet Public Library (IPL) and the Librarians’ Internet Index (LII) websites.
• “Lesson of the night: Want to put census geos in fusion tables? Keep it stupid simple: convert US Census data from TIGER into shape files with shpescape” — tip from Matt Kiefer
Rubular – a Ruby regular expression editor
Timeline Setter – makes timelines from spreadsheets
Spoofcard changes your voice and gives you a temporary phone number
Tablechart turns HTML tables into charts
Spam Mimic – hide a message in spam
FEC scraper/FEC parser – Chris Schnaars’ script on Github

Jump to Presentations & Tutorials | Software & Tools | References | Work Samples
 

References


• The American Library Association’s wiki of government databases (from Dan Nguyen)
Penn Treebank Project reference – Use it in conjunction with the Natural Language Toolkit (NLTK)
Geomedia Google Group
NICAR-L mailing list
Google Public Data Explorer
InfoVis Wiki – a catchcall list of papers, conferences, patterns and jobs in information visualization
Spatial Reference – an IMDB-like catalog of spatial reference systems
22 free visualization tools collected by ComputerWorld
Free Data Visualization tools – a collection from Sharon Machlis
8 cool tools for data analysis, visualization and presentation (from Sharon Machlis)
Chart and image gallery: 30 free tools for data visualization and analysis (from Sharon Machlis)
LocalHealthData.org – find health data from more than 70 sources and 300+ datasets
Analytic Journalism “It’s not ‘all about story’ if you don’t have anything to say.”
How to install MySQL and Navicat on Windows
Freebase – an entity graph/Wikipedia-like collection of data
Save the Post Office – records U.S. post office consolidations and closures
• Los Angeles Times datadesk Github repository with code for you to use
USASpending.gov – Official record of Federal Funding Accountability and Transparency Act (Transparency Act)
&bull: Data for the Public Good by Alexander Howard (free eBook)
CongressionalPrimaries.org shows what Illinois congressional candidates are tweeting about
Civic Commons Marketplace collects open government efforts in the U.S.
OpenCorporates is in the process of collecting information on every corporate entity in the world
• USA Today’s Developer Network

Jump to Presentations & Tutorials | Software & Tools | References | Work Samples
 

Work Samples


Bailed out banks profit from tax liens (Arizona Star heat maps showed property locations, making the story very clear)
Race gap found in traffic stops (Milwaukee Journal-Sentinel showed the racial disparity in pullovers and on further examination, municipal maintenance requests)
Texas redistricting map and slider code (Texas Tribune)
The Poverty Gap shows a clear correlation between poverty and access to education (ProPublica)
2012 Election Results big board, one approach to visual presentation of election info that tells you the story of the election immediately (The New York Times)
Little Loving County grabs a bit of Texas’ growth a census story unlike the usual census stories (The Dallas Morning News)
Riot rumours: how misinformation spread on Twitter during a time of crisis uses data analysis to watch the spread and suppression of rumors about the London riots (The Guardian)
Discover Boston Public Schools (Code for America)
SchoolBook makes teacher data reports for New York City schools
Redistricting: New lines leave some voters without a senator (The [Riverside, Calif.] Press-Enterprise)

Jump to Tutorials | Software & Tools | References | Work Samples

And finally, no journalism nerdfest would be complete without a demonstration of the latest hotness: Drone journalism by Matt Waite.

Drone Journalism Demo – Matt Waite from John Keefe on Vimeo.

If you want to understand someone, my advice is to sit next to them and solve a very hard problem together. You will learn who they are by watching how they think.
— Michael Lopp

PyGotham and the Q&A that followed, I’m finding more reasons than ever to read Michael Lopp’s books and blog, Rands in Repose.

The tension between those who make digital products and those who don’t is a systemic problem that seems to stymie every industry, yet so few people know how to resolve it — and resolve it at scale. There must be a collection of good advice somewhere. If not, it’s probably time to start one. What do you say?

(Photo: Ed Yourdon/Flickr)