Thursday, September 13, 2012

Answer: Northern vs. Southern California [Search Research]

Answer: Northern vs. Southern CaliforniaDaniel Russell knows how to find the answers to questions you can't get to with a simple Google query. In his weekly Search Research column, Russell issues a search challenge, then follows up later in the week with his solution?using whatever search technology and methodology fits the bill. This week's challenge: Does Northern or Southern California have more Superfund sites and brownfields as defined by the EPA?

There are many different ways to approach this problem, but today I want to show you a relatively new method using Google Fusion Tables. (If you don't know what a Fusion Table is, it's basically a spreadsheet that has a lot of smarts built into it?you can merge multiple tables together intelligently, and visualize the contents of the sheet, which is what we'll be doing here today.)

First, let's get a sense for where our dividing line is. I went to Google Maps and snapped the following image, then drew in a red-line dividing north from south, a Mason-Dixon line of California, if you will. (You can see also the little yellow box indicating the lat/long of Visalia.)

Answer: Northern vs. Southern California

For us, everything NORTH of that line is northern California, while everything SOUTH of that line is in southern California. It's not quite even in terms of geographic area, but it's a logical line in terms of political outlook and dividing the population somewhat evenly.

So now, how do we find a list of the EPA's Superfund and brownfield sites?

The approach I took was to use the (relatively) new Google Table Search. Once there, I did the obvious search:

Answer: Northern vs. Southern California

You can see what the result was: a list of Cleanup Sites tables that the search engine found. That first one is the EPA tablel for "Region 9" (that is, California).

And if you visit the web page, you'll see it's the master list of the Superfund and Brownfield sites, exactly what we're looking for.

Answer: Northern vs. Southern California

If you click on the "Show more" link, you'll see what kind of content the Table Search engine has extracted (which is pretty much exactly what's on the web page). This means we're 90% of the way there.

Answer: Northern vs. Southern California

Clicking on the link takes you to that data as imported into a Fusion Table. You'll see that the table has the three columns just as in the EPA's original data set: location, site name, type (brownfields or Superfund).

Now that you have the table, you can GEOCODE the contents of the table by selecting "geocode" under the File menu on the table.

Answer: Northern vs. Southern California

That step converts each of the place-name references into a discrete location (in terms of lat/longs) that can then be automatically placed onto the map of California by using the Visualize>Map option.

Answer: Northern vs. Southern California

Once you do this step, you've got a little data cleanup to do because not all of the rows can be neatly geocoded (that's what the yellow highlighting means in table below?"not geocodable"). That's what you'd expect from the "A | B | C ..." (which appeared in the original table to provide jump links to sections in the data, but aren't useful here). Still, a few places (e.g., Alpine County) in the image below, should have been converted?you'll just have to bear that in mind when we do our count.

Answer: Northern vs. Southern California

First I did a bit of cleaning up some of the data (mostly by just adding ", CA" after each of the un-geocoded entries?such as changing "Alpine County" into "Alpine County, CA" in the table above).

Then, switching to a map visualization of the data and just doing a quick count of the dots on the visualized map, I found 108 sites on the Visalia latitude or north, but I found 83 south of our dividing line.

Of course, there should be a be better way to do this (but I'll leave that for a future post).

Then it's an easy step to count the brownfields and Superfund sites row-by-row. I found 124 Superfund sites and 66 brownfield sites.

But unfortunately, this is a contest that I, as a northern Californian, didn't want to win. The north has significantly more sites: 108 in the north, 83 in the south.

Note: If you read through the reader comments, you'll find several other ways to solve this problem. Several people pointed out some very nice maps that the EPA has already drawn. And, unfortunately, not all of the data is mutually consistent. That's partly because different data sets draw on different EPA resources at different times (the list changes). But if you look through it all, the conclusion is more-or-less the same.

Answer: Northern vs. Southern California | SearchReSearch


Daniel M. Russell studies the way people search and research?an anthropologist of search, if you will. You can read more from Russell on his SearchReSearch blog, and stay tuned for his weekly challenges (and answers) here on Lifehacker.

Image via Jesse Kunerth (Shutterstock).

Source: http://feeds.gawker.com/~r/lifehacker/full/~3/hK7bF6L-qwM/answer-northern-vs-southern-california

the incredibles

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.