	Earthquake data via the USGS website with Common Lisp

Recently we had a bunch of small earthquakes in the Oakland, CA area.
I thought it would be fun to use Allegro Common Lisp to retrieve and
display information about recent earthquakes.

I found that the USGS website has hourly, daily and weekly quake data
available here:

   http://earthquake.usgs.gov/eqcenter/recenteqsww/catalogs/

Possible formats of interest to me: XML (RSS) and CSV.  I decided to
use the CSV format, comma separated files, since XML seemed like
overkill.

What I wanted to do was to find recent earthquakes near where I lived.
However, the data from USGS is of the form:

  Src,Eqid,Version,Datetime,Lat,Lon,Magnitude,Depth,NST
  ak,00074326,5,"January 09, 2007 23:59:55 GMT",64.1478,-150.1976,1.8,1.00,07

Time, location (latitude and longitude in decimal degrees) and
magnitude, among other bits of data.  The data is nice, but I needed
two things to make it useful:

  1. filter out the coordinates I didn't want, and
  2. convert latitude/longitude coordinates into city names.

An implication of (1) is that I needed an easy way to convert a place
name to coordinates, because if I want the facility to work for more
than just my location, others will need an easy way to look up their
own location.  Fortunately, Google Maps
[http://www.google.com/apis/maps/documentation/] provides this
information.  Their API requires that you supply a key assigned to
you.  (You can obtain a key from Google here
[http://www.google.com/apis/maps/signup.html].)  NOTE: see the
installation instructions below for where to put the key.

That makes it fairly easy to write a function place-to-location:

  cl-user(8): (place-to-location "Berkeley, CA")
  #<location 37.871666,-122.27167>
  cl-user(9): (location-to-place (place-to-location "Berkeley, CA"))
  "Berkeley, California"
  cl-user(10): (location-to-place
		(place-to-location
		 "2629 College Ave, Berkeley, CA"))
  "Berkeley, California"
  cl-user(11): (place-to-location "2629 College Ave, Berkeley, CA")
  #<location 37.86319,-122.25365>
  cl-user(12): 

To help me get a feel for how to filter the data I wanted to know the
distance between two locations.  I used the "Great Circle Distance"
Formula.  Some examples (distances are in miles):

  cl-user(17): (distance-between
		(place-to-location "Boulder, CO")
		(place-to-location "Piedmont, CA"))
  923.7414773838934d0
  cl-user(18): (distance-between
		(place-to-location "Lake Wylie, SC")
		(place-to-location "Piedmont, CA"))
  2283.922367943316d0
  cl-user(19): 

Converting location coordinates into city names turned out to be
tricky.  There is no readily available web resource that will do this
(like Google Maps). I did find The Zip Code Database Project
[http://sourceforge.net/project/showfiles.php?group_id=111073] which
contains location coordinates for each zip code, and the data includes
the names of the cities for each zip code.  It's a fairly simple matter
to find the nearest match for a set of coordinates.

Speaking of coordinates, how should they be represented in Lisp?  At
first I was using a cons of latitude and longitude.  However, using
`car' and `cdr' to access the `slots' got old and I created a
`measures' module to define the `location' object.  The `measures' API
also contains the `distance-between' and `location-near-p' functions.

The `zipcodes' module provides the `location-to-zip' function, and the
`google-maps' module uses this to provide `location-to-place' and
`place-to-location'.

Putting it all together, we can do things like this:

  cl-user(8): (get-quake-info
	       :period :week
	       :larger-than nil
	       :within 1.0
	       :reference-location 
	       (place-to-location
		"555 12th St, Oakland, CA, 94607, US"))
  ;; Downloading data...done.
  January 09, 2007 22:33:42 GMT: Hollister, California: magnitude=3.1
  January 09, 2007 22:16:21 GMT: Los Altos, California: magnitude=2.0
  January 09, 2007 21:39:52 GMT: Hollister, California: magnitude=1.3
  January 09, 2007 16:56:54 GMT: Aromas, California: magnitude=1.4
  January 09, 2007 13:20:02 GMT: Cobb, California: magnitude=1.4
  January 09, 2007 11:52:11 GMT: Hayward, California: magnitude=1.2
  January 09, 2007 03:01:03 GMT: El Cerrito, California: magnitude=1.9
  January 08, 2007 11:30:55 GMT: San Jose, California: magnitude=2.0
  January 07, 2007 14:22:56 GMT: Walnut Creek, California: magnitude=1.2
  January 07, 2007 09:20:38 GMT: Kenwood, California: magnitude=3.1
  January 06, 2007 12:58:01 GMT: Yountville, California: magnitude=1.8
  January 06, 2007 07:04:01 GMT: Oakland, California: magnitude=2.4
  January 05, 2007 22:55:53 GMT: Walnut Creek, California: magnitude=1.3
  January 05, 2007 20:18:51 GMT: Livermore, California: magnitude=2.4
  January 05, 2007 17:54:17 GMT: Ladera, California: magnitude=1.1
  January 05, 2007 11:09:43 GMT: Cobb, California: magnitude=2.4
  January 05, 2007 10:40:26 GMT: Concord, California: magnitude=1.3
  January 04, 2007 20:38:03 GMT: Los Altos, California: magnitude=1.7
  January 04, 2007 20:21:41 GMT: Aromas, California: magnitude=2.6
  January 04, 2007 13:42:36 GMT: Palo Alto, California: magnitude=1.2
  January 04, 2007 03:42:10 GMT: Oakland, California: magnitude=1.9
  nil
  cl-user(9): 

The :within keyword value is in decimal degrees.  A degree of latitude
is approximately 69 miles.  So 3.0 degrees is a little more than 200
miles in latitude.  I use this for longitude, as well, but a degree of
longitude changes with latitude, from 69 at the equator to 0 at the
north and south poles.

The :period keyword value can be :hour, :day or :week.

The :larger-than keyword is a filter on magnitude.  `nil' means all
quakes.  `2.0' means those larger than 2.0.

*******************************************************************************
IMPLEMENTATION DETAILS
*******************************************************************************

For those that want to know more about how this example was
implemented, this section is for you.

Zip codes

The zip code data, from the The Zip Code Database Project, comes in a
large CSV file and has the following form:

  "zip code", "state abbreviation", "latitude", "longitude", "city", "state"
  "35004", "AL", " 33.606379", " -86.50249", "Moody", "Alabama"
  ...

There are 33,178 entries in the file.  The first questions that come
to mind are:

  * How to efficiently represent the data in the fasl file (because I
    don't want to be dependent at fasl load time on the CSV file)?

  * How to efficiently represent the data in memory?

For representation in the fasl file, I first did the obvious and
straightforward simple vector of zip structs written out to the fasl
file.  It turns out that because the compiler must do circularity
detection and using make-load-form is fairly expensive, having the
compiler write out a simple vector of 33000+ struct objects is a very
slow process, one that creates a lot of garbage and uses lots of
memory.

So, to avoid the make-load-form on the structure objects, I decided to
spread the slots of all the zip code structures across a bunch of
vectors--strings and numbers should easily be handled by the compiler
in an efficient manner.  In other words, the zip code data in the fasl
file would consist of one vector per structure slot, each element of
the vector being the slot value from one zip code structure instance.
And, this would all be created at compile time.  Like this:

  (defvar *zipcode-data*
      #.(let ((zips
	       (prog2
		   (progn (format t "reading zips.cvs...")
			  (force-output))
		   (read-zips-csv 
		    (merge-pathnames "zips.csv" *compile-file-pathname*))
		 (format t "done~%"))))
	  (list 'list
		(vector-of zips #'zip-code)
		(vector-of zips #'zip-state-abbrev)
		(vector-of zips (lambda (zip)
				  (let ((loc (zip-location zip)))
				    (cons (location-latitude loc)
					  (location-longitude loc)))))
		(vector-of zips #'zip-city)
		(vector-of zips #'zip-state))))

For those not familiar with #., it is used to evaluate Lisp code at
`read' time.  In this case, since the expression above is being read
by the compiler, the code in the #.(let ...) is being evaluated at
compiler read time.  The result of reading that form is a list, like
this:

  (list #<...vector of zip codes...>
	...
	#<...vector of states...>)

That list is then compiled into the fasl file as the value of the
*zipcode-data* special variable.  When the fasl file is loaded into
Lisp, *zipcode-data* will be a list of simple vectors.  *zipcode-data*
is then used to construct the data structure we actually want to use,
which leads us to...

How to efficiently represent the data in memory?  I know, because of
the nature of the types of searches of the data I will do, that
sequential searches on the data will be necessary.  To me, that means
simple vector.  (If, for example, only a single zip to location mapping
was needed, then a more appropriate data structure would have been a
hash table.)  So, at load time, we reassemble the list of simple
vectors into our single simple vector of zip structs:

  (defvar *zipcodes*
      (destructuring-bind (codes abbrevs locations cities states)
	  *zipcode-data*
	(do* ((max (length codes))
	      (zips (make-array max))
	      (i 0 (1+ i)))
	    ((= i max) zips)
	  (setf (svref zips i)
	    (make-zip :code (svref codes i)
		      :state-abbrev (svref abbrevs i)
		      :location (let ((loc (svref locations i)))
				  (make-location
				   :latitude (car loc)
				   :longitude (cdr loc)))
		      :city (svref cities i)
		      :state (svref states i))))))

For those that have not used `destructuring-bind', it merely takes a
list and `destructures' it, or breaks it down into component parts.
In this case, I'm taking a flat list and assigning each element to a
variable.  The first being `codes' and the fifth being `states',
corresponding to the first and fifth elements of the list held by
*zipcode-data*.

Next, we nil out *zipcode-data* so the list of simple vectors can be
garbage collected, since we no longer need them.

  (setq *zipcode-data* nil)

This same technique can be used on other structure types.  Nothing in
the solution is specific to the `zip' struct object.

*******************************************************************************
INSTALLATION
*******************************************************************************

Unpack the .tgz file.  You will now have a usgs/ directory where you
unpacked.  Into this directory create a file called
"google-maps-key.cl" which contains your Google Maps API key, like
this:

  (setq util.google.maps:*default-key*
    "...key obtained from google...")

Where ``...key obtained from google...'' is the key you received from
Google.  The formatting of this file is important: the key must be a
string and bound to the variable util.google.maps:*default-key*.

Then, you are ready to start up Allegro CL in the directory where you
unpacked the source files, type 

  :ld load.cl

That should print something like this:

  cl-user(3): :ld load.cl
  ; Loading /home/layer/src/usgs/load.cl
  ;   Fast loading /backup/acl/acl80/code/asdf.001
  ;;; Installing asdf patch, version 1
  ; loading system definition from /home/layer/src/usgs/usgs.asd into
  ; #<The asdf0 package>
  ;   Loading /home/layer/src/usgs/usgs.asd
  ; registering #<system usgs @ #x71d13832> as usgs
  ; loading system definition from ../google/google.asd into #<The asdf0 package>
  ;   Loading /home/layer/src/google/google.asd
  ; registering #<system google @ #x71d1e01a> as google
  ; loading system definition from ../zipcodes/zipcodes.asd into
  ; #<The asdf0 package>
  ;   Loading /home/layer/src/zipcodes/zipcodes.asd
  ; registering #<system zipcodes @ #x71d26a92> as zipcodes
  ; loading system definition from ../measures/measures.asd into
  ; #<The asdf0 package>
  ;   Loading /home/layer/src/measures/measures.asd
  ; registering #<system measures @ #x71d2f99a> as measures
  ;;; Compiling file /home/layer/src/measures/measures.cl
  ;;; Writing fasl file /home/layer/src/measures/measures.fasl
  ;;; Fasl write complete
  ;   Fast loading /home/layer/src/measures/measures.fasl
  ;;; Compiling file /home/layer/src/zipcodes/zip-package.cl
  ;;; Writing fasl file /home/layer/src/zipcodes/zip-package.fasl
  ;;; Fasl write complete
  ;;; Compiling file /home/layer/src/zipcodes/zip-util.cl
  ;   Fast loading /home/layer/src/zipcodes/zip-package.fasl
  ;;; Writing fasl file /home/layer/src/zipcodes/zip-util.fasl
  ;;; Fasl write complete
  ;;; Compiling file /home/layer/src/zipcodes/zip-api.cl
  ;   Fast loading /home/layer/src/zipcodes/zip-package.fasl
  ;   Fast loading /home/layer/src/zipcodes/zip-util.fasl
  reading zips.cvs...done
  ;;; Writing fasl file /home/layer/src/zipcodes/zip-api.fasl
  9094928 bytes have been tenured, next gc will be global.
  See the documentation for variable *global-gc-behavior* for more information.
  ;;; Fasl write complete
  ;   Fast loading /home/layer/src/zipcodes/zip-package.fasl
  ;   Fast loading /home/layer/src/zipcodes/zip-util.fasl
  ;   Fast loading /home/layer/src/zipcodes/zip-api.fasl
  ;;; Compiling file /home/layer/src/google/google-maps.cl
  ;   Fast loading /backup/acl/acl80/code/aserve.004
  ;;; Installing aserve patch, version 4
  ;     Fast loading from bundle code/acldns.fasl.
  ;;; Installing acldns patch, version 1
  ;;; Writing fasl file /home/layer/src/google/google-maps.fasl
  ;;; Fasl write complete
  ;   Fast loading /home/layer/src/google/google-maps.fasl
  ;;; Compiling file /home/layer/src/usgs/usgs.cl
  ;   Fast loading /backup/acl/acl80/code/regexp2.001
  ;;; Installing regexp2 patch, version 1
  ;     Fast loading /backup/acl/acl80/code/yacc.001
  ;;; Installing yacc patch, version 1
  ;   Fast loading /backup/acl/acl80/code/update.fasl
  ;;; Installing update patch, version 6
  ;     Fast loading /backup/acl/acl80/code/crc.fasl
  ;     Fast loading /backup/acl/acl80/code/inflate.fasl
  ;       Fast loading from bundle code/iodefs.fasl.
  ;         Fast loading from bundle code/iordefs.fasl.
  ;           Fast loading /backup/acl/acl80/code/efmacs.002
  ;;; Installing efmacs patch, version 1
  ;;; Writing fasl file /home/layer/src/usgs/usgs.fasl
  ;;; Fasl write complete
  ;   Fast loading /home/layer/src/usgs/usgs.fasl
  ;     Loading /home/layer/src/usgs/google-maps-key.cl
  ;; Try this:

  (get-quake-info :period :week
    :larger-than nil
    :within 1.0
    :reference-location (place-to-location
			   "555 12th St, Oakland, CA, 94607, US"))
  cl-user(4): 

Now, you can evaluate the form printed just before the prompt to see
if it works for you.  Make sure you are in the `cl-user' package when
you do evaluate the above form.  NOTE: the IDE starts up in the
`cg-user' package.
