Create A Geocoding Service On The Cheap

April 25, 2015

Gecoding, because everyone needs to know the lat long for their address - Jay-Z

I have been doing some work with spatial data lately and one of the things I have needed to do is geocode addresses. If you need to do some simple geocoding then the services available from Google, Bing, Baidu, Yahoo etc. are great. However, if need to start geocoding at scale then you run into rate limiting issues.

Unfortunately these services are on the pricey side and an open source solution is always better than a paid one. Postgresql and PostGIS is know for their awesome spatial abilities which make it super simple to do spatial queries. One thing you still need are shapefiles with the necessary data. The census.gov site has all the necessary shapefiles needed to make this happen.

The PostGIS site has a tutorial on how to get the postgis_tiger_geocoder installed but I struggled for more than a day following their instructions. So after hours and hours of struggling I finally got everything working and now it’s time for me to share the steps that you don’t have to struggle like I did.

Lets get started.

These steps assume you are using Ubuntu.

Step 0.

sudo apt-get update
sudo apt-get upgrade
sudo apt-get install postgresql
sudo apt-get install postgis-2.1 wget unzip

Step 1.

sudo adduser postgres sudo
sudo passwd postgres 
touch /var/lib/postgresql/.psql_history

Step 2.

su postgres
psql
\password postgres
CREATE DATABASE geocoder;

Step 3.

CREATE EXTENSION postgis;
CREATE EXTENSION postgis_topology;
CREATE EXTENSION fuzzystrmatch;
CREATE EXTENSION postgis_tiger_geocoder;

Step 4.

GRANT USAGE ON SCHEMA tiger TO PUBLIC;
GRANT USAGE ON SCHEMA tiger_data TO PUBLIC;
GRANT SELECT, REFERENCES, TRIGGER ON ALL TABLES IN SCHEMA tiger TO PUBLIC;
GRANT SELECT, REFERENCES, TRIGGER ON ALL TABLES IN SCHEMA tiger_data TO PUBLIC;
GRANT EXECUTE ON ALL FUNCTIONS IN SCHEMA tiger TO PUBLIC;
ALTER DEFAULT PRIVILEGES IN SCHEMA tiger_data GRANT SELECT, REFERENCES ON TABLES TO PUBLIC;

Step 5.

sudo mkdir /gisdata
sudo mkdir /gisdata/temp
cd /gisdata
sudo chmod 777 .
cd temp
sudo chmod 777 .

Step 6.

psql
\c geocoder 
Generate nationscript
\t
\a
\o /gisdata/nationscript.sh
SELECT loader_generate_nation_script('sh');
\o

Step 7.

Generate statescript
\t
\a
\o /gisdata/statescript.sh
SELECT loader_generate_script(ARRAY['CT'],'sh');
\o

Step 8.

sudo nano  nationscript.sh
TMPDIR="/gisdata/temp/"
UNZIPTOOL=unzip
WGETTOOL="/usr/bin/wget"
export PGBIN=/usr/bin
export PGPORT=5432
export PGHOST=localhost
export PGUSER=postgres
export PGPASSWORD=postgres
export PGDATABASE=geocoder
PSQL=${PGBIN}/psql
SHP2PGSQL=${PGBIN}/shp2pgsql
cd /gisdata

Step 9.

sudo nano statescript.sh
TMPDIR="/gisdata/temp/"
UNZIPTOOL=unzip
WGETTOOL="/usr/bin/wget"
export PGBIN=/usr/bin
export PGPORT=5432
export PGHOST=localhost
export PGUSER=postgres
export PGPASSWORD=postgres
export PGDATABASE=geocoder
PSQL=${PGBIN}/psql
SHP2PGSQL=${PGBIN}/shp2pgsql
cd /gisdata

Step 10.

chmod a+x nationscript.sh
chmod a+x statescript.sh

sudo ./nationscript.sh
sudo ./statescript.sh

Step 11.

psql
\c geocoder
SELECT install_missing_indexes();

Step 12.

edit:
/etc/postgresql/<version>/main/postgresql.conf
listen_addresses='localhost' => listen_addresses='*'

Step 13.

edit:
/etc/postgresql/<version>/main/pg_hba.conf
IPv4 local connections:
host all all 127.0.0.1/32 md5
# IPv6 local connections:

to:
IPv4 local connections:
host all all 127.0.0.1/32 md5
host all all 0.0.0.0/0 md5
# IPv6 local connections:

Step 14.

sudo service postgresql restart

Step 15.

psql
\c geocoder

SELECT
g.rating As rating,
ST_X(g.geomout)::numeric(10,5) As lon,
ST_Y(g.geomout)::numeric(10,5) As lat,
(g.addy).address As snum,
(g.addy).streetname || ' '
|| (g.addy).streettypeabbrev As street,
(g.addy).zip,
geomout as geom
FROM geocode('1241 East Main Street Stamford, CT 06902',1) As g;

If everything worked correctly and I really hope it did (I was able to consistently get up and running following these steps) you will have an address returned which includes the latitude and longitude.


Discussion, links, and tweets

My name is Deon Heyns and I am a developer learning things and documenting them in realtime. Python, Ruby, Scala, .NET, and Groovy are all languages I have written code in. I appeared in the New York Post once. I host my code up at GitHub and Bitbucket so have a look at my code, fork it and send those pull requests.