j40-cejst-2/docs/decisions/0008-nominatim.md
Vim 6ba17b544d
Update documentation from OS contributors (#1657)
* Update documentation from OS contributors

* Disable markdown links checks for certain links

* Disable expired links
2022-05-27 15:17:17 -07:00

6.2 KiB

Use OSM/Nominatim for Geolocation

Context and Problem Statement

We need a short-term provider of a geolocation service, that is free, open source, and usable for basic searches in our product.

Decision Drivers

  • Free
  • Open Source
  • Usable for basic general-purpose searches
  • Capable of servicing a small number of searches

Considered Options

  • OSM/Nominatim
  • Various paid alternatives: Google/Bing/Others
  • Data Science Toolkit
  • Census REST API

Decision Outcome

Chosen option: OSM/Nominatim, because it is a free an open-source solution that allows for low-volume searches provided you follow straightforward restrictions.

We will monitor our traffic stats and re-evaluate if we have a large surge of people or need to otherwise account for higher traffic.

Positive Consequences

  • Free
  • Open Source
  • Suitable for general-purpose searches
  • Able to give polygon boundaries with polygon_geojson=1 option

Negative Consequences

  • Results are not as high quality as with some other services
  • Various restrictions apply to search volume:
    • a. We must have "an absolute maximum of 1 request per second"
    • b. We must provide a valid HTTP Referer or User-Agent identifying the application
    • c. Clearly display attribution (using the AttributionControl)
    • d. We are advised to setup a proxy to enable caching of search requests.
    • e. They also have this note: "Note: periodic requests from apps are considered bulk geocoding and as such are strongly discouraged. It may be okay if your app has very few users and applies appropriate caching of results. Make sure you stay well below the API usage limits."
  • Relevant result summary -- Here is the result of various kinds of relevant searches using Nominatim. About ~50% accuracy on a variety of responses:
    • St. Paul, Virginia - kind of works
    • St. Paul, VA - did not work
    • 24283 - works
    • Appalachia - did not work (not surprising)
    • St. Paul - did not work
    • Wise County - works
    • Wise County, VA - works
    • St Paul - does not work
    • Clinch river - works (surprisingly)
    • [a more rural address] 3025 4th Ave St. Paul, VA - does not work
    • 3025 4th Ave St. Paul, VA 24283 - does not work
    • 3025 4th Ave - does not work
    • Western Front Hotel - did not work
    • 3025 Fourth Avenue - does not work
    • 4th Ave and Broad St - does not work
    • Carytown - works
    • Carytown, Richmond - works
    • [a more urban address] 3109 W Cary St, Richmond, VA 23221 - works
    • Southwest Virginia - does not work
    • Richmond, VA - works

Pros and Cons of the Options

Paid Geocoders

There are various paid solutions which provide their own geocoding services.

You can find a general comparison of paid providers here (includes Geocodio, SmartyStreets, Google, Bing, Here, Mapbox, and TomTom)

Others providers:

Paid geocoder Trade-offs

  • Pros:
    • Data quality is often much higher than free options in many cases (see detailed comparison here, where OSM has about a 40% accuracy rate (though a small sample size) compared to other providers ).
    • Many come along with built-in / easy-to-use UI components such as Mapbox-gl-geocoder
    • Allows for good flexibility of both input and output
  • Cons:
    • Can be expensive
    • Require a key and associated account
    • Many are not open source

Data Science Toolkit

Data-science-oriented search framework that combines osm with geoip data.

An overall good option, but if we went down this path we would want to host ourselves -- it is notable that it does give this option.

  • Home here

    • Pricing: Free
    • Limitations: "You can get started using the "http://www.datasciencetoolkit.org/" server, but for intensive use or to run behind a firewall, you'll probably want to create your own machine." (source)
    • Sources: This API uses data from the US Census and OpenStreetMap, along with code from GeoIQ andSchuyler Erle.

DSTK Trade-offs

  • Pros:
    • Free
    • Has a hosted version
    • Augments OSM data using geoip
  • Cons:
    • Recommended that you host your own given these limitations
    • Not recommended for production use

Census Geocoding Services

Geocoding service provided directly by the census. More info here.

Trade-offs

  • Pros:

    • Quite good results if you give a whole address
    • Ties results back directly to census block group boundaries
    • Allows for querying different layers of census data directly via layers property
  • Cons:

    • Limited in input: Only two options available: street + city + state + zip ("address" lookup), or basically full addresses ("onelineaddress" lookup). No direct options for flexible input.
    • Limited in output: no direct polygon boundaries