Route Optimization: Getting Google Cloud and OSRM set up

Looking for some hands-on experience using data analysis to analyze logistics problems? Well look no further, we’re excited to introduce our Route Optimization blog series. In this series we’ll explore the data science behind the one of the industry’s largest supply chain. In Part 1, we show you how to set up an OSRM server using GCP. Stay tuned for Part 2, as we build out an API and use OSRM through python to analyze delivery routes.

American Tire Distributors delivery drivers accumulate more than 80 million miles per year, making nearly 7 million deliveries to customers annually. We’re always looking at ways we can enhance the customer experience throughout the delivery process. From the moment we receive the tires to the moment we deliver the right tire, at the right time. With a national footprint, even the slightest improvement can make a major impact to reduce the distance we drive (and therefore our carbon footprint) is a benefit to our customers and ATD.

As part of our last-mile delivery service, we orchestrate all deliveries through third-party software that helps streamline our delivery schedule. However, there are certain instances where we may want to modify the suggested route to further increase our delivery efficiency. When reviewing the route, we often find ourselves analyzing them from a geospatial and routing perspective.

Driving vs “straight-line” distances

It’s very important that distances between each stop are taken into consideration. Accuracy is important because if we’re using the “straight-line” (aka “great circle”, “as the crow flies”, “haversine”, etc.) distance between two points (customer locations), it’s easy to vastly underestimate the actual distance that a delivery truck would have to drive between each point.

A study from the NY State Cancer Registry and U. of Albany (interesting source for a study on driving distances) found that on average, the actual driving distance between two points is roughly 40% longer than what the “straight-line” distance estimates, which can greatly influence your analysis.

How can you calculate this information?

Google Maps is a helpful resource, but using this would be incredibly tedious when reviewing the distance between thousands of points. Fortunately, Google Maps has an API you can use to programmatically request this data using your language of choice (saving you a lot of time). This integration makes it very easy to capture the data you need and does come at a cost. According to Google’s pricing information, you’ll see this simple integration comes with a pretty hefty cost. If there are 500 locations you want to find the distance (or time) matrix for, the cost for this is (500² requests) * ($5/1000 requests) = $1,250.

This is a one-time expense as long as the locations and/or roads don’t change, but what do you do if you have to scale up or don’t have the budget? ATD has 80,000 customers which means 80,000 different points; we don’t want to spend tens-of-thousands of dollars acquiring this data. What if you’re a student who can’t afford to spend hundreds of dollars to analyze a small dataset for fun? If only there was an open source version of the Google Maps API that you could run for free (or at least significantly cheaper).

Introducing Open Source Routing Machine “OSRM”

You may have heard of OpenStreetMap; an open-source version of Google Maps where the maps are maintained and updated by an army of dedicated volunteers. The platform has some basic routing capabilities you can use manually, as well as an API, but it’s mostly focused on maintaining the maps rather than extracting and analyzing data from them.

OSRM is an open source software solution built on top of OpenStreetMap’s geographical data that’s capable of calculating routes between locations on the fly and very quickly too. OSRM is your own personal version of Google Maps API. You set up a server which accepts HTTP requests containing the location information of the stops, and it returns some data depending on what kind of answer you’re looking for (there are many services available, like distance matrix, fastest route, nearest road, etc.).

Now let’s get OSRM installed and set up

You’ll need to set up the server which is kind of an involved process, but if you’re willing to spend an afternoon downloading map files and setting up docker containers in exchange for saving a bunch of money, it’s a good tradeoff.

As a helpful guide, we’ve outlined the requirements for installation. Depending on the region(s) you want to analyze, you’ll need roughly 30 MB (for Antarctica) to 50 GB (the entire planet) of disk space for the map files, about 7–8 times as much more for all of the pre-processing files (so 8–9x map size in total), and then RAM requirements (for installation) are roughly 10 times as much as the map size.

After installation, the RAM needed to run the software is only 3–5 times as much as the file size for the map. So, if the map is 9 GB, It’s best to have at least 80 GB of disk space, 90 GB of RAM for installation, and then only about 30–40 GB of RAM when running the server after installation. This may be too much for a single machine to handle, especially if you’re just a data enthusiast working on a laptop.

But don’t stop reading yet…

There are cheap and abundant cloud computing resources that were made for just this purpose. For this scenario, we’re going to use the Google Cloud Platform (GCP). If you’re signing up for a new account, you’ll receive $300 in credit which is more than enough to cover the compute costs. If you don’t already have a GCP account set up, below is a guide to help get you set up.

Go ahead and skip this part if you already have a cloud provider or want to use your own machine.

Click here for the Docker install tutorial.

Now that your server is set up, you’ll want to download the map files. Geofabrik hosts up-to-date versions of OpenStreetMaps data for free (you can view a repository of map files here). You can either download an entire continent at once or just an individual country/state depending on what map data you need. Feeling adventurous? You can download the entire planet here. ATD does business in both the U.S. and Canada (NTD), so we downloaded the entire North America map.

$ mkdir -p ~/osrm/maps && cd ~/osrm/maps$ wget maps/

The download time will depend on which map data you’ve decided to use and your internet connection.

Next, we have to get the OSRM backend which is conveniently packaged into a docker container that’s easy to download and run. You’ll find it at the docker hub under the project name osrm/osrm-backend. Simply pull the container file, but don’t start the installation just yet.

$ sudo docker pull osrm/osrm-backend

OSRM needs to pre-process the map files so that it can make fast lookups at runtime and calculate routes. Running this pre-processing step is very computationally expensive, but once it’s done, you’ll only have to do it again when you want to update your map files. OSRM has two algorithms that it uses to calculate routes, called multi-level Dijkstra and contraction hierarchies. We’ll be using the latter because it’s faster (but less flexible in some cases). Because this pre-processing is so memory-intensive, we’re going to need more RAM. But that’s the beauty of cloud computing; if you need more resources, all you need to do is ask for them. Thank you, Google Cloud.

To prepare for this, you’ll need to shut down your VM by returning to your local machine and running:

$ gcloud compute instances stop <instance name> --zone <zone>

or simply go to the GCP console, select your instance and click “Stop”.

In the console, click into your instance’s details, hit “Edit”, then change your machine configuration to one with more memory. We’re going to use a custom machine type that provides 100 GB of RAM which is probably just enough to handle preprocessing for the North America map (we tried it with 60 GB in a previous run and ran out of memory). Unfortunately, GCP limits new accounts to using 8 vCPU’s max, unless you send them a request to increase. For this example, we’ll just stick with 8, but if you can get more, it’ll make the process go faster.

If you’re processing a larger map, use more RAM (recall the rule of thumb is 10 times the disk size of the map file). This will be the most “expensive” step and should cost between $5 to $10 depending on the machine you’re using and how long it runs for. If you’re using the $300 credit for a new GCP signup, this won’t even make a dent in it. You can estimate the cost by using GCP’s pricing calculator. Simply change the “Average hours per day each server is running” to “per month” and input the number of hours you think it’ll take to complete the install. We recommend budgeting at least 10 hours for the North America map.

Once that has been completed, you can start up your machine with:

$ gcloud compute instances start <instance name> --zone <zone>

And, start running the extraction process which takes the raw data from the map file and formats it in a way that OSRM can easily access to calculate routes. You don’t want to keep the terminal open the whole time the install is running, so you can use a pseudo-terminal like tmux, so that it can detach and run in the background.

$ tmux new -s osrm$ cd ~/osrm/maps$ sudo docker run -t -v “${PWD}:/data” osrm/osrm-backend osrm-extract –-threads 8 -p /opt/car.lua /data/north-america-latest.osm.pbf

This step will roughly take an hour to run. During the process, a lot of output will be produced in the terminal.

Most of which is just general information about what’s happening, prefixed with [info], but some of it will be “warnings” prefixed with [warn]. You may get several warnings regarding u-turns which is normal. If you run out of memory (like we did before increasing RAM to 100 GB), you’ll get an error like this:

[error] [exception] std::bad_alloc[error] Please provide more memory or consider using a larger swapfile

If you ran this in tmux, just press Ctrl+B followed by D to “detach” the terminal. You can then go do whatever else you want, and when you want to check back in on the extraction process, simply re-attach the terminal with:

$ tmux a -t osrm

After that’s complete, you’ll see there are a bunch of new files in the ~/osrm/maps directory that were produced during the extraction process.

$ sudo docker run -t -v “${PWD}:/data” osrm/osrm-backend osrm-contract –-thread 8 /data/north-america-latest.osrm

You’re almost there. This final step will take several hours to run, so it may be best to let it run overnight and check back in the morning. If you get any errors, it’s most likely because you don’t have enough RAM, so you’ll need to resize your VM and try again. You might also get several [warning] during the process; most of these are likely harmless and can be ignored.

Using OSRM from the terminal/browser

Now that the pre-processing step is done, we can start the OSRM server and interact with it. Run the following command to boot up the server:

$ sudo docker run -d -p 80:5000 –-rm –-name osrm -v “${PWD}:/data” osrm/osrm-backend osrm-routed --max-table-size 1000 --algorithm ch /data/north-america-latest.osrm

Here is a simple breakdown of the above code:

sudo docker run # run a docker image-d         # run in “detached” mode so the terminal output doesn’t clutter the screen-p 80:5000         # publish the container port (5000) to the host port (80) so the container can communicate with the “outside world”. Remember when we were saying to check the “allow HTTP traffic” box earlier?--rm      # delete the container after exit--name osrm     # name the container “osrm”-v “${PWD}:/data”          # create a filesystem internal to the container with the directory structure “/data” which maps to $PWDosrm/osrm-backend      # the name of the docker imageosrm-routed      # start the OSRM server--max-table-size 1000    # set the max allowed table size to 1000x1000 locations--algorithm ch    # use the contraction hierarchies algorithm/data/north-america-latest.osrm  # use the preprocessed map file we put in the /data directory

Once it starts (run sudo docker logs -f osrm to view the output; it’ll print out some stuff like “starting engines… thread… IP address/port…”. When it says [info] running and waiting for requests, go ahead and press Ctrl+C to stop following the logs. We can go ahead and make requests against the server like so:

$ curl “http://localhost:80/route/v1/driving/-80.868670,35.388501;-80.974886,35.236367?steps=true”

This documentation is a helpful resource for the specific details on how to structure your request; there are also examples on the right-hand side of the page. Running the above command in the terminal on your VM should return a JSON payload containing all the turn-by-turn details of how to get from the first set of coordinates to the second.

The JSON will be a polyline encoding that looks like a garbled mess of characters you can decode by pasting it into Google Map’s interactive polyline utility.

If you replace localhost with your external IP, then you can directly paste the URL into the browser window on your local machine and see the same output (probably formatted a bit nicer too). All you need to do is replace in the official OSRM examples with the domain name of your server which is localhost:80 if you’re executing this on the VM. If you want to send requests to the VM from your local machine, you’ll need to look up the “external IP” for the VM in the cloud console.

80 is a special port — meaning you don’t need to include it when making requests because it’s the default HTTP port, but we’ll keep using it for completeness or if you want to use a different port for your own purposes. Though, you’ll have to adjust your GCP firewall rules to allow traffic to connect through the port. As a heads-up, the external IP changes every time you start/stop your machine. You can either look this up manually from the GCP console every time or you can script it using the following command to easily access it through an environment variable such as echo $EXTERNAL_IP.

$ export EXTERNAL_IP=$(gcloud compute instances describe <instance name> --zone=<zone> --format=’get(networkInterfaces[0].accessConfigs[0].natIP’)

If you want to shut the server down, all you need to do is run sudo docker kill osrm. Since we set the --rm flag, the container will automatically be removed from the list of active containers, and then you can just start it back up again by running that command from earlier.

Understanding OSRM’s services

OSRM provides a number of different services. The route service will basically give you the driving route between a sequence of points in the order supplied. It won’t optimize the order for you but there is a trip service plugin that can. The nearest service will find the nearest road to the supplied (longitude, latitude) coordinate. The table service will supply you the distance/time matrices between a set of locations. The match service is similar to the `nearest` service in that it finds the nearest road to a point, but it’s more general because it can generate directions along the path defined. The tile service will actually return an image of a section of the map in a “vector tile” format which contains metadata about the road network.

We’ll be showing you how to use the route and table services in the next edition of our Route Optimization series where we will analyze some of ATD’s delivery routes with python, so stay tuned.

Who we are as people is who we are as a company. We share the same values in the way we work with each other, our partners and our customers. We are ATD.