Historical Maps Research < Jerod Weinman < CompSci < Grinnell


Example map detail
Figure 1. As in this 1896 bicycle trail map of California's central valley, historical maps feature irregular typefaces, overlapping text and graphical elements, and text with curved baselines and arbitrary orientation. detail. (Credit: Image 1592006 © Cartography Associates

Maps tell tales of politics, people, and progress. Historical and print map collections are important library and museum holdings frequently available only to scholars with physical access. Fortunately, many are now being digitized for more widespread access via the web.

Unfortunately, most maps are only coarsely indexed with meta-data, increasingly even georectified, while the contents themselves remain largely unsearchable. The goal of this project is to further increase the accessibility of information locked in geographical archives by automatically recognizing place names (toponyms).


Bayesian network
Figure 2. Graphical probability model (Bayesian network) for toponym recognition. Greyed ovals are observed values, and values inside the plate are replicated N times, once for each word to be recognized in the map. (Copyright © 2017 IEEE. Used by permission.)

For a set of word images from the map and their and image coordinates, we seek the strings, projection, and alignment that maximize the joint probability described by the Bayesian model at right (Figure 2). To support this, the model associates the underlying category and specific feature represented by each word with the typographical style in which it is rendered.

The global georectification alignment relates image coordinates to a gazetteer of known feature locations projected into the same Cartesian plane. We search for an initial maximum likelihood alignment with RANSAC and refine the result with Expectation Maximization.

Processing algorithm
Figure 3. Learning process for the Bayesian model. Outputs are cumulative and displayed only where updated. (Copyright © 2017 IEEE. Used by permission.)

A general robust word recognition system (a deep convolutional network with a flexible parser) produces a ranked list of strings and associated prior probabilities. We then use the most probable cartographic projection and alignment to update posterior probabilities for the recognized strings.

Taking the initial alignment from multiple cartographic projection families as a starting point, we refine the model parameters by expectation maximization. Later stages consider entities from additional geographic categories with appropriate cartographic models, allowing label words to be positioned anywhere along river flowlines or within regional boundaries (more below).

Finally, using an a priori clustering of words into text styles, we correlate style and geographic categories to finalize the bias among geographic features and rerank recognitions with Bayesian posteriors.

Text Separation

To detect words in maps, we have retrained a convolutional neural network (CNN) called the Faster RCNN (Ren et al., 2016). Each of the following images (click to see full map) demonstrates performance.

Example Map Text Detections 1 Example Map Text Detections 2 Example Map Text Detections 3

Alignment and Recognition

Predicted Location Prior and Posterior probabilities
Figure 4. Relevant probabilities for the word "Pensacola". Left: Given the alignment, the predicted location of the best toponym (2404503: City of Pensacola) is shown at the mean of the likelihood, surrounded by three isotropic standard deviations. Right: prior and posterior probabilities for top candidate strings. (Copyright © 2013 IEEE. Used by permission.)

Table 1: Example results.🆕
Error (%)
Word   Char
Prior 22.29 10.19
Posterior 14.23 6.75

In these results, geographical information enables us to infer coarse georeferencing of historical map images from noisy OCR output. The table at right lists error percentages and harmonic mean of correct words' ranks using twenty maps with 6,949 words where the correct string was ranked by the original OCR and present in the gazetteer. Jointly inferring toponym hypotheses and gazetteer alignment eliminates 37% of OCR errors.

Alignment Search
Figure 5. Corresponding state shape outline as a RANSAC search progresses to higher-scoring alignments. (Credit: Map Image 1070001 © Cartography Associates)

The animation at left shows alignments as the RANSAC-based search progresses to higher scores, a process which takes about 20 seconds after fewer than 500 random samples. Note that only the extracted words and their locations are used to determine the alignment no other image information, such as boundary contours. Reranking OCR scores using the final alignment reduces word recognition error in this image from 44% to 24%.

The figure below shows word recognitions and their posterior probabilities overlaid on a region of a map with complex artifacts. One recognition is marked incorrect and has a lower a score because the 1860's era map uses an unconventional spelling, and the spatial alignment pushes the system to produce its modern name.

Final text recognition and map
Figure 6. Overlaid posterior word recognitions and scores. (Credit: Map Image 1070006 © Cartography Associates)

Style and Category Modeling

Maps may pictorially and textually represent entities from a wide variety of categories, from cities to rivers or administrative regions. To support these additional categories as well as the tendency of a map, atlas, or geographic region to exhibit bias among them, we automatically link the category to a learned typographical representation of the text labels' styles.

The figure below demonstrates how the model clusters words by style according to their category, representing primarily water features (italics), city names (standard font), and counties or states (all capital letters).

Style Clusters
Figure 7. Text style clusters. Word images are ranked by probability of containing an LDA-based [Blei03] style (k=4) using LBP features [Nicolau15]. (Credit: Map Image 5242001 © Cartography Associates)
Style-Category Prior
Figure 8. Comparison of example ground truth and learned style-conditional category probabilities from two of the clusters above.

The figure at right illustrates how the style clusters can induce appropriate bias for particular geographical categories. Without any style information, the geographical entities for the map are determined to be mostly city or town placenames or counties (by a 2:1 margin), with a few rivers, lakes, and other outliers. We can see the uncertainty drastically reduced among several style clusters, which here separate placenames and counties into individual clusters, giving them an appropriate bias.

Non-point Feature Labels

Given a georeferenced map, predicting the location of a label for a small-scale point feature may be done with straightforward Gaussian model, which efficiently penalizes the squared distance betweeen the geographic prediction and the cartographic representation (see Figure 4). Because rivers and large lakes or counties do not fit this formulation, we instead penalize the minimum distance between the cartographic representtation and the geographic prediction, which is now a polyline or polygon. Binarizing the known shape on the image grid allows us to use a linear-time distance transform algorithm for effiency. The figure below shows an example of likelihood function, predicting where a label should appear for a particular geographic feature in a georeferenced map, along with the corresponding prior (OCR alone) and posterior (OCR plus georeferenced likelihood) probability distributions.

Predicted Location Prior and Posterior probabilities
Figure 9. River name recognition: contours of polyline likelihood (1579990: Namekagon River) and comparison of prior (using OCR alone) and posterior (adding automatic georeferencing) probability distributions. (Copyright © 2017 IEEE. Used by permission.)

Data and Support Code


The map images, text box annotations, and gazetteer regions used for recognition in these papers are available. If this data or code is used in a publication, please cite the 2017 ICDAR paper above.

We gratefully acknowledge the David Rumsey Map Collection as the source of the map images, which come with the following notice:

Images copyright © 2000 by Cartography Associates. Images may be reproduced or transmitted, but not for commercial use. ... This work is licensed under a Creative Commons [Attribution-NonCommercial-ShareAlike 3.0 Unported] license. By downloading any images from this site, you agree to the terms of that license.

Toponym Annotations

These annotations are used in the ICDAR 2017 paper and are designed for toponym recognition, rather than text/graphics separation because several non-toponym words are not marked.

An archived version of this data set is permanently available at http://hdl.handle.net/11084/19349

The 12 map annotations used in the 2013 ICDAR paper are available at http://hdl.handle.net/11084/3246

Complete Annotations

These annotations mark all words and characters in the map images.

An archived version of this data set is permanently available at http://hdl.handle.net/11084/23294


Ground Truth Processing

The following Java code processes and stores data from the 2017 XML annotations. The Matlab code uses the Java objects to produce cropped, normalized word images.

  • map.tar (Java and Matlab source)
  • map.m (README Matlab source)

Map Text Detection

The following Caffe fork contains a preliminary trained convolutional neural network for detecting map image text, based on the Faster R-CNN (Ren et al., 2016).

Word Recognition

This package contains our TensorFlow implementation of a deep convolutional stacked bidirectional LSTM trained with CTC-loss for word recognition, inspired by the CRNN (Shi, Bai, & Yao, 2015).

This companion package contains a customized fork of the TensorFlow system with classes for doing trie-based, lexicon-restricted output of the word recognition model above (which uses only standard TensorFlow).

Map Text Synthesizer

This package contains our synthesizer for images of cropped map words, designed for training a map text recognition system or other highly cluttered text scenarios.


Erik Learned-Miller (UMass), Jerod Weinman (Grinnell)
Graduate Students (UMass):
Pia Bideau, Francisco Garcia, Huaizu Jiang, Archan Ray, Terri Yu
Undergraduate Students (Grinnell):
Toby Baratta, Larry Boateng Asante, David Cambronero Sanchez, Ravi Chande, Ziwen Chen, Ben Gafford, Nathan Gifford, Dylan Gumm, Abyaya Lamsal, Matt Murphy, Liam Niehus-Staab, Kitt Nika, Bo Wang, Shen Zhang

Acknowledgments and Disclaimer

This material is based upon work supported by the National Science Foundation under Grant Numbers 1526350 and 1526431 in collaboration with Erik Learned-Miller. Any opinions, findings, and conclusions of recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Last modified: Tuesday, 02-Oct-2018 10:31:05 CDT