Skip to content Skip to navigation

Linking Individuals Across Historical Sources: a Fully Automated Approach

Oct 2018
Working Paper
Ran Abramitzky, Roy Mill, Santiago Perez

Linking individuals across historical datasets relies on information such as name and age that is both non-unique and prone to enumeration and transcription errors. These errors make it impossible to nd the correct match with certainty. We suggest a fully automated method for linking historical datasets that enables researchers to create samples that minimize type I (false positives) and type II (false negatives) errors. The rst step of the method uses the Expectation- Maximization (EM) algorithm, a standard tool in statistics, to compute the probability that each two observations correspond to the same individual. The second step uses these estimated probabilities to determine which records to use in the analysis. We provide codes to implement this method.

Geographic Regions: 
General