Gateway to Think Tanks
来源类型 | Working Paper |
规范类型 | 报告 |
DOI | 10.3386/w24019 |
来源ID | Working Paper 24019 |
How Well Do Automated Linking Methods Perform? Lessons from U.S. Historical Data | |
Martha Bailey; Connor Cole; Morgan Henderson; Catherine Massey | |
发表日期 | 2017-11-20 |
出版年 | 2017 |
语种 | 英语 |
摘要 | This paper reviews the literature in historical record linkage in the U.S. and examines the performance of widely-used automated record linking algorithms in two high-quality historical datasets and one synthetic ground truth. Focusing on algorithms in current practice, our findings highlight the important effects of linking methods on data quality. We find that (1) no method (including hand-linking) consistently produces representative samples; (2) 15 to 37 percent of links chosen by prominent machine linking algorithms are identified as false links by human reviewers; and (3) these false links are systematically related to baseline sample characteristics, suggesting that machine algorithms may introduce complicated forms of bias into analyses. We find that prominent linking algorithms attenuate estimates of the intergenerational income elasticity by up to 20 percent and common variations in algorithm choices result in greater attenuation. These results recommend that current practice could be improved by placing more emphasis on reducing false links and less emphasis on increasing match rates. We conclude with constructive suggestions for reducing linking errors and directions for future research. |
主题 | Labor Economics ; Unemployment and Immigration ; History |
URL | https://www.nber.org/papers/w24019 |
来源智库 | National Bureau of Economic Research (United States) |
引用统计 | |
资源类型 | 智库出版物 |
条目标识符 | http://119.78.100.153/handle/2XGU8XDN/581693 |
推荐引用方式 GB/T 7714 | Martha Bailey,Connor Cole,Morgan Henderson,et al. How Well Do Automated Linking Methods Perform? Lessons from U.S. Historical Data. 2017. |
条目包含的文件 | ||||||
文件名称/大小 | 资源类型 | 版本类型 | 开放类型 | 使用许可 | ||
w24019.pdf(4464KB) | 智库出版物 | 限制开放 | CC BY-NC-SA | 浏览 |
除非特别说明,本系统中所有内容都受版权保护,并保留所有权利。