[Data-Science-Seminar] Data Science Seminar on Nov 17
lkang2 at iit.edu
Mon Nov 16 20:02:20 CST 2015
Here is to remind you that there will be a data science seminar talk tomorrow Nov 17. Please pass along the news!
Speaker: Dr. Sou-Cheng Choi <http://mypages.iit.edu/~schoi32/>, Senior Statistician in NORC at the University of Chicago, and Research Assistant Professor in the Department of Applied Math at IIT.
Time: Nov 17 11:25 am—12:40 pm.
Title: Probabilistic Record Linkage and Address Standardization
Abstract: Probabilistic record linkage (PRL) refers to the process of matching records from different data sources such as database tables with missing values in primary key. It can be applied to join or de-duplicate records, or to impute missing data, resulting in better overall data quality. An important subproblem in PRL is to parse or standardize a text field such as address into its component fields, e.g., street number, street name, city, state, zip code, and country. Often, various modern data analysis techniques such as natural language processing and machine learning methods are gainfully employed in both PRL and address standardization to achieve higher accuracies of linking or prediction. In a recent study, we compare the performance of a few widely used open-source PRL packages freely available in the public domain, namely FRIL, Link Plus, R RecordLinkage, and SERF. In addition, we evaluate the baseline performance and sensitivity of a number of address-parsing web services including the U.S. address parser, Google Maps APIs, Geocoder.us, and Data Science Toolkit. We will present strengths and limitations of the software and services we have evaluated. This is joint work with Yongheng Lin and Edward Mulrow, NORC at the University of Chicago.
Assistant Professor, Applied Mathematics
Illinois Institute of Technology
Email: lkang2 at iit.edu
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Data-science-seminar