[Data-Science-Seminar] Data Science Seminar on Nov 17

Lulu Kang lkang2 at iit.edu
Mon Nov 16 20:02:20 CST 2015


Dear All

Here is to remind you that there will be a data science seminar talk tomorrow Nov 17. Please pass along the news! 

Speaker: Dr. Sou-Cheng Choi <http://mypages.iit.edu/~schoi32/>, Senior Statistician in NORC at the University of Chicago, and Research Assistant Professor in the Department of Applied Math at IIT. 

Time: Nov 17 11:25 am—12:40 pm. 
Location: SB-220. 

Title: Probabilistic Record Linkage and Address Standardization 

Abstract: Probabilistic record linkage (PRL) refers to the process of matching records from different data sources such as database tables with missing values in primary key. It can be applied to join or de-duplicate records, or to impute missing data, resulting in better overall data quality. An important subproblem in PRL is to parse or standardize a text field such as address into its component fields, e.g., street number, street name, city, state, zip code, and country. Often, various modern data analysis techniques such as natural language processing and machine learning methods are gainfully employed in both PRL and address standardization to achieve higher accuracies of linking or prediction. In a recent study, we compare the performance of a few widely used open-source PRL packages freely available in the public domain, namely FRIL, Link Plus, R RecordLinkage, and SERF. In addition, we evaluate the baseline performance and sensitivity of a number of address-parsing web services including the U.S. address parser, Google Maps APIs, Geocoder.us, and Data Science Toolkit. We will present strengths and limitations of the software and services we have evaluated. This is joint work with Yongheng Lin and Edward Mulrow, NORC at the University of Chicago. 


Lulu Kang
Assistant Professor, Applied Mathematics
Illinois Institute of Technology
Tel: 312-567-5322
Email: lkang2 at iit.edu
URL: math.iit.edu/~lkang2




-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://math.iit.edu/pipermail/data-science-seminar/attachments/20151116/a729ab08/attachment.html 


More information about the Data-science-seminar mailing list