ELTE logo ELTE Eötvös Loránd University
ANNALES Universitatis Scientiarum Budapestinensis de Rolando Eötvös Nominatae
Sectio Computatorica

Volumes » Volume 43 (2014)

https://doi.org/10.71352/ac.43.039

Theoretical foundations of entity resolution models

Csaba István Sidló, András József Molnár, Gábor Lukács and
András A. Benczúr

Abstract. Data quality is crucial in all information systems. As a key step in obtaining clean data, record linkage or entity resolution (ER) groups database records by the underlying real world entities. In this paper we give practical motivating examples and review the available ER formal models. The formal model for matching and merging records determines not just the power and quality, but also the algorithmic cost of the resolution process. Starting from a naive definition that may lead to unbounded entities or infinite loops and also discussing the shortcomings of the standard axioms, we give algebraic properties that lead to efficient record partitioning. Finally we describe algorithms suitable for complex entity resolution problems that may include fuzzy clustering to split a partition of records into potentially overlapping entities.

Full text PDF
Journal cover