Adaptive Windows for Duplicate Detection

Adaptive Windows for Duplicate Detection
Author :
Publisher : Universitätsverlag Potsdam
Total Pages : 46
Release :
ISBN-10 : 9783869561431
ISBN-13 : 3869561432
Rating : 4/5 (432 Downloads)

Book Synopsis Adaptive Windows for Duplicate Detection by : Uwe Draisbach

Download or read book Adaptive Windows for Duplicate Detection written by Uwe Draisbach and published by Universitätsverlag Potsdam. This book was released on 2012 with total page 46 pages. Available in PDF, EPUB and Kindle. Book excerpt: Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is difficult, because (i) representations might differ slightly, so some similarity measure must be defined to compare pairs of records and (ii) data sets might have a high volume making a pair-wise comparison of all records infeasible. To tackle the second problem, many algorithms have been suggested that partition the data set and compare all record pairs only within each partition. One well-known such approach is the Sorted Neighborhood Method (SNM), which sorts the data according to some key and then advances a window over the data comparing only records that appear within the same window. We propose several variations of SNM that have in common a varying window size and advancement. The general intuition of such adaptive windows is that there might be regions of high similarity suggesting a larger window size and regions of lower similarity suggesting a smaller window size. We propose and thoroughly evaluate several adaption strategies, some of which are provably better than the original SNM in terms of efficiency (same results with fewer comparisons).


Adaptive Windows for Duplicate Detection Related Books

Adaptive Windows for Duplicate Detection
Language: en
Pages: 46
Authors: Uwe Draisbach
Categories: Computers
Type: BOOK - Published: 2012 - Publisher: Universitätsverlag Potsdam

DOWNLOAD EBOOK

Duplicate detection is the task of identifying all groups of records within a data set that represent the same real-world entity, respectively. This task is dif
International Symposium on Fuzzy Systems, Knowledge Discovery and Natural Computation (FSKD 2014)
Language: en
Pages: 657
Authors: Defu Zhang, Xiamen University, China
Categories: Language Arts & Disciplines
Type: BOOK - Published: 2014-09-02 - Publisher: DEStech Publications, Inc

DOWNLOAD EBOOK

ICNC-FSKD is a premier international forum for scientists and researchers to present the state of the art of data mining and intelligent methods inspired from n
Covering Or Complete?
Language: en
Pages: 40
Authors: Jana Bauckmann
Categories: Computers
Type: BOOK - Published: 2012 - Publisher: Universitätsverlag Potsdam

DOWNLOAD EBOOK

Data dependencies, or integrity constraints, are used to improve the quality of a database schema, to optimize queries, and to ensure consistency in a database.
Cyber-physical Systems with Dynamic Structure
Language: en
Pages: 40
Authors: Basil Becker
Categories: Computers
Type: BOOK - Published: 2012 - Publisher: Universitätsverlag Potsdam

DOWNLOAD EBOOK

Cyber-physical systems achieve sophisticated system behavior exploring the tight interconnection of physical coupling present in classical engineering systems a
An Abstraction for Version Control Systems
Language: en
Pages: 88
Authors: Matthias Kleine
Categories: Computers
Type: BOOK - Published: 2012 - Publisher: Universitätsverlag Potsdam

DOWNLOAD EBOOK

Version Control Systems (VCS) allow developers to manage changes to software artifacts. Developers interact with VCSs through a variety of client programs, such