Similarity Search in High-Dimensional Vector Spaces
- R. Weber
- Dissertationen zu Datenbanken und Informationssystemen
- Informationstechnologie und Informationssysteme
- Dissertationen zu Datenbanken und Informationssytemen
- Gesamtverzeichnis AKA Verlag
incl. 7% MwSt.
With the example of an image database and the notion of relevance feedback, the dissertation addresses the important and challenging problem of identifying the most similar objects in a database given a set of reference objects and a set of features. Similarity search is typically implemented as "Nearest Neighbour Search" (NN- Search) in high-dimensional vector spaces. Although a large number of existing work have provided solutions for the NN-Search problem, most of them have not taken into account the specific problems of high dimensionality explicitly. This work carefully investigates the so-called "Curse of Dimensionality", and presents a novel, flat organization for NN-Search optimized for high-dimensional spaces: the so-called "Vector Approximation File" (VA-File). The superiority of the VA- File over conventional indexing structures is shown theoretically and with extensive experiments. Further refinements of the VA-File discussed in the current work include approximate search and parallel search in a cluster of workstations. From a practical perspective, the dissertation provides an indexing technique that allows for interactive-time similarity search even in huge databases with gigabytes of vector data. As such, it is the most favourable indexing technique for future multimedia retrieval systems.