https://doi.org/10.71352/ac.40.201
Fisher kernels for image descriptors: a theoretical overview and experimental results
Abstract.
Visual words have recently proved to be a key tool in image classification. Best performing Pascal VOC and ImageCLEF systems use
Gaussian mixtures or k-means clustering to define visual words based on the content-based features of points of interest. In most cases,
Gaussian Mixture Modeling (GMM) with a Fisher information based distance over the mixtures yields the most accurate classification results.
In this paper we overview the theoretical foundations of the Fisher kernel method. We indicate that it yields a natural metric over images
characterized by low level content descriptors generated from a Gaussian mixture. We justify the theoretical observations by
reproducing standard measurements over the Pascal VOC 2007 data. Our accuracy is comparable to the most recent best performing image
classification systems.
