Fisher kernels for image descriptors: a theoretical overview and experimental results

Bálint Daróczy, András A. Benczúr and Lajos Rónyai

Abstract. Visual words have recently proved to be a key tool in image classification. Best performing Pascal VOC and ImageCLEF systems use Gaussian mixtures or k-means clustering to define visual words based on the content-based features of points of interest. In most cases, Gaussian Mixture Modeling (GMM) with a Fisher information based distance over the mixtures yields the most accurate classification results.
In this paper we overview the theoretical foundations of the Fisher kernel method. We indicate that it yields a natural metric over images characterized by low level content descriptors generated from a Gaussian mixture. We justify the theoretical observations by reproducing standard measurements over the Pascal VOC 2007 data. Our accuracy is comparable to the most recent best performing image classification systems.

Full text PDF