Abstract
We propose a novel Weakly Supervised Learning (WSL) framework dedicated to learn discriminative part detectors from images annotated with a global label. Our WSL method encompasses three main contributions. Firstly, we introduce a new structured output latent variable model, Minimum mAximum lateNt sTRucturAl SVM (MANTRA), which prediction relies on a pair of latent variables: h+ (resp. h-) provides positive (resp. negative) evidence for a given output y. Secondly, we instantiate MANTRA for two different visual recognition tasks: multiclass classification and ranking. For ranking, we propose efficient solutions to exactly solve the inference and the loss-augmented problems. Finally, extensive experiments highlight the relevance of the proposed method: MANTRA outperforms state-of-the art results on five different datasets.
Paper
This paper has been published at IEEE International Conference on Computer Vision (ICCV) 2015.
Bibtex
@inproceedings{Durand_MANTRA_ICCV_2015,
Author = {Thibaut Durand and Nicolas Thome and Matthieu Cord},
Title = {MANTRA: Minimum Maximum Latent Structural SVM for Image Classification and Ranking},
booktitle = {IEEE International Conference on Computer Vision (ICCV)},
Year = {2015}
}
Visual Results
We show the response maps and the predicted regions for differents images of UIUC Sports dataset. The red (resp. blue) box is the region with the maximum (resp. minimum) score. The first column shows the results for the ground truth model, whereas the other columns show the results for wrong class.
sailing | badminton | bocce | rowing |
croquet | badminton | bocce | snowboard |
polo | badminton | croquet | rowing |
rowing | badminton | croquet | sailing |
snowboard | bocce | croquet | polo |
Acknowledgements
This research was supported by a DGA-MRIS scholarship.