Review: Domain Adaptation in the Age of Deep Models
Abstract
We briefly introduce domain adaptation and discuss some highlights from relevant CVPR ‘17 papers.
What is domain adaptation?
Modern supervised machine learning models usually require to be trained on extraordinarily large labeled data set in order to achieve state-of-the-art accuracy. For example, the benchmark in object detection and localization, the ImageNet consists of more than fourteen million images from twenty-seven high-level categories (e.g., fish, flowers, &c) - with more than twenty thousand subcategories, or synsets (e.g., begonia, orchid, &c). Collecting datasets at this magnitude is usually not a problem, but annotating them surely is, and is more difficult than we may expect.
Besides inaccessibility of large labeled data set, some models are hard to train, for example, a deep neural network usually cost hours on an average machine (that is to say, with no GPU support), if not days to converge. Naturally, people wish their models to be robust in the sense that they can be generalized well on novel unseen scenarios with little or even no access to labeled data to further retrain or fine-tune.
Hence, we may ask ourselves: how can we have the trained models transfer their knowledge from the domain they have been trained to a novel-but-somewhat-related domain well? The related domain, or the target domain, may have some labeled data (semi-supervised) or have no labeled data at all (unsupervised) - this is the core question Domain Adaptation intended to seek out a decent answer.
What are common applications?
Domain adaptation addresses problems more than that. Indeed, transferring knowledge between related domains comes with different flavors, listed below are a few common use cases:
Face Recognition
Faces in the wild differ from the those taken under controlled scenes in terms of pose, illumination, variations in the backgrounds, etc. Traditional methods often learn a projection or transformation to augment data. It is widely researched that projecting a set of faces under different conditions (provided to be the faces of the same subject, of course) to a lower dimensional subspace, and discriminate them on such spaces is workable. Why this might work? Well, at the end of the last century, people showed that same faces under the aforementioned condition variations tend to be lying on the same lower dimensional subspace (or manifolds. Indeed, this is usually referred to as “the manifold hypothesis”) and can be separated from the manifolds formed by other faces using an off-the-shelf classifier such as support vector machines readily.
Recent work, in addition to continue exploring in the traditional land, as we shall see later, tends to leverage the capability of deep neural networks as well.
Object Detection
As previously given as an example, having a well-trained model perform well in reality is very important. We have seen models such as Faster-RCNN doing this job pretty well. Nonetheless, it is almost surely more to achieve, especially for benchmarking, e.g., on the Office+Caltech dataset.
What’s the progress?
In this year’s CVPR, we see at least nine papers diving into the area of domain adaptation. Surely they addresses different issues at different levels, and we noticed a few interesting traits:
Leveraging Deep Models
Even flashbacking only few years to 2014, when DNN’s been a hot topic quite for a while, a well cited thorough survey [Patel et al. 2015] failed to mention too much about DA using DNN’s at all. In this year, on the contrary, we see several deep models, including the one:
- achieves compactness: using as few as 59% of parameters compared to GoogLeNet yet achieving similar DA task accuracy [Wu et al. 2017];
- introduces a novel hashing layer and hash loss [Venkateswara et al. 2017];
- trains on the target domain by jointly fine-tuning low-level features from the source domain [Ge and Yu 2017]; and most excitingly:
- borrows ideas from generative models [Bousmalis et al. 2017; Tzeng et al. 2017]
We highlight some of the key features of those generative-based models:
Model | Highlight | Pointer |
---|---|---|
PixelDA | Learns transformation of pixel space between domains with results look as if drawn from the target domain | [Bousmalis et al. 2017] |
ADDA | Exploits adversarial loss; extends well to cross-modality tasks | [Tzeng et al. 2017] |
We see they were inspired from GAN’s differently: PixelDA attempts to map both the source data and noise through GAN such that the generated data seems to be sampled from the target domain as far as the classifier is concerned. Their model is also not unified as the classifier may be changed according to specific tasks; ADDA mainly incorporated an adversarial loss (which they referred to as the GAN loss).
Exploring Shallow Methods
Besides deep models, we see several non-DNN based models (hence referred to as “Shallow Models”), that:
- enhances the classic Maximum Mean Discrepancy paradigm [Yan et al. 2017];
- explore further in the traditional subspace learning methods [Herath et al. 2017; Zhang et al. 2017; Koniusz et al. 2017];
In particular, [Yan et al. 2017] proposed to weight source classes differently, in the hope to impose class priors in case the cross-domain data are not very balanced (i.e., some classes from the source domain may be missing in the target domain). A weighted domain adaptation network based on Weighted MMD and CNN has been tested. Here we see again a deep model, nevertheless, we consider the key feature, i.e., the notion of weighted MMD to be more related to the canonical DA approaches.
[Herath et al. 2017; Zhang et al. 2017; Koniusz et al. 2017] all dug further on data augmented-related approaches. [Herath et al. 2017] is motivated from several state-of-the-art geodesic flow kernel methods and directly learns to construct a latent Hilbert space (that is, a vector space plus the notion of inner products) so as to project both the source and target data onto this space in hope to reduce the discrepancy between domains. Also worthy of highlighting, a notion of discriminatory power is proposed. This notion, as far as we are concerned, is analogous to the classical discriminant analysis - considering both between-class dissimilarity and within-class similarity.
In line of the idea of projection, [Zhang et al. 2017] learns two coupled projections to reduce geometrical and distribution shift. [Koniusz et al. 2017] deals with second order or higher order scatter statistics.
In a nutshell, these approaches exploit the notion of subspace learning (that is, projection followed by optimization on certain discrepancy measures in order to align similar data from both domains and misalign distinct data), the difference is how they achieve this.
Remark
As we have seen, incorporating deep models into DA related tasks seems to be pretty trendy. This is not surprising though, as generally deep models achieve better performances than classical data-augmentation approaches in supervised tasks. As for unsupervised DA tasks, both can be improved further.
We have also noted the introducing of generative networks. The ability of GAN’s to facilitate domain adaptation is started to be harnessed.
With the leading roles being played by deep models, traditional methods (that is, non DNN-based methods) are also attractive: a handulf of papers addressed subspace learning related topics. Interestingly though, most of them focused on subspace clustering.
In the future, we expect to witness more work leveraging deep models and integrating them with traditional canons as well.
References
- Bousmalis, K., Silberman, N., Dohan, D., Erhan, D., and Krishnan, D. 2017. Unsupervised Pixel-Level Domain Adaptation With Generative Adversarial Networks. CVPR ’17.
- Ge, W. and Yu, Y. 2017. Borrowing Treasures From the Wealthy: Deep Transfer Learning Through Selective Joint Fine-Tuning. CVPR ’17.
- Herath, S., Harandi, M., and Porikli, F. 2017. Learning an Invariant Hilbert Space for Domain Adaptation. CVPR ’17.
- Koniusz, P., Tas, Y., and Porikli, F. 2017. Domain Adaptation by Mixture of Alignments of Second- or Higher-Order Scatter Tensors. CVPR ’17.
- Patel, V.M., Gopalan, R., Li, R., and Chellappa, R. 2015. Visual domain adaptation: A survey of recent advances. IEEE signal processing magazine 32, 3, 53–69.
- Tzeng, E., Hoffman, J., Saenko, K., and Darrell, T. 2017. Adversarial Discriminative Domain Adaptation. CVPR ’17.
- Venkateswara, H., Eusebio, J., Chakraborty, S., and Panchanathan, S. 2017. Deep Hashing Network for Unsupervised Domain Adaptation. CVPR ’17.
- Wu, C., Wen, W., Afzal, T., Zhang, Y., Chen, Y., and (Helen) Li, H. 2017. A Compact DNN: Approaching GoogLeNet-Level Accuracy of Classification and Domain Adaptation. CVPR ’17.
- Yan, H., Ding, Y., Li, P., Wang, Q., Xu, Y., and Zuo, W. 2017. Mind the Class Weight Bias: Weighted Maximum Mean Discrepancy for Unsupervised Domain Adaptation. CVPR ’17.
- Zhang, J., Li, W., and Ogunbona, P. 2017. Joint Geometrical and Statistical Alignment for Visual Domain Adaptation. CVPR ’17.