Be More with Less: Scaling Deep-learning with Minimal Supervision
Large-scale deep learning models have reached previously unattainable performance for various tasks. However, the ever-growing resource consumption of neural networks generates large carbon footprint, brings difficulty for academics to engage in research and stops emerging economies from enjoying growing Artificial Intelligence (AI) benefits. To further scale AI to bring more benefits, two major challenges need to be solved. Firstly, even though large-scale deep learning models achieved remarkable success, their performance is still not satisfactory when fine-tuning with only a handful of examples, thereby hindering widespread adoption in real-world applications where a large scale of labeled data is difficult to obtain. Secondly, current machine learning models are still mainly designed for tasks in closed environments where testing datasets are highly similar to training datasets. When the deployed datasets have distribution shift relative to collected training data, we generally observe degraded performance of developed models. How to build adaptable models becomes another critical challenge. To address those challenges, in this dissertation, we focus on two topics: few-shot learning and domain adaptation, where few-shot learning aims to learn tasks with limited labeled data and domain adaption address the discrepancy between training data and testing data. In Part 1, we show our few-shot learning studies. The proposed few-shot solutions are built upon large-scale language models with evolutionary explorations from improving supervision signals, incorporating unlabeled data and improving few-shot learning abilities with lightweight fine-tuning design to reduce deployment costs. In Part 2, domain adaptation studies are introduced. We develop a progressive series of domain adaption approaches to transfer knowledge across domains efficiently to handle distribution shifts, including capturing common patterns across domains, adaptation with weak supervision and adaption to thousands of domains with limited labeled data and unlabeled data.