AdaMix: Mixture-of-Adapter for Parameter-efficient Tuning of Large Language Models
Yaqing Wang, Subhabrata Mukherjee, Xiaodong Liu, Jing Gao, Ahmed Hassan Awadallah, Jianfeng Gao
https://arxiv.org/abs/2205.12410
Contribution
학습
Analysis of consistency loss.
Analysis of adapter weight sharing.