Jan. 27, 2022, 2:10 a.m. | Xiaonan Li, Yeyun Gong, Yelong Shen, Xipeng Qiu, Hang Zhang, Bolun Yao, Weizhen Qi, Daxin Jiang, Weizhu Chen, Nan Duan

In this paper, we propose the CodeRetriever model, which combines the
unimodal and bimodal contrastive learning to train function-level code semantic
representations, specifically for the code search task. For unimodal
contrastive learning, we design a semantic-guided method to build positive code
pairs based on the documentation and function name. For bimodal contrastive
learning, we leverage the documentation and in-line comments of code to build
text-code pairs. Both contrastive objectives can fully leverage the large-scale
code corpus for pre-training. Experimental results …

