Question

I am currently reading this paper:
M. Hägglund, F. J. Peña, S. Pashami, A. Al-Shishtawy and A. H. Payberah, “COCLUBERT: Clustering Machine Learning Source Code,” 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA), Pasadena, CA, USA, 2021, pp. 151-158, doi: 10.1109/ICMLA52953.2021.00031. keywords: {Measurement;Codes;Conferences;Bit error rate;Machine learning;Indexes;Task analysis;Source Code Clustering;NLP;BERT;CuBERT}

In part A. Language Modeling (page 2), it describes how the Bert model has been trained for Masked Language Modeling (MLM) and Next Sentence Prediction (NSP).
I was wondering how it works when you train a model for 2 different objectives. How do you put both at the end together into one model ?

I was thinking maybe you create a pipeline of 2 separate models trained each on one of the 2 objectives. But even then I am having a hard time understanding how you choose the order and if the appropriate order changes based on the input.

I am a Master CS student, so I should be able to understand on my own. But it’s only recently that I’ve taken my studies really seriously and really want to gain knowledge, not only for passing exams…

Appreciate any take on this matter :).

Understanding Bert Composition

LEAVE A COMMENT Hủy