/u/DarthVader9396

Natural Language Processing

Hello Everyone,

I am a newbie in NLP research. My question is - How should we benchmark a new Language dataset/corpus (ex. dialogue dataset, q/a dataset) when there is no publicly available dataset for that particular language? Also what are the possible directions to perform evaluation on the newly prepared dataset. Need suggestions, please.

