Web: https://www.reddit.com/r/LanguageTechnology/comments/sbhz36/how_to_test_statistical_significance_on_text_data/

Jan. 24, 2022, 9:09 a.m. | /u/gtguide

Natural Language Processing reddit.com

So, I was in an interview and I was asked so many questions about statistical details on text data. For example 1. How would you sample million sentences from billions of sentences? What strategies will you use for sampling?

  1. Having sampled, how would determine that the sampled data follows actual data distribution? (In nutshell how would you determine whether two text data distributions are similar or not)

Follow up for these questions were, When will you decide to re-train your …

data languagetechnology statistical test text

Data Scientist

@ Fluent, LLC | Boca Raton, Florida, United States

Big Data ETL Engineer

@ Binance.US | Vancouver

Data Scientist / Data Engineer

@ Kin + Carta | Chicago

Data Engineer

@ Craft | Warsaw, Masovian Voivodeship, Poland

Senior Manager, Data Analytics Audit

@ Affirm | Remote US

Data Scientist - Nationwide Opportunities, AWS Professional Services

@ Amazon.com | US, NC, Virtual Location - N Carolina