Feb. 12, 2024, 5:43 a.m. | YunDa Tsai Cayon Liow Yin Sheng Siang Shou-De Lin

cs.LG updates on arXiv.org arxiv.org

This paper reveals a data bias issue that can severely affect the performance while conducting a machine learning model for malicious URL detection. We describe how such bias can be identified using interpretable machine learning techniques, and further argue that such biases naturally exist in the real world security data for training a classification model. We then propose a debiased training strategy that can be applied to most deep-learning based models to alleviate the negative effects from the biased features. …

bias biases cs.cr cs.lg data data bias detection generalized issue machine machine learning machine learning model machine learning techniques paper performance security training url world

Data Architect

@ University of Texas at Austin | Austin, TX

Data ETL Engineer

@ University of Texas at Austin | Austin, TX

Lead GNSS Data Scientist

@ Lurra Systems | Melbourne

Senior Machine Learning Engineer (MLOps)

@ Promaton | Remote, Europe

Senior Principal, Product Strategy Operations, Cloud Data Analytics

@ Google | Sunnyvale, CA, USA; Austin, TX, USA

Data Scientist - HR BU

@ ServiceNow | Hyderabad, India