April 24, 2024, 5:14 p.m. | /u/dr_greg_mouse

Machine Learning www.reddit.com

Hello, I am trying to model nitrate concentrations in the streams in Bavaria in Germany using Random Forest model. I am using Python and primarily sklearn for the same. I have data from 490 water quality stations. I am following the methodology in the paper from LongzhuQ.Shen et al which can be found here: [https://www.nature.com/articles/s41597-020-0478-7](https://www.nature.com/articles/s41597-020-0478-7)

I want to split my dataset into training and testing set such that the spatial distribution of data in both sets is identical. The idea …

data dataset distribution machinelearning risk set spatial split testing training

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

Lead Developer (AI)

@ Cere Network | San Francisco, US

Research Engineer

@ Allora Labs | Remote

Ecosystem Manager

@ Allora Labs | Remote

Founding AI Engineer, Agents

@ Occam AI | New York