[D] Preserving spatial distribution of data during data splitting | allainews.com

April 24, 2024, 5:14 p.m. | /u/dr_greg_mouse

Machine Learning www.reddit.com

Hello, I am trying to model nitrate concentrations in the streams in Bavaria in Germany using Random Forest model. I am using Python and primarily sklearn for the same. I have data from 490 water quality stations. I am following the methodology in the paper from LongzhuQ.Shen et al which can be found here: [https://www.nature.com/articles/s41597-020-0478-7](https://www.nature.com/articles/s41597-020-0478-7)

I want to split my dataset into training and testing set such that the spatial distribution of data in both sets is identical. The idea …

data dataset distribution machinelearning risk set spatial split testing training

More from www.reddit.com / Machine Learning

[D] GPT-4o "natively" multi-modal, what does this actually mean? 2 hours ago | www.reddit.com

architecture embed encoder fine-tune +15

[D] Is BERT still relevant in 2024 for an EMNLP submission? 3 hours ago | www.reddit.com

active learning applications bert classification +7

[R] Embedding Learning: New idea for calculating ideal margin penaltys 6 hours ago | www.reddit.com

embedding embeddings face facial recognition +7

[D] How do you get better at reading proof in the ML papers, with background … 8 hours ago | www.reddit.com

adversarial basic calculus context +6

[D] The usefulness of the last linear layer of each transformer layer 10 hours ago | www.reddit.com

kind layer linear machinelearning +7

[P] A Dataset for The Global Artificial Intelligence Championship Math 2024 11 hours ago | www.reddit.com

artificial artificial intelligence collection competition +14

[D] Have someone tried to implement KANs from scratch? 13 hours ago | www.reddit.com

announcement architecture deep learning domain +7

[D] Full causal self-attention layer in O(NlogN) computation steps and O(logN) time rather than O(N^2) … 17 hours ago | www.reddit.com

attention big causal computation +6

[Discussion] MICCAI 2024 decisions 18 hours ago | www.reddit.com

application decisions discuss email +5

Data Engineer

@ Lemon.io | Remote: Europe, LATAM, Canada, UK, Asia, Oceania

View on ai-jobs.net

Artificial Intelligence – Bioinformatic Expert

@ University of Texas Medical Branch | Galveston, TX

View on ai-jobs.net

Lead Developer (AI)

@ Cere Network | San Francisco, US

View on ai-jobs.net

Research Engineer

@ Allora Labs | Remote

View on ai-jobs.net

Ecosystem Manager

@ Allora Labs | Remote

View on ai-jobs.net

Founding AI Engineer, Agents

@ Occam AI | New York

View on ai-jobs.net