all AI news
Benchmarking for DuckDB and Polars
June 2, 2023, 1:38 p.m. | /u/100GB-CSV
Data Science www.reddit.com
Testing Functions: Read Parquet File => Filter => Group By => Write CSV File
DuckDB: 3.9s Polars: 16.2s
Testing machine: 8 cores and 32 GB memory
====================================================
import duckdb
import time
s = time.time()
con = duckdb.connect()
con.execute("""copy (SELECT Ledger, Account, DC, Currency, SUM(Base\_Amount) as Total\_Base\_Amount
FROM read\_parquet('input/300-MillionRows.parquet')
WHERE Ledger>='L30' AND Ledger <='L70'
GROUP BY Ledger, Account, DC, Currency)
to 'output/DuckFilterGroupByParquet.csv' (format csv, header true);""")
e = time.time()
print("DuckDB FilterGroupBy Parquet Time …
benchmarking csv currency data datascience duckdb filter import machine memory parquet testing
More from www.reddit.com / Data Science
Jobs in AI, ML, Big Data
AI Engineer Intern, Agents
@ Occam AI | US
AI Research Scientist
@ Vara | Berlin, Germany and Remote
Data Architect
@ University of Texas at Austin | Austin, TX
Data ETL Engineer
@ University of Texas at Austin | Austin, TX
Lead GNSS Data Scientist
@ Lurra Systems | Melbourne
Data Engineer - Takealot Group (Takealot.com | Superbalist.com | Mr D Food)
@ takealot.com | Cape Town