Task Graphs and Custom Workloads
Back to modules
Course progress0%
article
Custom workload patterns
Choose the right API for custom Python at scale.
Custom Workload Patterns
High-level collections cover many workflows, but teams often need custom logic. Dask gives several escape hatches.
Choosing an API
- Use
map_partitionsfor independent DataFrame partition transforms. - Use
map_blocksfor independent array block transforms. - Use
delayedfor custom dependency graphs. - Use futures for real-time, interactive task submission.
Partition transform
def normalize(pdf):
pdf["score"] = (pdf["score"] - pdf["score"].mean()) / pdf["score"].std()
return pdf
clean = ddf.map_partitions(normalize, meta=ddf)
Design habit
Write the custom function as if it receives one normal pandas DataFrame or NumPy block. Then let Dask distribute it.