Task Graphs and Custom Workloads

Back to modules
Course progress0%
article

Custom workload patterns

Choose the right API for custom Python at scale.

Custom Workload Patterns

High-level collections cover many workflows, but teams often need custom logic. Dask gives several escape hatches.

Choosing an API

  • Use map_partitions for independent DataFrame partition transforms.
  • Use map_blocks for independent array block transforms.
  • Use delayed for custom dependency graphs.
  • Use futures for real-time, interactive task submission.

Partition transform

def normalize(pdf):
    pdf["score"] = (pdf["score"] - pdf["score"].mean()) / pdf["score"].std()
    return pdf

clean = ddf.map_partitions(normalize, meta=ddf)

Design habit

Write the custom function as if it receives one normal pandas DataFrame or NumPy block. Then let Dask distribute it.

Custom workload patterns

Custom functions