From pandas to Dask DataFrames
Back to modules
Course progress50%
article
Partitioned DataFrame mental model
Build the right intuition before scaling a pandas workflow.
Partitioned DataFrame Mental Model
Dask DataFrame feels familiar because it mirrors much of pandas, but the execution model is different. A Dask DataFrame is many pandas DataFrames plus a lazy task graph.
Core ideas
- Partitions are the unit of parallel work.
- Transformations build a graph.
- Actions such as
compute()request results. - The dashboard helps explain what workers are doing.
Starter example
import dask.dataframe as dd
ddf = dd.read_parquet("s3://analytics/events/")
daily = ddf.groupby("event_date").revenue.sum()
result = daily.compute()
Mental model
If pandas is a single table in memory, Dask DataFrame is a plan for many table fragments that can be processed where the data lives.
1
Partitioned DataFrame mental model
DataFrame foundations