Thursday, August 24, 2017

Python dask dataframe for large data

The data that I am using here fits in memory, but dask will work even when the data is larger than memory.
import dask.dataframe as dd

df = dd.read_csv(r'F:\Novus\Decision Tree\Data\iris_data_copy.txt', 
                sep='\t', header=0, encoding='latin-1', blocksize=100**4)
df.npartitions

df_summary = df.groupby(['Species']).mean()
print(df_summary.compute())

No comments:

Post a Comment