The data that I am using here fits in memory, but dask
will work even when the data is larger than memory.
import dask.dataframe as dd df = dd.read_csv(r'F:\Novus\Decision Tree\Data\iris_data_copy.txt', sep='\t', header=0, encoding='latin-1', blocksize=100**4) df.npartitions df_summary = df.groupby(['Species']).mean() print(df_summary.compute())
No comments:
Post a Comment