The availability of runtime memory is often a challenge faced at processing larger-than-memory-dataset while working with pandas. To solve the problem, one can either shift to a system with larger memory capacity or consider switching to alternative libraries supporting distributed data processing like (Dask, PySpark etc.). Well, do you know when working with data stored in columnar formats like csv, parquet etc. and only some part of data is to be processed, manual optimization is possible e...| fireducks-dev.github.io
Thank you for your interest in FireDucks. This article describes possible causes and remedies for slow programs using FireDucks. When a pandas program with FireDucks applied is slow, the reason may be the followings. Using ‘apply’ or ’loop’. Using pandas API not implemented in FireDucks. In the case of 1, if you change the pandas program, the program may become faster. For example, sum_val = 0 for i in range(len(df)): if df["A"][i] > 2: sum_val += df["B"][i] A program using ’loop’...| fireducks-dev.github.io