As described here, FireDucks uses lazy execution model with define-by-run IR generation. Since FireDucks uses MLIR compiler framework to optimize and execute IR, first step of the execution is creating MLIR function which holds operations to be evaluated. This article describes how important this function creation step is for optimization, thus performance. In the simple example below, execution of IR is kicked by the print statement which calls df2.__repr__(). df0 = pd.| fireducks-dev.github.io
The availability of runtime memory is often a challenge faced at processing larger-than-memory-dataset while working with pandas. To solve the problem, one can either shift to a system with larger memory capacity or consider switching to alternative libraries supporting distributed data processing like (Dask, PySpark etc.). Well, do you know when working with data stored in columnar formats like csv, parquet etc. and only some part of data is to be processed, manual optimization is possible e...| fireducks-dev.github.io