In the previous article, we have talked about how FireDucks lazy-execution can take care of the caching for the intermediate results in order to avoid recomputation of an expensive operation. In today’s article, we will focus on the efficient data flow optimization by its JIT compiler. We will first try to understand some best practices when performing large-scale data analysis in pandas and then discuss how those can be automatically taken care by FireDucks lazy execution model.| fireducks-dev.github.io
FireDucks has a trace function that records how long each process such as read_csv, groupby, sort, etc. takes. This article introduces how to use the trace function. How to output and display trace files To use the trace function, you do not need to modify the program. Simply set the environment variables as shown below and execute the program to use the trace function. $ FIREDUCKS_FLAGS="--trace=3" python -mfireducks.pandas your_program.py After setting the environment variables and executin...| FireDucks – Posts