CedarDB is a database system that delivers unmatched performance for transactions and analytics, from small writes to handling billions of rows. Built on cutting-edge research to power today’s tools and tomorrow’s challenges.| cedardb.com
In the previous article, we have talked about how FireDucks lazy-execution can take care of the caching for the intermediate results in order to avoid recomputation of an expensive operation. In today’s article, we will focus on the efficient data flow optimization by its JIT compiler. We will first try to understand some best practices when performing large-scale data analysis in pandas and then discuss how those can be automatically taken care by FireDucks lazy execution model.| fireducks-dev.github.io
Research says that Data scientists spend about 45% of their time on data preparation tasks, including loading (19%) and cleaning (26%) the data. Pandas is one of the most popular python libraries for tabular data processing because of its diverse utilities and large community support. However, due to its performance issue with the large-scale data processing, there is a strong need for high-performance data frame libraries for the community. Although there are many alternatives available at t...| fireducks-dev.github.io