There have been a number of engineering improvements to Dask Array like consistent chunksizes in Xarray rolling-constructs and improved efficiency in map_overlap. Notably, as of Dask version 2024.11.2, calculating quantiles is much faster and more reliable. Calculating Quantiles with Xarray Calculating quantiles is a common operation for geospatial …| Patrick Hoefler
Running large-scale GroupBy-Map patterns with Xarray that are backed by Dask arrays is an essential part of a lot of typical geospatial workloads. Detrending is a very common operation where this pattern is needed. In this post, we will explore how and why this caused so many pitfalls for Xarray …| Patrick Hoefler
Intro Dask DataFrame scales out pandas DataFrames to operate at the 100GB-100TB scale. Historically, Dask was pretty slow compared to other tools in this space (like Spark). Due to a number of improvements focused on performance, it's now pretty fast (about 20x faster than before). The new implementation moved Dask …| Patrick Hoefler
The most interesting things about the new release pandas 2.2 was released on January 22nd 2024. Let’s take a look at the things this release introduces and how it will help us to improve our pandas workloads. It includes a bunch of improvements that will improve the user …| Patrick Hoefler
Get rid of annoying SettingWithCopyWarning messages| phofl.github.io
Explaining the migration path for Copy-on-Write| phofl.github.io
The most interesting things about the new release| phofl.github.io
You can use Coiled Run| phofl.github.io
Explaining how Copy-on-Write optimizes performance| phofl.github.io
We recently pushed out two new and experimental features Coiled Jobs| phofl.github.io
Or: How writing efficient pandas code matters| phofl.github.io
Get the most out of PyArrow support in pandas and Dask right now| phofl.github.io
Explaining how Copy-on-Write works internally| phofl.github.io
Introduction| phofl.github.io
We recently pushed out two new and experimental features coiled jobs| phofl.github.io
Explaining the pandas data model and its advantages| phofl.github.io
Getting notified of a significant performance regression the day before release sucks, but quickly identifying and resolving it feels great!| phofl.github.io
How the API is changing and how to leverage new functionalities| phofl.github.io
Improve performance when selecting data from a pandas object| phofl.github.io