This is not an official Microsoft benchmark, just my personal experience. Last week, I came across a new TPCH generator written in Rust. Luckily, someone ported it to Python, which makes generating large datasets possible even with a small amount of RAM. For example, it took 2 hours and 30 minutes to generate a 1 … Continue reading "Some Observations on Running TPCH 1 TB on Microsoft Fabric"| Small Data And self service
This is more or less the industry consensus on how a Lakehouse architecture should look in 2025. By now, it’s become clear that Parquet is the de facto standard for storing data, and using an object store to separate storage from compute makes a lot of sense. Another interesting development is how vendors want to … Continue reading "An Excel User’s Perspective on Lakehouse Architecture"| Small Data And self service
When attempting to read a Delta table using Python with the deltalake library (Delta_rs, not Spark), you may encounter the following error: import deltalake DeltaTable(‘/lakehouse/default/Tab…| Small Data And self service
How being curious about the unmet user needs can help dashboard designers navigate requests for spreadsheets as data sources, and requests for the ability to export dashboards to Excel| Do Mo(o)re with Data