2 posts published by mim during August 2025| Small Data And self service
TL;DR : As a quick first impression, I tested Generating SQL Queries based on a sql like semantic model, all the files are stored here , considering i have only 4 GB of VRAM, it is not bad at all !!! to be clear, this is not a very rigorous benchmark, I just used the … Continue reading "Using gpt-oss 20B for Text to SQL "| Small Data And self service
There are moments in life when you know things will never be the same. I remember distinctly when Gary showed me PowerPivot 10 years ago, and I knew that working with data would become as easy as p…| Small Data And self service
1 post published by mim during July 2025| Small Data And self service
I was giving a presentation about Microsoft Fabric Python notebooks and someone asked if they scale. The short answer is yes. You can download the notebook and try it for yourself. For the long ans…| Small Data And self service
This is not an official Microsoft benchmark, just my personal experience. Last week, I came across a new TPCH generator written in Rust. Luckily, someone ported it to Python, which makes generating large datasets possible even with a small amount of RAM. For example, it took 2 hours and 30 minutes to generate a 1 … Continue reading "Some Observations on Running TPCH 1 TB on Microsoft Fabric"| Small Data And self service
TL;DR: Shared a notebook showing the results of Iceberg metadata conversion to Delta in Onelake. I’ve been following the evolution of Iceberg shortcuts to OneLake and I’m genuinely impressed with how the engineering team has invested so much energy into making it more robust, it is a good idea to read the documentation. Essentially, XTable … Continue reading "Stress Testing Iceberg shortcut in Onelake"| Small Data And self service
TL;DR ; This post shares a quick experiment I ran to test how effective (or ineffective) small language models are at generating SQL from natural language questions when provided with a well-defined semantic model. It is purely an intellectual curiosity; I don’t think we are there yet. Cloud Hosted LLMs are simply too good, efficient, … Continue reading "A Non-scientific Benchmark of Text-to-SQL using Small Language Models"| Small Data And self service
Note: The blog and especially the code were written with the assistance of an LLM. TL;DR I built a simple Fabric Python notebook to orchestrate sequential SQL transformation tasks in OneLake using …| Small Data And self service
This is more or less the industry consensus on how a Lakehouse architecture should look in 2025. By now, it’s become clear that Parquet is the de facto standard for storing data, and using an object store to separate storage from compute makes a lot of sense. Another interesting development is how vendors want to … Continue reading "An Excel User’s Perspective on Lakehouse Architecture"| Small Data And self service
🌟 Introduction While testing the DuckDB ODBC driver, which is getting better and better (not production ready but less broken compared to two years ago), I noticed something unexpected. Running que…| Small Data And self service
When attempting to read a Delta table using Python with the deltalake library (Delta_rs, not Spark), you may encounter the following error: import deltalake DeltaTable(‘/lakehouse/default/Tab…| Small Data And self service
I had a simple data ingestion use case, Notebook A inserts data to a Delta Table every 5 minutes and Notebook B backfills the same table with new fields but only at 4 am. Initially I just scheduled…| Small Data And self service