Discover how to identify and resolve column mismatches between Delta Lake tables and SQL Endpoint in Microsoft Fabric| Sandeep Pawar | Microsoft Fabric
Oxbow is a project to take an existing storage location which contains Apache Parquet files into a Delta Lake table. It is intended to run both as an AWS Lambda or as a command line application. We are excited to introduce terraform-oxbow, an open-source Terraform module that simplifies the deployment and management of AWS Lambda and its supporting components. Whether you’re working with AWS Glue, Kinesis Data Firehose, SQS, or DynamoDB, this module provides a streamlined approach to infras...| Scribd Technology
One of the major themes for Infrastructure Engineering over the past couple years has been higher reliability and better operational efficiency. In a recent session with the Delta Lake project I was able to share the work led Kuntal Basu and a number of other people to dramatically improve the efficiency and reliability of our online data ingestion pipeline.| Scribd Technology
We brought a whole team to San Francisco to present and attend this year’s Data and AI Summit, and it was a blast! I would consider the event a success both in the attendance to the Scribd hosted talks and the number of talks which discussed patterns we have adopted in our own data and ML platform. The three talks I wrote about previously were well received and have since been posted to YouTube along with hundreds of other talks.| Scribd Technology
We are very excited to be presenting and attending this year’s Data and AI Summit which will be hosted virtually and physically in San Francisco from June 27th-30th. Throughout the course of 2021 we completed a number of really interesting projects built around delta-rs and the Databricks platform which we are thrilled to share with a broader audience. In addition to the presentations listed below, a number of Scribd engineers who are responsible for data and ML platform, machine learning s...| Scribd Technology
Delta Lake is integral to our data platform which is why we have invested heavily in delta-rs to support our non-JVM Delta Lake needs. This year I had the opportunity to share the progress of delta-rs at Data and AI Summit. Delta-rs was originally started by my colleague QP just over a year ago and it has now grown to now a multi-company project with numerous contributors, and downstream projects such as kafka-delta-ingest.| Scribd Technology
Streaming data from Apache Kafka into Delta Lake is an integral part of Scribd’s data platform, but has been challenging to manage and scale. We use Spark Structured Streaming jobs to read data from Kafka topics and write that data into Delta Lake tables. This approach gets the job done but in production our experience has convinced us that a different approach is necessary to efficiently bring data from Kafka to Delta Lake. To serve this need, we created kafka-delta-ingest.| Scribd Technology
When your core business is selling tyre 1. Build A Data Custodian Network and a Data Catalogue 🕸️ The first step in becoming Data Driven is to identify the experts in the data within your company. Those would generally be people within your IT organisation that have a good understanding of| Michelin IT Engineering Blog