Designing schemas for large-scale data analysis for OLAP (e.g. BigQuery, Snowflake, Avro, JSON Lines, etc.) is different from designing data structures in code or schemas for relational databases. This post focuses on advice for creating schemas for large-scale data analysis. I use X.509 certificates as concrete example of a dataset in need of a schema because I’ve worked with it a lot in the last 10 years or so. When describing schemas, I represent types in protobuf format, since it’s a ...