I've been working on an OCaml library to read XLSX files, and something I thought was odd is that all strings in an Excel workbook are listed in a "shared strings" file and then referenced by index. This seemed strange to me, since I would expect the compression algorithm to do this kind of work for you, but thinking about it made me better understand why that's necessary, and also what the advantages and disadvantages of the ZIP and tar + compression formats are.