We believe the soul of BigCode to be clear and transparent communication striving towards open collaboration. The project, therefore, runs under the following set of open and permissive licenses. Datasets. We value openness and transparency about the training data of LLMs and intend to release datasets whenever we have the rights to do so. We will also provide data cards for all datasets we release. Please see the Dataset Card for The Stack.| BigCode
BigCode is an open scientific collaboration working on the responsible development and use of large language models for code (Code LLMs), empowering the machine learning and open source communities through open governance. One of the challenges typically faced by researchers working on Code LLMs is the lack of transparency around the development of these systems. While a handful of papers on code LLMs have been published, they do not always give full insight into the development process, whic...| BigCode
We’re on a journey to advance and democratize artificial intelligence through open source and open science.| huggingface.co