We believe the soul of BigCode to be clear and transparent communication striving towards open collaboration. The project, therefore, runs under the following set of open and permissive licenses. Datasets. We value openness and transparency about the training data of LLMs and intend to release datasets whenever we have the rights to do so. We will also provide data cards for all datasets we release. Please see the Dataset Card for The Stack.| BigCode
StarCoder # Paper: A technical report about StarCoder. GitHub: All you need to know about using or fine-tuning StarCoder. StarCoder: StarCoderBase further trained on Python. StarCoderBase: Trained on 80+ languages from The Stack. StarCoder+: StarCoderBase further trained on English web data. StarEncoder: Encoder model trained on TheStack. StarPii: StarEncoder based PII detector. StarCoder Tools & Demos # StarCoder Playground: Write with StarCoder Models! VSCode Extension: Code with StarCoder!...| BigCode