As part of the BigCode project, we released and will maintain The Stack, a 6.4 TB dataset of permissively licensed source code in 358 programming languages, along with a collection of datasets created through the course of research during the project. Release Description v1.0 Initial release of the Stack. Included 30 programming languages and 18 permissive licenses. Note: Three included licenses (MPL/EPL/LGPL) are considered weak copyleft licenses. The resulting near-deduplicated dataset ...| BigCode
We are excited to invite AI practitioners from diverse backgrounds to join the BigCode project! Note that BigCode is a research collaboration and is open to participants who have a professional research background and are able to commit time to the project. In general, we expect applicants to be affiliated with a research organization (either in academia or industry) and work on the technical/ethical/legal aspects of LLMs for coding applications.| BigCode
We’re on a journey to advance and democratize artificial intelligence through open source and open science.| huggingface.co
September 26, 2022: Announcement of the BigCode project. October 6, 2022: Webinar with the BigCode Community to provide strategic direction. October 27, 2022: Introduction of “The Stack” dataset and paper publication. November 15, 2022: Introduction of “Am I in The Stack” tool and BigCode Opt-Out process. November 23, 2022: Details shared on the approach to de-identification of personally identifiable information (PII). November 29, 2022: Sharing of Weights and Biases dashboards ...| BigCode
In this blog post we outline a new naming convention for RAIL licenses that we hope the community shall find useful when conceptualizing and/or selecting their own Use Restrictions.| Responsible AI Licenses (RAIL)
Home page of The Apache Software Foundation| www.apache.org