Gallery examples: Categorical Feature Support in Gradient Boosting Combine predictors using stacking Partial Dependence and Individual Conditional Expectation Plots Permutation Importance vs Random...| scikit-learn
Gallery examples: FeatureHasher and DictVectorizer Comparison| scikit-learn
The Array API specification defines a standard API for all array manipulation libraries with a NumPy-like API. Scikit-learn vendors pinned copies of array-api-compat and array-api-extra. Scikit-lea...| scikit-learn
This example will demonstrate the set_output API to configure transformers to output pandas DataFrames. set_output can be configured per estimator by calling the set_output method or globally by se...| scikit-learn
Gallery examples: Scalable learning with polynomial kernel approximation Compare the effect of different scalers on data with outliers Clustering text documents using k-means| scikit-learn
Gallery examples: Out-of-core classification of text documents Clustering text documents using k-means FeatureHasher and DictVectorizer Comparison| scikit-learn
Gallery examples: Hashing feature transformation using Totally Random Trees Manifold learning on handwritten digits: Locally Linear Embedding, Isomap… Clustering text documents using k-means| scikit-learn
Gallery examples: Biclustering documents with the Spectral Co-clustering algorithm Compare BIRCH and MiniBatchKMeans Comparing different clustering algorithms on toy datasets Online learning of a d...| scikit-learn
In this example we illustrate text vectorization, which is the process of representing non-numerical input data (such as dictionaries or text documents) as vectors of real numbers. We first compare...| scikit-learn
This is an example showing how scikit-learn can be used to classify documents by topics using a Bag of Words approach. This example uses a Tf-idf-weighted document-term sparse matrix to encode the ...| scikit-learn
This example compares decision boundaries of multinomial and one-vs-rest logistic regression on a 2D dataset with three classes. We make a comparison of the decision boundaries of both methods that...| scikit-learn
M{array-like, sparse matrix} of shape (n_samples, n_features)Matrix to decompose.| scikit-learn
Gallery examples: Classifier comparison Multi-class AdaBoosted Decision Trees Two-class AdaBoost Plot the decision surfaces of ensembles of trees on the iris dataset Demonstration of multi-metric e...| scikit-learn
Gallery examples: Faces recognition example using eigenfaces and SVMs Classifier comparison Recognizing hand-written digits Concatenating multiple feature extraction methods Scalable learning with ...| scikit-learn
Compare randomized search and grid search for optimizing hyperparameters of a linear SVM with SGD training. All parameters that influence the learning are searched simultaneously (except for the nu...| scikit-learn
Example of Precision-Recall metric to evaluate classifier output quality. Precision-Recall is a useful measure of success of prediction when the classes are very imbalanced. In information retrieva...| scikit-learn
This example demonstrates how to balance model complexity and cross-validated score by finding a decent accuracy within 1 standard deviation of the best accuracy score while minimising the number o...| scikit-learn
Gallery examples: Faces recognition example using eigenfaces and SVMs Classifier comparison Recognizing hand-written digits Concatenating multiple feature extraction methods Scalable learning with ...| scikit-learn
Gallery examples: Feature agglomeration vs. univariate selection Column Transformer with Mixed Types Selecting dimensionality reduction with Pipeline and GridSearchCV Pipelining: chaining a PCA and...| scikit-learn
This is the gallery of examples that showcase how scikit-learn can be used. Some examples demonstrate the use of the API in general and some demonstrate specific applications in tutorial form. Also...| scikit-learn
Gallery examples: Statistical comparison of models using grid search Post-hoc tuning the cut-off point of decision function Overview of multiclass training meta-estimators| scikit-learn
Gallery examples: Common pitfalls in the interpretation of coefficients of linear models| scikit-learn
Gallery examples: Column Transformer with Heterogeneous Data Sources FeatureHasher and DictVectorizer Comparison| scikit-learn
Gallery examples: Species distribution modeling Principal Component Analysis (PCA) on Iris Dataset| scikit-learn
Gallery examples: Metadata Routing Displaying Pipelines Introducing the set_output API Post-tuning the decision threshold for cost-sensitive learning Target Encoder’s Internal Cross fitting Release...| scikit-learn
Gallery examples: Time-related feature engineering Plot classification probability Classifier comparison A demo of K-Means clustering on the handwritten digits data Principal Component Regression v...| scikit-learn
The PCA does an unsupervised dimensionality reduction, while the logistic regression does the prediction. We use a GridSearchCV to set the dimensionality of the PCA, Total running time of the scrip...| scikit-learn
Gallery examples: Topic extraction with Non-negative Matrix Factorization and Latent Dirichlet Allocation Biclustering documents with the Spectral Co-clustering algorithm Column Transformer with He...| scikit-learn
Gallery examples: Bisecting K-Means and Regular K-Means Performance Comparison Demonstration of k-means assumptions A demo of K-Means clustering on the handwritten digits data Selecting the number ...| scikit-learn
This is an example showing how the scikit-learn API can be used to cluster documents by topics using a Bag of Words approach. Two algorithms are demonstrated, namely KMeans and its more scalable va...| scikit-learn
Evaluate the ability of k-means initializations strategies to make the algorithm convergence robust, as measured by the relative standard deviation of the inertia of the clustering (i.e. the sum of...| scikit-learn
Silhouette analysis can be used to study the separation distance between the resulting clusters. The silhouette plot displays a measure of how close each point in one cluster is to points in the ne...| scikit-learn
In this example we compare the various initialization strategies for K-means in terms of runtime and quality of the results. As the ground truth is known here, we also apply different cluster quali...| scikit-learn
Gallery examples: Image denoising using kernel PCA Faces recognition example using eigenfaces and SVMs A demo of K-Means clustering on the handwritten digits data Column Transformer with Heterogene...| scikit-learn
This example compares the parameter search performed by HalvingGridSearchCV and GridSearchCV. We first define the parameter space for an SVC estimator, and compute the time required to train a Halv...| scikit-learn
The dataset used in this example is The 20 newsgroups text dataset which will be automatically downloaded, cached and reused for the document classification example. In this example, we tune the hy...| scikit-learn
This example shows how to use cross_val_predict together with PredictionErrorDisplay to visualize prediction errors. We will load the diabetes dataset and create an instance of a linear regression ...| scikit-learn
A Recursive Feature Elimination (RFE) example with automatic tuning of the number of features selected with cross-validation. Data generation: We build a classification task using 3 informative fea...| scikit-learn
This examples shows how a classifier is optimized by cross-validation, which is done using the GridSearchCV object on a development set that comprises only half of the available labeled data. The p...| scikit-learn
Gallery examples: Recursive feature elimination with cross-validation GMM covariances Visualizing cross-validation behavior in scikit-learn Test with permutations the significance of a classificati...| scikit-learn
Gallery examples: Feature agglomeration vs. univariate selection Comparing Random Forests and Histogram Gradient Boosting models Gradient Boosting Out-of-Bag estimates Visualizing cross-validation ...| scikit-learn
Gallery examples: Release Highlights for scikit-learn 1.4 Visualizing cross-validation behavior in scikit-learn| scikit-learn
Gallery examples: Plot classification probability Decision Boundaries of Multinomial and One-vs-Rest Logistic Regression Multiclass sparse logistic regression on 20newgroups Multilabel classificati...| scikit-learn
Methods for scaling, centering, normalization, binarization, and more. User guide. See the Preprocessing data section for further details.| scikit-learn
| scikit-learn.org
Clustering of unlabeled data can be performed with the module sklearn.cluster. Each clustering algorithm comes in two variants: a class, that implements the fit method to learn the clusters on trai...| scikit-learn
Contains the metadata request info of a consumer.| scikit-learn
This guide demonstrates how metadata can be routed and passed between objects in scikit-learn. If you are developing a scikit-learn compatible estimator or meta-estimator, you can check our related...| scikit-learn
Gallery examples: Release Highlights for scikit-learn 1.5 Release Highlights for scikit-learn 1.3 Release Highlights for scikit-learn 1.1 Release Highlights for scikit-learn 1.0 Release Highlights ...| scikit-learn
This glossary hopes to definitively represent the tacit and explicit conventions applied in Scikit-learn and its API, while providing a reference for users and contributors. It aims to describe the...| scikit-learn
Gallery examples: Release Highlights for scikit-learn 1.5 Release Highlights for scikit-learn 1.4 Release Highlights for scikit-learn 1.2 Release Highlights for scikit-learn 1.1 Release Highlights ...| scikit-learn
Encode target labels with value between 0 and n_classes-1.| scikit-learn
Gallery examples: Combine predictors using stacking L1-based models for Sparse Signals Lasso model selection: AIC-BIC / cross-validation Common pitfalls in the interpretation of coefficients of lin...| scikit-learn
The following are a set of methods intended for regression in which the target value is expected to be a linear combination of the features. In mathematical notation, if\hat{y} is the predicted val...| scikit-learn
Classification| scikit-learn.org
The sklearn.preprocessing package provides several common utility functions and transformer classes to change raw feature vectors into a representation that is more suitable for the downstream esti...| scikit-learn
Feature 0 (median income in a block) and feature 5 (average house occupancy) of the California Housing dataset have very different scales and contain some very large outliers. These two characteris...| scikit-learn
Hyper-parameters are parameters that are not directly learnt within estimators. In scikit-learn they are passed as arguments to the constructor of the estimator classes. Typical examples include C,...| scikit-learn
Gallery examples: Release Highlights for scikit-learn 1.5 Release Highlights for scikit-learn 1.4 Release Highlights for scikit-learn 1.1 Release Highlights for scikit-learn 1.0 Release Highlights ...| scikit-learn
Ensemble methods combine the predictions of several base estimators built with a given learning algorithm in order to improve generalizability / robustness over a single estimator. Two very famous ...| scikit-learn
Gallery examples: Release Highlights for scikit-learn 1.4 Release Highlights for scikit-learn 0.24 Feature agglomeration vs. univariate selection Shrinkage covariance estimation: LedoitWolf vs OAS ...| scikit-learn
Gallery examples: Release Highlights for scikit-learn 1.3 Model selection with Probabilistic PCA and Factor Analysis (FA) Lagged features for time series forecasting Imputing missing values before ...| scikit-learn
Learning the parameters of a prediction function and testing it on the same data is a methodological mistake: a model that would just repeat the labels of the samples that it has just seen would ha...| scikit-learn
The sklearn.feature_extraction module can be used to extract features in a format supported by machine learning algorithms from datasets consisting of formats such as text and image. Loading featur...| scikit-learn
Gallery examples: L1-based models for Sparse Signals Linear Regression Example Non-negative least squares Failure of Machine Learning to infer causal effects Effect of transforming the targets in r...| scikit-learn
There are 3 different APIs for evaluating the quality of a model’s predictions: Estimator score method: Estimators have a score method providing a default evaluation criterion for the problem they ...| scikit-learn
Gallery examples: Feature agglomeration vs. univariate selection Pipeline ANOVA SVM Recursive feature elimination Poisson regression and non-normal loss Permutation Importance vs Random Forest Feat...| scikit-learn