18501
12269
sklearn.pipeline.Pipeline(step_0=sklearn.impute._base.MissingIndicator,step_1=sklearn.decomposition._kernel_pca.KernelPCA,step_2=sklearn.ensemble._hist_gradient_boosting.gradient_boosting.HistGradientBoostingClassifier)
sklearn.Pipeline(MissingIndicator,KernelPCA,HistGradientBoostingClassifier)
sklearn.pipeline.Pipeline
1
openml==0.10.2,sklearn==0.22.1
Pipeline of transforms with a final estimator.
Sequentially apply a list of transforms and a final estimator.
Intermediate steps of the pipeline must be 'transforms', that is, they
must implement fit and transform methods.
The final estimator only needs to implement fit.
The transformers in the pipeline can be cached using ``memory`` argument.
The purpose of the pipeline is to assemble several steps that can be
cross-validated together while setting different parameters.
For this, it enables setting parameters of the various steps using their
names and the parameter name separated by a '__', as in the example below.
A step's estimator may be replaced entirely by setting the parameter
with its name to another estimator, or a transformer removed by setting
it to 'passthrough' or ``None``.
2020-05-21T08:46:35
English
sklearn==0.22.1
numpy>=1.6.1
scipy>=0.9
memory
None
null
Used to cache the fitted transformers of the pipeline. By default,
no caching is performed. If a string is given, it is the path to
the caching directory. Enabling caching triggers a clone of
the transformers before fitting. Therefore, the transformer
instance given to the pipeline cannot be inspected
directly. Use the attribute ``named_steps`` or ``steps`` to
inspect estimators within the pipeline. Caching the
transformers is advantageous when fitting is time consuming
steps
list
[{"oml-python:serialized_object": "component_reference", "value": {"key": "step_0", "step_name": "step_0"}}, {"oml-python:serialized_object": "component_reference", "value": {"key": "step_1", "step_name": "step_1"}}, {"oml-python:serialized_object": "component_reference", "value": {"key": "step_2", "step_name": "step_2"}}]
List of (name, transform) tuples (implementing fit/transform) that are
chained, in the order in which they are chained, with the last object
an estimator
verbose
bool
false
If True, the time elapsed while fitting each step will be printed as it
is completed.
step_2
17709
12269
sklearn.ensemble._hist_gradient_boosting.gradient_boosting.HistGradientBoostingClassifier
sklearn.HistGradientBoostingClassifier
sklearn.ensemble._hist_gradient_boosting.gradient_boosting.HistGradientBoostingClassifier
7
openml==0.10.2,sklearn==0.22.1
Histogram-based Gradient Boosting Classification Tree.
This estimator is much faster than
:class:`GradientBoostingClassifier<sklearn.ensemble.GradientBoostingClassifier>`
for big datasets (n_samples >= 10 000).
This estimator has native support for missing values (NaNs). During
training, the tree grower learns at each split point whether samples
with missing values should go to the left or right child, based on the
potential gain. When predicting, samples with missing values are
assigned to the left or right child consequently. If no missing values
were encountered for a given feature during training, then samples with
missing values are mapped to whichever child has the most samples.
This implementation is inspired by
`LightGBM <https://github.com/Microsoft/LightGBM>`_.
.. note::
This estimator is still **experimental** for now: the predictions
and the API might change without any deprecation cycle. To use it,
you need to explicitly import ``enable_hist_gradient_boosting``::
>>> # explicit...
2020-05-18T19:47:40
English
sklearn==0.22.1
numpy>=1.6.1
scipy>=0.9
l2_regularization
float
0.029157851614848844
The L2 regularization parameter. Use 0 for no regularization
learning_rate
float
0.0002615635618827854
The learning rate, also known as *shrinkage*. This is used as a
multiplicative factor for the leaves values. Use ``1`` for no
shrinkage
loss
"auto"
max_bins
int
219
The maximum number of bins to use for non-missing values. Before
training, each feature of the input array `X` is binned into
integer-valued bins, which allows for a much faster training stage
Features with a small number of unique values may use less than
``max_bins`` bins. In addition to the ``max_bins`` bins, one more bin
is always reserved for missing values. Must be no larger than 255
max_depth
int or None
9
The maximum depth of each tree. The depth of a tree is the number of
nodes to go from the root to the deepest leaf. Must be strictly greater
than 1. Depth isn't constrained by default
max_iter
int
938
The maximum number of iterations of the boosting process, i.e. the
maximum number of trees for binary classification. For multiclass
classification, `n_classes` trees per iteration are built
max_leaf_nodes
int or None
107
The maximum number of leaves for each tree. Must be strictly greater
than 1. If None, there is no maximum limit
min_samples_leaf
int
266
The minimum number of samples per leaf. For small datasets with less
than a few hundred samples, it is recommended to lower this value
since only very shallow trees would be built
n_iter_no_change
int or None
65
Used to determine when to "early stop". The fitting process is
stopped when none of the last ``n_iter_no_change`` scores are better
than the ``n_iter_no_change - 1`` -th-to-last one, up to some
tolerance. If None or 0, no early-stopping is done
random_state
int
42
Pseudo-random number generator to control the subsampling in the
binning process, and the train/validation data split if early stopping
is enabled. See :term:`random_state`.
scoring
str or callable or None
"neg_log_loss"
Scoring parameter to use for early stopping. It can be a single
string (see :ref:`scoring_parameter`) or a callable (see
:ref:`scoring`). If None, the estimator's default scorer
is used. If ``scoring='loss'``, early stopping is checked
w.r.t the loss value. Only used if ``n_iter_no_change`` is not None
tol
float or None
0.09169788469283188
The absolute tolerance to use when comparing scores. The higher the
tolerance, the more likely we are to early stop: higher tolerance
means that it will be harder for subsequent iterations to be
considered an improvement upon the reference score
verbose: int, optional (default=0)
The verbosity level. If not zero, print some information about the
fitting process
validation_fraction
int or float or None
0.19574305926541347
Proportion (or absolute size) of training data to set aside as
validation data for early stopping. If None, early stopping is done on
the training data
verbose
0
warm_start
bool
false
When set to ``True``, reuse the solution of the previous call to fit
and add more estimators to the ensemble. For results to be valid, the
estimator should be re-trained on the same data only
See :term:`the Glossary <warm_start>`
openml-python
python
scikit-learn
sklearn
sklearn_0.22.1
step_1
17740
12269
sklearn.decomposition._kernel_pca.KernelPCA
sklearn.KernelPCA
sklearn.decomposition._kernel_pca.KernelPCA
1
openml==0.10.2,sklearn==0.22.1
Kernel Principal component analysis (KPCA)
Non-linear dimensionality reduction through the use of kernels (see
:ref:`metrics`).
2020-05-18T23:48:33
English
sklearn==0.22.1
numpy>=1.6.1
scipy>=0.9
alpha
int
4
Hyperparameter of the ridge regression that learns the
inverse transform (when fit_inverse_transform=True)
coef0
float
1
Independent term in poly and sigmoid kernels
Ignored by other kernels
copy_X
boolean
false
If True, input X is copied and stored by the model in the `X_fit_`
attribute. If no further changes will be done to X, setting
`copy_X=False` saves memory by storing a reference
.. versionadded:: 0.18
degree
int
3
Degree for poly kernels. Ignored by other kernels
eigen_solver
string
"arpack"
Select eigensolver to use. If n_components is much less than
the number of training samples, arpack may be more efficient
than the dense eigensolver
fit_inverse_transform
bool
true
Learn the inverse transform for non-precomputed kernels
(i.e. learn to find the pre-image of a point)
gamma
float
null
Kernel coefficient for rbf, poly and sigmoid kernels. Ignored by other
kernels
kernel
"cosine"
kernel_params
mapping of string to any
null
Parameters (keyword arguments) and values for kernel passed as
callable object. Ignored by other kernels
max_iter
int
524
Maximum number of iterations for arpack
If None, optimal value will be chosen by arpack
n_components
int
4
Number of components. If None, all non-zero components are kept
kernel : "linear" | "poly" | "rbf" | "sigmoid" | "cosine" | "precomputed"
Kernel. Default="linear"
n_jobs
int or None
1
The number of parallel jobs to run
``None`` means 1 unless in a :obj:`joblib.parallel_backend` context
``-1`` means using all processors. See :term:`Glossary <n_jobs>`
for more details
.. versionadded:: 0.18
random_state
int
42
If int, random_state is the seed used by the random number generator;
If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`. Used when ``eigen_solver`` == 'arpack'
.. versionadded:: 0.18
remove_zero_eig
boolean
false
If True, then all components with zero eigenvalues are removed, so
that the number of components in the output may be < n_components
(and sometimes even zero due to numerical instability)
When n_components is None, this parameter is ignored and components
with zero eigenvalues are removed regardless
tol
float
0.6476998368285369
Convergence tolerance for arpack
If 0, optimal value will be chosen by arpack
openml-python
python
scikit-learn
sklearn
sklearn_0.22.1
step_0
18449
12269
sklearn.impute._base.MissingIndicator
sklearn.MissingIndicator
sklearn.impute._base.MissingIndicator
1
openml==0.10.2,sklearn==0.22.1
Binary indicators for missing values.
Note that this component typically should not be used in a vanilla
:class:`Pipeline` consisting of transformers and a classifier, but rather
could be added using a :class:`FeatureUnion` or :class:`ColumnTransformer`.
2020-05-21T07:48:00
English
sklearn==0.22.1
numpy>=1.6.1
scipy>=0.9
error_on_new
boolean
true
If True (default), transform will raise an error when there are
features with missing values in transform that have no missing values
in fit. This is applicable only when ``features="missing-only"``.
features
str
"all"
Whether the imputer mask should represent all or a subset of
features
- If "missing-only" (default), the imputer mask will only represent
features containing missing values during fit time
- If "all", the imputer mask will represent all features
missing_values
number
NaN
The placeholder for the missing values. All occurrences of
`missing_values` will be indicated (True in the output array), the
other values will be marked as False
sparse
boolean or
"auto"
Whether the imputer mask format should be sparse or dense
- If "auto" (default), the imputer mask will be of same type as
input
- If True, the imputer mask will be a sparse matrix
- If False, the imputer mask will be a numpy array
openml-python
python
scikit-learn
sklearn
sklearn_0.22.1
openml-python
python
scikit-learn
sklearn
sklearn_0.22.1