logitboost.LogitBoost

class logitboost.LogitBoost(base_estimator=None, n_estimators=50, weight_trim_quantile=0.05, max_response=4.0, learning_rate=1.0, bootstrap=False, random_state=None)[source]

Bases: sklearn.base.ClassifierMixin, sklearn.ensemble._base.BaseEnsemble

A LogitBoost classifier.

A LogitBoost [1] classifier is a meta-estimator that fits an additive model minimizing a logistic loss function.

Parameters:
  • base_estimator (object, optional (default=None)) – The base estimator from which the LogitBoost classifier is built. This should be a regressor. If no base_estimator is specified, a decision stump is used.
  • n_estimators (int, optional (default=50)) – The number of estimators per class in the ensemble.
  • weight_trim_quantile (float, optional (default=0.05)) – Threshold for weight trimming (see Section 9 in [1]). The distribution of the weights tends to become very skewed in later boosting iterations, and the observations with low weights contribute little to the base estimator being fitted at that iteration. At each boosting iteration, observations with weight smaller than this quantile of the sample weight distribution are removed from the data for fitting the base estimator (for that iteration only) to speed up computation.
  • max_response (float, optional (default=4.0)) – Maximum response value to allow when fitting the base estimators (for numerical stability). Values will be clipped to the interval [-max_response, max_response]. See the bottom of p. 352 in [1].
  • learning_rate (float, optional (default=1.0)) – The learning rate shrinks the contribution of each classifier by learning_rate during fitting.
  • bootstrap (bool, optional (default=False)) – If True, each boosting iteration trains the base estimator using a weighted bootstrap sample of the training data. If False, each boosting iteration trains the base estimator using the full (weighted) training sample. In this case, the base estimator must support sample weighting by means of a sample_weight parameter in its fit() method.
  • random_state (int, RandomState instance or None, optional (default=None)) – If int, random_state is the seed used by the random number generator. If RandomState instance, random_state is the random number generator. If None, the random number generator is the RandomState instance used by numpy.random.
classes_[source]

One-dimensional array of unique class labels extracted from the training data target vector during fitting.

Type:numpy.ndarray
estimators_[source]

All the estimators in the ensemble after fitting. If the task is binary classification, this is a list of n_estimators fitted base estimators. If the task is multiclass classification, this is a list of n_estimators lists, each containing one base estimator for each class label.

Type:list
n_classes_[source]

Number of classes (length of the classes_ array). If n_classes is 2, then the task is binary classification. Otherwise, the task is multiclass classification.

Type:int
n_features_[source]

Number of features, inferred during fitting.

Type:int

See also

sklearn.tree.DecisionTreeRegressor
The default base estimator (with max_depth = 1).

References

[1](1, 2, 3) Jerome Friedman, Trevor Hastie, and Robert Tibshirani. “Additive Logistic Regression: A Statistical View of Boosting”. The Annals of Statistics. Volume 28, Number 2 (2000), pp. 337–374. JSTOR. Project Euclid.
contributions(X, sample_weight=None)[source]

Average absolute contribution of each estimator in the ensemble.

This can be used to compare how much influence different estimators in the ensemble have on the final predictions made by the model.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The input samples to average over.
  • sample_weight (array-like of shape (n_samples,) (default=None)) – Weights for the samples, for averaging.
Returns:

contrib – Average absolute contribution of each estimator in the ensemble.

Return type:

numpy.ndarray of shape (n_estimators,)

decision_function(X)[source]

Compute the decision function of X.

Parameters:X (array-like of shape (n_samples, n_features)) – The input data.
Returns:scores – The decision function of the input samples. The order of outputs is the same of that of the classes_ attribute. Binary classification is a special cases with k = 1, otherwise k = n_classes. For binary classification, positive values indicate class 1 and negative values indicate class 0.
Return type:numpy.ndarray of shape (n_samples, k)
feature_importances_[source]

Return the feature importances (the higher, the more important the feature).

Returns:

feature_importances_ – The feature importances. Each feature’s importance is computed as the average feature importance taken over each estimator in the trained ensemble. This requires the base estimator to support a feature_importances_ attribute.

Return type:

numpy.ndarray of shape (n_features,)

Raises:
  • AttributeError – Raised if the base estimator doesn’t support a feature_importances_ attribute.
  • NotImplementedError – Raised if the task is multiclass classification: feature importance is currently only supported for binary classification.
fit(X, y, **fit_params)[source]

Build a LogitBoost classifier from the training data (X, y).

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The training feature data.
  • y (array-like of shape (n_samples,)) – The target values (class labels).
  • **fit_params (keyword arguments) – Additional keyword arguments to pass to the base estimator’s fit() method.
Returns:

self – Returns this LogitBoost estimator.

Return type:

LogitBoost

predict(X)[source]

Predict class labels for X.

Parameters:X (array-like of shape (n_samples, n_features)) – The input data.
Returns:labels – Array of predicted class labels, one for each input.
Return type:numpy.ndarray of shape (n_samples,)
predict_log_proba(X)[source]

Predict class log-probabilities for X.

Parameters:X (array-like of shape (n_samples, n_features)) – The input data.
Returns:log_prob – Array of class log-probabilities of shape (n_samples, n_classes), one log-probability for each (input, class) pair.
Return type:numpy.ndarray of shape (n_samples, n_classes)
predict_proba(X)[source]

Predict class probabilities for X.

Parameters:X (array-like of shape (n_samples, n_features)) – The input data.
Returns:prob – Array of class probabilities of shape (n_samples, n_classes), one probability for each (input, class) pair.
Return type:numpy.ndarray of shape (n_samples, n_classes)
staged_decision_function(X)[source]

Compute decision function of X for each boosting iteration.

This method allows monitoring (i.e. determine error on testing set) after each boosting iteration.

Parameters:X (array-like of shape (n_samples, n_features)) – The input data.
Yields:scores (numpy.ndarray of shape (n_samples, k)) – The decision function of the input samples. The order of outputs is the same of that of the classes_ attribute. Binary classification is a special cases with k = 1, otherwise k = n_classes. For binary classification, positive values indicate class 1 and negative values indicate class 0.
staged_predict(X)[source]

Return predictions for X at each boosting iteration.

This generator method yields the ensemble prediction after each iteration of boosting and therefore allows monitoring, such as to determine the prediction on a test set after each boost.

Parameters:X (array-like of shape (n_samples, n_features)) – The input data.
Yields:labels (numpy.ndarray of shape (n_samples,)) – Array of predicted class labels, one for each input, at each boosting iteration.
staged_predict_proba(X)[source]

Predict class probabilities for X at each boosting iteration.

This generator method yields the ensemble predicted class probabilities after each iteration of boosting and therefore allows monitoring, such as to determine the predicted class probabilities on a test set after each boost.

Parameters:X (array-like of shape (n_samples, n_features)) – The input data.
Yields:prob (numpy.ndarray of shape (n_samples, n_classes)) – Array of class probabilities of shape (n_samples, n_classes), one probability for each (input, class) pair, at each boosting iteration.
staged_score(X, y, sample_weight=None)[source]

Return staged accuracy scores on the given test data and labels.

This generator method yields the ensemble accuracy score after each iteration of boosting and therefore allows monitoring, such as determine the score on a test set after each boost.

Parameters:
  • X (array-like of shape (n_samples, n_features)) – The input samples.
  • y (array-like of shape (n_samples,)) – The target values (class labels).
  • sample_weight (array-like of shape (n_samples,) (default=None)) – Weights for the samples.
Yields:

accuracy (float) – Accuracy at each stage of boosting.