sklearn.naive_bayes.BernoulliNB?

class sklearn.naive_bayes.BernoulliNB(*, alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None)

[源碼]

用于多元伯努利模型的樸素貝葉斯分類器。

像MultinomialNB一樣，這個分類器也適用于離散數據。區別在于，MultinomialNB可處理多分類，但BernoulliNB是為二分類或布爾型函數而設計的。

在用戶指南中閱讀更多內容。

參數	說明
alpha	float, default=1.0 附加的平滑參數(Laplace/Lidstone)，0是不平滑
binarize	float or None, default=0.0 用于將樣本特征二值化（映射為布爾值）的閾值。如果為None，則假定輸入已經由二分類向量組成。
fit_prior	bool, default=True 是否學習類別先驗概率。如果為False，將使用統一的先驗。
class_prior	array-like of shape (n_classes,), default=None 類別的先驗概率。一經指定先驗概率不能隨著數據而調整。

屬性	說明
class_count_	ndarray of shape (n_classes) 擬合期間每個類別遇到的樣本數。此值由提供的樣本權重加權。
class_log_prior_	ndarray of shape (n_classes) 每個類別的對數概率（平滑）。
classes_	ndarray of shape (n_classes,) 分類器已知的類別標簽
feature_count_	ndarray of shape (n_classes, n_features) 擬合期間每個（類別，特征）遇到的樣本數。此值由提供的樣品權重加權。
feature_log_prob_	ndarray of shape (n_classes, n_features) 給定一類P(x_i / y)的特征的經驗對數概率。
n_features_	int 每個樣本的特征數量。

參考文獻

C.D. Manning, P. Raghavan and H. Schuetze (2008). Introduction to Information Retrieval. Cambridge University Press, pp. 234-265. https://nlp.stanford.edu/IR-book/html/htmledition/the-bernoulli-model-1.html

A. McCallum and K. Nigam (1998). A comparison of event models for naive Bayes text classification. Proc. AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48.

V. Metsis, I. Androutsopoulos and G. Paliouras (2006). Spam filtering with naive Bayes – Which naive Bayes? 3rd Conf. on Email and Anti-Spam (CEAS).

示例

>>> import numpy as np
>>> rng = np.random.RandomState(1)
>>> X = rng.randint(5, size=(6, 100))
>>> Y = np.array([1, 2, 3, 4, 4, 5])
>>> from sklearn.naive_bayes import BernoulliNB
>>> clf = BernoulliNB()
>>> clf.fit(X, Y)
BernoulliNB()
>>> print(clf.predict(X[2:3]))
[3]

方法

方法	說明
`fit`(X, y[, sample_weight])	根據X，y擬合樸素貝葉斯分類器
`get_params`([deep])	獲取這個估計器的參數
`partial_fit`(X, y[, classes, sample_weight])	對一批樣本進行增量擬合
`predict`(X)	對測試向量X進行分類。
`predict_log_proba`(X)	返回針對測試向量X的對數概率估計
`predict_proba`(X)	返回針對測試向量X的概率估計
`score`(X, y[, sample_weight])	返回給定測試數據和標簽上的平均準確率。
`set_params`(**params)	為這個估計器設置參數

__init__(*, alpha=1.0, binarize=0.0, fit_prior=True, class_prior=None)

[源碼]

初始化self。詳情可參閱 type（self）的幫助。

fit(X, y, sample_weight=None)

[源碼]

根據X，y擬合樸素貝葉斯分類器

參數	說明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 用于訓練的向量，其中n_samples是樣本數量，n_features是特征數量。
y	array-like of shape (n_samples,) 目標值。
sample_weight	array-like of shape (n_samples,), default=None 應用于單個樣本的權重（1.未加權）。

返回值	說明
self	object

get_params(deep=True)

[源碼]

獲取這個估計器的參數

參數	說明
deep	bool, default=True 如果為True，則將返回這個估計器的參數和所包含的估算器子對象。

返回值	說明
params	mapping of string to any 參數名稱映射到其值。

partial_fit(X, y, classes=None, sample_weight=None)

[源碼]

對一批樣本進行增量擬合.

參數	說明
X	{array-like, sparse matrix} of shape (n_samples, n_features) 用于訓練的向量，其中n_samples是樣本數量，n_features是特征數量。
y	array-like of shape (n_samples,) 目標值。
classes	array-like of shape (n_classes), default=None y向量中可能出現的所有類別的列表。必須在第一次調用partial_fit時提供，在隨后的調用中可以省略。
sample_weight	array-like of shape (n_samples,), default=None 應用于單個樣本的權重（1.未加權）。

返回值	說明
self	object

predict(X)

[源碼]

對測試向量X進行分類。

參數	說明
X	array-like of shape (n_samples, n_features)

返回值	說明
C	ndarray of shape (n_samples,) X的預測目標值

predict_log_proba(X)

[源碼]

返回針對測試向量X的對數概率估計

參數	說明
X	array-like of shape (n_samples, n_features)

返回值	說明
C	array-like of shape (n_samples, n_classes) 返回模型中每個類別的樣本的對數概率。這些列按照排序順序對應于類，就像它們出現在屬性classes_中一樣。

predict_proba(X)

[源碼]

返回針對測試向量X的概率估計

參數	說明
X	array-like of shape (n_samples, n_features)

返回值	說明
C	array-like of shape (n_samples, n_classes) 返回模型中每個類別的樣本概率。這些列按照排序順序對應于類，就像它們出現在屬性classes_中一樣。

score(X, y, sample_weight=None)

[源碼]

返回給定測試數據和標簽上的平均準確率。

在多標簽分類中，這是子集準確性，這是一個嚴格的指標，因為您需要為每個樣本正確預測每個標簽集。

參數	說明
X	array-like of shape (n_samples, n_features) 測試樣本
y	array-like of shape (n_samples,) or (n_samples, n_outputs) X的真實標簽
sample_weight	array-like of shape (n_samples,), default=None 樣本權重

返回值	說明
score	float self.predict(X) 關于y的平均準確率。

set_params(**params)

[源碼]

為這個估計器設置參數。

參數	說明
**params	dict 估計器參數。

返回值	說明
self	object 估計器實例。

sklearn.naive_bayes.BernoulliNB使用示例?

基于完全隨機樹的哈希特征變換 ?

使用稀疏特征對文本文檔進行分類 ?