sklearn.decomposition.non_negative_factorization?

sklearn.decomposition.non_negative_factorization(X, W=None, H=None, n_components=None, *, init=None, update_H=True, solver='cd', beta_loss='frobenius', tol=0.0001, max_iter=200, alpha=0.0, l1_ratio=0.0, regularization=None, random_state=None, verbose=0, shuffle=False)

[源碼]

計算非負矩陣分解(NMF)

找出兩個非負矩陣(W, H)，它們的乘積近似于非負矩陣x。這種分解可以用于降維、源分離或主題提取。

目標函數為:

0.5 * ||X - WH||_Fro^2
+ alpha * l1_ratio * ||vec(W)||_1
+ alpha * l1_ratio * ||vec(H)||_1
+ 0.5 * alpha * (1 - l1_ratio) * ||W||_Fro^2
+ 0.5 * alpha * (1 - l1_ratio) * ||H||_Fro^2

Where:

||A||_Fro^2 = \sum_{i,j} A_{ij}^2 (Frobenius norm)
||vec(A)||_1 = \sum_{i,j} abs(A_{ij}) (Elementwise L1 norm)

對于乘更新(' mu ')求解器，通過改變參數beta_loss，可以將Frobenius范數(0.5 * ||X - WH||_Fro^2)變為另一個散度損失。

目標函數通過W和H的交替最小化最小化，如果H給定，update_H=False，則只求解W。

參數	說明
X	array-like, shape (n_samples, n_features) 常數矩陣。
W	array-like, shape (n_samples, n_components) 如果init= ' custom '，則使用它作為解決方案的初始猜測。
H	array-like, shape (n_components, n_features) 如果init= ' custom '，則使用它作為解決方案的初始猜測。如果update_H=False，則將其作為常數，只求解W。
n_components	integer 組件的數量，如果沒有設置n_components，則保留所有特性。
init	None , ‘random’ ,‘nndsvd’ , ‘nndsvda’ , ‘nndsvdar’ , ‘custom’ 用于初始化過程的方法。默認值：None。選項： None：如果n_components <n_features個，則為'nndsvd'，否則為'random'。 'random'：非負隨機矩陣，縮放比例為：sqrt（X.mean（）/ n_components） 'nndsvd'：非負雙奇異值分解（NNDSVD）初始化（更好的稀疏性） 'nndsvda'：NNDSVD，其零被X的平均值填充（最好在不需要稀疏的情況下使用） 'nndsvdar'：NNDSVD，零填充小隨機值（當不需要稀疏性時，通常是NNDSVDa的更快，更不準確的替代品） 'custom'：使用自定義矩陣W和H 在版本0.23中更改：在0.23中，默認值init從'random'更改為None。
update_H	boolean, default: True 設為True, W和H都將根據最初的猜測進行估計。設為False，只有W會被估計。
solver	‘cd’/'mu' 數值求解器使用: ' cd '是一個使用快速分層的坐標下降求解器交替最小二乘(快速HALS)。 ' mu '是一個乘法更新求解器。新版本0.17:坐標下降求解器。版本0.19中的新版本:乘法更新求解器。
beta_loss	float or string, default ‘frobenius’ 字符串必須是{' frobenius '， ' kullback-leibler '， ' itakura-saito '}。為了使散度最小，測量X和點積WH之間的距離。注意，與“frobenius”(或2)和“kullback-leibler”(或1)不同的值會導致匹配速度明顯較慢。注意，對于beta_loss <= 0(或' itakura-saito ')，輸入矩陣X不能包含0。只在求解器中使用。新版本為0.19。
tol	float, default: 1e-4 停止條件的容忍度。
max_iter	integer, default: 200 超時前的最大迭代次數。
alpha	double, default: 0. 乘正則化項的常數。
l1_ratio	double, default: 0. 正則化混合參數，0 <= l1_ratio <= 1。對于l1_ratio = 0，罰分為元素L2罰分(又名Frobenius Norm)。對于l1_ratio = 1，它是元素上的L1懲罰。對于0 < l1_ratio < 1，懲罰為L1和L2的組合。
regularization	‘both’/‘components’/‘transformation’/None 選擇正則化是否影響組件(H)、變換(W)、兩個或不影響它們。
random_state	int, RandomState instance, default=None 用于NMF初始化(當init == ' nndsvdar '或' random ')，并在坐標下降。在多個函數調用中傳遞可重復的結果。看到術語表。
verbose	integer, default: 0 冗長的水平。
shuffle	boolean, default: False 如果為真，在CD求解器中隨機化坐標的順序。

返回值	參數
W	array-like, shape (n_samples, n_components) 非負最小二乘問題的解。
H	array-like, shape (n_components, n_features) 非負最小二乘問題的解。
n_iter	int 實際迭代次數。

參考文獻

Cichocki, Andrzej, and P. H. A. N. Anh-Huy. “Fast local algorithms for large scale nonnegative matrix and tensor factorizations.” IEICE transactions on fundamentals of electronics, communications and computer sciences 92.3: 708-721, 2009.

Fevotte, C., & Idier, J. (2011). Algorithms for nonnegative matrix factorization with the beta-divergence. Neural Computation, 23(9).

示例

>>> import numpy as np
>>> X = np.array([[1,1], [2, 1], [3, 1.2], [4, 1], [5, 0.8], [6, 1]])
>>> from sklearn.decomposition import non_negative_factorization
>>> W, H, n_iter = non_negative_factorization(X, n_components=2,
... init='random', random_state=0)