sklearn.neighbors.BallTree?

class sklearn.neighbors.BallTree(X, leaf_size=40, metric='minkowski', **kwargs)

BallTree用于快速廣義N-point問題

參數	說明
X	array-like of shape (n_samples, n_features) n_samples是數據集中的點數，n_features是參數空間的維數。注意：如果X是C連續的雙精度數組，則不會復制數據。否則，將進行內部復制。
leaf_size	positive int, default=40 轉換為蠻力點的點數。更改leaf_size不會影響查詢的結果，但是會顯著影響查詢的速度以及存儲構造的樹所需的內存。存儲樹比例尺所需的內存量約為n_samples / leaf_size。對于指定的leaf_size，請確保葉子節點滿足leaf_size <= n_points <= 2 * leaf_size，除非n_samples <leaf_size。
metric	str or DistanceMetric object 樹使用的距離度量。默認值=“ minkowski”，其中p = 2（即歐氏度量）。有關可用度量的列表，請參見DistanceMetric類的文檔。 ball_tree.valid_metrics給出了對BallTree有效的度量的列表。

其他關鍵字將傳遞到距離度量標準類。
注意：KDTree和Ball Tree 中不支持metric參數中的可調用函數。函數調用開銷將導致非常差的性能。

屬性	說明
data	memory view 訓練數據

示例查詢k-最近鄰

>>> import numpy as np
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
>>> tree = BallTree(X, leaf_size=2)              # doctest: +SKIP
>>> dist, ind = tree.query(X[:1], k=3)                # doctest: +SKIP
>>> print(ind)  # indices of 3 closest neighbors
[0 3 1]
>>> print(dist)  # distances to 3 closest neighbors
[ 0.          0.19662693  0.29473397]

請注意，樹的狀態是在pickle操作中保存的：解開時不需要重建樹。

>>> import numpy as np
>>> import pickle
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
>>> tree = BallTree(X, leaf_size=2)        # doctest: +SKIP
>>> s = pickle.dumps(tree)                     # doctest: +SKIP
>>> tree_copy = pickle.loads(s)                # doctest: +SKIP
>>> dist, ind = tree_copy.query(X[:1], k=3)     # doctest: +SKIP
>>> print(ind)  # indices of 3 closest neighbors
[0 3 1]
>>> print(dist)  # distances to 3 closest neighbors
[ 0.          0.19662693  0.29473397]

查詢給定半徑內的臨近點

>>> import numpy as np
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((10, 3))  # 10 points in 3 dimensions
>>> tree = BallTree(X, leaf_size=2)     # doctest: +SKIP
>>> print(tree.query_radius(X[:1], r=0.3, count_only=True))
3
>>> ind = tree.query_radius(X[:1], r=0.3)  # doctest: +SKIP
>>> print(ind)  # indices of neighbors within distance 0.3
[3 0 1]

計算高斯核密度估計：

>>> import numpy as np
>>> rng = np.random.RandomState(42)
>>> X = rng.random_sample((100, 3))
>>> tree = BallTree(X)                # doctest: +SKIP
>>> tree.kernel_density(X[:3], h=0.1, kernel='gaussian')
array([ 6.94114649,  7.83281226,  7.2071716 ])

計算兩點自相關函數

>>> import numpy as np
>>> rng = np.random.RandomState(0)
>>> X = rng.random_sample((30, 3))
>>> r = np.linspace(0, 1, 5)
>>> tree = BallTree(X)                # doctest: +SKIP
>>> tree.two_point_correlation(X, r)
array([ 30,  62, 278, 580, 820])

方法

方法	說明
`get_arrays`()	獲取數據和節點數組。
`get_n_calls`()	獲取通話數量。
`get_tree_stats`()	獲取樹狀態。
`kernel_density`(, X, h[, kernel, atol, …])	使用在樹創建時指定的距離度量，使用給定的內核計算點X處的內核密度估計。
`query`(X[, k, return_distance, dualtree, …])	查詢樹中最近的k個臨近點
`query_radius`(X, r[, return_distance, …])	查詢樹中半徑為r的臨近點
`reset_n_calls`()	將會話次數重置為0。
`two_point_correlation`(X, r[, dualtree])	計算兩點相關函數

__init__(, /, *args, **kwargs)

初始化self，請參閱help(type())以獲得準確的說明.

get_arrays()

獲取數據和節點數組。

返回值	說明
arrays:	tuple of array 用于存儲樹數據，索引，節點數據和節點邊界的數組。

get_tree_stats()

獲取樹狀態

返回值	說明
tree_stats	tuple of int （修剪數量，葉子數量，分割數量）

kernel_density(, X, h, kernel='gaussian', atol=0, rtol=1E-8, breadth_first=True, return_log=False)

使用在樹創建時指定的距離度量，使用給定的內核計算點X處的內核密度估計。

參數	說明
X	array-like of shape (n_samples, n_features) 要查詢的點數組。最后維度應與訓練數據的維度匹配。
h	float 內核的寬度
kernel	str, default=”gaussian” 指定要使用的內核。選項為-'gaussian'-'tophat'-'epanechnikov'-'exponential'-'linear'-'cosine'默認為kernel ='gaussian'
atol, rtol	float, default=0, 1e-8 指定結果的所需相對和絕對公差。如果真實結果為K_true，則返回的結果K_ret滿足abs（K_true-K_ret）<atol + rtol * K_ret兩者的默認值為零（即機器精度）。
breadth_first	bool, default=False 如果為True，則使用廣度優先搜索。如果為False（默認），則使用深度優先搜索。對于緊湊的內核和/或高公差，廣度優先通常更快。
return_log	bool, default=False 回結果的對數。這比返回狹窄內核的結果本身更準確。

返回值	說明
density	ndarray of shape X.shape[:-1] （對數）密度評估的數組

query(X, k=1, return_distance=True, dualtree=False, breadth_first=False)

查詢樹中最近的k個臨近點

參數	說明
X	array-like of shape (n_samples, n_features) 要查詢的點數組
k	int, default=1 要返回的最近臨近點的數量
return_distance	bool, default=True 如果為True，則返回距離和索引的元組（d，i）；如果為False，則返回數組i
dualtree	bool, default=False 如果為True，則對查詢使用雙樹形式：對查詢點構建一棵樹，并使用這對樹來有效地搜索該空間。隨著點數的增加，這可以導致更好的性能。
breadth_first	bool, default=False 如果為True，則以廣度優先的方式查詢節點。否則，以深度優先的方式查詢節點。
sort_results	bool, default=True 如果為True，則在返回時對每個點的距離和索引進行排序，以便第一列包含最近的點。否則，將以任意順序返回臨近點。

返回值	說明
i：	if return_distance == False
(d,i)：	if return_distance == True
d：	ndarray of shape X.shape[:-1] + k, dtype=double<br> 每個條目都列出了到相應點的臨近點的距離列表。
i	indarray of shape X.shape[:-1] + k, dtype=int<br> 每個條目給出相應點的鄰近點的索引列表。

query_radius(X, r, return_distance=False, count_only=False, sort_results=False)

查詢樹中半徑為r的臨近點

參數	說明
X	array-like of shape (n_samples, n_features) 要查詢的點數組
r	distance within which neighbors are returned r可以是單個值，也可以是形狀為x.shape [：-1]的值的數組，如果每個點都需要不同的半徑。
return_distance	bool, default=False 如果為True，則返回到每個點的臨近點的距離；如果為False，則僅返回臨近點的注意。請注意，與query（）方法不同，此處設置return_distance = True會增加計算時間。對于return_distance = False，并非所有距離都需要顯式計算。默認情況下，結果未排序：請參見sort_results關鍵字。
count_only	bool, default=False 如果為True，則僅返回距離r內的點的計數；如果為False，則返回距離r內所有點的索引。如果return_distance == True，則設置count_only = True將導致錯誤。
sort_results	bool, default=False 如果為True，則距離和索引將在返回之前進行排序。如果為False，則不會對結果進行排序。如果return_distance == False，則將sort_results = True設置將導致錯誤。

返回值	說明
-	count:if count_only == True ind:if count_only == False and return_distance == False (ind, dist):if count_only == False and return_distance == True
count:	ndarray of shape X.shape[:-1], dtype=int 每個條目給出在對應點的距離r內的臨近點數。
ind：	ndarray of shape X.shape[:-1], dtype=object 每個元素都是一個numpy整數數組，列出相應點的臨近點的索引。請注意，與k臨近點查詢的結果不同，默認情況下，返回的鄰近點不按距離排序。
dist：	ndarray of shape X.shape[:-1], dtype=object 每個元素都是一個numpy雙數組，列出與i中的索引相對應的距離。

reset_n_calls()

將會話次數重置為0。

two_point_correlation(X, r, dualtree=False)

計算兩點相關函數

參數	說明
X	array-like of shape (n_samples, n_features) 要查詢的點數組。最后維度應與訓練數據的維度匹配。
r	array-like 一維距離數組
dualtree	bool, default=False 如果為True，則使用雙樹算法。否則，請使用單樹算法。雙樹算法可以針對較大的N具有更好的縮放比例。

返回值	說明
counts	ndarray counts [i]包含距離小于或等于r [i]的點對的數量