dl.helpers.cluster module¶
Data Lab helpers for clustering.
- dl.helpers.cluster.constructOutlines(x, y, clusterlabels)[source]¶
Construct the convex hull (outline) of points in (x,y) feature space,
- Parameters:
x (seq (e.g. tuple,list,1-d array)) – Location of points in (x,y) feature space (e,g, RA & Dec).
y (seq (e.g. tuple,list,1-d array)) – Location of points in (x,y) feature space (e,g, RA & Dec).
- Returns:
hull – The convex hull of points (x,y), an instance of
scipy.spatial.qhull.ConvexHull
.- Return type:
instance
Example
Given x & y coordinates as 1d sequences:
points = np.vstack((x,y)).T # make 2-d array of correct shape hull = constructOutlines(x,y) plt.plot(points[hull.vertices,0], points[hull.vertices,1], 'r-', lw=2) # plot the hull plt.plot(points[hull.vertices[0],0], points[hull.vertices[0],1], 'r-') # closing last point of the hull
- dl.helpers.cluster.findClusters(x, y, method='MiniBatchKMeans', **kwargs)[source]¶
Find 2D clusters from x & y data.
- Parameters:
x (seq (e.g. tuple,list,1-d array)) – Location of points in (x,y) feature space, e,g, RA & Dec, but x & y need not be spatial in nature.
y (seq (e.g. tuple,list,1-d array)) – Location of points in (x,y) feature space, e,g, RA & Dec, but x & y need not be spatial in nature.
method (str) – Cluster finding method from
sklearn.cluster
to use. Default: ‘MiniBatchKMeans’ (a streaming implementation of KMeans), which is very fast, but not the most robust. ‘DBSCAN’ is much more robust, but MUCH slower. For other methods, consultsklearn.cluster
.**kwargs –
Any other keyword arguments will be passed to the cluster finding method. If method=’MiniBatchKMeans’ or ‘KMeans’, n_clusters (integer number of clusters to find) must be passed, e.g.
clusters = findClusters(x,y,method='MiniBatchKMeans',n_clusters=3)