dl.helpers.cluster module

Data Lab helpers for clustering.

dl.helpers.cluster.constructOutlines(x, y, clusterlabels)[source]

Construct the convex hull (outline) of points in (x,y) feature space,

Parameters:
  • x (seq (e.g. tuple,list,1-d array)) – Location of points in (x,y) feature space (e,g, RA & Dec).

  • y (seq (e.g. tuple,list,1-d array)) – Location of points in (x,y) feature space (e,g, RA & Dec).

Returns:

hull – The convex hull of points (x,y), an instance of scipy.spatial.qhull.ConvexHull.

Return type:

instance

Example

Given x & y coordinates as 1d sequences:

points = np.vstack((x,y)).T  # make 2-d array of correct shape
hull = constructOutlines(x,y)
plt.plot(points[hull.vertices,0], points[hull.vertices,1], 'r-', lw=2) # plot the hull
plt.plot(points[hull.vertices[0],0], points[hull.vertices[0],1], 'r-') # closing last point of the hull
dl.helpers.cluster.findClusters(x, y, method='MiniBatchKMeans', **kwargs)[source]

Find 2D clusters from x & y data.

Parameters:
  • x (seq (e.g. tuple,list,1-d array)) – Location of points in (x,y) feature space, e,g, RA & Dec, but x & y need not be spatial in nature.

  • y (seq (e.g. tuple,list,1-d array)) – Location of points in (x,y) feature space, e,g, RA & Dec, but x & y need not be spatial in nature.

  • method (str) – Cluster finding method from sklearn.cluster to use. Default: ‘MiniBatchKMeans’ (a streaming implementation of KMeans), which is very fast, but not the most robust. ‘DBSCAN’ is much more robust, but MUCH slower. For other methods, consult sklearn.cluster.

  • **kwargs

    Any other keyword arguments will be passed to the cluster finding method. If method=’MiniBatchKMeans’ or ‘KMeans’, n_clusters (integer number of clusters to find) must be passed, e.g.

    clusters = findClusters(x,y,method='MiniBatchKMeans',n_clusters=3)