Saliya's Blogs

Mostly technical stuff with some interesting moments of life

K-Means Clustering with Chapel

No comments
K-Means clustering is simple to comprehend, yet is a good candidate to benefit from parallelization. The idea is to partition N data points into K clusters by assigning points to the nearest cluster center. The program starts with a random K number of centers. These centers are then refined iteratively by taking the mean of nearest points to each center. The refinement continues until the change in centers falls below a predefined threshold or until a fixed number of iterations have completed.

Full working code is at https://www.dropbox.com/s/ikfktb3sswmc8vy/kmeans.chpl
  • Declaration of points, dimension of a point, iterations, and number of clusters. Note. These are configurable at start time by passing them as command line arguments. For example passing --numPoints 50000 will set the number of points to 50000 instead of 2000.
     config var numDim: int = 2,
                numClusters: int = 4,
                numPoints: int  = 2000,
                numIter: int = 50,
                threshold: real = 0.0001;
  • Definitions of domains. The {0 .. #N} notation indicates the integer range starting at zero and counting up to N number of values, i.e. 0, 1, 2 ... N-1
     const PointsSpace = {0..#numPoints,0..#numDim};
     const ClusterSpace = {0..#numClusters,0..#numDim};
     const ClusterNumSpace = {0..#numClusters};
  • Block distribute points along zeroth dimension across locales. The array Locales and the numLocales are made available to programs by the Chapel runtime and we have reshaped the locales into a two dimensional array of numLocales x 1 size. This ensures components of each point stay in the same locale when blocked. 
     var blockedLocaleView = {0..#numLocales,1..1};
     var blockedLocales: [blockedLocaleView] locale = reshape(Locales, blockedLocaleView);
     const BlockedPointsSpace = PointsSpace 
       dmapped  Block(boundingBox=PointsSpace, targetLocales=blockedLocales);
     var points: [BlockedPointsSpace] real;
  • Arrays to hold current centers. Chapel’s rich domain operations make it possible to assign a subset of points as current centers by specifying a range as shown in second line. 
     var currCenters: [ClusterSpace] real;
     currCenters = points[ClusterSpace];  
  • Replicated arrays to keep local centers and updates. Chapel allows transparent access to array elements on remote locales; however, performance may suffer when data is transferred over network. Therefore, it is beneficial to use local arrays when operating on points in a particular locale. Chapel provides a convenient distribution to achieve this, ReplicatedDist, which creates a copy of the array in each locale. The array reference resolves to the local copy in the particular locale where the referring code runs. Also note the use of atomic real and integer variables, which is done as a workaround to non-implemented atomic blocks. 
     const ReplClusterSpace = ClusterSpace dmapped ReplicatedDist();
     var localCurrCenters: [ReplClusterSpace] real;
     // using atomic primitives as a work around to not implemented atomic blocks
     var localCenterUpdates: [ReplClusterSpace] atomic real;
     const ReplClusterNumSpace = ClusterNumSpace dmapped ReplicatedDist();
     var localCenterPCounts: [ReplClusterNumSpace] atomic int;
  • The next steps happen iteratively while refining centers.
    • We start by resetting local arrays as follows. The first line copies the current centers to localCurrCenters array in each locale. The ReplicatedDist in Chapel guarantees the array assignment in the first line happens in each locale. The next two lines initialize the two local arrays, i.e. localCenterUpdates and localCenterPCounts, to zero. Again, the distribution guarantees the two forall loops happen for each local copy of the arrays. These three statements are run in parallel by wrapping inside a cobegin clause.          
          cobegin {
            localCurrCenters = currCenters;
            forall lcu in localCenterUpdates do lcu.write(0.0);
            forall lcpc in localCenterPCounts do lcpc.write(0);
          }
    • Next is to compare the distance for each point against all cluster centers and decide the cluster it belongs to.  Note the shifting of locales using the on clause in line 2, which in turn guarantee the access of current centers, center updates, and center point counts arrays local to the particular locale. We have placed the atomic construct to show where the updates should be done atomically, but it is not implemented in Chapel yet. However, the use of atomic variables, overcomes this and guarantees proper atomic updates. Also note if the number of clusters was large, the for loop could be changed to the parallel forall version with slight modification to the code. Moreover, if parallel task creation was unnecessarily expensive one could easily change the code to use serial execution.
          forall p in {0..#numPoints} {
            on points[p,0] {
              var closestC: int = -1;
              var closestDist: real = MaxDist;
              for c in {0..#numClusters} { // possibility to parallelize
                var dist: atomic real;
                dist.write(0.0);
                forall d in {0..#numDim} {
                  var tmp = points[p,d] - localCurrCenters[c,d];
                  dist.add(tmp*tmp);
                }
                if (dist.read() < closestDist) {
                  closestDist = dist.read();
                  closestC = c;
                }
              }
              forall d in {0..#numDim} {
                atomic { // here's where we need atomic
                  localCenterUpdates[closestC,d].add(points[p,d]);
                }
              }
              localCenterPCounts[closestC].add(1);
            }
          }
    • Then we collate centers in each locale. Note the use of nested forall and cobegin constrcuts. Also, the tmpLCU and tmpLCPC arrays are not distributed, yet Chapel gives seamless access to their elements even when the referring code runs on remote locales.
          var tmpLCU: [ClusterSpace] atomic real;
          forall tlcu in tmpLCU do tlcu.write(0.0);
          var tmpLCPC: [0..#numClusters] atomic int;
          forall tlcpc in tmpLCPC do tlcpc.write(0);

          forall loc in Locales {
            on loc do {
              cobegin {
                forall (i,j) in ClusterSpace {
                  tmpLCU[i,j].add(localCenterUpdates[i,j].read());
                }
                forall i in {0..#numClusters} {
                  tmpLCPC[i].add(localCenterPCounts[i].read());
                }
              }
            }
          }
    • Finally, we test for convergence. If it has converged or the number of iterations has exceeded the given limit we will stop and will print results.
          var b: atomic bool;
          b.write(true);
          forall (i,j) in ClusterSpace {
            var center: real = tmpLCU[i,j].read()/tmpLCPC[i].read();
            if (abs(center - currCenters[i,j]) > threshold){
              b.write(false);
            }
            currCenters[i,j] = center;
          }
          converged = b.read();

P.S. I thought I should mention here that this code may not be the most optimal due to the fact that for each point it'll do a locale shift. Ideally, I'd like to implement this in a fashion as "for each locale do for each point", but this requires you knowing the points assigned to a particular locale. Currently, Chapel doesn't have a readily available method to retrieve this info. However, Chapel developers suggested some workaround to my mail on Identifying Local Indices of Distributed Arrays

No comments :

Post a Comment