The system is implemented on the Smartdust platform, in which PCA-based feature extraction and K-means-based clustering are realized by software. For these software approaches, real-time spike sorting operations may be difficult when the processor operates at low clock rates. A number of FPGA-based hardware architectures have been proposed to expedite the spike sorting. The architectures in [11,12] are able to perform online PCA/GHA feature extraction. The architectures for discrete wavelet transform (DWT)-based feature extraction are proposed in [13,14]. The circuit for the extraction of zero-crossing features (ZCF) of spike trains is presented in [15]. The architecture in [16] is capable of performing self-organizing map (SOM)-based clustering.
A common drawback of these architectures is that they are not able to perform both feature extraction and clustering. Because all of these operations are required for spike sorting, hardware implementation of any single operation may not be able to provide sufficient throughput at the front end.Studies in [17�C19] have proposed GHA architectures for texture classification and face recognition. Although these architecture may be directly used for spike sorting, some architectures are not suited, because of their high area costs and/or long latency. The architecture presented in [17] provides high throughput for GHA training, since it processes all elements of input vectors concurrently. However, the area cost of the architecture grows linearly with the dimension of input vectors.
On the contrary, only one element of input vectors is delivered to architecture proposed in [18] at a time. The architecture therefore has low area cost. Nevertheless, its latency grows linearly with the vector dimension. Hence, the architecture may not be suitable for the training of long spikes. The architecture in [19] separates input vectors into a number of smaller blocks. It then processes one block at a time. The architecture has both the advantages of low area costs and high computational speed [19].In addition to GHA architectures, many FCM architectures [20�C22] have been proposed for image processing. However, some of these AV-951 architecture are difficult to be extended for spike sorting. The architecture in [20] is designed for clustering with only two classes. For spike sorting applications, the number of classes may be larger than two.
The architecture in [21] allows all the membership coefficients associated with a training vector to be computed in parallel. The architecture has both high throughput and high area costs. The high hardware utilization increases the difficulty of integrating FCM with GHA on a single chip. An effective alternative to [20,21] is based on [22], which produces membership coefficients sequentially to reduce the area costs.