CUDA-ACCELERATED FEATURE SELECTION

Sterling Ramroach1*, Jonathan Herbert2 and Ajay Joshi3

1,3Department of Electrical and Computer Engineering,

2Department of Computing and Information Technology,

The University of the West Indies at Saint Augustine.

1Email: sterling.ramroach@sta.uwi.edu *(Corresponding author)

2Email: jonathan.herbert@my.uwi.edu

3Email: ajay.joshi@sta.uwi.edu

Abstract:

Identifying important features from high dimensional data is usually done using one-dimensional filtering techniques. These techniques discard noisy attributes and those that are constant throughout the data. This is a time-consuming task that has scope for acceleration via high performance computing techniques involving the graphics processing unit (GPU). The proposed algorithm involves acceleration via the Compute Unified Device Architecture (CUDA) framework developed by Nvidia. This framework facilitates the seamless scaling of computation on any CUDA-enabled GPUs. Thus, the Pearson Correlation Coefficient can be applied in parallel on each feature with respect to the response variable. The ranks obtained for each feature can be used to determine the most relevant features to select. Using data from the UCI Machine Learning Repository, our results show an increase in efficiency for multi-dimensional analysis with a more reliable feature importance ranking. When tested on a high-dimensional dataset of 1000 samples and 10,000 features, we achieved a 1,230-time speedup using CUDA. This acceleration grows exponentially, as with any embarrassingly parallel task.

 

Keywords: CUDA, Feature Selection, High Performance Computing, Pearson Correlation.

https://doi.org/10.47412/JUQG5057

 

 

Full PDF Article