GPU Asisted Parallelization of Nearest Neighbour Calculation

This program executes sequential and parallel versions of Nearest-Neighbour calculation using both GPU parallelizm and CPU parallelizm, outputs the execution times of sequential and parallel versions

Implemented by:

M.Mucahid Benlioglu

Getting Started

Data

There are two sets of 16-dimensional points considered in this application, which will be referred as train set and test set. This application finds points from training set having the shortest Euclidian distances to the points given in test set.

To represent this two sets, 2 CSV files that have 16 columns of numbers, without any header rows, are considered. One of which will represent the train set, and the other will represent the test set. There are 2 files under ./data/ directory provided, which are used for the results shown here.

Compiling

The provided Makefile, ./Makefile is to compile codes into executable in Linux environment. Executing following command in the project root will create an executable named "nearestNeighbour" which takes zero, one or three arguments.

$ make

For Windows use Visual Studio with provided *.sln & *.vxproj files to compile the codes.

Execution

Compiled executable will run either with zero, one or three parameters. If zero parameters are provided, program will use device 0 as the GPU device and will try to input train and test sets from ./data/train.txt and ./data/test.txt respectively. (which are provided) If 1 parameter is provided that will be used as the GPU device id and data files will be assumed in the same place as before. If 3 parameters are provided, first parameter will be used for the path of train set second will be used for test set and third will be used for the GPU device id.

The program will run sequential and parallel versions of the algorithm and output the execution times.

Algorithm & Implementation

To find closest point in train set to a point in test set, Euclidian distances of the point in test set to all points in train set must be checked, but since we are not interested in actual values of distances, we can omit square root part in the calculation and just compare the sums of squares, to reduce the computational complexity.

In both CPU and GPU parallelizm, closest neighbours of multiple points from test set is calculated concurrently, but since number of cores in GPU is significantly larger than CPU, we can find more neighbours in parallel, compared to CPU. Results can be seen in the following sections.

Observations and performance improvements

A quick analysis of the provided data revealed that none of the dimensions in neither train nor test set exceeds 15, which means we can represent a dimension with at most 4 bits of information, or an entire point with 64 bits of information. Therefore, the first realization was using regular 32-bit integers for every dimension would waste 8 times more space than we would need, even 8-bit unsigned chars would cause a waste of space. Therefore, bit fields are used to reduce memory consumption and increase the chances of fitting most of the data into fast memories to imporve performance.

Results

In this section, result tables containing various information about execution results is given.

Table 1. Execution time of the algortihm on GPU and CPU with different number of threads

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
data		data
res		res
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
parallel-nearest-neighbour.sln		parallel-nearest-neighbour.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU Asisted Parallelization of Nearest Neighbour Calculation

Getting Started

Data

Compiling

Execution

Algorithm & Implementation

Observations and performance improvements

Results

About

Releases

Packages

Languages

mbenlioglu/parallel-nearest-neighbour

Folders and files

Latest commit

History

Repository files navigation

GPU Asisted Parallelization of Nearest Neighbour Calculation

Getting Started

Data

Compiling

Execution

Algorithm & Implementation

Observations and performance improvements

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages