Parallel programming on GPU using CUDA Christopher Umahaeyo; Supervisor: Öykü Akaydın

Yazar: Katkıda bulunan(lar):Dil: İngilizce Yayın ayrıntıları:Nicosia Cyprus International University 2015Tanım: XI, 42 p. table, color figure, figure 30.5 cm CDİçerik türü:
  • text
Ortam türü:
  • unmediated
Taşıyıcı türü:
  • volume
Konu(lar):
Eksik içerik
1 CHAPTER 1
1 INDRODUCTION
3 CHAPTER 2
3 Introductıon
3 LITERATURE REVIEW
3 Implementation of GPU
5 CHAPTER 3
5 GRAPHICAL PROCESSING UNIT
5 Introduction
5 Parallelism
7 Mainstream Computer with GPU
8 Architecture of Graphical Processing Unit
8 General Architecture
9 Today's GPU architecture
11 Geforce GT 440 and its ''Fermi Architecture''
11 Streaming Multiprocessor
12 Memory
12 Memory types
13 Memory interactions
14 Kernel Scheduler
14 Multithreaded Instruction Unit
14 Streaming Processor
14 Load/Store Units
14 Special Function Units
15 Warp Scheduler
15 General Purpose Programming with the GPU(GPGPU)
16 CHAPTER 4
16 IMPLEMENTATION USING CUDA
16 Introduction
16 Parallelism with CUDA
17 Single Instruction Multiple Threads(SIMT)
17 Single Instruction,Multiple Registers
17 Single Instruction, Multiple Addresses
18 Single Istruction, Mutiple Flow Path
18 CUBA Program Structure
18 CUBA Threads,Blocks, Grid and Kernel
20 Kernel Function Call and Dimension
21 CUDA Memory
22 Heterogenous Data Transfer
23 CUDA Software Stack
23 Matrix Multiple Algorithm
24 Sequential Implementation
24 Sequential Pseudocode
24 Sequential Algorithm
25 Parallel Implementation
25 Parallel Pseudocode
25 Parallel Algorithm
26 CHAPTER 5
26 SIMULATION STUDY
26 Introduction
26 Simulation Environement
27 Simulation Results
27 Simulation Result for Sequential algorithm
28 Simulation Results of Parallel Algorithm in 2D
31 Simulation Results of Parallel Algorithm with Dimesion and Topology
31 Simulation Results for Dimension
33 Simulation results for Topolgy
34 Performance Analysis
34 Speedup
36 CHAPTER 6
36 CONCLUSION
37 BIBLIOGRAPHY
Özet: 'ABSTRACT The true internal working of a parallel algorithm depends on the method of exploitation, as well as hardware capability and environment to which it is being exploited either for data intensive or scientific purposes. In this thesis, we perform parallel programming on matrix multiplication using CUDA on a GPU and make comparison between the results obtained to the sequential execution results on the CPU. Tests are also carried out on varied topology when the parallel algorithm is executed in its best dimension to proffer suitability. It has been observed that, for large computational domains, the parallel implementation of the matrix multiplication provides a significant reductions. Keywords-GPGPU, CUDA, Parallel Computing, Topology, Dimension, Speedup.'
Materyal türü: Thesis

Includes CD

Includes references (37-39 p.)

'ABSTRACT The true internal working of a parallel algorithm depends on the method of exploitation, as well as hardware capability and environment to which it is being exploited either for data intensive or scientific purposes. In this thesis, we perform parallel programming on matrix multiplication using CUDA on a GPU and make comparison between the results obtained to the sequential execution results on the CPU. Tests are also carried out on varied topology when the parallel algorithm is executed in its best dimension to proffer suitability. It has been observed that, for large computational domains, the parallel implementation of the matrix multiplication provides a significant reductions. Keywords-GPGPU, CUDA, Parallel Computing, Topology, Dimension, Speedup.'

1 CHAPTER 1

1 INDRODUCTION

3 CHAPTER 2

3 Introductıon

3 LITERATURE REVIEW

3 Implementation of GPU

5 CHAPTER 3

5 GRAPHICAL PROCESSING UNIT

5 Introduction

5 Parallelism

7 Mainstream Computer with GPU

8 Architecture of Graphical Processing Unit

8 General Architecture

9 Today's GPU architecture

11 Geforce GT 440 and its ''Fermi Architecture''

11 Streaming Multiprocessor

12 Memory

12 Memory types

13 Memory interactions

14 Kernel Scheduler

14 Multithreaded Instruction Unit

14 Streaming Processor

14 Load/Store Units

14 Special Function Units

15 Warp Scheduler

15 General Purpose Programming with the GPU(GPGPU)

16 CHAPTER 4

16 IMPLEMENTATION USING CUDA

16 Introduction

16 Parallelism with CUDA

17 Single Instruction Multiple Threads(SIMT)

17 Single Instruction,Multiple Registers

17 Single Instruction, Multiple Addresses

18 Single Istruction, Mutiple Flow Path

18 CUBA Program Structure

18 CUBA Threads,Blocks, Grid and Kernel

20 Kernel Function Call and Dimension

21 CUDA Memory

22 Heterogenous Data Transfer

23 CUDA Software Stack

23 Matrix Multiple Algorithm

24 Sequential Implementation

24 Sequential Pseudocode

24 Sequential Algorithm

25 Parallel Implementation

25 Parallel Pseudocode

25 Parallel Algorithm

26 CHAPTER 5

26 SIMULATION STUDY

26 Introduction

26 Simulation Environement

27 Simulation Results

27 Simulation Result for Sequential algorithm

28 Simulation Results of Parallel Algorithm in 2D

31 Simulation Results of Parallel Algorithm with Dimesion and Topology

31 Simulation Results for Dimension

33 Simulation results for Topolgy

34 Performance Analysis

34 Speedup

36 CHAPTER 6

36 CONCLUSION

37 BIBLIOGRAPHY

Araştırmaya Başlarken  
  Sıkça Sorulan Sorular