Parallel programming on GPU using CUDA Christopher Umahaeyo; Supervisor: Öykü Akaydın
Dil: İngilizce Yayın ayrıntıları:Nicosia Cyprus International University 2015Tanım: XI, 42 p. table, color figure, figure 30.5 cm CDİçerik türü:- text
- unmediated
- volume
Materyal türü | Geçerli Kütüphane | Koleksiyon | Yer Numarası | Durum | Notlar | İade tarihi | Barkod | Materyal Ayırtmaları | |
---|---|---|---|---|---|---|---|---|---|
Thesis | CIU LIBRARY Tez Koleksiyonu | Tez Koleksiyonu | YL 532 U43 2015 (Rafa gözat(Aşağıda açılır)) | Kullanılabilir | Computer Engineering Department | T588 |
CIU LIBRARY raflarına göz atılıyor, Raftaki konumu: Tez Koleksiyonu, Koleksiyon: Tez Koleksiyonu Raf tarayıcısını kapatın(Raf tarayıcısını kapatır)
Includes CD
Includes references (37-39 p.)
'ABSTRACT The true internal working of a parallel algorithm depends on the method of exploitation, as well as hardware capability and environment to which it is being exploited either for data intensive or scientific purposes. In this thesis, we perform parallel programming on matrix multiplication using CUDA on a GPU and make comparison between the results obtained to the sequential execution results on the CPU. Tests are also carried out on varied topology when the parallel algorithm is executed in its best dimension to proffer suitability. It has been observed that, for large computational domains, the parallel implementation of the matrix multiplication provides a significant reductions. Keywords-GPGPU, CUDA, Parallel Computing, Topology, Dimension, Speedup.'
1 CHAPTER 1
1 INDRODUCTION
3 CHAPTER 2
3 Introductıon
3 LITERATURE REVIEW
3 Implementation of GPU
5 CHAPTER 3
5 GRAPHICAL PROCESSING UNIT
5 Introduction
5 Parallelism
7 Mainstream Computer with GPU
8 Architecture of Graphical Processing Unit
8 General Architecture
9 Today's GPU architecture
11 Geforce GT 440 and its ''Fermi Architecture''
11 Streaming Multiprocessor
12 Memory
12 Memory types
13 Memory interactions
14 Kernel Scheduler
14 Multithreaded Instruction Unit
14 Streaming Processor
14 Load/Store Units
14 Special Function Units
15 Warp Scheduler
15 General Purpose Programming with the GPU(GPGPU)
16 CHAPTER 4
16 IMPLEMENTATION USING CUDA
16 Introduction
16 Parallelism with CUDA
17 Single Instruction Multiple Threads(SIMT)
17 Single Instruction,Multiple Registers
17 Single Instruction, Multiple Addresses
18 Single Istruction, Mutiple Flow Path
18 CUBA Program Structure
18 CUBA Threads,Blocks, Grid and Kernel
20 Kernel Function Call and Dimension
21 CUDA Memory
22 Heterogenous Data Transfer
23 CUDA Software Stack
23 Matrix Multiple Algorithm
24 Sequential Implementation
24 Sequential Pseudocode
24 Sequential Algorithm
25 Parallel Implementation
25 Parallel Pseudocode
25 Parallel Algorithm
26 CHAPTER 5
26 SIMULATION STUDY
26 Introduction
26 Simulation Environement
27 Simulation Results
27 Simulation Result for Sequential algorithm
28 Simulation Results of Parallel Algorithm in 2D
31 Simulation Results of Parallel Algorithm with Dimesion and Topology
31 Simulation Results for Dimension
33 Simulation results for Topolgy
34 Performance Analysis
34 Speedup
36 CHAPTER 6
36 CONCLUSION
37 BIBLIOGRAPHY