Find the perfect course for you

GPGPU Computing in C++ with AMP

Some algorithms are massively parallel, for instance applying filters to images, convolutions, matrix operations, particle based physics simulations, evaluating neural networks. These algorithms can be massively accelerated by executing them on the GPU instead of the CPU. In this course we use Microsoft AMP for writing programs that perform their computations on the GPU.

This course goes close to the metal and as such we focus a fair bit on GPU hardware architecture, since the main goal of GPU programming is to get better performance we need to know which patterns that work best and how to debug and optimize for GPU computing.

Microsoft AMP is a C++ extension that adds some keywords to the C++ language for marking functions that should execute on the GPU. This allows us to write code that looks like C++ and hits a sweet spot between development cost and performance. 

Target audience

Experienced C++ Developers


Deep knowledge of multi-threaded programming using threads. Knowledge of task based concurrency for instance with Microsoft PPL or Intel TBB.

What you will learn

  • How to avoid common optimization pitfalls. 
  • When to benefit from parallelism. 
  • How the underlying hardware contributes to parallelism. 
  • How to take advantage of the GPU across multiple manufacturers with C++ and Microsoft AMP. 
  • How to avoid common parallel and heterogeneous computing pitfalls. 


Day 1

  • Introduction
  • C++11 Lambdas
  • Measuring Performance
  • Introduction to CPU and GPU Hardware
    • Memory Types and Caching
    • Vector programming
    • Cores, Threads, Tiles and Warps
  • Methods of writing code for the GPU
    • OpenCL
    • CUDA
    • DirectCompute
    • Microsoft C++ AMP
  • Introduction to AMP
    • AMP Syntax and Data Types
    • array, array_view
    • index
    • extent
    • grid
    • restrict
  • parallel_for_each
    • How to use
    • Optimizing Memory Move and Copy
  • Synchronizing memory with accelerators
    • Implicit synchronization
    • synchronize*()
    • data()
    • Lost Exceptions
  • The fast_math and precise_math namespaces
    • Comparison to “standard” math.
    • Accelerator requirements
    • Example
  • Debugging with Warp
    • Visual Studio Tools
    • GPU Threads
    • Parallel Stacks
    • Parallel Watch
  • Floating Point Numbers
    • How they are handled
    • Why they are different from CPU
    • Performance of float/double operations

Day 2

  • Tiling
    • Syntax
    • Determining tile size
    • Memory Coalescence
    • Memory Collisions
    • Tile Synchronization
  • AMP Atomic Operations
    • atomic_exchange()
    • atomic_fetch*()
  • Parallel patterns with AMP
    • Map
    • Reduce
    • Scan
    • Pack
  • AMP Accelerators
    • Accelerator properties
    • Shared memory
    • Using multiple accelerators
  • The concurrency::graphics namespace
    • Exploiting the texture cache
  • AMP Error Handling
    • Exceptions
    • Detecting and Recovering from TDR

Course info

Course code: T393
Duration: 2 days
Price: 24 500 SEK
Language: English

Course schedule

There are no set dates for this course at the moment, but contact us and we'll make arrangements!



Related courses

  • C++ for Experienced Developers

    Category: C++
    Duration: 3 days
    Price: 25 900 SEK
  • Advanced C++

    Category: C++
    Duration: 2 days
    Price: 21 500 SEK
  • GPGPU Computing in C++ with CUDA

    Category: C++
    Duration: 2 days
    Price: 24 500 SEK

Contact us for details

+46 40 61 70 720

All prices excluding VAT