GPGPU Computing in C++ with AMP

Some algorithms are massively parallel, for instance applying filters to images, convolutions, matrix operations, particle based physics simulations, evaluating neural networks. These algorithms can be massively accelerated by executing them on the GPU instead of the CPU. In this course we use Microsoft AMP for writing programs that perform their computations on the GPU.

This course goes close to the metal and as such we focus a fair bit on GPU hardware architecture, since the main goal of GPU programming is to get better performance we need to know which patterns that work best and how to debug and optimize for GPU computing.

Microsoft AMP is a C++ extension that adds some keywords to the C++ language for marking functions that should execute on the GPU. This allows us to write code that looks like C++ and hits a sweet spot between development cost and performance. 

    Target audience

    Experienced C++ Developers


    Deep knowledge of multi-threaded programming using threads. Knowledge of task based concurrency for instance with Microsoft PPL or Intel TBB.

    What you will learn

    • How to avoid common optimization pitfalls. 
    • When to benefit from parallelism. 
    • How the underlying hardware contributes to parallelism. 
    • How to take advantage of the GPU across multiple manufacturers with C++ and Microsoft AMP. 
    • How to avoid common parallel and heterogeneous computing pitfalls. 


    Day 1

    • Introduction
    • C++11 Lambdas
    • Measuring Performance
    • Introduction to CPU and GPU Hardware
      • Memory Types and Caching
      • Vector programming
      • Cores, Threads, Tiles and Warps
    • Methods of writing code for the GPU
      • OpenCL
      • CUDA
      • DirectCompute
      • Microsoft C++ AMP
    • Introduction to AMP
      • AMP Syntax and Data Types
      • array, array_view
      • index
      • extent
      • grid
      • restrict
    • parallel_for_each
      • How to use
      • Optimizing Memory Move and Copy
    • Synchronizing memory with accelerators
      • Implicit synchronization
      • synchronize*()
      • data()
      • Lost Exceptions
    • The fast_math and precise_math namespaces
      • Comparison to “standard” math.
      • Accelerator requirements
      • Example
    • Debugging with Warp
      • Visual Studio Tools
      • GPU Threads
      • Parallel Stacks
      • Parallel Watch
    • Floating Point Numbers
      • How they are handled
      • Why they are different from CPU
      • Performance of float/double operations

    Day 2

    • Tiling
      • Syntax
      • Determining tile size
      • Memory Coalescence
      • Memory Collisions
      • Tile Synchronization
    • AMP Atomic Operations
      • atomic_exchange()
      • atomic_fetch*()
    • Parallel patterns with AMP
      • Map
      • Reduce
      • Scan
      • Pack
    • AMP Accelerators
      • Accelerator properties
      • Shared memory
      • Using multiple accelerators
    • The concurrency::graphics namespace
      • Exploiting the texture cache
    • AMP Error Handling
      • Exceptions
      • Detecting and Recovering from TDR

    Course info

    Course code: T393
    Duration: 2 days
    Price: 24 500 SEK
    Language: English

    Course schedule

    27 MayBook now
    21 NovBook now
    20 MayBook now
    14 NovBook now
    13 MayBook now



    Related courses

    • C++ for Experienced Developers

      Category: C++
      Duration: 3 days
      Price: 25 900 SEK
    • Advanced C++

      Category: C++
      Duration: 2 days
      Price: 21 500 SEK
    • GPGPU Computing in C++ with CUDA

      Category: C++
      Duration: 2 days
      Price: 24 500 SEK

    Contact us for details

    +46 40 61 70 720

    All prices excluding VAT