GPU /GPGPU /TPU progamming links / SuperComputing at your fingertips

( What is GPU Computing? )

OpenCL is an open general-purpose GPU computing language. It is an open standard defined by the Khronos Group. OpenCL provides a cross-platform GPGPU platform that additionally supports data parallel compute on CPUs. OpenCL is supported on Intel, AMD, Nvidia, and ARM platforms. The Khronos Group is currently involved in the development of SYCL, which has its implementations with ComputeCPP and SYCL STL.

A proprietary framework is Nvidia CUDA. Nvidia started CUDA in 2006, a software development kit (SDK) and application programming interface (API) that allows using the programming language C to code algorithms for execution on GeForce 8 series and later GPUs.

Close to Metal, later called Stream, is AMD's GPGPU technology for ATI Radeon-based GPUs. AMD Stream SDK, was released under AMD EULA in December 2007 after the software stack was rewritten. Stream SDK provides high-level in addition to low-level tools for general-purpose access to AMD graphics hardware. Using GPUs to perform computations holds a lot of potential for some applications because of the fundamental differences of GPU microarchitectures compared to CPUs. GPUs achieve much greater throughput (calculations per second) by executing many programs in parallel and restricting flow control (the ability of one program to execute instructions independently of another). Modern GPUs also have addressable on-die memory and extremely high performance multi-channel external memory. AMD subsequently switched from CTM to OpenCL.

Programming standards for parallel computing include OpenCL (vendor-independent), OpenACC, and OpenHMPP.

The Xcelerit SDK, created by Xcelerit, is designed to accelerate large existing C++ or C# code-bases on GPUs with minimal effort. It provides a simplified programming model, automates parallelisation, manages devices and memory, and compiles to CUDA binaries. Additionally, multi-core CPUs and other accelerators can be targeted from the same source code.

OpenVIDIA was developed at University of Toronto between 2003-2005, in collaboration with Nvidia.

MATLAB supports GPGPU acceleration using the Parallel Computing Toolbox and MATLAB Distributed Computing Server, and third-party packages like Jacket.

GPGPU processing is also used to simulate Newtonian physics by Physics engines, and commercial implementations include Havok Physics, FX and PhysX, both of which are typically used for computer and video games.

C++ Accelerated Massive Parallelism (C++ AMP) is a library that accelerates execution of C++ code by exploiting the data-parallel hardware on GPUs.

Altimesh Hybridizer by Altimesh compiles Common Intermediate Language to CUDA binaries. It supports generics and virtual functions. Debugging and profiling is integrated to visual studio and Nsight. It's available as a Visual Studio Extension on Visual Studio Marketplace.

Microsoft introduced the DirectCompute GPU computing API, released with the DirectX 11 API.

Alea GPU by QuantAlea introduces native GPU computing capabilities for the Microsoft .NET language F# and C#. Alea GPU also provides a simplified GPU programming model based on GPU parallel-for and parallel aggregate using delegates and automatic memory management.


GPU-Gems Part I


Pharr, M.: GPU Gems 2: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Addison-Wesley, Boston, MA, 2005 or or



Nguyen, H.: GPU Gems 3: Programming Techniques for High-Performance Graphics and General-Purpose Computation. Addison-Wesley, Boston, MA, 2007 or or





Some newer pdf- articles stored locally:


GPU Simulation and Rendering of Volumetric Effects for Computer Games and Virtual Environments

Higher order FEM numerical integration on GPUs with OpenCL

Implicit FEM and Fluid Coupling on GPU for Interactive Multiphysics Simulation

Fluid–solid coupling on a cluster of GPU graphics cards for seismic wave propagation

Fast seismic modeling and reverse time migration on a GPU cluster

GPU Cluster Computing For Multigrid FEM-Solvers...  (abstract)

Assembly of Finite Element Methods on Graphics Processors

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers see also:

Towards a complete FEM-based simulation toolkit on GPUs: Geometric Multigrid solvers(2)

Finite Element Multigrid Solvers for PDE Problems on GPUs and GPU Clusters Part 2: Applications on GPU Clusters

Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Accelerating Double Precision FEM Simulations with GPUs

Analyzing CUDA Workloads Using a Detailed GPU Simulator

Automated Finite Element Computations in the FEniCS Framework using GPUs

GPU Cluster Computing for Finite Element Applications

Finite Element Integration on GPUs

Efficient Implementation of Finite Element Operators on GPUs

Massively Parallel Micromagnetic FEM Calculations with Graphical Processing Units (GPUs)

Making Faster FEM Solvers, Faster


general gpu:

General Purpose Computation On Graphics Processing Units



texts in german:

GPU-basierte Verfahren zur interaktiven Simulation und Darstellung von Fluid-Effekten

Implementierung von FEMMethoden auf programmierbaren Grafikkarten

FFT auf der GPU Von Alexander Kubias

A litle bit older (SOFA see below):

Efficient nonlinear FEM for soft tissue modelling and its GPU implementation within the open source framework SOFA








NVIDIA Parallel Nsight or here

NVIDIA Parallel Nsight brings GPU Computing into Microsoft Visual Studio. Debug, profile and analyze GPGPU or graphics applications using CUDA C, OpenCL, DirectCompute, Direct3D, and OpenGL.


NVIDIA PhysX  (2.X)

NVIDIA PhysX SDK 2.X provides game physics solutions for a variety of platforms including PC, in both software and GPU hardware-accelerated confugurations, OSX, Linux, all current major game consoles (PS3, Xbox 360, and Wii), and key mobile computing platforms.


NVIDIA CUDA  (Compute Unified Device Architecture), Nvidia's GPGPU technology for Nvidia GeForce-, Quadro- and Tesla-based GPUs (NVIDIA CUDA german)

Nvidia CUDA Programming Guide for CUDA Toolkit 3.2

Nvidia Developer Web Site

Nvidia Development Whitepapers and Presentations

Nvidia developer resources page

NVIDIA GPU Computing Developer Home Page

Nvidia Free GPU Computing Online Seminars

Nvidia GPU Programming Guide or

Nvidia Tesla C1060/C2050/M2090 (512 CUDA cores, up to 665 gflops)  (Overview, Specifications, Drivers & Downloads, ...) or or M2090: The Next Generation CUDA Architecture, Code Named Fermi (up to 512 CUDA cores). pdf

Nvidia GTX 590 /580 / 570 




Stream, AMD/ATI's GPGPU technology for ATI Radeon-based GPUs

AMD Accelerated Parallel Processing (APP) SDK (formerly ATI Stream)

AMD Accelerated Parallel Processing (APP) SDK OpenCL Programming Guide

AMD HD 6990, FireStream 9270 up to 1.2 TFLOPS (single prec.),  AMD 5970 up to 928 GFLOPS in double precision

AMD ATI FirePro V7800 (overview, Tecnical Data, ...) or

AMD APP SDK with OpenCL 1.1 Support


Which grafics card to choose - a "best" card does not exist. You got to choose - all or high end  or  price performance (G3D Mark / $Price)

There are descriptions in the net how to flash a 465 to a 470 (not for the faint at heart - do a back up first, not all cards can flash to a 470!) german description

test your gpu: GPU-Z

    FurMark 1.9.0

    GPU Caps Viewer see also here

A SuperComputer at your fingertips? !!!    GPU-Supercomputer mit 30 TFLOPS(german)

SuperComputer with the same performance as a supercomputer cluster consisting of hundreds of PCs
    Chinese supercomputer




DirectCompute Microsoft's GPU Computing API - Initially released with the DirectX 11 API

Microsoft Accelerator

Microsoft DirectX / DirectCompute or or or or

Microsoft Parallel Computing Developer Center or here:




Intel OpenCL SDK (Windows 7 32/64) or

Intel C/C++ Compiler


Open source:

OpenCL (Open Computing Language) cross platform GPGPU language for GPUs (AMD/ATI/Nvidia) and general purpose CPUs
Apple's GPU utilization introduced in Mac OS X v10.6 ‘Snow Leopard’

Adventures in OpenCL: Part 1, Getting Started

Adventures in OpenCL: Part 1.5, C++ Bindings

Adventures in OpenCL Part 2: Particles with OpenGL

Brown Deer Technology: OpenCL Tutorial: N-Body Simulation.

Nvidia  OpenCL


OpenCL Programming Guide

OpenCL Quick Reference Card

OpenCL Spezifikation

OpenCV / GpuCV see also:

OpenCV / GpuCV links and downl. here

OpenGL and OpenCL Debugger

Open MPI: Open Source High Performance Computing. OpenMP Application Program Interface. Version 3.0, May 2008. pdf:

Sh, a GPGPU library for C++
BrookGPU is the Stanford University Graphics group's compiler and runtime implementation of the Brook stream programming language. See also here.
GLSL Shader Programming Resources

CBC Seminar on GPU Programming and Computing

General-Purpose Computation on Graphics Hardware

GNU Scientific Library (GSL) or Research and development community. TPU   Why the GPGPU is Less Efficient than the TPU for DNNs

GPU Resources

GPUSort: High Performance Sorting using Graphics Processors or

Mathematica GPU Computing see also:
or here: MATLAB GPU Computing or here or here MIT Open Courseware: Applied Parallel Computing.

MPI standard: The Message Passing Interface Standard.or here or here

Intel Xeon E7, Xeon E5000er processor
AMD Llano , Bulldozer, FX8000-Serie, FX6000-Serie, FX4000-Serie (anouncement)

Fea/Fem packages here

Some more GPU /FEM links:

GPU Floating-Point Paranoia

ATILA GPU simulator source code released - Beyond3D Forum ATILA

GPU simulator source code released 3D Technology & Algorithms gpuprogramming-project3-final - monkology

SOFA download SOFA documentation SOFA altern. link ForceField

Tag: Finite Element Methods :: IEEE Xplore - GPU accelerated fast FEM deformation simulation   GPU acceleration of an unmodified parallel finite element Navier-Stokes solver     GPGPU - Wikipedia, the free encyclopedia
GPU accelerated FEM for simulation and segmentation - NAMIC
Graphics Processor Unit (GPU) acceleration of Time-Domain Finite Element Method (TD-FEM) algorithm (IEEE) | GPU Computing Simulation-Based Engineering Laboratory
GPU Computing
Automated Finite Element Discretization and Solution of Nonlinear FEM Systems /Magma Dynamics

GMH: A Message Passing Toolkit for GPU Clusters

A MEMORY EFFICIENT AND FAST SPARSE MATRIX VECTOR PRODUCT ON A GPU Aspect: Advanced Solver for Problems in Earth's ConvecTion download here, svn here
FEM Utils:

calc4fem Spreadsheet for Structural Engineering (FEM Analysis for beams, trusses, 2D-frames).
Meshgen Meshgen is designed to interactively generate 2D FEM meshes composed of triangular and quadrilateral elements.
fem_converter Conversion of data elements from one format to another (no files released)
Grid3D is a preprocessing tool for FEAST and its predecessor FEATFLOW. It provides a convenient graphical interface to create geometries and coarse grids, define boundary conditions, etc. Grid3D is implemented purely in JAVA. (Downloads and documentation...)

ALBERTA - An adaptive hierarchical finite element toolbox newer version:

The ALUGrid Library provides both hexahedral and tetrahedral grids which can be locally adapted and when used for parallel computations the decomposition of the domain can be recomputed.
The PARTY partitioning library serves a variety of different partitioning methods in a very simple and easy way. Instead of implementing the methods directly, the user may take advantage of the ready implemented methods of the library (on demand)
PreView is a Finite Element (FE) preprocessor that has been designed specifically to set up FE problems for FEBio Postview is a finite element post-processor that is designed to post-process the results from FEBio. WinFiber3D is a program that allows you to visualize MicroVisu3D files. WarpLAB is a finite element (FE) post-processing application that is specially designed to post-process warping problems. OpenDX OpenDX is a full-featured software package for the visualization of scientific, engineering and analytical data: Its open system design is built on a standard interface environments. And its data model provides users with great flexibility in creating visualizations.

Salome pre- & postprocessor

GMV GMV is no longer available for free and is being commercialized.

Tecplot not free, site licence

VTK The Visualization Toolkit (VTK) is an open-source, freely available software system for 3D computer graphics, image processing and visualization.

VTKEdge library of advanced visualization and data processing techniques that complement the Visualization Toolkit.

ParaView is an open-source, multi-platform data analysis and visualization application.

PovRay raytracer

Visit VisIt is a free interactive parallel visualization and graphical analysis tool for viewing scientific data on Unix and PC platforms

GeoMesh (131 KB). simple mesh generator
GenMesh (190 KB) more general mesh generator.
Casca mesh generator (no more avail ? manual here). The casca program can be used to make a general finite element mesh. This can then be read into Geocrack2D.

Netgen is a multi-platform automatic mesh generation tool written in C++ capable of generating meshes in two and three dimensions. The program is open source

Tetgen Open source code for generating tetrahedral meshes. Volume mesh created from surface meshes.

Gmsh: a three-dimensional finite element mesh generator with built-in pre- and post-processing facilities

LaGriT is a library of user callable tools that provide mesh generation, mesh optimization and dynamic mesh maintenance.

List of mesh generators (public domain and comerc.) Another one.

CUBIT (free for governmental use, else comercial)

OpenCTM (last Upd 2010-01-15) OpenCTM is a file format, a software library and a tool set for compression of 3D triangle meshes. The geometry is compressed to a fraction of comparable file formats (3DS, STL, COLLADA...), and the format is accessible through a simple, portable API

Some converters may stll be useful on the old ASME/Mecheng website README, FTP, short description of files

general gpu links: or codesnippets:

Mesh-based Monte Carlo (MMC)
Collins Brain Atlas FEM Mesh Version 2

Monte Carlo eXtreme (MCX)

GPU-based Interactive Simulation of Liver Resection (using SOFA) Digimouse is a popular mouse atlas (Dogdas2007). FEM mesh Version 1 was created by Qianqian Fang using iso2mesh (Fang2009) version 1.0 and CGAL (CGAL2009). or
What is GPU Computing? CUDA, Supercomputing for the Masses: Part 1-20
Use your GPU for scientific computing or
Using Graphics Processors for High Performance IR Query Processing

GPU-based Fast Analysis of Networks
Computational Intelligence Research Lab Graphics Processor Unit (GPU) Site or here
FLAGON is a library for programming NVIDIA CUDA from Fortran 95
GPU programming concepts
Installing the CUDA SDK
General Purpose GPU Programming
CUDA game of life:
Mandelbulb stereo angalyph
CFD cuda paper of rodinia bench
Radix sort for doubles: Sort doubles with two 32-bit radix sorts using similar tricks. Here are some performance results
Performance of 3D Deconvolution Algorithms on Multi-Core and Many-Core Architectures

Swarm-NG focuses on the integration of an ensemble of N-body systems evolving under Newtonian gravity.
8 cpus vs gpu code you find here: Clarity Deconvolution Library 1.0 manual here:
Fuzzy Logic on the GPU in CUDA

C# Backpropagation library written for GPU
Slideshow for ATI GPGPU physics demonstration by Stanford grad student Mike Houston See p. 13 for overview of mapping of conventional program tasks to GPU hardware.
Tech Report article: "ATI stakes claims on physics, GPGPU ground" by Scott Wasson
gpu economics or here:
montecarlo gpus or here:

using 3D arrays in cuda with explanation - New Open Standard for Many-Core gpu's
SIGGRAPH 2005 GPGPU Course Notes
IEEE VIS 2005 GPGPU Course Notes
Jacket: GPU Engine for MATLAB
Ascalaph Liquid GPU see also molecular dynamics. GPGPU Publications, Videos and Software
GP-You Project
GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model - porting a standard model to GPU hardware
GPGPU software catalog
GPGPU Computing @ Duke Statistical Science
Brahma - open-source library written for the .NET 3.5 framework (in C# 3.0). Focus on GPGPU.
Penumbra - open-source library for Clojure. Penumbra is a Clojure wrapper for LWJGL that includes s-expression representation of GLSL and GPGPU.
OpenCL Studio Integrated development environment for OpenCL.
Monte Carlo of diffuse light propagation (photon migration) CUDA-based codes for Monte Carlo simulation of light transport
GPGPU Programming in F# using the Microsoft Research Accelerator system.
ViennaCL scientific computing library compatible with uBLAS for GPUs and multi-core CPUs written in C++ and based on OpenCL.
GPGPU Image Post-Processing GPU accelerated examples of Paint.NET's blur effects with performance comparison.
VizExperts provide HPC solutions and training.
Intro to GPGPU featuring CUDA and OpenCL
GPGPU Review, European Physical Journal Special Topics 194, 87-119 (2011)
CUDAfy.NET Open source library for the .NET framework for programming CUDA GPUs. Supports device code in native .NET; and CURAND, CUBLAS and CUFFT.
here , pdf: see also this page: this can be used for: Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
Introduction to GPU Programming with GLSL: SIBGRAPI 2009 Ricardo Marroquim André Maximo motivation architecture language examples wrap-up Tutorial project:

Introduction to GPU Programming with GLSL Ricardo Marroquim Istitutodi Scienzae Tecnologiedell'Informazio ne CNR Pisa, Italy Andr ...

GPU Programming and GLSL: 15-466 Computer Game Programming, Carnegie Mellon University, Spring 2007 (James Kuffner) Announcements Announcements Announcements Lab2 posted at: …

GPU Programming using GLSL and VTK: 3 vizNETConference 2009 Graphics shaders • Procedural graphics shadershave been around since the early days of computing. • Developed by scientists

Release Notes for NVIDIA OpenGL Shading Language Support November 9, 2006 These release notes explain the implementation status of the OpenGL Shading Language GLSL ...

GPU Christmas Tree Rendering: January 2007 1 Beta Release This is the beta version of the Christmas tree rendering whitepaper. A final version will be released in a later SDK

AMD - Introduction to OpenGL 3.0 Introduction OpenGL continues to evolve, growing alongside the hardware that supports it. With the release of the latest version of OpenGL

SiftGPU Manual Changchang Wu University of North Carolina at Chapel Hill Introduction SiftGPU is a GPU implementation of David Lowe's Scale Invariant Feature

The OpenGL Shading Language: Introduction This document specifies only version 1.30 of the OpenGL Shading Language. It requires __VERSION__ to substitute 130, and requires #version to …

Speed-up of Algorithms With Graphics Processing Units (GPU): Part I of IV Derek Anderson and Robert Luke * Electrical and Computer Engineering Department

Intro to OpenGL Shading Language (GLSL) Why should we care? •We can do lots of really cool stuff in real-time, without overworking the CPU •Some Examples

Step-Through Debugging of GLSL Shaders Hilgart, Mark School of Computer Science, DePaul University, Chicago, USA

Intro to GLSL (OpenGL Shading Language): Worcester Polytechnic Institute 5 Back To Lecture Back To Lecture Q: What is a Programmable GPU & Why do we need it?

RTSL: a Ray Tracing Shading Language StevenG. Parker † Solomon Boulos James Bigler † Austin Robison † SCI Institute, University of Utah School of Computing

PyStream: Python Shaders on the GPU: PyStream vs. GLSL PyStream vs. GLSL class CompiledA

Ray Tracing on GPU: University of Applied Sciences Basel (FHBB) Diploma Thesis DA070405 RayTracing on GPU Ray Tracingon GPU
OpenGL"Hello, world!" byIan Romanick This work is licensed under the Creative Commons Attribution Non-commercial Share Alike (by-nc-sa)

GLC_lib GLC_lib is a C++ library for high performance 3D application based on OpenGL and QT4 GUI. Some GLC_lib features : Supported file format : 3DS, OBJ, COLLADA, 3DXML, OFF, STL. Easy view manipulation, Level of detail, shaders

Some Wikipedia pages (select your preferred language):






Fea/Fem packages: wickipedia article


GPU Computing Gems Emerald Edition (Applications of GPU Computing Series) by Wen-mei W. Hwu Hardcover
CUDA by Example: An Introduction to General-Purpose GPU Programming by Jason Sanders Paperback 

Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series) by David B. Kirk Paperback

Programming utilitys here