OpenCL C++ Bindings
OpenCL C++ Bindings

Introduction

For many large applications C++ is the language of choice and so it seems reasonable to define C++ bindings for OpenCL.

The interface is contained with a single C++ header file opencl.hpp and all definitions are contained within the namespace cl. There is no additional requirement to include cl.h and to use either the C++ or original C bindings; it is enough to simply include opencl.hpp.

The bindings themselves are lightweight and correspond closely to the underlying C API. Using the C++ bindings introduces no additional execution overhead.

There are numerous compatibility, portability and memory management fixes in the new header as well as additional OpenCL 2.0 features. As a result the header is not directly backward compatible and for this reason we release it as opencl.hpp rather than a new version of cl.hpp.

Compatibility

Due to the evolution of the underlying OpenCL API the 2.0 C++ bindings include an updated approach to defining supported feature versions and the range of valid underlying OpenCL runtime versions supported.

The combination of preprocessor macros CL_HPP_TARGET_OPENCL_VERSION and CL_HPP_MINIMUM_OPENCL_VERSION control this range. These are three digit decimal values representing OpenCL runime versions. The default for the target is 200, representing OpenCL 2.0 and the minimum is also defined as 200. These settings would use 2.0 API calls only. If backward compatibility with a 1.2 runtime is required, the minimum version may be set to 120.

Note that this is a compile-time setting, and so affects linking against a particular SDK version rather than the versioning of the loaded runtime.

The earlier versions of the header included basic vector and string classes based loosely on STL versions. These were difficult to maintain and very rarely used. For the 2.0 header we now assume the presence of the standard library unless requested otherwise. We use std::array, std::vector, std::shared_ptr and std::string throughout to safely manage memory and reduce the chance of a recurrance of earlier memory management bugs.

These classes are used through typedefs in the cl namespace: cl::array, cl::vector, cl::pointer and cl::string. In addition cl::allocate_pointer forwards to std::allocate_shared by default. In all cases these standard library classes can be replaced with custom interface-compatible versions using the CL_HPP_NO_STD_ARRAY, CL_HPP_NO_STD_VECTOR, CL_HPP_NO_STD_UNIQUE_PTR and CL_HPP_NO_STD_STRING macros.

The OpenCL 1.x versions of the C++ bindings included a size_t wrapper class to interface with kernel enqueue. This caused unpleasant interactions with the standard size_t declaration and led to namespacing bugs. In the 2.0 version we have replaced this with a std::array-based interface. However, the old behaviour can be regained for backward compatibility using the CL_HPP_ENABLE_SIZE_T_COMPATIBILITY macro.

Finally, the program construction interface used a clumsy vector-of-pairs design in the earlier versions. We have replaced that with a cleaner vector-of-vectors and vector-of-strings design. However, for backward compatibility old behaviour can be regained with the CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY macro.

In OpenCL 2.0 OpenCL C is not entirely backward compatibility with earlier versions. As a result a flag must be passed to the OpenCL C compiled to request OpenCL 2.0 compilation of kernels with 1.2 as the default in the absence of the flag. In some cases the C++ bindings automatically compile code for ease. For those cases the compilation defaults to OpenCL C 2.0. If this is not wanted, the CL_HPP_CL_1_2_DEFAULT_BUILD macro may be specified to assume 1.2 compilation. If more fine-grained decisions on a per-kernel bases are required then explicit build operations that take the flag should be used.

Parameters

This header may be parameterized by a set of preprocessor macros.

  • CL_HPP_TARGET_OPENCL_VERSION

    Defines the target OpenCL runtime version to build the header against. Defaults to 200, representing OpenCL 2.0.

  • CL_HPP_NO_STD_STRING

    Do not use the standard library string class. cl::string is not defined and may be defined by the user before opencl.hpp is included.

  • CL_HPP_NO_STD_VECTOR

    Do not use the standard library vector class. cl::vector is not defined and may be defined by the user before opencl.hpp is included.

  • CL_HPP_NO_STD_ARRAY

    Do not use the standard library array class. cl::array is not defined and may be defined by the user before opencl.hpp is included.

  • CL_HPP_NO_STD_UNIQUE_PTR

    Do not use the standard library unique_ptr class. cl::pointer and the cl::allocate_pointer functions are not defined and may be defined by the user before opencl.hpp is included.

  • CL_HPP_ENABLE_EXCEPTIONS

    Enable exceptions for use in the C++ bindings header. This is the preferred error handling mechanism but is not required.

  • CL_HPP_ENABLE_SIZE_T_COMPATIBILITY

    Backward compatibility option to support cl.hpp-style size_t class. Replaces the updated std::array derived version and removal of size_t from the namespace. Note that in this case the new size_t class is placed in the cl::compatibility namespace and thus requires an additional using declaration for direct backward compatibility.

  • CL_HPP_ENABLE_PROGRAM_CONSTRUCTION_FROM_ARRAY_COMPATIBILITY

    Enable older vector of pairs interface for construction of programs.

  • CL_HPP_CL_1_2_DEFAULT_BUILD

    Default to OpenCL C 1.2 compilation rather than OpenCL C 2.0 applies to use of cl::Program construction and other program build variants.

  • CL_HPP_USE_CL_DEVICE_FISSION

    Enable the cl_ext_device_fission extension.

  • CL_HPP_USE_CL_IMAGE2D_FROM_BUFFER_KHR

    Enable the cl_khr_image2d_from_buffer extension.

  • CL_HPP_USE_CL_SUB_GROUPS_KHR

    Enable the cl_khr_subgroups extension.

  • CL_HPP_USE_DX_INTEROP

    Enable the cl_khr_d3d10_sharing extension.

  • CL_HPP_USE_IL_KHR

    Enable the cl_khr_il_program extension.

Example

The following example shows a general use case for the C++ bindings, including support for the optional exception feature and also the supplied vector and string classes, see following sections for decriptions of these features.

Note: the C++ bindings use std::call_once and therefore may need to be compiled using special command-line options (such as "-pthread") on some platforms!

#define CL_HPP_ENABLE_EXCEPTIONS
#define CL_HPP_TARGET_OPENCL_VERSION 200
#include <CL/opencl.hpp>
#include <iostream>
#include <vector>
#include <memory>
#include <algorithm>
const int numElements = 32;
int main(void)
{
// Filter for a 2.0 or newer platform and set it as the default
std::vector<cl::Platform> platforms;
cl::Platform::get(&platforms);
for (auto &p : platforms) {
std::string platver = p.getInfo<CL_PLATFORM_VERSION>();
if (platver.find("OpenCL 2.") != std::string::npos ||
platver.find("OpenCL 3.") != std::string::npos) {
// Note: an OpenCL 3.x platform may not support all required features!
plat = p;
}
}
if (plat() == 0) {
std::cout << "No OpenCL 2.0 or newer platform found.\n";
return -1;
}
if (newP != plat) {
std::cout << "Error setting default platform.\n";
return -1;
}
// C++11 raw string literal for the first kernel
std::string kernel1{R"CLC(
global int globalA;
kernel void updateGlobal()
{
globalA = 75;
}
)CLC"};
// Raw string literal for the second kernel
std::string kernel2{R"CLC(
typedef struct { global int *bar; } Foo;
kernel void vectorAdd(global const Foo* aNum, global const int *inputA, global const int *inputB,
global int *output, int val, write_only pipe int outPipe, queue_t childQueue)
{
output[get_global_id(0)] = inputA[get_global_id(0)] + inputB[get_global_id(0)] + val + *(aNum->bar);
write_pipe(outPipe, &val);
queue_t default_queue = get_default_queue();
ndrange_t ndrange = ndrange_1D(get_global_size(0)/2, get_global_size(0)/2);
// Have a child kernel write into third quarter of output
enqueue_kernel(default_queue, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange,
^{
output[get_global_size(0)*2 + get_global_id(0)] =
inputA[get_global_size(0)*2 + get_global_id(0)] + inputB[get_global_size(0)*2 + get_global_id(0)] + globalA;
});
// Have a child kernel write into last quarter of output
enqueue_kernel(childQueue, CLK_ENQUEUE_FLAGS_WAIT_KERNEL, ndrange,
^{
output[get_global_size(0)*3 + get_global_id(0)] =
inputA[get_global_size(0)*3 + get_global_id(0)] + inputB[get_global_size(0)*3 + get_global_id(0)] + globalA + 2;
});
}
)CLC"};
std::vector<std::string> programStrings;
programStrings.push_back(kernel1);
programStrings.push_back(kernel2);
cl::Program vectorAddProgram(programStrings);
try {
vectorAddProgram.build("-cl-std=CL2.0");
}
catch (...) {
// Print build info for all devices
cl_int buildErr = CL_SUCCESS;
auto buildInfo = vectorAddProgram.getBuildInfo<CL_PROGRAM_BUILD_LOG>(&buildErr);
for (auto &pair : buildInfo) {
std::cerr << pair.second << std::endl << std::endl;
}
return 1;
}
typedef struct { int *bar; } Foo;
// Get and run kernel that initializes the program-scope global
// A test for kernels that take no arguments
auto program2Kernel =
cl::KernelFunctor<>(vectorAddProgram, "updateGlobal");
program2Kernel(
// SVM allocations
auto anSVMInt = cl::allocate_svm<int, cl::SVMTraitCoarse<>>();
*anSVMInt = 5;
auto fooPointer = cl::allocate_pointer<Foo>(svmAllocReadOnly);
fooPointer->bar = anSVMInt.get();
std::vector<int, cl::SVMAllocator<int, cl::SVMTraitCoarse<>>> inputA(numElements, 1, svmAlloc);
cl::coarse_svm_vector<int> inputB(numElements, 2, svmAlloc);
// Traditional cl_mem allocations
std::vector<int> output(numElements, 0xdeadbeef);
cl::Buffer outputBuffer(begin(output), end(output), false);
cl::Pipe aPipe(sizeof(cl_int), numElements / 2);
// Default command queue, also passed in as a parameter
auto vectorAddKernel =
decltype(fooPointer)&,
int*,
int,
>(vectorAddProgram, "vectorAdd");
// Ensure that the additional SVM pointer is available to the kernel
// This one was not passed as a parameter
vectorAddKernel.setSVMPointers(anSVMInt);
cl_int error;
vectorAddKernel(
cl::NDRange(numElements/2),
cl::NDRange(numElements/2)),
fooPointer,
inputA.data(),
inputB,
outputBuffer,
3,
aPipe,
defaultDeviceQueue,
error
);
cl::copy(outputBuffer, begin(output), end(output));
std::cout << "Output:\n";
for (int i = 1; i < numElements; ++i) {
std::cout << "\t" << output[i] << "\n";
}
std::cout << "\n\n";
return 0;
}
Class interface for Buffer Memory Objects.
Definition: opencl.hpp:3924
static Context getDefault(cl_int *err=NULL)
Returns a singleton context including all devices of CL_DEVICE_TYPE_DEFAULT.
Definition: opencl.hpp:3090
DeviceCommandQueue interface for device cl_command_queues.
Definition: opencl.hpp:8871
static DeviceCommandQueue makeDefault(cl_int *err=nullptr)
Definition: opencl.hpp:9024
Class interface for cl_device_id.
Definition: opencl.hpp:2176
static Device getDefault(cl_int *errResult=NULL)
Returns the first device on the default context.
Definition: opencl.hpp:2225
Class interface for specifying NDRange values.
Definition: opencl.hpp:5785
Class interface for Pipe Memory Objects.
Definition: opencl.hpp:5515
Class interface for cl_platform_id.
Definition: opencl.hpp:2456
static cl_int get(vector< Platform > *platforms)
Gets a list of available platforms.
Definition: opencl.hpp:2732
cl_int getInfo(cl_platform_info name, T *param) const
Wrapper for clGetPlatformInfo().
Definition: opencl.hpp:2577
static Platform setDefault(const Platform &default_platform)
Definition: opencl.hpp:2568
Program interface that implements cl_program.
Definition: opencl.hpp:6271
cl_int copy(IteratorType startIterator, IteratorType endIterator, cl::Buffer &buffer)
Definition: opencl.hpp:9514
vector< T, cl::SVMAllocator< int, cl::SVMTraitCoarse<> > > coarse_svm_vector
Vector alias to simplify contruction of coarse-grained SVM containers.
Definition: opencl.hpp:3900
C++ bindings for OpenCL 1.0, OpenCL 1.1, OpenCL 1.2, OpenCL 2.0, OpenCL 2.1, OpenCL 2....