A code generator that emits GPU code from a given Halide stmt.
More...
#include <CodeGen_GPU_Dev.h>
|
virtual | ~CodeGen_GPU_Dev () |
|
virtual void | add_kernel (Stmt stmt, const std::string &name, const std::vector< DeviceArgument > &args)=0 |
| Compile a GPU kernel into the module.
|
|
virtual void | init_module ()=0 |
| (Re)initialize the GPU kernel module.
|
|
virtual std::vector< char > | compile_to_src ()=0 |
|
virtual std::string | get_current_kernel_name ()=0 |
|
virtual void | dump ()=0 |
|
virtual std::string | api_unique_name ()=0 |
| This routine returns the GPU API name that is combined into runtime routine names to ensure each GPU API has a unique name.
|
|
virtual std::string | print_gpu_name (const std::string &name)=0 |
| Returns the specified name transformed by the variable naming rules for the GPU language backend.
|
|
virtual bool | kernel_run_takes_types () const |
| Allows the GPU device specific code to request halide_type_t values to be passed to the kernel_run routine rather than just argument type sizes.
|
|
A code generator that emits GPU code from a given Halide stmt.
Definition at line 18 of file CodeGen_GPU_Dev.h.
◆ MemoryFenceType
An mask describing which type of memory fence to use for the gpu_thread_barrier() intrinsic.
Not all GPUs APIs support all types.
Enumerator |
---|
None | |
Device | |
Shared | |
Definition at line 75 of file CodeGen_GPU_Dev.h.
◆ ~CodeGen_GPU_Dev()
virtual Halide::Internal::CodeGen_GPU_Dev::~CodeGen_GPU_Dev |
( |
| ) |
|
|
virtual |
◆ add_kernel()
virtual void Halide::Internal::CodeGen_GPU_Dev::add_kernel |
( |
Stmt | stmt, |
|
|
const std::string & | name, |
|
|
const std::vector< DeviceArgument > & | args ) |
|
pure virtual |
Compile a GPU kernel into the module.
This may be called many times with different kernels, which will all be accumulated into a single source module shared by a given Halide pipeline.
◆ init_module()
virtual void Halide::Internal::CodeGen_GPU_Dev::init_module |
( |
| ) |
|
|
pure virtual |
(Re)initialize the GPU kernel module.
This is separate from compile, since a GPU device module will often have many kernels compiled into it for a single pipeline.
◆ compile_to_src()
virtual std::vector< char > Halide::Internal::CodeGen_GPU_Dev::compile_to_src |
( |
| ) |
|
|
pure virtual |
◆ get_current_kernel_name()
virtual std::string Halide::Internal::CodeGen_GPU_Dev::get_current_kernel_name |
( |
| ) |
|
|
pure virtual |
◆ dump()
virtual void Halide::Internal::CodeGen_GPU_Dev::dump |
( |
| ) |
|
|
pure virtual |
◆ api_unique_name()
virtual std::string Halide::Internal::CodeGen_GPU_Dev::api_unique_name |
( |
| ) |
|
|
pure virtual |
This routine returns the GPU API name that is combined into runtime routine names to ensure each GPU API has a unique name.
◆ print_gpu_name()
virtual std::string Halide::Internal::CodeGen_GPU_Dev::print_gpu_name |
( |
const std::string & | name | ) |
|
|
pure virtual |
Returns the specified name transformed by the variable naming rules for the GPU language backend.
Used to determine the name of a parameter during host codegen.
◆ kernel_run_takes_types()
virtual bool Halide::Internal::CodeGen_GPU_Dev::kernel_run_takes_types |
( |
| ) |
const |
|
inlinevirtual |
Allows the GPU device specific code to request halide_type_t values to be passed to the kernel_run routine rather than just argument type sizes.
Definition at line 54 of file CodeGen_GPU_Dev.h.
◆ is_block_uniform()
static bool Halide::Internal::CodeGen_GPU_Dev::is_block_uniform |
( |
const Expr & | expr | ) |
|
|
static |
Checks if expr is block uniform, i.e.
does not depend on a thread var.
◆ is_buffer_constant()
static bool Halide::Internal::CodeGen_GPU_Dev::is_buffer_constant |
( |
const Stmt & | kernel, |
|
|
const std::string & | buffer ) |
|
static |
Checks if the buffer is a candidate for constant storage.
Most GPUs (APIs) support a constant memory storage class that cannot be written to and performs well for block uniform accesses. A buffer is a candidate for constant storage if it is never written to, and loads are uniform within the workgroup.
◆ scalarize_predicated_loads_stores()
static Stmt Halide::Internal::CodeGen_GPU_Dev::scalarize_predicated_loads_stores |
( |
Stmt & | s | ) |
|
|
static |
Modifies predicated loads and stores to be non-predicated, since most GPU backends do not support predication.
The documentation for this struct was generated from the following file: