A code generator that emits GPU code from a given Halide stmt. More...

#include <CodeGen_GPU_Dev.h>

Public Types
enum	MemoryFenceType { None = 0, Device = 1, Shared = 2 }
	An mask describing which type of memory fence to use for the gpu_thread_barrier() intrinsic. More...

Public Member Functions
virtual	~CodeGen_GPU_Dev ()

virtual void	add_kernel (Stmt stmt, const std::string &name, const std::vector< DeviceArgument > &args)=0
	Compile a GPU kernel into the module. More...

virtual void	init_module ()=0
	(Re)initialize the GPU kernel module. More...

virtual std::vector< char >	compile_to_src ()=0

virtual std::string	get_current_kernel_name ()=0

virtual void	dump ()=0

virtual std::string	api_unique_name ()=0
	This routine returns the GPU API name that is combined into runtime routine names to ensure each GPU API has a unique name. More...

virtual std::string	print_gpu_name (const std::string &name)=0
	Returns the specified name transformed by the variable naming rules for the GPU language backend. More...

virtual bool	kernel_run_takes_types () const
	Allows the GPU device specific code to request halide_type_t values to be passed to the kernel_run routine rather than just argument type sizes. More...

Static Public Member Functions
static bool	is_gpu_var (const std::string &name)

static bool	is_gpu_block_var (const std::string &name)

static bool	is_gpu_thread_var (const std::string &name)

static bool	is_block_uniform (const Expr &expr)
	Checks if expr is block uniform, i.e. More...

static bool	is_buffer_constant (const Stmt &kernel, const std::string &buffer)
	Checks if the buffer is a candidate for constant storage. More...

static Stmt	scalarize_predicated_loads_stores (Stmt &s)
	Modifies predicated loads and stores to be non-predicated, since most GPU backends do not support predication. More...

Detailed Description

A code generator that emits GPU code from a given Halide stmt.

Definition at line 18 of file CodeGen_GPU_Dev.h.

Member Enumeration Documentation

◆ MemoryFenceType

enum Halide::Internal::CodeGen_GPU_Dev::MemoryFenceType

An mask describing which type of memory fence to use for the gpu_thread_barrier() intrinsic.

Not all GPUs APIs support all types.

Enumerator
None
Device
Shared

Definition at line 79 of file CodeGen_GPU_Dev.h.

Constructor & Destructor Documentation

◆ ~CodeGen_GPU_Dev()

virtual Halide::Internal::CodeGen_GPU_Dev::~CodeGen_GPU_Dev ( )

virtual

Member Function Documentation

◆ add_kernel()

virtual void Halide::Internal::CodeGen_GPU_Dev::add_kernel	(	Stmt	stmt,
		const std::string &	name,
		const std::vector< DeviceArgument > &	args
	)

pure virtual

Compile a GPU kernel into the module.

This may be called many times with different kernels, which will all be accumulated into a single source module shared by a given Halide pipeline.

◆ init_module()

virtual void Halide::Internal::CodeGen_GPU_Dev::init_module ( )

pure virtual

(Re)initialize the GPU kernel module.

This is separate from compile, since a GPU device module will often have many kernels compiled into it for a single pipeline.

◆ compile_to_src()

virtual std::vector<char> Halide::Internal::CodeGen_GPU_Dev::compile_to_src ( )

pure virtual

◆ get_current_kernel_name()

virtual std::string Halide::Internal::CodeGen_GPU_Dev::get_current_kernel_name ( )

pure virtual

◆ dump()

virtual void Halide::Internal::CodeGen_GPU_Dev::dump ( )

pure virtual

◆ api_unique_name()

virtual std::string Halide::Internal::CodeGen_GPU_Dev::api_unique_name ( )

pure virtual

This routine returns the GPU API name that is combined into runtime routine names to ensure each GPU API has a unique name.

◆ print_gpu_name()

virtual std::string Halide::Internal::CodeGen_GPU_Dev::print_gpu_name ( const std::string & name )

pure virtual

Returns the specified name transformed by the variable naming rules for the GPU language backend.

Used to determine the name of a parameter during host codegen.

◆ kernel_run_takes_types()

virtual bool Halide::Internal::CodeGen_GPU_Dev::kernel_run_takes_types ( ) const

inlinevirtual

Allows the GPU device specific code to request halide_type_t values to be passed to the kernel_run routine rather than just argument type sizes.

Definition at line 54 of file CodeGen_GPU_Dev.h.

◆ is_gpu_var()

static bool Halide::Internal::CodeGen_GPU_Dev::is_gpu_var ( const std::string & name )

static

◆ is_gpu_block_var()

static bool Halide::Internal::CodeGen_GPU_Dev::is_gpu_block_var ( const std::string & name )

static

◆ is_gpu_thread_var()

static bool Halide::Internal::CodeGen_GPU_Dev::is_gpu_thread_var ( const std::string & name )

static

◆ is_block_uniform()

static bool Halide::Internal::CodeGen_GPU_Dev::is_block_uniform ( const Expr & expr )

static

Checks if expr is block uniform, i.e.

does not depend on a thread var.

◆ is_buffer_constant()

static bool Halide::Internal::CodeGen_GPU_Dev::is_buffer_constant	(	const Stmt &	kernel,
		const std::string &	buffer
	)

static

Checks if the buffer is a candidate for constant storage.

Most GPUs (APIs) support a constant memory storage class that cannot be written to and performs well for block uniform accesses. A buffer is a candidate for constant storage if it is never written to, and loads are uniform within the workgroup.

◆ scalarize_predicated_loads_stores()

static Stmt Halide::Internal::CodeGen_GPU_Dev::scalarize_predicated_loads_stores ( Stmt & s )

static

Modifies predicated loads and stores to be non-predicated, since most GPU backends do not support predication.

The documentation for this struct was generated from the following file:

src/CodeGen_GPU_Dev.h

Public Types

Public Member Functions

Static Public Member Functions

Detailed Description

Member Enumeration Documentation

◆ MemoryFenceType

Constructor & Destructor Documentation

◆ ~CodeGen_GPU_Dev()

Member Function Documentation

◆ add_kernel()

◆ init_module()

◆ compile_to_src()

◆ get_current_kernel_name()

◆ dump()

◆ api_unique_name()

◆ print_gpu_name()

◆ kernel_run_takes_types()

◆ is_gpu_var()

◆ is_gpu_block_var()

◆ is_gpu_thread_var()

◆ is_block_uniform()

◆ is_buffer_constant()

◆ scalarize_predicated_loads_stores()