Halide
Halide::Func Class Reference

A halide function. More...

#include <Func.h>

Public Member Functions

 Func (const std::string &name)
 Declare a new undefined function with the given name. More...
 
 Func ()
 Declare a new undefined function with an automatically-generated unique name. More...
 
 Func (const Expr &e)
 Declare a new function with an automatically-generated unique name, and define it to return the given expression (which may not contain free variables). More...
 
 Func (Internal::Function f)
 Construct a new Func to wrap an existing, already-define Function object. More...
 
template<typename T >
HALIDE_NO_USER_CODE_INLINE Func (Buffer< T > &im)
 Construct a new Func to wrap a Buffer. More...
 
Realization realize (std::vector< int32_t > sizes, const Target &target=Target(), const ParamMap &param_map=ParamMap::empty_map())
 Evaluate this function over some rectangular domain and return the resulting buffer or buffers. More...
 
Realization realize (int x_size, int y_size, int z_size, int w_size, const Target &target=Target(), const ParamMap &param_map=ParamMap::empty_map())
 
Realization realize (int x_size, int y_size, int z_size, const Target &target=Target(), const ParamMap &param_map=ParamMap::empty_map())
 
Realization realize (int x_size, int y_size, const Target &target=Target(), const ParamMap &param_map=ParamMap::empty_map())
 
Realization realize (int x_size, const Target &target=Target(), const ParamMap &param_map=ParamMap::empty_map())
 
Realization realize (const Target &target=Target(), const ParamMap &param_map=ParamMap::empty_map())
 
void realize (Pipeline::RealizationArg outputs, const Target &target=Target(), const ParamMap &param_map=ParamMap::empty_map())
 Evaluate this function into an existing allocated buffer or buffers. More...
 
void infer_input_bounds (const std::vector< int32_t > &sizes, const Target &target=get_jit_target_from_environment(), const ParamMap &param_map=ParamMap::empty_map())
 For a given size of output, or a given output buffer, determine the bounds required of all unbound ImageParams referenced. More...
 
void infer_input_bounds (int x_size=0, int y_size=0, int z_size=0, int w_size=0, const Target &target=get_jit_target_from_environment(), const ParamMap &param_map=ParamMap::empty_map())
 
void infer_input_bounds (const std::initializer_list< int > &sizes, const Target &target=get_jit_target_from_environment(), const ParamMap &param_map=ParamMap::empty_map())
 
void infer_input_bounds (Pipeline::RealizationArg outputs, const Target &target=get_jit_target_from_environment(), const ParamMap &param_map=ParamMap::empty_map())
 
void compile_to_bitcode (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment())
 Statically compile this function to llvm bitcode, with the given filename (which should probably end in .bc), type signature, and C function name (which defaults to the same name as this halide function. More...
 
void compile_to_bitcode (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment())
 
void compile_to_llvm_assembly (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment())
 Statically compile this function to llvm assembly, with the given filename (which should probably end in .ll), type signature, and C function name (which defaults to the same name as this halide function. More...
 
void compile_to_llvm_assembly (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment())
 
void compile_to_object (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment())
 Statically compile this function to an object file, with the given filename (which should probably end in .o or .obj), type signature, and C function name (which defaults to the same name as this halide function. More...
 
void compile_to_object (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment())
 
void compile_to_header (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name="", const Target &target=get_target_from_environment())
 Emit a header file with the given filename for this function. More...
 
void compile_to_assembly (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment())
 Statically compile this function to text assembly equivalent to the object file generated by compile_to_object. More...
 
void compile_to_assembly (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment())
 
void compile_to_c (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name="", const Target &target=get_target_from_environment())
 Statically compile this function to C source code. More...
 
void compile_to_lowered_stmt (const std::string &filename, const std::vector< Argument > &args, StmtOutputFormat fmt=Text, const Target &target=get_target_from_environment())
 Write out an internal representation of lowered code. More...
 
void print_loop_nest ()
 Write out the loop nests specified by the schedule for this Function. More...
 
void compile_to_file (const std::string &filename_prefix, const std::vector< Argument > &args, const std::string &fn_name="", const Target &target=get_target_from_environment())
 Compile to object file and header pair, with the given arguments. More...
 
void compile_to_static_library (const std::string &filename_prefix, const std::vector< Argument > &args, const std::string &fn_name="", const Target &target=get_target_from_environment())
 Compile to static-library file and header pair, with the given arguments. More...
 
void compile_to_multitarget_static_library (const std::string &filename_prefix, const std::vector< Argument > &args, const std::vector< Target > &targets)
 Compile to static-library file and header pair once for each target; each resulting function will be considered (in order) via halide_can_use_target_features() at runtime, with the first appropriate match being selected for subsequent use. More...
 
void compile_to_multitarget_object_files (const std::string &filename_prefix, const std::vector< Argument > &args, const std::vector< Target > &targets, const std::vector< std::string > &suffixes)
 Like compile_to_multitarget_static_library(), except that the object files are all output as object files (rather than bundled into a static library). More...
 
Module compile_to_module (const std::vector< Argument > &args, const std::string &fn_name="", const Target &target=get_target_from_environment())
 Store an internal representation of lowered code as a self contained Module suitable for further compilation. More...
 
void compile_to (const std::map< Output, std::string > &output_files, const std::vector< Argument > &args, const std::string &fn_name, const Target &target=get_target_from_environment())
 Compile and generate multiple target files with single call. More...
 
void compile_jit (const Target &target=get_jit_target_from_environment())
 Eagerly jit compile the function to machine code. More...
 
void set_error_handler (void(*handler)(void *, const char *))
 Set the error handler function that be called in the case of runtime errors during halide pipelines. More...
 
void set_custom_allocator (void *(*malloc)(void *, size_t), void(*free)(void *, void *))
 Set a custom malloc and free for halide to use. More...
 
void set_custom_do_task (int(*custom_do_task)(void *, int(*)(void *, int, uint8_t *), int, uint8_t *))
 Set a custom task handler to be called by the parallel for loop. More...
 
void set_custom_do_par_for (int(*custom_do_par_for)(void *, int(*)(void *, int, uint8_t *), int, int, uint8_t *))
 Set a custom parallel for loop launcher. More...
 
void set_custom_trace (int(*trace_fn)(void *, const halide_trace_event_t *))
 Set custom routines to call when tracing is enabled. More...
 
void set_custom_print (void(*handler)(void *, const char *))
 Set the function called to print messages from the runtime. More...
 
const Internal::JITHandlersjit_handlers ()
 Get a struct containing the currently set custom functions used by JIT. More...
 
template<typename T >
void add_custom_lowering_pass (T *pass)
 Add a custom pass to be used during lowering. More...
 
void add_custom_lowering_pass (Internal::IRMutator *pass, std::function< void()> deleter)
 Add a custom pass to be used during lowering, with the function that will be called to delete it also passed in. More...
 
void clear_custom_lowering_passes ()
 Remove all previously-set custom lowering passes. More...
 
const std::vector< CustomLoweringPass > & custom_lowering_passes ()
 Get the custom lowering passes. More...
 
void debug_to_file (const std::string &filename)
 When this function is compiled, include code that dumps its values to a file after it is realized, for the purpose of debugging. More...
 
const std::string & name () const
 The name of this function, either given during construction, or automatically generated. More...
 
std::vector< Varargs () const
 Get the pure arguments. More...
 
Expr value () const
 The right-hand-side value of the pure definition of this function. More...
 
Tuple values () const
 The values returned by this function. More...
 
bool defined () const
 Does this function have at least a pure definition. More...
 
const std::vector< Expr > & update_args (int idx=0) const
 Get the left-hand-side of the update definition. More...
 
Expr update_value (int idx=0) const
 Get the right-hand-side of an update definition. More...
 
Tuple update_values (int idx=0) const
 Get the right-hand-side of an update definition for functions that returns multiple values. More...
 
std::vector< RVarrvars (int idx=0) const
 Get the RVars of the reduction domain for an update definition, if there is one. More...
 
bool has_update_definition () const
 Does this function have at least one update definition? More...
 
int num_update_definitions () const
 How many update definitions does this function have? More...
 
bool is_extern () const
 Is this function an external stage? That is, was it defined using define_extern? More...
 
void define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > &params, Type t, int dimensionality, NameMangling mangling=NameMangling::Default, DeviceAPI device_api=DeviceAPI::Host)
 Add an extern definition for this Func. More...
 
void define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > &params, const std::vector< Type > &types, int dimensionality, NameMangling mangling)
 
void define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > &params, const std::vector< Type > &types, int dimensionality, NameMangling mangling=NameMangling::Default, DeviceAPI device_api=DeviceAPI::Host)
 
void define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > &params, Type t, const std::vector< Var > &arguments, NameMangling mangling=NameMangling::Default, DeviceAPI device_api=DeviceAPI::Host)
 
void define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > &params, const std::vector< Type > &types, const std::vector< Var > &arguments, NameMangling mangling=NameMangling::Default, DeviceAPI device_api=DeviceAPI::Host)
 
const std::vector< Type > & output_types () const
 Get the types of the outputs of this Func. More...
 
int outputs () const
 Get the number of outputs of this Func. More...
 
const std::string & extern_function_name () const
 Get the name of the extern function called for an extern definition. More...
 
int dimensions () const
 The dimensionality (number of arguments) of this function. More...
 
FuncRef operator() (std::vector< Var >) const
 Construct either the left-hand-side of a definition, or a call to a functions that happens to only contain vars as arguments. More...
 
template<typename... Args>
HALIDE_NO_USER_CODE_INLINE std::enable_if< Internal::all_are_convertible< Var, Args... >::value, FuncRef >::type operator() (Args &&... args) const
 
FuncRef operator() (std::vector< Expr >) const
 Either calls to the function, or the left-hand-side of an update definition (see RDom). More...
 
template<typename... Args>
HALIDE_NO_USER_CODE_INLINE std::enable_if< Internal::all_are_convertible< Expr, Args... >::value, FuncRef >::type operator() (const Expr &x, Args &&... args) const
 
Func in (const Func &f)
 Creates and returns a new identity Func that wraps this Func. More...
 
Func in (const std::vector< Func > &fs)
 Create and return an identity wrapper shared by all the Funcs in 'fs'. More...
 
Func in ()
 Create and return a global identity wrapper, which wraps all calls to this Func by any other Func. More...
 
Func clone_in (const Func &f)
 Similar to Func::in; however, instead of replacing the call to this Func with an identity Func that refers to it, this replaces the call with a clone of this Func. More...
 
Func clone_in (const std::vector< Func > &fs)
 
Func copy_to_device (DeviceAPI d=DeviceAPI::Default_GPU)
 Declare that this function should be implemented by a call to halide_buffer_copy with the given target device API. More...
 
Func copy_to_host ()
 Declare that this function should be implemented by a call to halide_buffer_copy with a NULL target device API. More...
 
Funcsplit (const VarOrRVar &old, const VarOrRVar &outer, const VarOrRVar &inner, const Expr &factor, TailStrategy tail=TailStrategy::Auto)
 Split a dimension into inner and outer subdimensions with the given names, where the inner dimension iterates from 0 to factor-1. More...
 
Funcfuse (const VarOrRVar &inner, const VarOrRVar &outer, const VarOrRVar &fused)
 Join two dimensions into a single fused dimenion. More...
 
Funcserial (const VarOrRVar &var)
 Mark a dimension to be traversed serially. More...
 
Funcparallel (const VarOrRVar &var)
 Mark a dimension to be traversed in parallel. More...
 
Funcparallel (const VarOrRVar &var, const Expr &task_size, TailStrategy tail=TailStrategy::Auto)
 Split a dimension by the given task_size, and the parallelize the outer dimension. More...
 
Funcvectorize (const VarOrRVar &var)
 Mark a dimension to be computed all-at-once as a single vector. More...
 
Funcunroll (const VarOrRVar &var)
 Mark a dimension to be completely unrolled. More...
 
Funcvectorize (const VarOrRVar &var, const Expr &factor, TailStrategy tail=TailStrategy::Auto)
 Split a dimension by the given factor, then vectorize the inner dimension. More...
 
Funcunroll (const VarOrRVar &var, const Expr &factor, TailStrategy tail=TailStrategy::Auto)
 Split a dimension by the given factor, then unroll the inner dimension. More...
 
Funcbound (const Var &var, Expr min, Expr extent)
 Statically declare that the range over which a function should be evaluated is given by the second and third arguments. More...
 
Funcset_estimate (const Var &var, const Expr &min, const Expr &extent)
 Statically declare the range over which the function will be evaluated in the general case. More...
 
Funcestimate (const Var &var, const Expr &min, const Expr &extent)
 
Funcset_estimates (const Region &estimates)
 Set (min, extent) estimates for all dimensions in the Func at once; this is equivalent to calling set_estimate(args()[n], min, extent) repeatedly, but slightly terser. More...
 
Funcalign_bounds (const Var &var, Expr modulus, Expr remainder=0)
 Expand the region computed so that the min coordinates is congruent to 'remainder' modulo 'modulus', and the extent is a multiple of 'modulus'. More...
 
Funcbound_extent (const Var &var, Expr extent)
 Bound the extent of a Func's realization, but not its min. More...
 
Functile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &xo, const VarOrRVar &yo, const VarOrRVar &xi, const VarOrRVar &yi, const Expr &xfactor, const Expr &yfactor, TailStrategy tail=TailStrategy::Auto)
 Split two dimensions at once by the given factors, and then reorder the resulting dimensions to be xi, yi, xo, yo from innermost outwards. More...
 
Functile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &xi, const VarOrRVar &yi, const Expr &xfactor, const Expr &yfactor, TailStrategy tail=TailStrategy::Auto)
 A shorter form of tile, which reuses the old variable names as the new outer dimensions. More...
 
Functile (const std::vector< VarOrRVar > &previous, const std::vector< VarOrRVar > &outers, const std::vector< VarOrRVar > &inners, const std::vector< Expr > &factors, const std::vector< TailStrategy > &tails)
 A more general form of tile, which defines tiles of any dimensionality. More...
 
Functile (const std::vector< VarOrRVar > &previous, const std::vector< VarOrRVar > &outers, const std::vector< VarOrRVar > &inners, const std::vector< Expr > &factors, TailStrategy tail=TailStrategy::Auto)
 The generalized tile, with a single tail strategy to apply to all vars. More...
 
Functile (const std::vector< VarOrRVar > &previous, const std::vector< VarOrRVar > &inners, const std::vector< Expr > &factors, TailStrategy tail=TailStrategy::Auto)
 Generalized tiling, reusing the previous names as the outer names. More...
 
Funcreorder (const std::vector< VarOrRVar > &vars)
 Reorder variables to have the given nesting order, from innermost out. More...
 
template<typename... Args>
HALIDE_NO_USER_CODE_INLINE std::enable_if< Internal::all_are_convertible< VarOrRVar, Args... >::value, Func & >::type reorder (const VarOrRVar &x, const VarOrRVar &y, Args &&... args)
 
Funcrename (const VarOrRVar &old_name, const VarOrRVar &new_name)
 Rename a dimension. More...
 
Funcallow_race_conditions ()
 Specify that race conditions are permitted for this Func, which enables parallelizing over RVars even when Halide cannot prove that it is safe to do so. More...
 
Funcatomic (bool override_associativity_test=false)
 Issue atomic updates for this Func. More...
 
Stage specialize (const Expr &condition)
 Specialize a Func. More...
 
void specialize_fail (const std::string &message)
 Add a specialization to a Func that always terminates execution with a call to halide_error(). More...
 
Funcgpu_threads (const VarOrRVar &thread_x, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU thread indices. More...
 
Funcgpu_threads (const VarOrRVar &thread_x, const VarOrRVar &thread_y, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcgpu_threads (const VarOrRVar &thread_x, const VarOrRVar &thread_y, const VarOrRVar &thread_z, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcgpu_lanes (const VarOrRVar &thread_x, DeviceAPI device_api=DeviceAPI::Default_GPU)
 The given dimension corresponds to the lanes in a GPU warp. More...
 
Funcgpu_single_thread (DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide to run this stage using a single gpu thread and block. More...
 
Funcgpu_blocks (const VarOrRVar &block_x, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU block indices. More...
 
Funcgpu_blocks (const VarOrRVar &block_x, const VarOrRVar &block_y, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcgpu_blocks (const VarOrRVar &block_x, const VarOrRVar &block_y, const VarOrRVar &block_z, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcgpu (const VarOrRVar &block_x, const VarOrRVar &thread_x, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU block indices and thread indices. More...
 
Funcgpu (const VarOrRVar &block_x, const VarOrRVar &block_y, const VarOrRVar &thread_x, const VarOrRVar &thread_y, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcgpu (const VarOrRVar &block_x, const VarOrRVar &block_y, const VarOrRVar &block_z, const VarOrRVar &thread_x, const VarOrRVar &thread_y, const VarOrRVar &thread_z, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcgpu_tile (const VarOrRVar &x, const VarOrRVar &bx, const VarOrRVar &tx, const Expr &x_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
Funcgpu_tile (const VarOrRVar &x, const VarOrRVar &tx, const Expr &x_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcgpu_tile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &bx, const VarOrRVar &by, const VarOrRVar &tx, const VarOrRVar &ty, const Expr &x_size, const Expr &y_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcgpu_tile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &tx, const VarOrRVar &ty, const Expr &x_size, const Expr &y_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcgpu_tile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &z, const VarOrRVar &bx, const VarOrRVar &by, const VarOrRVar &bz, const VarOrRVar &tx, const VarOrRVar &ty, const VarOrRVar &tz, const Expr &x_size, const Expr &y_size, const Expr &z_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcgpu_tile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &z, const VarOrRVar &tx, const VarOrRVar &ty, const VarOrRVar &tz, const Expr &x_size, const Expr &y_size, const Expr &z_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 
Funcshader (const Var &x, const Var &y, const Var &c, DeviceAPI device_api)
 Schedule for execution using coordinate-based hardware api. More...
 
Funcglsl (const Var &x, const Var &y, const Var &c)
 Schedule for execution as GLSL kernel. More...
 
Funchexagon (const VarOrRVar &x=Var::outermost())
 Schedule for execution on Hexagon. More...
 
Funcprefetch (const Func &f, const VarOrRVar &var, Expr offset=1, PrefetchBoundStrategy strategy=PrefetchBoundStrategy::GuardWithIf)
 Prefetch data written to or read from a Func or an ImageParam by a subsequent loop iteration, at an optionally specified iteration offset. More...
 
Funcprefetch (const Internal::Parameter &param, const VarOrRVar &var, Expr offset=1, PrefetchBoundStrategy strategy=PrefetchBoundStrategy::GuardWithIf)
 
template<typename T >
Funcprefetch (const T &image, VarOrRVar var, Expr offset=1, PrefetchBoundStrategy strategy=PrefetchBoundStrategy::GuardWithIf)
 
Funcreorder_storage (const std::vector< Var > &dims)
 Specify how the storage for the function is laid out. More...
 
Funcreorder_storage (const Var &x, const Var &y)
 
template<typename... Args>
HALIDE_NO_USER_CODE_INLINE std::enable_if< Internal::all_are_convertible< Var, Args... >::value, Func & >::type reorder_storage (const Var &x, const Var &y, Args &&... args)
 
Funcalign_storage (const Var &dim, const Expr &alignment)
 Pad the storage extent of a particular dimension of realizations of this function up to be a multiple of the specified alignment. More...
 
Funcfold_storage (const Var &dim, const Expr &extent, bool fold_forward=true)
 Store realizations of this function in a circular buffer of a given extent. More...
 
Funccompute_at (const Func &f, const Var &var)
 Compute this function as needed for each unique value of the given var for the given calling function f. More...
 
Funccompute_at (const Func &f, const RVar &var)
 Schedule a function to be computed within the iteration over some dimension of an update domain. More...
 
Funccompute_at (LoopLevel loop_level)
 Schedule a function to be computed within the iteration over a given LoopLevel. More...
 
Funccompute_with (const Stage &s, const VarOrRVar &var, const std::vector< std::pair< VarOrRVar, LoopAlignStrategy >> &align)
 Schedule the iteration over the initial definition of this function to be fused with another stage 's' from outermost loop to a given LoopLevel. More...
 
Funccompute_with (const Stage &s, const VarOrRVar &var, LoopAlignStrategy align=LoopAlignStrategy::Auto)
 
Funccompute_with (LoopLevel loop_level, const std::vector< std::pair< VarOrRVar, LoopAlignStrategy >> &align)
 
Funccompute_with (LoopLevel loop_level, LoopAlignStrategy align=LoopAlignStrategy::Auto)
 
Funccompute_root ()
 Compute all of this function once ahead of time. More...
 
Funcmemoize ()
 Use the halide_memoization_cache_... More...
 
Funcasync ()
 Produce this Func asynchronously in a separate thread. More...
 
Funcstore_at (const Func &f, const Var &var)
 Allocate storage for this function within f's loop over var. More...
 
Funcstore_at (const Func &f, const RVar &var)
 Equivalent to the version of store_at that takes a Var, but schedules storage within the loop over a dimension of a reduction domain. More...
 
Funcstore_at (LoopLevel loop_level)
 Equivalent to the version of store_at that takes a Var, but schedules storage at a given LoopLevel. More...
 
Funcstore_root ()
 Equivalent to Func::store_at, but schedules storage outside the outermost loop. More...
 
Funccompute_inline ()
 Aggressively inline all uses of this function. More...
 
Stage update (int idx=0)
 Get a handle on an update step for the purposes of scheduling it. More...
 
Funcstore_in (MemoryType memory_type)
 Set the type of memory this Func should be stored in. More...
 
Functrace_loads ()
 Trace all loads from this Func by emitting calls to halide_trace. More...
 
Functrace_stores ()
 Trace all stores to the buffer backing this Func by emitting calls to halide_trace. More...
 
Functrace_realizations ()
 Trace all realizations of this Func by emitting calls to halide_trace. More...
 
Funcadd_trace_tag (const std::string &trace_tag)
 Add a string of arbitrary text that will be passed thru to trace inspection code if the Func is realized in trace mode. More...
 
Internal::Function function () const
 Get a handle on the internal halide function that this Func represents. More...
 
 operator Stage () const
 You can cast a Func to its pure stage for the purposes of scheduling it. More...
 
OutputImageParam output_buffer () const
 Get a handle on the output buffer for this Func. More...
 
std::vector< OutputImageParamoutput_buffers () const
 
 operator ExternFuncArgument () const
 Use a Func as an argument to an external stage. More...
 
std::vector< Argumentinfer_arguments () const
 Infer the arguments to the Func, sorted into a canonical order: all buffers (sorted alphabetically by name), followed by all non-buffers (sorted alphabetically by name). More...
 
std::string source_location () const
 Get the source location of the pure definition of this Func. More...
 
const Internal::StageScheduleget_schedule () const
 Return the current StageSchedule associated with this initial Stage of this Func. More...
 

Detailed Description

Constructor & Destructor Documentation

◆ Func() [1/5]

Halide::Func::Func ( const std::string &  name)
explicit

Declare a new undefined function with the given name.

◆ Func() [2/5]

Halide::Func::Func ( )

Declare a new undefined function with an automatically-generated unique name.

◆ Func() [3/5]

Halide::Func::Func ( const Expr e)
explicit

Declare a new function with an automatically-generated unique name, and define it to return the given expression (which may not contain free variables).

◆ Func() [4/5]

Halide::Func::Func ( Internal::Function  f)
explicit

Construct a new Func to wrap an existing, already-define Function object.

◆ Func() [5/5]

template<typename T >
HALIDE_NO_USER_CODE_INLINE Halide::Func::Func ( Buffer< T > &  im)
inlineexplicit

Construct a new Func to wrap a Buffer.

Definition at line 713 of file Func.h.

Member Function Documentation

◆ realize() [1/7]

Realization Halide::Func::realize ( std::vector< int32_t sizes,
const Target target = Target(),
const ParamMap param_map = ParamMap::empty_map() 
)

Evaluate this function over some rectangular domain and return the resulting buffer or buffers.

Performs compilation if the Func has not previously been realized and jit_compile has not been called. If the final stage of the pipeline is on the GPU, data is copied back to the host before being returned. The returned Realization should probably be instantly converted to a Buffer class of the appropriate type. That is, do this:

f(x) = sin(x);
Buffer<float> im = f.realize(...);

If your Func has multiple values, because you defined it using a Tuple, then casting the result of a realize call to a buffer or image will produce a run-time error. Instead you should do the following:

f(x) = Tuple(x, sin(x));
Realization r = f.realize(...);
Buffer<int> im0 = r[0];
Buffer<float> im1 = r[1];

In Halide formal arguments of a computation are specified using Param<T> and ImageParam objects in the expressions defining the computation. The param_map argument to realize allows specifying a set of per-call parameters to be used for a specific computation. This method is thread-safe where the globals used by Param<T> and ImageParam are not. Any parameters that are not in the param_map are taken from the global values, so those can continue to be used if they are not changing per-thread.

One can explicitly construct a ParamMap and use its set method to insert Parameter to scalar or Buffer value mappings:

Param<int32> p(42);
ImageParam img(Int(32), 1);
f(x) = img(x) + p;
Buffer<int32_t) arg_img(10, 10);
<fill in arg_img...>
ParamMap params;
params.set(p, 17);
params.set(img, arg_img);
Buffer<int32_t> result = f.realize(10, 10, t, params);

Alternatively, an initializer list can be used directly in the realize call to pass this information:

Param<int32> p(42);
ImageParam img(Int(32), 1);
f(x) = img(x) + p;
Buffer<int32_t) arg_img(10, 10);
<fill in arg_img...>
Buffer<int32_t> result = f.realize(10, 10, t, { { p, 17 }, { img, arg_img } });

If the Func cannot be realized into a buffer of the given size due to scheduling constraints on scattering update definitions, it will be realized into a larger buffer of the minimum size possible, and a cropped view at the requested size will be returned. It is thus not safe to assume the returned buffers are contiguous in memory. This behavior can be disabled with the NoBoundsQuery target flag, in which case an error about writing out of bounds on the output buffer will trigger instead.

Examples
tutorial/lesson_01_basics.cpp, tutorial/lesson_02_input_image.cpp, tutorial/lesson_03_debugging_1.cpp, tutorial/lesson_04_debugging_2.cpp, tutorial/lesson_05_scheduling_1.cpp, tutorial/lesson_06_realizing_over_shifted_domains.cpp, tutorial/lesson_07_multi_stage_pipelines.cpp, tutorial/lesson_08_scheduling_2.cpp, tutorial/lesson_09_update_definitions.cpp, tutorial/lesson_12_using_the_gpu.cpp, and tutorial/lesson_13_tuples.cpp.

Referenced by Halide::SimdOpCheckTest::check_one(), Halide::evaluate(), Halide::evaluate_may_gpu(), and Halide::Internal::StubOutputBufferBase::realize().

◆ realize() [2/7]

Realization Halide::Func::realize ( int  x_size,
int  y_size,
int  z_size,
int  w_size,
const Target target = Target(),
const ParamMap param_map = ParamMap::empty_map() 
)

◆ realize() [3/7]

Realization Halide::Func::realize ( int  x_size,
int  y_size,
int  z_size,
const Target target = Target(),
const ParamMap param_map = ParamMap::empty_map() 
)

◆ realize() [4/7]

Realization Halide::Func::realize ( int  x_size,
int  y_size,
const Target target = Target(),
const ParamMap param_map = ParamMap::empty_map() 
)

◆ realize() [5/7]

Realization Halide::Func::realize ( int  x_size,
const Target target = Target(),
const ParamMap param_map = ParamMap::empty_map() 
)

◆ realize() [6/7]

Realization Halide::Func::realize ( const Target target = Target(),
const ParamMap param_map = ParamMap::empty_map() 
)

◆ realize() [7/7]

void Halide::Func::realize ( Pipeline::RealizationArg  outputs,
const Target target = Target(),
const ParamMap param_map = ParamMap::empty_map() 
)

Evaluate this function into an existing allocated buffer or buffers.

If the buffer is also one of the arguments to the function, strange things may happen, as the pipeline isn't necessarily safe to run in-place. If you pass multiple buffers, they must have matching sizes. This form of realize does not automatically copy data back from the GPU.

◆ infer_input_bounds() [1/4]

void Halide::Func::infer_input_bounds ( const std::vector< int32_t > &  sizes,
const Target target = get_jit_target_from_environment(),
const ParamMap param_map = ParamMap::empty_map() 
)

For a given size of output, or a given output buffer, determine the bounds required of all unbound ImageParams referenced.

Communicates the result by allocating new buffers of the appropriate size and binding them to the unbound ImageParams.

Set the documentation for Func::realize regarding the ParamMap. There is one difference in that input Buffer<> arguments that are being inferred are specified as a pointer to the Buffer<> in the ParamMap. E.g.

Param<int32> p(42);
ImageParam img(Int(32), 1);
f(x) = img(x) + p;
f.infer_input_bounds({10, 10}, t, { { img, &in } });

On return, in will be an allocated buffer of the correct size to evaulate f over a 10x10 region.

Referenced by Halide::SimdOpCheckTest::check_one(), and infer_input_bounds().

◆ infer_input_bounds() [2/4]

void Halide::Func::infer_input_bounds ( int  x_size = 0,
int  y_size = 0,
int  z_size = 0,
int  w_size = 0,
const Target target = get_jit_target_from_environment(),
const ParamMap param_map = ParamMap::empty_map() 
)

◆ infer_input_bounds() [3/4]

void Halide::Func::infer_input_bounds ( const std::initializer_list< int > &  sizes,
const Target target = get_jit_target_from_environment(),
const ParamMap param_map = ParamMap::empty_map() 
)
inline

Definition at line 857 of file Func.h.

References infer_input_bounds().

◆ infer_input_bounds() [4/4]

void Halide::Func::infer_input_bounds ( Pipeline::RealizationArg  outputs,
const Target target = get_jit_target_from_environment(),
const ParamMap param_map = ParamMap::empty_map() 
)

◆ compile_to_bitcode() [1/2]

void Halide::Func::compile_to_bitcode ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name,
const Target target = get_target_from_environment() 
)

Statically compile this function to llvm bitcode, with the given filename (which should probably end in .bc), type signature, and C function name (which defaults to the same name as this halide function.

◆ compile_to_bitcode() [2/2]

void Halide::Func::compile_to_bitcode ( const std::string &  filename,
const std::vector< Argument > &  ,
const Target target = get_target_from_environment() 
)

◆ compile_to_llvm_assembly() [1/2]

void Halide::Func::compile_to_llvm_assembly ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name,
const Target target = get_target_from_environment() 
)

Statically compile this function to llvm assembly, with the given filename (which should probably end in .ll), type signature, and C function name (which defaults to the same name as this halide function.

◆ compile_to_llvm_assembly() [2/2]

void Halide::Func::compile_to_llvm_assembly ( const std::string &  filename,
const std::vector< Argument > &  ,
const Target target = get_target_from_environment() 
)

◆ compile_to_object() [1/2]

void Halide::Func::compile_to_object ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name,
const Target target = get_target_from_environment() 
)

Statically compile this function to an object file, with the given filename (which should probably end in .o or .obj), type signature, and C function name (which defaults to the same name as this halide function.

You probably don't want to use this directly; call compile_to_static_library or compile_to_file instead.

◆ compile_to_object() [2/2]

void Halide::Func::compile_to_object ( const std::string &  filename,
const std::vector< Argument > &  ,
const Target target = get_target_from_environment() 
)

◆ compile_to_header()

void Halide::Func::compile_to_header ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name = "",
const Target target = get_target_from_environment() 
)

Emit a header file with the given filename for this function.

The header will define a function with the type signature given by the second argument, and a name given by the third. The name defaults to the same name as this halide function. You don't actually have to have defined this function yet to call this. You probably don't want to use this directly; call compile_to_static_library or compile_to_file instead.

◆ compile_to_assembly() [1/2]

void Halide::Func::compile_to_assembly ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name,
const Target target = get_target_from_environment() 
)

Statically compile this function to text assembly equivalent to the object file generated by compile_to_object.

This is useful for checking what Halide is producing without having to disassemble anything, or if you need to feed the assembly into some custom toolchain to produce an object file (e.g. iOS)

Referenced by Halide::SimdOpCheckTest::check_one().

◆ compile_to_assembly() [2/2]

void Halide::Func::compile_to_assembly ( const std::string &  filename,
const std::vector< Argument > &  ,
const Target target = get_target_from_environment() 
)

◆ compile_to_c()

void Halide::Func::compile_to_c ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name = "",
const Target target = get_target_from_environment() 
)

Statically compile this function to C source code.

This is useful for providing fallback code paths that will compile on many platforms. Vectorization will fail, and parallelization will produce serial code.

◆ compile_to_lowered_stmt()

void Halide::Func::compile_to_lowered_stmt ( const std::string &  filename,
const std::vector< Argument > &  args,
StmtOutputFormat  fmt = Text,
const Target target = get_target_from_environment() 
)

Write out an internal representation of lowered code.

Useful for analyzing and debugging scheduling. Can emit html or plain text.

Examples
tutorial/lesson_03_debugging_1.cpp.

◆ print_loop_nest()

void Halide::Func::print_loop_nest ( )

Write out the loop nests specified by the schedule for this Function.

Helpful for understanding what a schedule is doing.

Examples
tutorial/lesson_05_scheduling_1.cpp, and tutorial/lesson_08_scheduling_2.cpp.

◆ compile_to_file()

void Halide::Func::compile_to_file ( const std::string &  filename_prefix,
const std::vector< Argument > &  args,
const std::string &  fn_name = "",
const Target target = get_target_from_environment() 
)

Compile to object file and header pair, with the given arguments.

The name defaults to the same name as this halide function.

Examples
tutorial/lesson_11_cross_compilation.cpp.

Referenced by Halide::SimdOpCheckTest::check_one().

◆ compile_to_static_library()

void Halide::Func::compile_to_static_library ( const std::string &  filename_prefix,
const std::vector< Argument > &  args,
const std::string &  fn_name = "",
const Target target = get_target_from_environment() 
)

Compile to static-library file and header pair, with the given arguments.

The name defaults to the same name as this halide function.

Examples
tutorial/lesson_10_aot_compilation_generate.cpp.

◆ compile_to_multitarget_static_library()

void Halide::Func::compile_to_multitarget_static_library ( const std::string &  filename_prefix,
const std::vector< Argument > &  args,
const std::vector< Target > &  targets 
)

Compile to static-library file and header pair once for each target; each resulting function will be considered (in order) via halide_can_use_target_features() at runtime, with the first appropriate match being selected for subsequent use.

This is typically useful for specializations that may vary unpredictably by machine (e.g., SSE4.1/AVX/AVX2 on x86 desktop machines). All targets must have identical arch-os-bits.

◆ compile_to_multitarget_object_files()

void Halide::Func::compile_to_multitarget_object_files ( const std::string &  filename_prefix,
const std::vector< Argument > &  args,
const std::vector< Target > &  targets,
const std::vector< std::string > &  suffixes 
)

Like compile_to_multitarget_static_library(), except that the object files are all output as object files (rather than bundled into a static library).

suffixes is an optional list of strings to use for as the suffix for each object file. If nonempty, it must be the same length as targets. (If empty, Target::to_string() will be used for each suffix.)

Note that if targets.size() > 1, the wrapper code (to select the subtarget) will be generated with the filename ${filename_prefix}_wrapper.o

Note that if targets.size() > 1 and no_runtime is not specified, the runtime will be generated with the filename ${filename_prefix}_runtime.o

◆ compile_to_module()

Module Halide::Func::compile_to_module ( const std::vector< Argument > &  args,
const std::string &  fn_name = "",
const Target target = get_target_from_environment() 
)

Store an internal representation of lowered code as a self contained Module suitable for further compilation.

◆ compile_to()

void Halide::Func::compile_to ( const std::map< Output, std::string > &  output_files,
const std::vector< Argument > &  args,
const std::string &  fn_name,
const Target target = get_target_from_environment() 
)

Compile and generate multiple target files with single call.

Deduces target files based on filenames specified in output_files map.

◆ compile_jit()

void Halide::Func::compile_jit ( const Target target = get_jit_target_from_environment())

Eagerly jit compile the function to machine code.

This normally happens on the first call to realize. If you're running your halide pipeline inside time-sensitive code and wish to avoid including the time taken to compile a pipeline, then you can call this ahead of time. Default is to use the Target returned from Halide::get_jit_target_from_environment()

Examples
tutorial/lesson_12_using_the_gpu.cpp.

◆ set_error_handler()

void Halide::Func::set_error_handler ( void(*)(void *, const char *)  handler)

Set the error handler function that be called in the case of runtime errors during halide pipelines.

If you are compiling statically, you can also just define your own function with signature

extern "C" void halide_error(void *user_context, const char *);

This will clobber Halide's version.

◆ set_custom_allocator()

void Halide::Func::set_custom_allocator ( void *(*)(void *, size_t malloc,
void(*)(void *, void *)  free 
)

Set a custom malloc and free for halide to use.

Malloc should return 32-byte aligned chunks of memory, and it should be safe for Halide to read slightly out of bounds (up to 8 bytes before the start or beyond the end). If compiling statically, routines with appropriate signatures can be provided directly

extern "C" void *halide_malloc(void *, size_t)
extern "C" void halide_free(void *, void *)

These will clobber Halide's versions. See HalideRuntime.h for declarations.

◆ set_custom_do_task()

void Halide::Func::set_custom_do_task ( int(*)(void *, int(*)(void *, int, uint8_t *), int, uint8_t *)  custom_do_task)

Set a custom task handler to be called by the parallel for loop.

It is useful to set this if you want to do some additional bookkeeping at the granularity of parallel tasks. The default implementation does this:

extern "C" int halide_do_task(void *user_context,
int (*f)(void *, int, uint8_t *),
int idx, uint8_t *state) {
return f(user_context, idx, state);
}

If you are statically compiling, you can also just define your own version of the above function, and it will clobber Halide's version.

If you're trying to use a custom parallel runtime, you probably don't want to call this. See instead Func::set_custom_do_par_for .

◆ set_custom_do_par_for()

void Halide::Func::set_custom_do_par_for ( int(*)(void *, int(*)(void *, int, uint8_t *), int, int, uint8_t *)  custom_do_par_for)

Set a custom parallel for loop launcher.

Useful if your app already manages a thread pool. The default implementation is equivalent to this:

extern "C" int halide_do_par_for(void *user_context,
int (*f)(void *, int, uint8_t *),
int min, int extent, uint8_t *state) {
int exit_status = 0;
parallel for (int idx = min; idx < min+extent; idx++) {
int job_status = halide_do_task(user_context, f, idx, state);
if (job_status) exit_status = job_status;
}
return exit_status;
}

However, notwithstanding the above example code, if one task fails, we may skip over other tasks, and if two tasks return different error codes, we may select one arbitrarily to return.

If you are statically compiling, you can also just define your own version of the above function, and it will clobber Halide's version.

◆ set_custom_trace()

void Halide::Func::set_custom_trace ( int(*)(void *, const halide_trace_event_t *)  trace_fn)

Set custom routines to call when tracing is enabled.

Call this on the output Func of your pipeline. This then sets custom routines for the entire pipeline, not just calls to this Func.

If you are statically compiling, you can also just define your own versions of the tracing functions (see HalideRuntime.h), and they will clobber Halide's versions.

◆ set_custom_print()

void Halide::Func::set_custom_print ( void(*)(void *, const char *)  handler)

Set the function called to print messages from the runtime.

If you are compiling statically, you can also just define your own function with signature

extern "C" void halide_print(void *user_context, const char *);

This will clobber Halide's version.

◆ jit_handlers()

const Internal::JITHandlers& Halide::Func::jit_handlers ( )

Get a struct containing the currently set custom functions used by JIT.

◆ add_custom_lowering_pass() [1/2]

template<typename T >
void Halide::Func::add_custom_lowering_pass ( T *  pass)
inline

Add a custom pass to be used during lowering.

It is run after all other lowering passes. Can be used to verify properties of the lowered Stmt, instrument it with extra code, or otherwise modify it. The Func takes ownership of the pass, and will call delete on it when the Func goes out of scope. So don't pass a stack object, or share pass instances between multiple Funcs.

Definition at line 1121 of file Func.h.

◆ add_custom_lowering_pass() [2/2]

void Halide::Func::add_custom_lowering_pass ( Internal::IRMutator pass,
std::function< void()>  deleter 
)

Add a custom pass to be used during lowering, with the function that will be called to delete it also passed in.

Set it to nullptr if you wish to retain ownership of the object.

◆ clear_custom_lowering_passes()

void Halide::Func::clear_custom_lowering_passes ( )

Remove all previously-set custom lowering passes.

◆ custom_lowering_passes()

const std::vector<CustomLoweringPass>& Halide::Func::custom_lowering_passes ( )

Get the custom lowering passes.

◆ debug_to_file()

void Halide::Func::debug_to_file ( const std::string &  filename)

When this function is compiled, include code that dumps its values to a file after it is realized, for the purpose of debugging.

If filename ends in ".tif" or ".tiff" (case insensitive) the file is in TIFF format and can be read by standard tools. Oherwise, the file format is as follows:

All data is in the byte-order of the target platform. First, a 20 byte-header containing four 32-bit ints, giving the extents of the first four dimensions. Dimensions beyond four are folded into the fourth. Then, a fifth 32-bit int giving the data type of the function. The typecodes are given by: float = 0, double = 1, uint8_t = 2, int8_t = 3, uint16_t = 4, int16_t = 5, uint32_t = 6, int32_t = 7, uint64_t = 8, int64_t = 9. The data follows the header, as a densely packed array of the given size and the given type. If given the extension .tmp, this file format can be natively read by the program ImageStack.

◆ name()

const std::string& Halide::Func::name ( ) const

The name of this function, either given during construction, or automatically generated.

◆ args()

std::vector<Var> Halide::Func::args ( ) const

Get the pure arguments.

Referenced by operator()(), reorder(), and reorder_storage().

◆ value()

Expr Halide::Func::value ( ) const

The right-hand-side value of the pure definition of this function.

Causes an error if there's no pure definition, or if the function is defined to return multiple values.

◆ values()

Tuple Halide::Func::values ( ) const

The values returned by this function.

An error if the function has not been been defined. Returns a Tuple with one element for functions defined to return a single value.

◆ defined()

bool Halide::Func::defined ( ) const

Does this function have at least a pure definition.

◆ update_args()

const std::vector<Expr>& Halide::Func::update_args ( int  idx = 0) const

Get the left-hand-side of the update definition.

An empty vector if there's no update definition. If there are multiple update definitions for this function, use the argument to select which one you want.

◆ update_value()

Expr Halide::Func::update_value ( int  idx = 0) const

Get the right-hand-side of an update definition.

An error if there's no update definition. If there are multiple update definitions for this function, use the argument to select which one you want.

◆ update_values()

Tuple Halide::Func::update_values ( int  idx = 0) const

Get the right-hand-side of an update definition for functions that returns multiple values.

An error if there's no update definition. Returns a Tuple with one element for functions that return a single value.

◆ rvars()

std::vector<RVar> Halide::Func::rvars ( int  idx = 0) const

Get the RVars of the reduction domain for an update definition, if there is one.

◆ has_update_definition()

bool Halide::Func::has_update_definition ( ) const

Does this function have at least one update definition?

◆ num_update_definitions()

int Halide::Func::num_update_definitions ( ) const

How many update definitions does this function have?

◆ is_extern()

bool Halide::Func::is_extern ( ) const

Is this function an external stage? That is, was it defined using define_extern?

◆ define_extern() [1/5]

void Halide::Func::define_extern ( const std::string &  function_name,
const std::vector< ExternFuncArgument > &  params,
Type  t,
int  dimensionality,
NameMangling  mangling = NameMangling::Default,
DeviceAPI  device_api = DeviceAPI::Host 
)
inline

Add an extern definition for this Func.

This lets you define a Func that represents an external pipeline stage. You can, for example, use it to wrap a call to an extern library such as fftw.

Definition at line 1216 of file Func.h.

References Halide::Internal::make_argument_list().

Referenced by define_extern().

◆ define_extern() [2/5]

void Halide::Func::define_extern ( const std::string &  function_name,
const std::vector< ExternFuncArgument > &  params,
const std::vector< Type > &  types,
int  dimensionality,
NameMangling  mangling 
)
inline

Definition at line 1226 of file Func.h.

References define_extern(), and Halide::Internal::make_argument_list().

◆ define_extern() [3/5]

void Halide::Func::define_extern ( const std::string &  function_name,
const std::vector< ExternFuncArgument > &  params,
const std::vector< Type > &  types,
int  dimensionality,
NameMangling  mangling = NameMangling::Default,
DeviceAPI  device_api = DeviceAPI::Host 
)
inline

Definition at line 1234 of file Func.h.

References define_extern(), and Halide::Internal::make_argument_list().

◆ define_extern() [4/5]

void Halide::Func::define_extern ( const std::string &  function_name,
const std::vector< ExternFuncArgument > &  params,
Type  t,
const std::vector< Var > &  arguments,
NameMangling  mangling = NameMangling::Default,
DeviceAPI  device_api = DeviceAPI::Host 
)
inline

Definition at line 1244 of file Func.h.

References define_extern().

◆ define_extern() [5/5]

void Halide::Func::define_extern ( const std::string &  function_name,
const std::vector< ExternFuncArgument > &  params,
const std::vector< Type > &  types,
const std::vector< Var > &  arguments,
NameMangling  mangling = NameMangling::Default,
DeviceAPI  device_api = DeviceAPI::Host 
)

◆ output_types()

const std::vector<Type>& Halide::Func::output_types ( ) const

Get the types of the outputs of this Func.

Examples
tutorial/lesson_14_types.cpp.

◆ outputs()

int Halide::Func::outputs ( ) const

Get the number of outputs of this Func.

Corresponds to the size of the Tuple this Func was defined to return.

◆ extern_function_name()

const std::string& Halide::Func::extern_function_name ( ) const

Get the name of the extern function called for an extern definition.

◆ dimensions()

int Halide::Func::dimensions ( ) const

The dimensionality (number of arguments) of this function.

Zero if the function is not yet defined.

◆ operator()() [1/4]

FuncRef Halide::Func::operator() ( std::vector< Var ) const

Construct either the left-hand-side of a definition, or a call to a functions that happens to only contain vars as arguments.

If the function has already been defined, and fewer arguments are given than the function has dimensions, then enough implicit vars are added to the end of the argument list to make up the difference (see Var::implicit)

Referenced by operator()().

◆ operator()() [2/4]

template<typename... Args>
HALIDE_NO_USER_CODE_INLINE std::enable_if<Internal::all_are_convertible<Var, Args...>::value, FuncRef>::type Halide::Func::operator() ( Args &&...  args) const
inline

Definition at line 1287 of file Func.h.

References args(), and operator()().

◆ operator()() [3/4]

FuncRef Halide::Func::operator() ( std::vector< Expr ) const

Either calls to the function, or the left-hand-side of an update definition (see RDom).

If the function has already been defined, and fewer arguments are given than the function has dimensions, then enough implicit vars are added to the end of the argument list to make up the difference. (see Var::implicit)

◆ operator()() [4/4]

template<typename... Args>
HALIDE_NO_USER_CODE_INLINE std::enable_if<Internal::all_are_convertible<Expr, Args...>::value, FuncRef>::type Halide::Func::operator() ( const Expr x,
Args &&...  args 
) const
inline

Definition at line 1304 of file Func.h.

References args().

◆ in() [1/3]

Func Halide::Func::in ( const Func f)

Creates and returns a new identity Func that wraps this Func.

During compilation, Halide replaces all calls to this Func done by 'f' with calls to the wrapper. If this Func is already wrapped for use in 'f', will return the existing wrapper.

For example, g.in(f) would rewrite a pipeline like this:

g(x, y) = ...
f(x, y) = ... g(x, y) ...

into a pipeline like this:

g(x, y) = ...
g_wrap(x, y) = g(x, y)
f(x, y) = ... g_wrap(x, y)

This has a variety of uses. You can use it to schedule this Func differently in the different places it is used:

g(x, y) = ...
f1(x, y) = ... g(x, y) ...
f2(x, y) = ... g(x, y) ...
g.in(f1).compute_at(f1, y).vectorize(x, 8);
g.in(f2).compute_at(f2, x).unroll(x);

You can also use it to stage loads from this Func via some intermediate buffer (perhaps on the stack as in test/performance/block_transpose.cpp, or in shared GPU memory as in test/performance/wrap.cpp). In this we compute the wrapper at tiles of the consuming Funcs like so:

g.compute_root()...
g.in(f).compute_at(f, tiles)...

Func::in() can also be used to compute pieces of a Func into a smaller scratch buffer (perhaps on the GPU) and then copy them into a larger output buffer one tile at a time. See apps/interpolate/interpolate.cpp for an example of this. In this case we compute the Func at tiles of its own wrapper:

f.in(g).compute_root().gpu_tile(...)...
f.compute_at(f.in(g), tiles)...

A similar use of Func::in() wrapping Funcs with multiple update stages in a pure wrapper. The following code:

f(x, y) = x + y;
f(x, y) += 5;
g(x, y) = f(x, y);
f.compute_root();

Is equivalent to:

for y:
for x:
f(x, y) = x + y;
for y:
for x:
f(x, y) += 5
for y:
for x:
g(x, y) = f(x, y)

using Func::in(), we can write:

f(x, y) = x + y;
f(x, y) += 5;
g(x, y) = f(x, y);
f.in(g).compute_root();

which instead produces:

for y:
for x:
f(x, y) = x + y;
f(x, y) += 5
f_wrap(x, y) = f(x, y)
for y:
for x:
g(x, y) = f_wrap(x, y)

Referenced by Halide::Internal::GeneratorInput_Buffer< T >::in(), and Halide::Internal::GeneratorInput_Func< T >::in().

◆ in() [2/3]

Func Halide::Func::in ( const std::vector< Func > &  fs)

Create and return an identity wrapper shared by all the Funcs in 'fs'.

If any of the Funcs in 'fs' already have a custom wrapper, this will throw an error.

◆ in() [3/3]

Func Halide::Func::in ( )

Create and return a global identity wrapper, which wraps all calls to this Func by any other Func.

If a global wrapper already exists, returns it. The global identity wrapper is only used by callers for which no custom wrapper has been specified.

◆ clone_in() [1/2]

Func Halide::Func::clone_in ( const Func f)

Similar to Func::in; however, instead of replacing the call to this Func with an identity Func that refers to it, this replaces the call with a clone of this Func.

For example, f.clone_in(g) would rewrite a pipeline like this:

f(x, y) = x + y;
g(x, y) = f(x, y) + 2;
h(x, y) = f(x, y) - 3;

into a pipeline like this:

f(x, y) = x + y;
f_clone(x, y) = x + y;
g(x, y) = f_clone(x, y) + 2;
h(x, y) = f(x, y) - 3;

Referenced by Halide::SimdOpCheckTest::check_one().

◆ clone_in() [2/2]

Func Halide::Func::clone_in ( const std::vector< Func > &  fs)

◆ copy_to_device()

Func Halide::Func::copy_to_device ( DeviceAPI  d = DeviceAPI::Default_GPU)

Declare that this function should be implemented by a call to halide_buffer_copy with the given target device API.

Asserts that the Func has a pure definition which is a simple call to a single input, and no update definitions. The wrapper Funcs returned by in() are suitable candidates. Consumes all pure variables, and rewrites the Func to have an extern definition that calls halide_buffer_copy.

◆ copy_to_host()

Func Halide::Func::copy_to_host ( )

Declare that this function should be implemented by a call to halide_buffer_copy with a NULL target device API.

Equivalent to copy_to_device(DeviceAPI::Host). Asserts that the Func has a pure definition which is a simple call to a single input, and no update definitions. The wrapper Funcs returned by in() are suitable candidates. Consumes all pure variables, and rewrites the Func to have an extern definition that calls halide_buffer_copy.

Note that if the source Func is already valid in host memory, this compiles to code that does the minimum number of calls to memcpy.

◆ split()

Func& Halide::Func::split ( const VarOrRVar old,
const VarOrRVar outer,
const VarOrRVar inner,
const Expr factor,
TailStrategy  tail = TailStrategy::Auto 
)

Split a dimension into inner and outer subdimensions with the given names, where the inner dimension iterates from 0 to factor-1.

The inner and outer subdimensions can then be dealt with using the other scheduling calls. It's ok to reuse the old variable name as either the inner or outer variable. The final argument specifies how the tail should be handled if the split factor does not provably divide the extent.

Examples
tutorial/lesson_05_scheduling_1.cpp, tutorial/lesson_08_scheduling_2.cpp, tutorial/lesson_09_update_definitions.cpp, and tutorial/lesson_12_using_the_gpu.cpp.

Referenced by do_cost_model_schedule().

◆ fuse()

Func& Halide::Func::fuse ( const VarOrRVar inner,
const VarOrRVar outer,
const VarOrRVar fused 
)

Join two dimensions into a single fused dimenion.

The fused dimension covers the product of the extents of the inner and outer dimensions given.

Examples
tutorial/lesson_05_scheduling_1.cpp.

Referenced by do_cost_model_schedule().

◆ serial()

Func& Halide::Func::serial ( const VarOrRVar var)

Mark a dimension to be traversed serially.

This is the default.

◆ parallel() [1/2]

◆ parallel() [2/2]

Func& Halide::Func::parallel ( const VarOrRVar var,
const Expr task_size,
TailStrategy  tail = TailStrategy::Auto 
)

Split a dimension by the given task_size, and the parallelize the outer dimension.

This creates parallel tasks that have size task_size. After this call, var refers to the outer dimension of the split. The inner dimension has a new anonymous name. If you wish to mutate it, or schedule with respect to it, do the split manually.

◆ vectorize() [1/2]

Func& Halide::Func::vectorize ( const VarOrRVar var)

Mark a dimension to be computed all-at-once as a single vector.

The dimension should have constant extent - e.g. because it is the inner dimension following a split by a constant factor. For most uses of vectorize you want the two argument form. The variable to be vectorized should be the innermost one.

Examples
tutorial/lesson_05_scheduling_1.cpp, tutorial/lesson_08_scheduling_2.cpp, tutorial/lesson_09_update_definitions.cpp, tutorial/lesson_10_aot_compilation_generate.cpp, tutorial/lesson_11_cross_compilation.cpp, and tutorial/lesson_12_using_the_gpu.cpp.

Referenced by Halide::SimdOpCheckTest::check_one(), and do_cost_model_schedule().

◆ unroll() [1/2]

Func& Halide::Func::unroll ( const VarOrRVar var)

Mark a dimension to be completely unrolled.

The dimension should have constant extent - e.g. because it is the inner dimension following a split by a constant factor. For most uses of unroll you want the two-argument form.

Examples
tutorial/lesson_05_scheduling_1.cpp, and tutorial/lesson_12_using_the_gpu.cpp.

◆ vectorize() [2/2]

Func& Halide::Func::vectorize ( const VarOrRVar var,
const Expr factor,
TailStrategy  tail = TailStrategy::Auto 
)

Split a dimension by the given factor, then vectorize the inner dimension.

This is how you vectorize a loop of unknown size. The variable to be vectorized should be the innermost one. After this call, var refers to the outer dimension of the split. 'factor' must be an integer.

◆ unroll() [2/2]

Func& Halide::Func::unroll ( const VarOrRVar var,
const Expr factor,
TailStrategy  tail = TailStrategy::Auto 
)

Split a dimension by the given factor, then unroll the inner dimension.

This is how you unroll a loop of unknown size by some constant factor. After this call, var refers to the outer dimension of the split. 'factor' must be an integer.

◆ bound()

Func& Halide::Func::bound ( const Var var,
Expr  min,
Expr  extent 
)

Statically declare that the range over which a function should be evaluated is given by the second and third arguments.

This can let Halide perform some optimizations. E.g. if you know there are going to be 4 color channels, you can completely vectorize the color channel dimension without the overhead of splitting it up. If bounds inference decides that it requires more of this function than the bounds you have stated, a runtime error will occur when you try to run your pipeline.

Examples
tutorial/lesson_12_using_the_gpu.cpp.

Referenced by Halide::SimdOpCheckTest::check_one().

◆ set_estimate()

Func& Halide::Func::set_estimate ( const Var var,
const Expr min,
const Expr extent 
)

Statically declare the range over which the function will be evaluated in the general case.

This provides a basis for the auto scheduler to make trade-offs and scheduling decisions. The auto generated schedules might break when the sizes of the dimensions are very different from the estimates specified. These estimates are used only by the auto scheduler if the function is a pipeline output.

Referenced by estimate(), and Halide::Internal::GeneratorOutput_Func< T >::set_estimate().

◆ estimate()

Func& Halide::Func::estimate ( const Var var,
const Expr min,
const Expr extent 
)
inline

Definition at line 1533 of file Func.h.

References Halide::min(), and set_estimate().

◆ set_estimates()

Func& Halide::Func::set_estimates ( const Region estimates)

Set (min, extent) estimates for all dimensions in the Func at once; this is equivalent to calling set_estimate(args()[n], min, extent) repeatedly, but slightly terser.

The size of the estimates vector must match the dimensionality of the Func.

Referenced by Halide::Internal::GeneratorOutput_Func< T >::set_estimates().

◆ align_bounds()

Func& Halide::Func::align_bounds ( const Var var,
Expr  modulus,
Expr  remainder = 0 
)

Expand the region computed so that the min coordinates is congruent to 'remainder' modulo 'modulus', and the extent is a multiple of 'modulus'.

For example, f.align_bounds(x, 2) forces the min and extent realized to be even, and calling f.align_bounds(x, 2, 1) forces the min to be odd and the extent to be even. The region computed always contains the region that would have been computed without this directive, so no assertions are injected.

◆ bound_extent()

Func& Halide::Func::bound_extent ( const Var var,
Expr  extent 
)

Bound the extent of a Func's realization, but not its min.

This means the dimension can be unrolled or vectorized even when its min is not fixed (for example because it is compute_at tiles of another Func). This can also be useful for forcing a function's allocation to be a fixed size, which often means it can go on the stack.

◆ tile() [1/5]

Func& Halide::Func::tile ( const VarOrRVar x,
const VarOrRVar y,
const VarOrRVar xo,
const VarOrRVar yo,
const VarOrRVar xi,
const VarOrRVar yi,
const Expr xfactor,
const Expr yfactor,
TailStrategy  tail = TailStrategy::Auto 
)

Split two dimensions at once by the given factors, and then reorder the resulting dimensions to be xi, yi, xo, yo from innermost outwards.

This gives a tiled traversal.

Examples
tutorial/lesson_05_scheduling_1.cpp, and tutorial/lesson_08_scheduling_2.cpp.

◆ tile() [2/5]

Func& Halide::Func::tile ( const VarOrRVar x,
const VarOrRVar y,
const VarOrRVar xi,
const VarOrRVar yi,
const Expr xfactor,
const Expr yfactor,
TailStrategy  tail = TailStrategy::Auto 
)

A shorter form of tile, which reuses the old variable names as the new outer dimensions.

◆ tile() [3/5]

Func& Halide::Func::tile ( const std::vector< VarOrRVar > &  previous,
const std::vector< VarOrRVar > &  outers,
const std::vector< VarOrRVar > &  inners,
const std::vector< Expr > &  factors,
const std::vector< TailStrategy > &  tails 
)

A more general form of tile, which defines tiles of any dimensionality.

◆ tile() [4/5]

Func& Halide::Func::tile ( const std::vector< VarOrRVar > &  previous,
const std::vector< VarOrRVar > &  outers,
const std::vector< VarOrRVar > &  inners,
const std::vector< Expr > &  factors,
TailStrategy  tail = TailStrategy::Auto 
)

The generalized tile, with a single tail strategy to apply to all vars.

◆ tile() [5/5]

Func& Halide::Func::tile ( const std::vector< VarOrRVar > &  previous,
const std::vector< VarOrRVar > &  inners,
const std::vector< Expr > &  factors,
TailStrategy  tail = TailStrategy::Auto 
)

Generalized tiling, reusing the previous names as the outer names.

◆ reorder() [1/2]

Func& Halide::Func::reorder ( const std::vector< VarOrRVar > &  vars)

Reorder variables to have the given nesting order, from innermost out.

Examples
tutorial/lesson_05_scheduling_1.cpp, and tutorial/lesson_12_using_the_gpu.cpp.

Referenced by do_cost_model_schedule(), and reorder().

◆ reorder() [2/2]

template<typename... Args>
HALIDE_NO_USER_CODE_INLINE std::enable_if<Internal::all_are_convertible<VarOrRVar, Args...>::value, Func &>::type Halide::Func::reorder ( const VarOrRVar x,
const VarOrRVar y,
Args &&...  args 
)
inline

Definition at line 1603 of file Func.h.

References args(), and reorder().

◆ rename()

Func& Halide::Func::rename ( const VarOrRVar old_name,
const VarOrRVar new_name 
)

Rename a dimension.

Equivalent to split with a inner size of one.

◆ allow_race_conditions()

Func& Halide::Func::allow_race_conditions ( )

Specify that race conditions are permitted for this Func, which enables parallelizing over RVars even when Halide cannot prove that it is safe to do so.

Use this with great caution, and only if you can prove to yourself that this is safe, as it may result in a non-deterministic routine that returns different values at different times or on different machines.

◆ atomic()

Func& Halide::Func::atomic ( bool  override_associativity_test = false)

Issue atomic updates for this Func.

This allows parallelization on associative RVars. The function throws a compile error when Halide fails to prove associativity. Use override_associativity_test to disable the associativity test if you believe the function is associative or the order of reduction variable execution does not matter. Halide compiles this into hardware atomic operations whenever possible, and falls back to a mutex lock per storage element if it is impossible to atomically update. There are three possible outcomes of the compiled code: atomic add, compare-and-swap loop, and mutex lock. For example:

hist(x) = 0; hist(im(r)) += 1; hist.compute_root(); hist.update().atomic().parallel();

will be compiled to atomic add operations.

hist(x) = 0; hist(im(r)) = min(hist(im(r)) + 1, 100); hist.compute_root(); hist.update().atomic().parallel();

will be compiled to compare-and-swap loops.

arg_max() = {0, im(0)}; Expr old_index = arg_max()[0]; Expr old_max = arg_max()[1]; Expr new_index = select(old_max < im(r), r, old_index); Expr new_max = max(im(r), old_max); arg_max() = {new_index, new_max}; arg_max.compute_root(); arg_max.update().atomic().parallel();

will be compiled to updates guarded by a mutex lock, since it is impossible to atomically update two different locations.

Currently the atomic operation is supported by x86, CUDA, and OpenCL backends. Compiling to other backends results in a compile error. If an operation is compiled into a mutex lock, and is vectorized or is compiled to CUDA or OpenCL, it also results in a compile error, since per-element mutex lock on vectorized operation leads to a deadlock. Vectorization of predicated RVars (through rdom.where()) on CPU is also unsupported yet (see https://github.com/halide/Halide/issues/4298). 8-bit and 16-bit atomics on GPU are also not supported.

◆ specialize()

Stage Halide::Func::specialize ( const Expr condition)

Specialize a Func.

This creates a special-case version of the Func where the given condition is true. The most effective conditions are those of the form param == value, and boolean Params. Consider a simple example:

f(x) = x + select(cond, 0, 1);
f.compute_root();

This is equivalent to:

for (int x = 0; x < width; x++) {
f[x] = x + (cond ? 0 : 1);
}

Adding the scheduling directive:

f.specialize(cond)

makes it equivalent to:

if (cond) {
for (int x = 0; x < width; x++) {
f[x] = x;
}
} else {
for (int x = 0; x < width; x++) {
f[x] = x + 1;
}
}

Note that the inner loops have been simplified. In the first path Halide knows that cond is true, and in the second path Halide knows that it is false.

The specialized version gets its own schedule, which inherits every directive made about the parent Func's schedule so far except for its specializations. This method returns a handle to the new schedule. If you wish to retrieve the specialized sub-schedule again later, you can call this method with the same condition. Consider the following example of scheduling the specialized version:

f(x) = x;
f.compute_root();
f.specialize(width > 1).unroll(x, 2);

Assuming for simplicity that width is even, this is equivalent to:

if (width > 1) {
for (int x = 0; x < width/2; x++) {
f[2*x] = 2*x;
f[2*x + 1] = 2*x + 1;
}
} else {
for (int x = 0; x < width/2; x++) {
f[x] = x;
}
}

For this case, it may be better to schedule the un-specialized case instead:

f(x) = x;
f.compute_root();
f.specialize(width == 1); // Creates a copy of the schedule so far.
f.unroll(x, 2); // Only applies to the unspecialized case.

This is equivalent to:

if (width == 1) {
f[0] = 0;
} else {
for (int x = 0; x < width/2; x++) {
f[2*x] = 2*x;
f[2*x + 1] = 2*x + 1;
}
}

This can be a good way to write a pipeline that splits, vectorizes, or tiles, but can still handle small inputs.

If a Func has several specializations, the first matching one will be used, so the order in which you define specializations is significant. For example:

f(x) = x + select(cond1, a, b) - select(cond2, c, d);
f.specialize(cond1);
f.specialize(cond2);

is equivalent to:

if (cond1) {
for (int x = 0; x < width; x++) {
f[x] = x + a - (cond2 ? c : d);
}
} else if (cond2) {
for (int x = 0; x < width; x++) {
f[x] = x + b - c;
}
} else {
for (int x = 0; x < width; x++) {
f[x] = x + b - d;
}
}

Specializations may in turn be specialized, which creates a nested if statement in the generated code.

f(x) = x + select(cond1, a, b) - select(cond2, c, d);
f.specialize(cond1).specialize(cond2);

This is equivalent to:

if (cond1) {
if (cond2) {
for (int x = 0; x < width; x++) {
f[x] = x + a - c;
}
} else {
for (int x = 0; x < width; x++) {
f[x] = x + a - d;
}
}
} else {
for (int x = 0; x < width; x++) {
f[x] = x + b - (cond2 ? c : d);
}
}

To create a 4-way if statement that simplifies away all of the ternary operators above, you could say:

f.specialize(cond1).specialize(cond2);
f.specialize(cond2);

or

f.specialize(cond1 && cond2);
f.specialize(cond1);
f.specialize(cond2);

Any prior Func which is compute_at some variable of this Func gets separately included in all paths of the generated if statement. The Var in the compute_at call to must exist in all paths, but it may have been generated via a different path of splits, fuses, and renames. This can be used somewhat creatively. Consider the following code:

g(x, y) = 8*x;
f(x, y) = g(x, y) + 1;
f.compute_root().specialize(cond);
Var g_loop;
f.specialize(cond).rename(y, g_loop);
f.rename(x, g_loop);
g.compute_at(f, g_loop);

When cond is true, this is equivalent to g.compute_at(f,y). When it is false, this is equivalent to g.compute_at(f,x).

◆ specialize_fail()

void Halide::Func::specialize_fail ( const std::string &  message)

Add a specialization to a Func that always terminates execution with a call to halide_error().

By itself, this is of limited use, but can be useful to terminate chains of specialize() calls where no "default" case is expected (thus avoiding unnecessary code generation).

For instance, say we want to optimize a pipeline to process images in planar and interleaved format; we might typically do something like:

ImageParam im(UInt(8), 3);
Func f = do_something_with(im);
f.specialize(im.dim(0).stride() == 1).vectorize(x, 8); // planar
f.specialize(im.dim(2).stride() == 1).reorder(c, x, y).vectorize(c); // interleaved

This code will vectorize along rows for the planar case, and across pixel components for the interleaved case... but there is an implicit "else" for the unhandled cases, which generates unoptimized code. If we never anticipate passing any other sort of images to this, we code streamline our code by adding specialize_fail():

ImageParam im(UInt(8), 3);
Func f = do_something(im);
f.specialize(im.dim(0).stride() == 1).vectorize(x, 8); // planar
f.specialize(im.dim(2).stride() == 1).reorder(c, x, y).vectorize(c); // interleaved
f.specialize_fail("Unhandled image format");

Conceptually, this produces codes like:

if (im.dim(0).stride() == 1) {
do_something_planar();
} else if (im.dim(2).stride() == 1) {
do_something_interleaved();
} else {
halide_error("Unhandled image format");
}

Note that calling specialize_fail() terminates the specialization chain for a given Func; you cannot create new specializations for the Func afterwards (though you can retrieve handles to previous specializations).

◆ gpu_threads() [1/3]

Func& Halide::Func::gpu_threads ( const VarOrRVar thread_x,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU thread indices.

This is useful if you compute a producer function within the block indices of a consumer function, and want to control how that function's dimensions map to GPU threads. If the selected target is not an appropriate GPU, this just marks those dimensions as parallel.

Examples
tutorial/lesson_12_using_the_gpu.cpp.

◆ gpu_threads() [2/3]

Func& Halide::Func::gpu_threads ( const VarOrRVar thread_x,
const VarOrRVar thread_y,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ gpu_threads() [3/3]

Func& Halide::Func::gpu_threads ( const VarOrRVar thread_x,
const VarOrRVar thread_y,
const VarOrRVar thread_z,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ gpu_lanes()

Func& Halide::Func::gpu_lanes ( const VarOrRVar thread_x,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

The given dimension corresponds to the lanes in a GPU warp.

GPU warp lanes are distinguished from GPU threads by the fact that all warp lanes run together in lockstep, which permits lightweight communication of data from one lane to another.

◆ gpu_single_thread()

Func& Halide::Func::gpu_single_thread ( DeviceAPI  device_api = DeviceAPI::Default_GPU)

Tell Halide to run this stage using a single gpu thread and block.

This is not an efficient use of your GPU, but it can be useful to avoid copy-back for intermediate update stages that touch a very small part of your Func.

Referenced by Halide::Internal::schedule_scalar().

◆ gpu_blocks() [1/3]

Func& Halide::Func::gpu_blocks ( const VarOrRVar block_x,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU block indices.

This is useful for scheduling stages that will run serially within each GPU block. If the selected target is not ptx, this just marks those dimensions as parallel.

Examples
tutorial/lesson_12_using_the_gpu.cpp.

◆ gpu_blocks() [2/3]

Func& Halide::Func::gpu_blocks ( const VarOrRVar block_x,
const VarOrRVar block_y,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ gpu_blocks() [3/3]

Func& Halide::Func::gpu_blocks ( const VarOrRVar block_x,
const VarOrRVar block_y,
const VarOrRVar block_z,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ gpu() [1/3]

Func& Halide::Func::gpu ( const VarOrRVar block_x,
const VarOrRVar thread_x,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU block indices and thread indices.

If the selected target is not ptx, these just mark the given dimensions as parallel. The dimensions are consumed by this call, so do all other unrolling, reordering, etc first.

◆ gpu() [2/3]

Func& Halide::Func::gpu ( const VarOrRVar block_x,
const VarOrRVar block_y,
const VarOrRVar thread_x,
const VarOrRVar thread_y,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ gpu() [3/3]

Func& Halide::Func::gpu ( const VarOrRVar block_x,
const VarOrRVar block_y,
const VarOrRVar block_z,
const VarOrRVar thread_x,
const VarOrRVar thread_y,
const VarOrRVar thread_z,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ gpu_tile() [1/6]

Func& Halide::Func::gpu_tile ( const VarOrRVar x,
const VarOrRVar bx,
const VarOrRVar tx,
const Expr x_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

Examples
tutorial/lesson_12_using_the_gpu.cpp.

◆ gpu_tile() [2/6]

Func& Halide::Func::gpu_tile ( const VarOrRVar x,
const VarOrRVar tx,
const Expr x_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ gpu_tile() [3/6]

Func& Halide::Func::gpu_tile ( const VarOrRVar x,
const VarOrRVar y,
const VarOrRVar bx,
const VarOrRVar by,
const VarOrRVar tx,
const VarOrRVar ty,
const Expr x_size,
const Expr y_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ gpu_tile() [4/6]

Func& Halide::Func::gpu_tile ( const VarOrRVar x,
const VarOrRVar y,
const VarOrRVar tx,
const VarOrRVar ty,
const Expr x_size,
const Expr y_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ gpu_tile() [5/6]

Func& Halide::Func::gpu_tile ( const VarOrRVar x,
const VarOrRVar y,
const VarOrRVar z,
const VarOrRVar bx,
const VarOrRVar by,
const VarOrRVar bz,
const VarOrRVar tx,
const VarOrRVar ty,
const VarOrRVar tz,
const Expr x_size,
const Expr y_size,
const Expr z_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ gpu_tile() [6/6]

Func& Halide::Func::gpu_tile ( const VarOrRVar x,
const VarOrRVar y,
const VarOrRVar z,
const VarOrRVar tx,
const VarOrRVar ty,
const VarOrRVar tz,
const Expr x_size,
const Expr y_size,
const Expr z_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

◆ shader()

Func& Halide::Func::shader ( const Var x,
const Var y,
const Var c,
DeviceAPI  device_api 
)

Schedule for execution using coordinate-based hardware api.

GLSL is an example of this. Conceptually, this is similar to parallelization over 'x' and 'y' (since GLSL shaders compute individual output pixels in parallel) and vectorization over 'c' (since GLSL/RS implicitly vectorizes the color channel).

◆ glsl()

Func& Halide::Func::glsl ( const Var x,
const Var y,
const Var c 
)

Schedule for execution as GLSL kernel.

◆ hexagon()

Func& Halide::Func::hexagon ( const VarOrRVar x = Var::outermost())

Schedule for execution on Hexagon.

When a loop is marked with Hexagon, that loop is executed on a Hexagon DSP.

Referenced by Halide::Internal::schedule_scalar().

◆ prefetch() [1/3]

Func& Halide::Func::prefetch ( const Func f,
const VarOrRVar var,
Expr  offset = 1,
PrefetchBoundStrategy  strategy = PrefetchBoundStrategy::GuardWithIf 
)

Prefetch data written to or read from a Func or an ImageParam by a subsequent loop iteration, at an optionally specified iteration offset.

'var' specifies at which loop level the prefetch calls should be inserted. The final argument specifies how prefetch of region outside bounds should be handled.

For example, consider this pipeline:

Func f, g;
Var x, y;
f(x, y) = x + y;
g(x, y) = 2 * f(x, y);

The following schedule:

f.compute_root();

will inject prefetch call at the innermost loop of 'g' and generate the following loop nest: for y = ... for x = ... f(x, y) = x + y for y = .. for x = ... prefetch(&f[x + 2, y], 1, 16); g(x, y) = 2 * f(x, y)

Referenced by prefetch().

◆ prefetch() [2/3]

Func& Halide::Func::prefetch ( const Internal::Parameter param,
const VarOrRVar var,
Expr  offset = 1,
PrefetchBoundStrategy  strategy = PrefetchBoundStrategy::GuardWithIf 
)

◆ prefetch() [3/3]

template<typename T >
Func& Halide::Func::prefetch ( const T &  image,
VarOrRVar  var,
Expr  offset = 1,
PrefetchBoundStrategy  strategy = PrefetchBoundStrategy::GuardWithIf 
)
inline

Definition at line 2013 of file Func.h.

References prefetch().

◆ reorder_storage() [1/3]

Func& Halide::Func::reorder_storage ( const std::vector< Var > &  dims)

Specify how the storage for the function is laid out.

These calls let you specify the nesting order of the dimensions. For example, foo.reorder_storage(y, x) tells Halide to use column-major storage for any realizations of foo, without changing how you refer to foo in the code. You may want to do this if you intend to vectorize across y. When representing color images, foo.reorder_storage(c, x, y) specifies packed storage (red, green, and blue values adjacent in memory), and foo.reorder_storage(x, y, c) specifies planar storage (entire red, green, and blue images one after the other in memory).

If you leave out some dimensions, those remain in the same positions in the nesting order while the specified variables are reordered around them.

◆ reorder_storage() [2/3]

Func& Halide::Func::reorder_storage ( const Var x,
const Var y 
)

◆ reorder_storage() [3/3]

template<typename... Args>
HALIDE_NO_USER_CODE_INLINE std::enable_if<Internal::all_are_convertible<Var, Args...>::value, Func &>::type Halide::Func::reorder_storage ( const Var x,
const Var y,
Args &&...  args 
)
inline

Definition at line 2039 of file Func.h.

References args().

◆ align_storage()

Func& Halide::Func::align_storage ( const Var dim,
const Expr alignment 
)

Pad the storage extent of a particular dimension of realizations of this function up to be a multiple of the specified alignment.

This guarantees that the strides for the dimensions stored outside of dim will be multiples of the specified alignment, where the strides and alignment are measured in numbers of elements.

For example, to guarantee that a function foo(x, y, c) representing an image has scanlines starting on offsets aligned to multiples of 16, use foo.align_storage(x, 16).

◆ fold_storage()

Func& Halide::Func::fold_storage ( const Var dim,
const Expr extent,
bool  fold_forward = true 
)

Store realizations of this function in a circular buffer of a given extent.

This is more efficient when the extent of the circular buffer is a power of 2. If the fold factor is too small, or the dimension is not accessed monotonically, the pipeline will generate an error at runtime.

The fold_forward option indicates that the new values of the producer are accessed by the consumer in a monotonically increasing order. Folding storage of producers is also supported if the new values are accessed in a monotonically decreasing order by setting fold_forward to false.

For example, consider the pipeline:

Func f, g;
Var x, y;
g(x, y) = x*y;
f(x, y) = g(x, y) + g(x, y+1);

If we schedule f like so:

g.compute_at(f, y).store_root().fold_storage(y, 2);

Then g will be computed at each row of f and stored in a buffer with an extent in y of 2, alternately storing each computed row of g in row y=0 or y=1.

◆ compute_at() [1/3]

Func& Halide::Func::compute_at ( const Func f,
const Var var 
)

Compute this function as needed for each unique value of the given var for the given calling function f.

For example, consider the simple pipeline:

Func f, g;
Var x, y;
g(x, y) = x*y;
f(x, y) = g(x, y) + g(x, y+1) + g(x+1, y) + g(x+1, y+1);

If we schedule f like so:

g.compute_at(f, x);

Then the C code equivalent to this pipeline will look like this

int f[height][width];
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
int g[2][2];
g[0][0] = x*y;
g[0][1] = (x+1)*y;
g[1][0] = x*(y+1);
g[1][1] = (x+1)*(y+1);
f[y][x] = g[0][0] + g[1][0] + g[0][1] + g[1][1];
}
}

The allocation and computation of g is within f's loop over x, and enough of g is computed to satisfy all that f will need for that iteration. This has excellent locality - values of g are used as soon as they are computed, but it does redundant work. Each value of g ends up getting computed four times. If we instead schedule f like so:

g.compute_at(f, y);

The equivalent C code is:

int f[height][width];
for (int y = 0; y < height; y++) {
int g[2][width+1];
for (int x = 0; x < width; x++) {
g[0][x] = x*y;
g[1][x] = x*(y+1);
}
for (int x = 0; x < width; x++) {
f[y][x] = g[0][x] + g[1][x] + g[0][x+1] + g[1][x+1];
}
}

The allocation and computation of g is within f's loop over y, and enough of g is computed to satisfy all that f will need for that iteration. This does less redundant work (each point in g ends up being evaluated twice), but the locality is not quite as good, and we have to allocate more temporary memory to store g.

Examples
tutorial/lesson_08_scheduling_2.cpp, tutorial/lesson_09_update_definitions.cpp, and tutorial/lesson_12_using_the_gpu.cpp.

Referenced by Halide::SimdOpCheckTest::check_one(), and do_cost_model_schedule().

◆ compute_at() [2/3]

Func& Halide::Func::compute_at ( const Func f,
const RVar var 
)

Schedule a function to be computed within the iteration over some dimension of an update domain.

Produces equivalent code to the version of compute_at that takes a Var.

◆ compute_at() [3/3]

Func& Halide::Func::compute_at ( LoopLevel  loop_level)

Schedule a function to be computed within the iteration over a given LoopLevel.

◆ compute_with() [1/4]

Func& Halide::Func::compute_with ( const Stage s,
const VarOrRVar var,
const std::vector< std::pair< VarOrRVar, LoopAlignStrategy >> &  align 
)

Schedule the iteration over the initial definition of this function to be fused with another stage 's' from outermost loop to a given LoopLevel.

◆ compute_with() [2/4]

Func& Halide::Func::compute_with ( const Stage s,
const VarOrRVar var,
LoopAlignStrategy  align = LoopAlignStrategy::Auto 
)

◆ compute_with() [3/4]

Func& Halide::Func::compute_with ( LoopLevel  loop_level,
const std::vector< std::pair< VarOrRVar, LoopAlignStrategy >> &  align 
)

◆ compute_with() [4/4]

Func& Halide::Func::compute_with ( LoopLevel  loop_level,
LoopAlignStrategy  align = LoopAlignStrategy::Auto 
)

◆ compute_root()

Func& Halide::Func::compute_root ( )

Compute all of this function once ahead of time.

Reusing the example in Func::compute_at :

Func f, g;
Var x, y;
g(x, y) = x*y;
f(x, y) = g(x, y) + g(x, y+1) + g(x+1, y) + g(x+1, y+1);
g.compute_root();

is equivalent to

int f[height][width];
int g[height+1][width+1];
for (int y = 0; y < height+1; y++) {
for (int x = 0; x < width+1; x++) {
g[y][x] = x*y;
}
}
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
f[y][x] = g[y][x] + g[y+1][x] + g[y][x+1] + g[y+1][x+1];
}
}

g is computed once ahead of time, and enough is computed to satisfy all uses of it. This does no redundant work (each point in g is evaluated once), but has poor locality (values of g are probably not still in cache when they are used by f), and allocates lots of temporary memory to store g.

Examples
tutorial/lesson_08_scheduling_2.cpp, and tutorial/lesson_12_using_the_gpu.cpp.

Referenced by Halide::SimdOpCheckTest::check_one(), and do_cost_model_schedule().

◆ memoize()

Func& Halide::Func::memoize ( )

Use the halide_memoization_cache_...

interface to store a computed version of this function across invocations of the Func.

◆ async()

Func& Halide::Func::async ( )

Produce this Func asynchronously in a separate thread.

Consumers will be run by the task system when the production is complete. If this Func's store level is different to its compute level, consumers will be run concurrently, blocking as necessary to prevent reading ahead of what the producer has computed. If storage is folded, then the producer will additionally not be permitted to run too far ahead of the consumer, to avoid clobbering data that has not yet been used.

Take special care when combining this with custom thread pool implementations, as avoiding deadlock with producer-consumer parallelism requires a much more sophisticated parallel runtime than with data parallelism alone. It is strongly recommended you just use Halide's default thread pool, which guarantees no deadlock and a bound on the number of threads launched.

◆ store_at() [1/3]

Func& Halide::Func::store_at ( const Func f,
const Var var 
)

Allocate storage for this function within f's loop over var.

Scheduling storage is optional, and can be used to separate the loop level at which storage occurs from the loop level at which computation occurs to trade off between locality and redundant work. This can open the door for two types of optimization.

Consider again the pipeline from Func::compute_at :

Func f, g;
Var x, y;
g(x, y) = x*y;
f(x, y) = g(x, y) + g(x+1, y) + g(x, y+1) + g(x+1, y+1);

If we schedule it like so:

g.compute_at(f, x).store_at(f, y);

Then the computation of g takes place within the loop over x, but the storage takes place within the loop over y:

int f[height][width];
for (int y = 0; y < height; y++) {
int g[2][width+1];
for (int x = 0; x < width; x++) {
g[0][x] = x*y;
g[0][x+1] = (x+1)*y;
g[1][x] = x*(y+1);
g[1][x+1] = (x+1)*(y+1);
f[y][x] = g[0][x] + g[1][x] + g[0][x+1] + g[1][x+1];
}
}

Provided the for loop over x is serial, halide then automatically performs the following sliding window optimization:

int f[height][width];
for (int y = 0; y < height; y++) {
int g[2][width+1];
for (int x = 0; x < width; x++) {
if (x == 0) {
g[0][x] = x*y;
g[1][x] = x*(y+1);
}
g[0][x+1] = (x+1)*y;
g[1][x+1] = (x+1)*(y+1);
f[y][x] = g[0][x] + g[1][x] + g[0][x+1] + g[1][x+1];
}
}

Two of the assignments to g only need to be done when x is zero. The rest of the time, those sites have already been filled in by a previous iteration. This version has the locality of compute_at(f, x), but allocates more memory and does much less redundant work.

Halide then further optimizes this pipeline like so:

int f[height][width];
for (int y = 0; y < height; y++) {
int g[2][2];
for (int x = 0; x < width; x++) {
if (x == 0) {
g[0][0] = x*y;
g[1][0] = x*(y+1);
}
g[0][(x+1)%2] = (x+1)*y;
g[1][(x+1)%2] = (x+1)*(y+1);
f[y][x] = g[0][x%2] + g[1][x%2] + g[0][(x+1)%2] + g[1][(x+1)%2];
}
}

Halide has detected that it's possible to use a circular buffer to represent g, and has reduced all accesses to g modulo 2 in the x dimension. This optimization only triggers if the for loop over x is serial, and if halide can statically determine some power of two large enough to cover the range needed. For powers of two, the modulo operator compiles to more efficient bit-masking. This optimization reduces memory usage, and also improves locality by reusing recently-accessed memory instead of pulling new memory into cache.

Examples
tutorial/lesson_08_scheduling_2.cpp, tutorial/lesson_09_update_definitions.cpp, and tutorial/lesson_12_using_the_gpu.cpp.

Referenced by do_cost_model_schedule().

◆ store_at() [2/3]

Func& Halide::Func::store_at ( const Func f,
const RVar var 
)

Equivalent to the version of store_at that takes a Var, but schedules storage within the loop over a dimension of a reduction domain.

◆ store_at() [3/3]

Func& Halide::Func::store_at ( LoopLevel  loop_level)

Equivalent to the version of store_at that takes a Var, but schedules storage at a given LoopLevel.

◆ store_root()

Func& Halide::Func::store_root ( )

Equivalent to Func::store_at, but schedules storage outside the outermost loop.

Examples
tutorial/lesson_08_scheduling_2.cpp.

◆ compute_inline()

Func& Halide::Func::compute_inline ( )

Aggressively inline all uses of this function.

This is the default schedule, so you're unlikely to need to call this. For a Func with an update definition, that means it gets computed as close to the innermost loop as possible.

Consider once more the pipeline from Func::compute_at :

Func f, g;
Var x, y;
g(x, y) = x*y;
f(x, y) = g(x, y) + g(x+1, y) + g(x, y+1) + g(x+1, y+1);

Leaving g as inline, this compiles to code equivalent to the following C:

int f[height][width];
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
f[y][x] = x*y + x*(y+1) + (x+1)*y + (x+1)*(y+1);
}
}

◆ update()

Stage Halide::Func::update ( int  idx = 0)

Get a handle on an update step for the purposes of scheduling it.

Examples
tutorial/lesson_09_update_definitions.cpp.

Referenced by Halide::SimdOpCheckTest::check_one(), and do_cost_model_schedule().

◆ store_in()

Func& Halide::Func::store_in ( MemoryType  memory_type)

Set the type of memory this Func should be stored in.

Controls whether allocations go on the stack or the heap on the CPU, and in global vs shared vs local on the GPU. See the documentation on MemoryType for more detail.

Referenced by do_cost_model_schedule().

◆ trace_loads()

Func& Halide::Func::trace_loads ( )

Trace all loads from this Func by emitting calls to halide_trace.

If the Func is inlined, this has no effect.

Examples
tutorial/lesson_09_update_definitions.cpp.

◆ trace_stores()

Func& Halide::Func::trace_stores ( )

Trace all stores to the buffer backing this Func by emitting calls to halide_trace.

If the Func is inlined, this call has no effect.

Examples
tutorial/lesson_04_debugging_2.cpp, tutorial/lesson_05_scheduling_1.cpp, tutorial/lesson_06_realizing_over_shifted_domains.cpp, tutorial/lesson_08_scheduling_2.cpp, and tutorial/lesson_09_update_definitions.cpp.

◆ trace_realizations()

Func& Halide::Func::trace_realizations ( )

Trace all realizations of this Func by emitting calls to halide_trace.

◆ add_trace_tag()

Func& Halide::Func::add_trace_tag ( const std::string &  trace_tag)

Add a string of arbitrary text that will be passed thru to trace inspection code if the Func is realized in trace mode.

(Funcs that are inlined won't have their tags emitted.) Ignored entirely if tracing is not enabled for the Func (or globally).

◆ function()

Internal::Function Halide::Func::function ( ) const
inline

Get a handle on the internal halide function that this Func represents.

Useful if you want to do introspection on Halide functions

Definition at line 2409 of file Func.h.

◆ operator Stage()

Halide::Func::operator Stage ( ) const

You can cast a Func to its pure stage for the purposes of scheduling it.

◆ output_buffer()

OutputImageParam Halide::Func::output_buffer ( ) const

Get a handle on the output buffer for this Func.

Only relevant if this is the output Func in a pipeline. Useful for making static promises about strides, mins, and extents.

◆ output_buffers()

std::vector<OutputImageParam> Halide::Func::output_buffers ( ) const

◆ operator ExternFuncArgument()

Halide::Func::operator ExternFuncArgument ( ) const

Use a Func as an argument to an external stage.

◆ infer_arguments()

std::vector<Argument> Halide::Func::infer_arguments ( ) const

Infer the arguments to the Func, sorted into a canonical order: all buffers (sorted alphabetically by name), followed by all non-buffers (sorted alphabetically by name).

This lets you write things like:

func.compile_to_assembly("/dev/stdout", func.infer_arguments());

◆ source_location()

std::string Halide::Func::source_location ( ) const

Get the source location of the pure definition of this Func.

See Stage::source_location()

◆ get_schedule()

const Internal::StageSchedule& Halide::Func::get_schedule ( ) const
inline

Return the current StageSchedule associated with this initial Stage of this Func.

For introspection only: to modify schedule, use the Func interface.

Definition at line 2445 of file Func.h.

References Halide::Stage::get_schedule().

Referenced by do_cost_model_schedule().


The documentation for this class was generated from the following file:
int32_t
signed __INT32_TYPE__ int32_t
Definition: runtime_internal.h:20
halide_free
void halide_free(void *user_context, void *ptr)
Halide::Func::Func
Func()
Declare a new undefined function with an automatically-generated unique name.
Halide::Internal::IOKind::Buffer
@ Buffer
uint8_t
unsigned __INT8_TYPE__ uint8_t
Definition: runtime_internal.h:25
Halide::min
Expr min(const FuncRef &a, const FuncRef &b)
Explicit overloads of min and max for FuncRef.
Definition: Func.h:577
Halide::sin
Expr sin(Expr x)
Return the sine of a floating-point expression.
halide_do_par_for
int halide_do_par_for(void *user_context, halide_task_t task, int min, int size, uint8_t *closure)
Definition: thread_pool_common.h:773
halide_error
void halide_error(void *user_context, const char *)
Halide calls this function on runtime errors (for example bounds checking failures).
Halide::select
Expr select(Expr condition, Expr true_value, Expr false_value)
Returns an expression similar to the ternary operator in C, except that it always evaluates all argum...
Halide::Buffer<>
Halide::Func::infer_input_bounds
void infer_input_bounds(const std::vector< int32_t > &sizes, const Target &target=get_jit_target_from_environment(), const ParamMap &param_map=ParamMap::empty_map())
For a given size of output, or a given output buffer, determine the bounds required of all unbound Im...
Halide::PrefetchBoundStrategy::NonFaulting
@ NonFaulting
Leave the prefetched exprs as are (no if-guards around the prefetch and no intersecting with the orig...
Halide::Func::parallel
Func & parallel(const VarOrRVar &var)
Mark a dimension to be traversed in parallel.
Halide::UInt
Type UInt(int bits, int lanes=1)
Constructing an unsigned integer type.
Definition: Type.h:477
halide_malloc
void * halide_malloc(void *user_context, size_t x)
Halide calls these functions to allocate and free memory.
Halide::get_jit_target_from_environment
Target get_jit_target_from_environment()
Return the target that Halide will use for jit-compilation.
halide_print
void halide_print(void *user_context, const char *)
Print a message to stderr.
Halide::Func::in
Func in()
Create and return a global identity wrapper, which wraps all calls to this Func by any other Func.
user_context
void * user_context
Definition: printer.h:33
halide_do_task
int halide_do_task(void *user_context, halide_task_t f, int idx, uint8_t *closure)
Definition: thread_pool_common.h:768
Halide::Int
Type Int(int bits, int lanes=1)
Constructing a signed integer type.
Definition: Type.h:472