Halide
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Macros Pages
Halide::Func Class Reference

A halide function. More...

#include <Func.h>

Public Member Functions

EXPORT Func (const std::string &name)
 Declare a new undefined function with the given name. More...
 
EXPORT Func ()
 Declare a new undefined function with an automatically-generated unique name. More...
 
EXPORT Func (Expr e)
 Declare a new function with an automatically-generated unique name, and define it to return the given expression (which may not contain free variables). More...
 
EXPORT Func (Internal::Function f)
 Construct a new Func to wrap an existing, already-define Function object. More...
 
template<typename T >
NO_INLINE Func (Buffer< T > &im)
 Construct a new Func to wrap a Buffer. More...
 
EXPORT void realize (Realization dst, const Target &target=Target())
 Evaluate this function into an existing allocated buffer or buffers. More...
 
EXPORT void compile_to_header (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name="", const Target &target=get_target_from_environment())
 Emit a header file with the given filename for this function. More...
 
EXPORT void compile_to_c (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name="", const Target &target=get_target_from_environment())
 Statically compile this function to C source code. More...
 
EXPORT void compile_to_lowered_stmt (const std::string &filename, const std::vector< Argument > &args, StmtOutputFormat fmt=Text, const Target &target=get_target_from_environment())
 Write out an internal representation of lowered code. More...
 
EXPORT void print_loop_nest ()
 Write out the loop nests specified by the schedule for this Function. More...
 
EXPORT void compile_to_file (const std::string &filename_prefix, const std::vector< Argument > &args, const std::string &fn_name="", const Target &target=get_target_from_environment())
 Compile to object file and header pair, with the given arguments. More...
 
EXPORT void compile_to_static_library (const std::string &filename_prefix, const std::vector< Argument > &args, const std::string &fn_name="", const Target &target=get_target_from_environment())
 Compile to static-library file and header pair, with the given arguments. More...
 
EXPORT void compile_to_multitarget_static_library (const std::string &filename_prefix, const std::vector< Argument > &args, const std::vector< Target > &targets)
 Compile to static-library file and header pair once for each target; each resulting function will be considered (in order) via halide_can_use_target_features() at runtime, with the first appropriate match being selected for subsequent use. More...
 
EXPORT Module compile_to_module (const std::vector< Argument > &args, const std::string &fn_name="", const Target &target=get_target_from_environment())
 Store an internal representation of lowered code as a self contained Module suitable for further compilation. More...
 
EXPORT void compile_to (const Outputs &output_files, const std::vector< Argument > &args, const std::string &fn_name, const Target &target=get_target_from_environment())
 Compile and generate multiple target files with single call. More...
 
EXPORT void * compile_jit (const Target &target=get_jit_target_from_environment())
 Eagerly jit compile the function to machine code. More...
 
EXPORT void set_error_handler (void(*handler)(void *, const char *))
 Set the error handler function that be called in the case of runtime errors during halide pipelines. More...
 
EXPORT void set_custom_allocator (void *(*malloc)(void *, size_t), void(*free)(void *, void *))
 
EXPORT void set_custom_do_task (int(*custom_do_task)(void *, int(*)(void *, int, uint8_t *), int, uint8_t *))
 Set a custom task handler to be called by the parallel for loop. More...
 
EXPORT void set_custom_do_par_for (int(*custom_do_par_for)(void *, int(*)(void *, int, uint8_t *), int, int, uint8_t *))
 Set a custom parallel for loop launcher. More...
 
EXPORT void set_custom_trace (int(*trace_fn)(void *, const halide_trace_event_t *))
 Set custom routines to call when tracing is enabled. More...
 
EXPORT void set_custom_print (void(*handler)(void *, const char *))
 Set the function called to print messages from the runtime. More...
 
EXPORT const
Internal::JITHandlers
jit_handlers ()
 Get a struct containing the currently set custom functions used by JIT. More...
 
template<typename T >
void add_custom_lowering_pass (T *pass)
 Add a custom pass to be used during lowering. More...
 
EXPORT void add_custom_lowering_pass (Internal::IRMutator *pass, void(*deleter)(Internal::IRMutator *))
 Add a custom pass to be used during lowering, with the function that will be called to delete it also passed in. More...
 
EXPORT void clear_custom_lowering_passes ()
 Remove all previously-set custom lowering passes. More...
 
EXPORT const std::vector
< CustomLoweringPass > & 
custom_lowering_passes ()
 Get the custom lowering passes. More...
 
EXPORT void debug_to_file (const std::string &filename)
 When this function is compiled, include code that dumps its values to a file after it is realized, for the purpose of debugging. More...
 
EXPORT const std::string & name () const
 The name of this function, either given during construction, or automatically generated. More...
 
EXPORT std::vector< Varargs () const
 Get the pure arguments. More...
 
EXPORT Expr value () const
 The right-hand-side value of the pure definition of this function. More...
 
EXPORT Tuple values () const
 The values returned by this function. More...
 
EXPORT bool defined () const
 Does this function have at least a pure definition. More...
 
EXPORT const std::vector< Expr > & update_args (int idx=0) const
 Get the left-hand-side of the update definition. More...
 
EXPORT Expr update_value (int idx=0) const
 Get the right-hand-side of an update definition. More...
 
EXPORT Tuple update_values (int idx=0) const
 Get the right-hand-side of an update definition for functions that returns multiple values. More...
 
EXPORT std::vector< RVarrvars (int idx=0) const
 Get the RVars of the reduction domain for an update definition, if there is one. More...
 
EXPORT bool has_update_definition () const
 Does this function have at least one update definition? More...
 
EXPORT int num_update_definitions () const
 How many update definitions does this function have? More...
 
EXPORT bool is_extern () const
 Is this function an external stage? That is, was it defined using define_extern? More...
 
EXPORT const std::vector< Type > & output_types () const
 Get the types of the outputs of this Func. More...
 
EXPORT int outputs () const
 Get the number of outputs of this Func. More...
 
EXPORT const std::string & extern_function_name () const
 Get the name of the extern function called for an extern definition. More...
 
EXPORT int dimensions () const
 The dimensionality (number of arguments) of this function. More...
 
EXPORT Func in (const Func &f)
 Creates and returns a new Func that wraps this Func. More...
 
EXPORT Func in (const std::vector< Func > &fs)
 Create and return a wrapper shared by all the Funcs in 'fs'. More...
 
EXPORT Func in ()
 Create and return a global wrapper, which wraps all calls to this Func by any other Func. More...
 
EXPORT Funcsplit (VarOrRVar old, VarOrRVar outer, VarOrRVar inner, Expr factor, TailStrategy tail=TailStrategy::Auto)
 Split a dimension into inner and outer subdimensions with the given names, where the inner dimension iterates from 0 to factor-1. More...
 
EXPORT Funcfuse (VarOrRVar inner, VarOrRVar outer, VarOrRVar fused)
 Join two dimensions into a single fused dimenion. More...
 
EXPORT Funcserial (VarOrRVar var)
 Mark a dimension to be traversed serially. More...
 
EXPORT Funcparallel (VarOrRVar var)
 Mark a dimension to be traversed in parallel. More...
 
EXPORT Funcparallel (VarOrRVar var, Expr task_size, TailStrategy tail=TailStrategy::Auto)
 Split a dimension by the given task_size, and the parallelize the outer dimension. More...
 
EXPORT Funcvectorize (VarOrRVar var)
 Mark a dimension to be computed all-at-once as a single vector. More...
 
EXPORT Funcunroll (VarOrRVar var)
 Mark a dimension to be completely unrolled. More...
 
EXPORT Funcvectorize (VarOrRVar var, Expr factor, TailStrategy tail=TailStrategy::Auto)
 Split a dimension by the given factor, then vectorize the inner dimension. More...
 
EXPORT Funcunroll (VarOrRVar var, Expr factor, TailStrategy tail=TailStrategy::Auto)
 Split a dimension by the given factor, then unroll the inner dimension. More...
 
EXPORT Funcbound (Var var, Expr min, Expr extent)
 Statically declare that the range over which a function should be evaluated is given by the second and third arguments. More...
 
EXPORT Funcalign_bounds (Var var, Expr modulus, Expr remainder=0)
 Expand the region computed so that the min coordinates is congruent to 'remainder' modulo 'modulus', and the extent is a multiple of 'modulus'. More...
 
EXPORT Funcbound_extent (Var var, Expr extent)
 Bound the extent of a Func's realization, but not its min. More...
 
EXPORT Functile (VarOrRVar x, VarOrRVar y, VarOrRVar xo, VarOrRVar yo, VarOrRVar xi, VarOrRVar yi, Expr xfactor, Expr yfactor, TailStrategy tail=TailStrategy::Auto)
 Split two dimensions at once by the given factors, and then reorder the resulting dimensions to be xi, yi, xo, yo from innermost outwards. More...
 
EXPORT Functile (VarOrRVar x, VarOrRVar y, VarOrRVar xi, VarOrRVar yi, Expr xfactor, Expr yfactor, TailStrategy tail=TailStrategy::Auto)
 A shorter form of tile, which reuses the old variable names as the new outer dimensions. More...
 
EXPORT Funcreorder (const std::vector< VarOrRVar > &vars)
 Reorder variables to have the given nesting order, from innermost out. More...
 
template<typename... Args>
NO_INLINE std::enable_if
< Internal::all_are_convertible
< VarOrRVar, Args...>::value,
Func & >::type 
reorder (VarOrRVar x, VarOrRVar y, Args &&...args)
 
EXPORT Funcrename (VarOrRVar old_name, VarOrRVar new_name)
 Rename a dimension. More...
 
EXPORT Funcallow_race_conditions ()
 Specify that race conditions are permitted for this Func, which enables parallelizing over RVars even when Halide cannot prove that it is safe to do so. More...
 
EXPORT Stage specialize (Expr condition)
 Specialize a Func. More...
 
EXPORT void specialize_fail (const std::string &message)
 Add a specialization to a Func that always terminates execution with a call to halide_error(). More...
 
EXPORT Funcgpu_single_thread (DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide to run this stage using a single gpu thread and block. More...
 
EXPORT Funcshader (Var x, Var y, Var c, DeviceAPI device_api)
 Schedule for execution using coordinate-based hardware api. More...
 
EXPORT Funcglsl (Var x, Var y, Var c)
 Schedule for execution as GLSL kernel. More...
 
EXPORT Funchexagon (VarOrRVar x=Var::outermost())
 Schedule for execution on Hexagon. More...
 
EXPORT Funcalign_storage (Var dim, Expr alignment)
 Pad the storage extent of a particular dimension of realizations of this function up to be a multiple of the specified alignment. More...
 
EXPORT Funcfold_storage (Var dim, Expr extent, bool fold_forward=true)
 Store realizations of this function in a circular buffer of a given extent. More...
 
EXPORT Funccompute_at (Func f, Var var)
 Compute this function as needed for each unique value of the given var for the given calling function f. More...
 
EXPORT Funccompute_at (Func f, RVar var)
 Schedule a function to be computed within the iteration over some dimension of an update domain. More...
 
EXPORT Funccompute_at (LoopLevel loop_level)
 Schedule a function to be computed within the iteration over a given LoopLevel. More...
 
EXPORT Funccompute_root ()
 Compute all of this function once ahead of time. More...
 
EXPORT Funcmemoize ()
 Use the halide_memoization_cache_... More...
 
EXPORT Funcstore_at (Func f, Var var)
 Allocate storage for this function within f's loop over var. More...
 
EXPORT Funcstore_at (Func f, RVar var)
 Equivalent to the version of store_at that takes a Var, but schedules storage within the loop over a dimension of a reduction domain. More...
 
EXPORT Funcstore_at (LoopLevel loop_level)
 Equivalent to the version of store_at that takes a Var, but schedules storage at a given LoopLevel. More...
 
EXPORT Funcstore_root ()
 Equivalent to Func::store_at, but schedules storage outside the outermost loop. More...
 
EXPORT Funccompute_inline ()
 Aggressively inline all uses of this function. More...
 
EXPORT Stage update (int idx=0)
 Get a handle on an update step for the purposes of scheduling it. More...
 
EXPORT Functrace_loads ()
 Trace all loads from this Func by emitting calls to halide_trace. More...
 
EXPORT Functrace_stores ()
 Trace all stores to the buffer backing this Func by emitting calls to halide_trace. More...
 
EXPORT Functrace_realizations ()
 Trace all realizations of this Func by emitting calls to halide_trace. More...
 
Internal::Function function () const
 Get a handle on the internal halide function that this Func represents. More...
 
EXPORT operator Stage () const
 You can cast a Func to its pure stage for the purposes of scheduling it. More...
 
 operator ExternFuncArgument () const
 Use a Func as an argument to an external stage. More...
 
EXPORT std::vector< Argumentinfer_arguments () const
 Infer the arguments to the Func, sorted into a canonical order: all buffers (sorted alphabetically by name), followed by all non-buffers (sorted alphabetically by name). More...
 
EXPORT Realization realize (std::vector< int32_t > sizes, const Target &target=Target())
 Evaluate this function over some rectangular domain and return the resulting buffer or buffers. More...
 
EXPORT Realization realize (int x_size, int y_size, int z_size, int w_size, const Target &target=Target())
 Evaluate this function over some rectangular domain and return the resulting buffer or buffers. More...
 
EXPORT Realization realize (int x_size, int y_size, int z_size, const Target &target=Target())
 Evaluate this function over some rectangular domain and return the resulting buffer or buffers. More...
 
EXPORT Realization realize (int x_size, int y_size, const Target &target=Target())
 Evaluate this function over some rectangular domain and return the resulting buffer or buffers. More...
 
EXPORT Realization realize (int x_size, const Target &target=Target())
 Evaluate this function over some rectangular domain and return the resulting buffer or buffers. More...
 
EXPORT Realization realize (const Target &target=Target())
 Evaluate this function over some rectangular domain and return the resulting buffer or buffers. More...
 
EXPORT void infer_input_bounds (int x_size=0, int y_size=0, int z_size=0, int w_size=0)
 For a given size of output, or a given output buffer, determine the bounds required of all unbound ImageParams referenced. More...
 
EXPORT void infer_input_bounds (Realization dst)
 For a given size of output, or a given output buffer, determine the bounds required of all unbound ImageParams referenced. More...
 
EXPORT void compile_to_bitcode (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment())
 Statically compile this function to llvm bitcode, with the given filename (which should probably end in .bc), type signature, and C function name (which defaults to the same name as this halide function. More...
 
EXPORT void compile_to_bitcode (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment())
 Statically compile this function to llvm bitcode, with the given filename (which should probably end in .bc), type signature, and C function name (which defaults to the same name as this halide function. More...
 
EXPORT void compile_to_llvm_assembly (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment())
 Statically compile this function to llvm assembly, with the given filename (which should probably end in .ll), type signature, and C function name (which defaults to the same name as this halide function. More...
 
EXPORT void compile_to_llvm_assembly (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment())
 Statically compile this function to llvm assembly, with the given filename (which should probably end in .ll), type signature, and C function name (which defaults to the same name as this halide function. More...
 
EXPORT void compile_to_object (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment())
 Statically compile this function to an object file, with the given filename (which should probably end in .o or .obj), type signature, and C function name (which defaults to the same name as this halide function. More...
 
EXPORT void compile_to_object (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment())
 Statically compile this function to an object file, with the given filename (which should probably end in .o or .obj), type signature, and C function name (which defaults to the same name as this halide function. More...
 
EXPORT void compile_to_assembly (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment())
 Statically compile this function to text assembly equivalent to the object file generated by compile_to_object. More...
 
EXPORT void compile_to_assembly (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment())
 Statically compile this function to text assembly equivalent to the object file generated by compile_to_object. More...
 
EXPORT void define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > &params, Type t, int dimensionality, NameMangling mangling, bool uses_old_buffer_t)
 Add an extern definition for this Func. More...
 
EXPORT void define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > &params, Type t, int dimensionality, NameMangling mangling=NameMangling::Default, DeviceAPI device_api=DeviceAPI::Host, bool uses_old_buffer_t=false)
 Add an extern definition for this Func. More...
 
EXPORT void define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > &params, const std::vector< Type > &types, int dimensionality, NameMangling mangling, bool uses_old_buffer_t)
 Add an extern definition for this Func. More...
 
EXPORT void define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > &params, const std::vector< Type > &types, int dimensionality, NameMangling mangling=NameMangling::Default, DeviceAPI device_api=DeviceAPI::Host, bool uses_old_buffer_t=false)
 Add an extern definition for this Func. More...
 
EXPORT FuncRef operator() (std::vector< Var >) const
 Construct either the left-hand-side of a definition, or a call to a functions that happens to only contain vars as arguments. More...
 
template<typename... Args>
NO_INLINE std::enable_if
< Internal::all_are_convertible
< Var, Args...>::value,
FuncRef >::type 
operator() (Args &&...args) const
 Construct either the left-hand-side of a definition, or a call to a functions that happens to only contain vars as arguments. More...
 
EXPORT FuncRef operator() (std::vector< Expr >) const
 Either calls to the function, or the left-hand-side of an update definition (see RDom). More...
 
template<typename... Args>
NO_INLINE std::enable_if
< Internal::all_are_convertible
< Expr, Args...>::value,
FuncRef >::type 
operator() (Expr x, Args &&...args) const
 Either calls to the function, or the left-hand-side of an update definition (see RDom). More...
 
EXPORT Funcgpu_threads (VarOrRVar thread_x, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU thread indices. More...
 
EXPORT Funcgpu_threads (VarOrRVar thread_x, VarOrRVar thread_y, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU thread indices. More...
 
EXPORT Funcgpu_threads (VarOrRVar thread_x, VarOrRVar thread_y, VarOrRVar thread_z, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU thread indices. More...
 
EXPORT Funcgpu_blocks (VarOrRVar block_x, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU block indices. More...
 
EXPORT Funcgpu_blocks (VarOrRVar block_x, VarOrRVar block_y, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU block indices. More...
 
EXPORT Funcgpu_blocks (VarOrRVar block_x, VarOrRVar block_y, VarOrRVar block_z, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU block indices. More...
 
EXPORT Funcgpu (VarOrRVar block_x, VarOrRVar thread_x, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU block indices and thread indices. More...
 
EXPORT Funcgpu (VarOrRVar block_x, VarOrRVar block_y, VarOrRVar thread_x, VarOrRVar thread_y, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU block indices and thread indices. More...
 
EXPORT Funcgpu (VarOrRVar block_x, VarOrRVar block_y, VarOrRVar block_z, VarOrRVar thread_x, VarOrRVar thread_y, VarOrRVar thread_z, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Tell Halide that the following dimensions correspond to GPU block indices and thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, VarOrRVar bx, Var tx, Expr x_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, VarOrRVar bx, RVar tx, Expr x_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, VarOrRVar tx, Expr x_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, VarOrRVar y, VarOrRVar bx, VarOrRVar by, VarOrRVar tx, VarOrRVar ty, Expr x_size, Expr y_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, VarOrRVar y, VarOrRVar tx, Var ty, Expr x_size, Expr y_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, VarOrRVar y, VarOrRVar tx, RVar ty, Expr x_size, Expr y_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, VarOrRVar y, VarOrRVar z, VarOrRVar bx, VarOrRVar by, VarOrRVar bz, VarOrRVar tx, VarOrRVar ty, VarOrRVar tz, Expr x_size, Expr y_size, Expr z_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, VarOrRVar y, VarOrRVar z, VarOrRVar tx, VarOrRVar ty, VarOrRVar tz, Expr x_size, Expr y_size, Expr z_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, Expr x_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, VarOrRVar y, Expr x_size, Expr y_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcgpu_tile (VarOrRVar x, VarOrRVar y, VarOrRVar z, Expr x_size, Expr y_size, Expr z_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU)
 Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. More...
 
EXPORT Funcprefetch (const Func &f, VarOrRVar var, Expr offset=1, PrefetchBoundStrategy strategy=PrefetchBoundStrategy::GuardWithIf)
 Prefetch data written to or read from a Func or an ImageParam by a subsequent loop iteration, at an optionally specified iteration offset. More...
 
EXPORT Funcprefetch (const Internal::Parameter &param, VarOrRVar var, Expr offset=1, PrefetchBoundStrategy strategy=PrefetchBoundStrategy::GuardWithIf)
 Prefetch data written to or read from a Func or an ImageParam by a subsequent loop iteration, at an optionally specified iteration offset. More...
 
template<typename T >
Funcprefetch (const T &image, VarOrRVar var, Expr offset=1, PrefetchBoundStrategy strategy=PrefetchBoundStrategy::GuardWithIf)
 Prefetch data written to or read from a Func or an ImageParam by a subsequent loop iteration, at an optionally specified iteration offset. More...
 
EXPORT Funcreorder_storage (const std::vector< Var > &dims)
 Specify how the storage for the function is laid out. More...
 
EXPORT Funcreorder_storage (Var x, Var y)
 Specify how the storage for the function is laid out. More...
 
template<typename... Args>
NO_INLINE std::enable_if
< Internal::all_are_convertible
< Var, Args...>::value, Func & >
::type 
reorder_storage (Var x, Var y, Args &&...args)
 Specify how the storage for the function is laid out. More...
 
EXPORT OutputImageParam output_buffer () const
 Get a handle on the output buffer for this Func. More...
 
EXPORT std::vector
< OutputImageParam
output_buffers () const
 Get a handle on the output buffer for this Func. More...
 

Detailed Description

Constructor & Destructor Documentation

EXPORT Halide::Func::Func ( const std::string &  name)
explicit

Declare a new undefined function with the given name.

EXPORT Halide::Func::Func ( )

Declare a new undefined function with an automatically-generated unique name.

EXPORT Halide::Func::Func ( Expr  e)
explicit

Declare a new function with an automatically-generated unique name, and define it to return the given expression (which may not contain free variables).

EXPORT Halide::Func::Func ( Internal::Function  f)
explicit

Construct a new Func to wrap an existing, already-define Function object.

template<typename T >
NO_INLINE Halide::Func::Func ( Buffer< T > &  im)
inlineexplicit

Construct a new Func to wrap a Buffer.

Definition at line 538 of file Func.h.

References Halide::_.

Member Function Documentation

EXPORT Realization Halide::Func::realize ( std::vector< int32_t sizes,
const Target target = Target() 
)

Evaluate this function over some rectangular domain and return the resulting buffer or buffers.

Performs compilation if the Func has not previously been realized and jit_compile has not been called. If the final stage of the pipeline is on the GPU, data is copied back to the host before being returned. The returned Realization should probably be instantly converted to a Buffer class of the appropriate type. That is, do this:

f(x) = sin(x);
Buffer<float> im = f.realize(...);

If your Func has multiple values, because you defined it using a Tuple, then casting the result of a realize call to a buffer or image will produce a run-time error. Instead you should do the following:

f(x) = Tuple(x, sin(x));
Realization r = f.realize(...);
Buffer<int> im0 = r[0];
Buffer<float> im1 = r[1];
Examples:
tutorial/lesson_01_basics.cpp, tutorial/lesson_02_input_image.cpp, tutorial/lesson_03_debugging_1.cpp, tutorial/lesson_04_debugging_2.cpp, tutorial/lesson_05_scheduling_1.cpp, tutorial/lesson_06_realizing_over_shifted_domains.cpp, tutorial/lesson_07_multi_stage_pipelines.cpp, tutorial/lesson_08_scheduling_2.cpp, tutorial/lesson_09_update_definitions.cpp, and tutorial/lesson_13_tuples.cpp.

Referenced by Halide::evaluate(), Halide::evaluate_may_gpu(), and Halide::Internal::StubOutputBufferBase::realize().

EXPORT Realization Halide::Func::realize ( int  x_size,
int  y_size,
int  z_size,
int  w_size,
const Target target = Target() 
)

Evaluate this function over some rectangular domain and return the resulting buffer or buffers.

Performs compilation if the Func has not previously been realized and jit_compile has not been called. If the final stage of the pipeline is on the GPU, data is copied back to the host before being returned. The returned Realization should probably be instantly converted to a Buffer class of the appropriate type. That is, do this:

f(x) = sin(x);
Buffer<float> im = f.realize(...);

If your Func has multiple values, because you defined it using a Tuple, then casting the result of a realize call to a buffer or image will produce a run-time error. Instead you should do the following:

f(x) = Tuple(x, sin(x));
Realization r = f.realize(...);
Buffer<int> im0 = r[0];
Buffer<float> im1 = r[1];
EXPORT Realization Halide::Func::realize ( int  x_size,
int  y_size,
int  z_size,
const Target target = Target() 
)

Evaluate this function over some rectangular domain and return the resulting buffer or buffers.

Performs compilation if the Func has not previously been realized and jit_compile has not been called. If the final stage of the pipeline is on the GPU, data is copied back to the host before being returned. The returned Realization should probably be instantly converted to a Buffer class of the appropriate type. That is, do this:

f(x) = sin(x);
Buffer<float> im = f.realize(...);

If your Func has multiple values, because you defined it using a Tuple, then casting the result of a realize call to a buffer or image will produce a run-time error. Instead you should do the following:

f(x) = Tuple(x, sin(x));
Realization r = f.realize(...);
Buffer<int> im0 = r[0];
Buffer<float> im1 = r[1];
EXPORT Realization Halide::Func::realize ( int  x_size,
int  y_size,
const Target target = Target() 
)

Evaluate this function over some rectangular domain and return the resulting buffer or buffers.

Performs compilation if the Func has not previously been realized and jit_compile has not been called. If the final stage of the pipeline is on the GPU, data is copied back to the host before being returned. The returned Realization should probably be instantly converted to a Buffer class of the appropriate type. That is, do this:

f(x) = sin(x);
Buffer<float> im = f.realize(...);

If your Func has multiple values, because you defined it using a Tuple, then casting the result of a realize call to a buffer or image will produce a run-time error. Instead you should do the following:

f(x) = Tuple(x, sin(x));
Realization r = f.realize(...);
Buffer<int> im0 = r[0];
Buffer<float> im1 = r[1];
EXPORT Realization Halide::Func::realize ( int  x_size,
const Target target = Target() 
)

Evaluate this function over some rectangular domain and return the resulting buffer or buffers.

Performs compilation if the Func has not previously been realized and jit_compile has not been called. If the final stage of the pipeline is on the GPU, data is copied back to the host before being returned. The returned Realization should probably be instantly converted to a Buffer class of the appropriate type. That is, do this:

f(x) = sin(x);
Buffer<float> im = f.realize(...);

If your Func has multiple values, because you defined it using a Tuple, then casting the result of a realize call to a buffer or image will produce a run-time error. Instead you should do the following:

f(x) = Tuple(x, sin(x));
Realization r = f.realize(...);
Buffer<int> im0 = r[0];
Buffer<float> im1 = r[1];
EXPORT Realization Halide::Func::realize ( const Target target = Target())

Evaluate this function over some rectangular domain and return the resulting buffer or buffers.

Performs compilation if the Func has not previously been realized and jit_compile has not been called. If the final stage of the pipeline is on the GPU, data is copied back to the host before being returned. The returned Realization should probably be instantly converted to a Buffer class of the appropriate type. That is, do this:

f(x) = sin(x);
Buffer<float> im = f.realize(...);

If your Func has multiple values, because you defined it using a Tuple, then casting the result of a realize call to a buffer or image will produce a run-time error. Instead you should do the following:

f(x) = Tuple(x, sin(x));
Realization r = f.realize(...);
Buffer<int> im0 = r[0];
Buffer<float> im1 = r[1];
EXPORT void Halide::Func::realize ( Realization  dst,
const Target target = Target() 
)

Evaluate this function into an existing allocated buffer or buffers.

If the buffer is also one of the arguments to the function, strange things may happen, as the pipeline isn't necessarily safe to run in-place. If you pass multiple buffers, they must have matching sizes. This form of realize does not automatically copy data back from the GPU.

EXPORT void Halide::Func::infer_input_bounds ( int  x_size = 0,
int  y_size = 0,
int  z_size = 0,
int  w_size = 0 
)

For a given size of output, or a given output buffer, determine the bounds required of all unbound ImageParams referenced.

Communicates the result by allocating new buffers of the appropriate size and binding them to the unbound ImageParams.

EXPORT void Halide::Func::infer_input_bounds ( Realization  dst)

For a given size of output, or a given output buffer, determine the bounds required of all unbound ImageParams referenced.

Communicates the result by allocating new buffers of the appropriate size and binding them to the unbound ImageParams.

EXPORT void Halide::Func::compile_to_bitcode ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name,
const Target target = get_target_from_environment() 
)

Statically compile this function to llvm bitcode, with the given filename (which should probably end in .bc), type signature, and C function name (which defaults to the same name as this halide function.

EXPORT void Halide::Func::compile_to_bitcode ( const std::string &  filename,
const std::vector< Argument > &  ,
const Target target = get_target_from_environment() 
)

Statically compile this function to llvm bitcode, with the given filename (which should probably end in .bc), type signature, and C function name (which defaults to the same name as this halide function.

EXPORT void Halide::Func::compile_to_llvm_assembly ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name,
const Target target = get_target_from_environment() 
)

Statically compile this function to llvm assembly, with the given filename (which should probably end in .ll), type signature, and C function name (which defaults to the same name as this halide function.

EXPORT void Halide::Func::compile_to_llvm_assembly ( const std::string &  filename,
const std::vector< Argument > &  ,
const Target target = get_target_from_environment() 
)

Statically compile this function to llvm assembly, with the given filename (which should probably end in .ll), type signature, and C function name (which defaults to the same name as this halide function.

EXPORT void Halide::Func::compile_to_object ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name,
const Target target = get_target_from_environment() 
)

Statically compile this function to an object file, with the given filename (which should probably end in .o or .obj), type signature, and C function name (which defaults to the same name as this halide function.

You probably don't want to use this directly; call compile_to_static_library or compile_to_file instead.

EXPORT void Halide::Func::compile_to_object ( const std::string &  filename,
const std::vector< Argument > &  ,
const Target target = get_target_from_environment() 
)

Statically compile this function to an object file, with the given filename (which should probably end in .o or .obj), type signature, and C function name (which defaults to the same name as this halide function.

You probably don't want to use this directly; call compile_to_static_library or compile_to_file instead.

EXPORT void Halide::Func::compile_to_header ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name = "",
const Target target = get_target_from_environment() 
)

Emit a header file with the given filename for this function.

The header will define a function with the type signature given by the second argument, and a name given by the third. The name defaults to the same name as this halide function. You don't actually have to have defined this function yet to call this. You probably don't want to use this directly; call compile_to_static_library or compile_to_file instead.

EXPORT void Halide::Func::compile_to_assembly ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name,
const Target target = get_target_from_environment() 
)

Statically compile this function to text assembly equivalent to the object file generated by compile_to_object.

This is useful for checking what Halide is producing without having to disassemble anything, or if you need to feed the assembly into some custom toolchain to produce an object file (e.g. iOS)

EXPORT void Halide::Func::compile_to_assembly ( const std::string &  filename,
const std::vector< Argument > &  ,
const Target target = get_target_from_environment() 
)

Statically compile this function to text assembly equivalent to the object file generated by compile_to_object.

This is useful for checking what Halide is producing without having to disassemble anything, or if you need to feed the assembly into some custom toolchain to produce an object file (e.g. iOS)

EXPORT void Halide::Func::compile_to_c ( const std::string &  filename,
const std::vector< Argument > &  ,
const std::string &  fn_name = "",
const Target target = get_target_from_environment() 
)

Statically compile this function to C source code.

This is useful for providing fallback code paths that will compile on many platforms. Vectorization will fail, and parallelization will produce serial code.

EXPORT void Halide::Func::compile_to_lowered_stmt ( const std::string &  filename,
const std::vector< Argument > &  args,
StmtOutputFormat  fmt = Text,
const Target target = get_target_from_environment() 
)

Write out an internal representation of lowered code.

Useful for analyzing and debugging scheduling. Can emit html or plain text.

Examples:
tutorial/lesson_03_debugging_1.cpp.
EXPORT void Halide::Func::print_loop_nest ( )

Write out the loop nests specified by the schedule for this Function.

Helpful for understanding what a schedule is doing.

Examples:
tutorial/lesson_05_scheduling_1.cpp, and tutorial/lesson_08_scheduling_2.cpp.
EXPORT void Halide::Func::compile_to_file ( const std::string &  filename_prefix,
const std::vector< Argument > &  args,
const std::string &  fn_name = "",
const Target target = get_target_from_environment() 
)

Compile to object file and header pair, with the given arguments.

The name defaults to the same name as this halide function.

Examples:
tutorial/lesson_11_cross_compilation.cpp.
EXPORT void Halide::Func::compile_to_static_library ( const std::string &  filename_prefix,
const std::vector< Argument > &  args,
const std::string &  fn_name = "",
const Target target = get_target_from_environment() 
)

Compile to static-library file and header pair, with the given arguments.

The name defaults to the same name as this halide function.

Examples:
tutorial/lesson_10_aot_compilation_generate.cpp.
EXPORT void Halide::Func::compile_to_multitarget_static_library ( const std::string &  filename_prefix,
const std::vector< Argument > &  args,
const std::vector< Target > &  targets 
)

Compile to static-library file and header pair once for each target; each resulting function will be considered (in order) via halide_can_use_target_features() at runtime, with the first appropriate match being selected for subsequent use.

This is typically useful for specializations that may vary unpredictably by machine (e.g., SSE4.1/AVX/AVX2 on x86 desktop machines). All targets must have identical arch-os-bits.

EXPORT Module Halide::Func::compile_to_module ( const std::vector< Argument > &  args,
const std::string &  fn_name = "",
const Target target = get_target_from_environment() 
)

Store an internal representation of lowered code as a self contained Module suitable for further compilation.

EXPORT void Halide::Func::compile_to ( const Outputs output_files,
const std::vector< Argument > &  args,
const std::string &  fn_name,
const Target target = get_target_from_environment() 
)

Compile and generate multiple target files with single call.

Deduces target files based on filenames specified in output_files struct.

EXPORT void* Halide::Func::compile_jit ( const Target target = get_jit_target_from_environment())

Eagerly jit compile the function to machine code.

This normally happens on the first call to realize. If you're running your halide pipeline inside time-sensitive code and wish to avoid including the time taken to compile a pipeline, then you can call this ahead of time. Returns the raw function pointer to the compiled pipeline. Default is to use the Target returned from Halide::get_jit_target_from_environment()

EXPORT void Halide::Func::set_error_handler ( void(*)(void *, const char *)  handler)

Set the error handler function that be called in the case of runtime errors during halide pipelines.

If you are compiling statically, you can also just define your own function with signature

extern "C" void halide_error(void *user_context, const char *);

This will clobber Halide's version.

EXPORT void Halide::Func::set_custom_allocator ( void *(*)(void *, size_t malloc,
void(*)(void *, void *)  free 
)
EXPORT void Halide::Func::set_custom_do_task ( int(*)(void *, int(*)(void *, int, uint8_t *), int, uint8_t *)  custom_do_task)

Set a custom task handler to be called by the parallel for loop.

It is useful to set this if you want to do some additional bookkeeping at the granularity of parallel tasks. The default implementation does this:

extern "C" int halide_do_task(void *user_context,
int (*f)(void *, int, uint8_t *),
int idx, uint8_t *state) {
return f(user_context, idx, state);
}

If you are statically compiling, you can also just define your own version of the above function, and it will clobber Halide's version.

If you're trying to use a custom parallel runtime, you probably don't want to call this. See instead Func::set_custom_do_par_for .

EXPORT void Halide::Func::set_custom_do_par_for ( int(*)(void *, int(*)(void *, int, uint8_t *), int, int, uint8_t *)  custom_do_par_for)

Set a custom parallel for loop launcher.

Useful if your app already manages a thread pool. The default implementation is equivalent to this:

extern "C" int halide_do_par_for(void *user_context,
int (*f)(void *, int, uint8_t *),
int min, int extent, uint8_t *state) {
int exit_status = 0;
parallel for (int idx = min; idx < min+extent; idx++) {
int job_status = halide_do_task(user_context, f, idx, state);
if (job_status) exit_status = job_status;
}
return exit_status;
}

However, notwithstanding the above example code, if one task fails, we may skip over other tasks, and if two tasks return different error codes, we may select one arbitrarily to return.

If you are statically compiling, you can also just define your own version of the above function, and it will clobber Halide's version.

EXPORT void Halide::Func::set_custom_trace ( int(*)(void *, const halide_trace_event_t *)  trace_fn)

Set custom routines to call when tracing is enabled.

Call this on the output Func of your pipeline. This then sets custom routines for the entire pipeline, not just calls to this Func.

If you are statically compiling, you can also just define your own versions of the tracing functions (see HalideRuntime.h), and they will clobber Halide's versions.

EXPORT void Halide::Func::set_custom_print ( void(*)(void *, const char *)  handler)

Set the function called to print messages from the runtime.

If you are compiling statically, you can also just define your own function with signature

extern "C" void halide_print(void *user_context, const char *);

This will clobber Halide's version.

EXPORT const Internal::JITHandlers& Halide::Func::jit_handlers ( )

Get a struct containing the currently set custom functions used by JIT.

template<typename T >
void Halide::Func::add_custom_lowering_pass ( T *  pass)
inline

Add a custom pass to be used during lowering.

It is run after all other lowering passes. Can be used to verify properties of the lowered Stmt, instrument it with extra code, or otherwise modify it. The Func takes ownership of the pass, and will call delete on it when the Func goes out of scope. So don't pass a stack object, or share pass instances between multiple Funcs.

Definition at line 836 of file Func.h.

EXPORT void Halide::Func::add_custom_lowering_pass ( Internal::IRMutator pass,
void(*)(Internal::IRMutator *)  deleter 
)

Add a custom pass to be used during lowering, with the function that will be called to delete it also passed in.

Set it to nullptr if you wish to retain ownership of the object.

EXPORT void Halide::Func::clear_custom_lowering_passes ( )

Remove all previously-set custom lowering passes.

EXPORT const std::vector<CustomLoweringPass>& Halide::Func::custom_lowering_passes ( )

Get the custom lowering passes.

EXPORT void Halide::Func::debug_to_file ( const std::string &  filename)

When this function is compiled, include code that dumps its values to a file after it is realized, for the purpose of debugging.

If filename ends in ".tif" or ".tiff" (case insensitive) the file is in TIFF format and can be read by standard tools. Oherwise, the file format is as follows:

All data is in the byte-order of the target platform. First, a 20 byte-header containing four 32-bit ints, giving the extents of the first four dimensions. Dimensions beyond four are folded into the fourth. Then, a fifth 32-bit int giving the data type of the function. The typecodes are given by: float = 0, double = 1, uint8_t = 2, int8_t = 3, uint16_t = 4, int16_t = 5, uint32_t = 6, int32_t = 7, uint64_t = 8, int64_t = 9. The data follows the header, as a densely packed array of the given size and the given type. If given the extension .tmp, this file format can be natively read by the program ImageStack.

EXPORT const std::string& Halide::Func::name ( ) const

The name of this function, either given during construction, or automatically generated.

EXPORT std::vector<Var> Halide::Func::args ( ) const

Get the pure arguments.

EXPORT Expr Halide::Func::value ( ) const

The right-hand-side value of the pure definition of this function.

Causes an error if there's no pure definition, or if the function is defined to return multiple values.

EXPORT Tuple Halide::Func::values ( ) const

The values returned by this function.

An error if the function has not been been defined. Returns a Tuple with one element for functions defined to return a single value.

EXPORT bool Halide::Func::defined ( ) const

Does this function have at least a pure definition.

EXPORT const std::vector<Expr>& Halide::Func::update_args ( int  idx = 0) const

Get the left-hand-side of the update definition.

An empty vector if there's no update definition. If there are multiple update definitions for this function, use the argument to select which one you want.

EXPORT Expr Halide::Func::update_value ( int  idx = 0) const

Get the right-hand-side of an update definition.

An error if there's no update definition. If there are multiple update definitions for this function, use the argument to select which one you want.

EXPORT Tuple Halide::Func::update_values ( int  idx = 0) const

Get the right-hand-side of an update definition for functions that returns multiple values.

An error if there's no update definition. Returns a Tuple with one element for functions that return a single value.

EXPORT std::vector<RVar> Halide::Func::rvars ( int  idx = 0) const

Get the RVars of the reduction domain for an update definition, if there is one.

EXPORT bool Halide::Func::has_update_definition ( ) const

Does this function have at least one update definition?

EXPORT int Halide::Func::num_update_definitions ( ) const

How many update definitions does this function have?

EXPORT bool Halide::Func::is_extern ( ) const

Is this function an external stage? That is, was it defined using define_extern?

EXPORT void Halide::Func::define_extern ( const std::string &  function_name,
const std::vector< ExternFuncArgument > &  params,
Type  t,
int  dimensionality,
NameMangling  mangling,
bool  uses_old_buffer_t 
)
inline

Add an extern definition for this Func.

This lets you define a Func that represents an external pipeline stage. You can, for example, use it to wrap a call to an extern library such as fftw.

Definition at line 934 of file Func.h.

References Halide::Host.

EXPORT void Halide::Func::define_extern ( const std::string &  function_name,
const std::vector< ExternFuncArgument > &  params,
Type  t,
int  dimensionality,
NameMangling  mangling = NameMangling::Default,
DeviceAPI  device_api = DeviceAPI::Host,
bool  uses_old_buffer_t = false 
)
inline

Add an extern definition for this Func.

This lets you define a Func that represents an external pipeline stage. You can, for example, use it to wrap a call to an extern library such as fftw.

Definition at line 944 of file Func.h.

EXPORT void Halide::Func::define_extern ( const std::string &  function_name,
const std::vector< ExternFuncArgument > &  params,
const std::vector< Type > &  types,
int  dimensionality,
NameMangling  mangling,
bool  uses_old_buffer_t 
)
inline

Add an extern definition for this Func.

This lets you define a Func that represents an external pipeline stage. You can, for example, use it to wrap a call to an extern library such as fftw.

Definition at line 955 of file Func.h.

References Halide::Host.

EXPORT void Halide::Func::define_extern ( const std::string &  function_name,
const std::vector< ExternFuncArgument > &  params,
const std::vector< Type > &  types,
int  dimensionality,
NameMangling  mangling = NameMangling::Default,
DeviceAPI  device_api = DeviceAPI::Host,
bool  uses_old_buffer_t = false 
)

Add an extern definition for this Func.

This lets you define a Func that represents an external pipeline stage. You can, for example, use it to wrap a call to an extern library such as fftw.

EXPORT const std::vector<Type>& Halide::Func::output_types ( ) const

Get the types of the outputs of this Func.

Examples:
tutorial/lesson_14_types.cpp.
EXPORT int Halide::Func::outputs ( ) const

Get the number of outputs of this Func.

Corresponds to the size of the Tuple this Func was defined to return.

EXPORT const std::string& Halide::Func::extern_function_name ( ) const

Get the name of the extern function called for an extern definition.

EXPORT int Halide::Func::dimensions ( ) const

The dimensionality (number of arguments) of this function.

Zero if the function is not yet defined.

EXPORT FuncRef Halide::Func::operator() ( std::vector< Var ) const

Construct either the left-hand-side of a definition, or a call to a functions that happens to only contain vars as arguments.

If the function has already been defined, and fewer arguments are given than the function has dimensions, then enough implicit vars are added to the end of the argument list to make up the difference (see Var::implicit)

template<typename... Args>
NO_INLINE std::enable_if<Internal::all_are_convertible<Var, Args...>::value, FuncRef>::type Halide::Func::operator() ( Args &&...  args) const
inline

Construct either the left-hand-side of a definition, or a call to a functions that happens to only contain vars as arguments.

If the function has already been defined, and fewer arguments are given than the function has dimensions, then enough implicit vars are added to the end of the argument list to make up the difference (see Var::implicit)

Definition at line 1000 of file Func.h.

References EXPORT, and NO_INLINE.

EXPORT FuncRef Halide::Func::operator() ( std::vector< Expr ) const

Either calls to the function, or the left-hand-side of an update definition (see RDom).

If the function has already been defined, and fewer arguments are given than the function has dimensions, then enough implicit vars are added to the end of the argument list to make up the difference. (see Var::implicit)

template<typename... Args>
NO_INLINE std::enable_if<Internal::all_are_convertible<Expr, Args...>::value, FuncRef>::type Halide::Func::operator() ( Expr  x,
Args &&...  args 
) const
inline

Either calls to the function, or the left-hand-side of an update definition (see RDom).

If the function has already been defined, and fewer arguments are given than the function has dimensions, then enough implicit vars are added to the end of the argument list to make up the difference. (see Var::implicit)

Definition at line 1017 of file Func.h.

References Halide::Auto, EXPORT, Halide::Stage::fuse(), Halide::min(), NO_INLINE, Halide::Stage::parallel(), Halide::Stage::reorder(), Halide::Stage::serial(), Halide::Stage::tile(), Halide::Stage::unroll(), and Halide::Stage::vectorize().

EXPORT Func Halide::Func::in ( const Func f)

Creates and returns a new Func that wraps this Func.

During compilation, Halide replaces all calls to this Func done by 'f' with calls to the wrapper. If this Func is already wrapped for use in 'f', will return the existing wrapper.

For example, g.in(f) would rewrite a pipeline like this:

g(x, y) = ...
f(x, y) = ... g(x, y) ...

into a pipeline like this:

g(x, y) = ...
g_wrap(x, y) = g(x, y)
f(x, y) = ... g_wrap(x, y)

This has a variety of uses. You can use it to schedule this Func differently in the different places it is used:

g(x, y) = ...
f1(x, y) = ... g(x, y) ...
f2(x, y) = ... g(x, y) ...
g.in(f1).compute_at(f1, y).vectorize(x, 8);
g.in(f2).compute_at(f2, x).unroll(x);

You can also use it to stage loads from this Func via some intermediate buffer (perhaps on the stack as in test/performance/block_transpose.cpp, or in shared GPU memory as in test/performance/wrap.cpp). In this we compute the wrapper at tiles of the consuming Funcs like so:

g.compute_root()...
g.in(f).compute_at(f, tiles)...

Func::in() can also be used to compute pieces of a Func into a smaller scratch buffer (perhaps on the GPU) and then copy them into a larger output buffer one tile at a time. See apps/interpolate/interpolate.cpp for an example of this. In this case we compute the Func at tiles of its own wrapper:

f.in(g).compute_root().gpu_tile(...)...
f.compute_at(f.in(g), tiles)...

A similar use of Func::in() wrapping Funcs with multiple update stages in a pure wrapper. The following code:

f(x, y) = x + y;
f(x, y) += 5;
g(x, y) = f(x, y);
f.compute_root();

Is equivalent to:

for y:
for x:
f(x, y) = x + y;
for y:
for x:
f(x, y) += 5
for y:
for x:
g(x, y) = f(x, y)

using Func::in(), we can write:

f(x, y) = x + y;
f(x, y) += 5;
g(x, y) = f(x, y);
f.in(g).compute_root();

which instead produces:

for y:
for x:
f(x, y) = x + y;
f(x, y) += 5
f_wrap(x, y) = f(x, y)
for y:
for x:
g(x, y) = f_wrap(x, y)
EXPORT Func Halide::Func::in ( const std::vector< Func > &  fs)

Create and return a wrapper shared by all the Funcs in 'fs'.

If any of the Funcs in 'fs' already have a custom wrapper, this will throw an error.

EXPORT Func Halide::Func::in ( )

Create and return a global wrapper, which wraps all calls to this Func by any other Func.

If a global wrapper already exists, returns it. The global wrapper is only used by callers for which no custom wrapper has been specified.

EXPORT Func& Halide::Func::split ( VarOrRVar  old,
VarOrRVar  outer,
VarOrRVar  inner,
Expr  factor,
TailStrategy  tail = TailStrategy::Auto 
)

Split a dimension into inner and outer subdimensions with the given names, where the inner dimension iterates from 0 to factor-1.

The inner and outer subdimensions can then be dealt with using the other scheduling calls. It's ok to reuse the old variable name as either the inner or outer variable. The final argument specifies how the tail should be handled if the split factor does not provably divide the extent.

Examples:
tutorial/lesson_05_scheduling_1.cpp, tutorial/lesson_08_scheduling_2.cpp, and tutorial/lesson_09_update_definitions.cpp.
EXPORT Func& Halide::Func::fuse ( VarOrRVar  inner,
VarOrRVar  outer,
VarOrRVar  fused 
)

Join two dimensions into a single fused dimenion.

The fused dimension covers the product of the extents of the inner and outer dimensions given.

Examples:
tutorial/lesson_05_scheduling_1.cpp.
EXPORT Func& Halide::Func::serial ( VarOrRVar  var)

Mark a dimension to be traversed serially.

This is the default.

EXPORT Func& Halide::Func::parallel ( VarOrRVar  var,
Expr  task_size,
TailStrategy  tail = TailStrategy::Auto 
)

Split a dimension by the given task_size, and the parallelize the outer dimension.

This creates parallel tasks that have size task_size. After this call, var refers to the outer dimension of the split. The inner dimension has a new anonymous name. If you wish to mutate it, or schedule with respect to it, do the split manually.

EXPORT Func& Halide::Func::vectorize ( VarOrRVar  var)

Mark a dimension to be computed all-at-once as a single vector.

The dimension should have constant extent - e.g. because it is the inner dimension following a split by a constant factor. For most uses of vectorize you want the two argument form. The variable to be vectorized should be the innermost one.

Examples:
tutorial/lesson_05_scheduling_1.cpp, tutorial/lesson_08_scheduling_2.cpp, tutorial/lesson_09_update_definitions.cpp, tutorial/lesson_10_aot_compilation_generate.cpp, tutorial/lesson_11_cross_compilation.cpp, and tutorial/lesson_15_generators.cpp.
EXPORT Func& Halide::Func::unroll ( VarOrRVar  var)

Mark a dimension to be completely unrolled.

The dimension should have constant extent - e.g. because it is the inner dimension following a split by a constant factor. For most uses of unroll you want the two-argument form.

Examples:
tutorial/lesson_05_scheduling_1.cpp.
EXPORT Func& Halide::Func::vectorize ( VarOrRVar  var,
Expr  factor,
TailStrategy  tail = TailStrategy::Auto 
)

Split a dimension by the given factor, then vectorize the inner dimension.

This is how you vectorize a loop of unknown size. The variable to be vectorized should be the innermost one. After this call, var refers to the outer dimension of the split. 'factor' must be an integer.

EXPORT Func& Halide::Func::unroll ( VarOrRVar  var,
Expr  factor,
TailStrategy  tail = TailStrategy::Auto 
)

Split a dimension by the given factor, then unroll the inner dimension.

This is how you unroll a loop of unknown size by some constant factor. After this call, var refers to the outer dimension of the split. 'factor' must be an integer.

EXPORT Func& Halide::Func::bound ( Var  var,
Expr  min,
Expr  extent 
)

Statically declare that the range over which a function should be evaluated is given by the second and third arguments.

This can let Halide perform some optimizations. E.g. if you know there are going to be 4 color channels, you can completely vectorize the color channel dimension without the overhead of splitting it up. If bounds inference decides that it requires more of this function than the bounds you have stated, a runtime error will occur when you try to run your pipeline.

EXPORT Func& Halide::Func::align_bounds ( Var  var,
Expr  modulus,
Expr  remainder = 0 
)

Expand the region computed so that the min coordinates is congruent to 'remainder' modulo 'modulus', and the extent is a multiple of 'modulus'.

For example, f.align_bounds(x, 2) forces the min and extent realized to be even, and calling f.align_bounds(x, 2, 1) forces the min to be odd and the extent to be even. The region computed always contains the region that would have been computed without this directive, so no assertions are injected.

EXPORT Func& Halide::Func::bound_extent ( Var  var,
Expr  extent 
)

Bound the extent of a Func's realization, but not its min.

This means the dimension can be unrolled or vectorized even when its min is not fixed (for example because it is compute_at tiles of another Func). This can also be useful for forcing a function's allocation to be a fixed size, which often means it can go on the stack.

EXPORT Func& Halide::Func::tile ( VarOrRVar  x,
VarOrRVar  y,
VarOrRVar  xo,
VarOrRVar  yo,
VarOrRVar  xi,
VarOrRVar  yi,
Expr  xfactor,
Expr  yfactor,
TailStrategy  tail = TailStrategy::Auto 
)

Split two dimensions at once by the given factors, and then reorder the resulting dimensions to be xi, yi, xo, yo from innermost outwards.

This gives a tiled traversal.

Examples:
tutorial/lesson_05_scheduling_1.cpp, and tutorial/lesson_08_scheduling_2.cpp.
EXPORT Func& Halide::Func::tile ( VarOrRVar  x,
VarOrRVar  y,
VarOrRVar  xi,
VarOrRVar  yi,
Expr  xfactor,
Expr  yfactor,
TailStrategy  tail = TailStrategy::Auto 
)

A shorter form of tile, which reuses the old variable names as the new outer dimensions.

EXPORT Func& Halide::Func::reorder ( const std::vector< VarOrRVar > &  vars)

Reorder variables to have the given nesting order, from innermost out.

Examples:
tutorial/lesson_05_scheduling_1.cpp.
EXPORT Func& Halide::Func::rename ( VarOrRVar  old_name,
VarOrRVar  new_name 
)

Rename a dimension.

Equivalent to split with a inner size of one.

EXPORT Func& Halide::Func::allow_race_conditions ( )

Specify that race conditions are permitted for this Func, which enables parallelizing over RVars even when Halide cannot prove that it is safe to do so.

Use this with great caution, and only if you can prove to yourself that this is safe, as it may result in a non-deterministic routine that returns different values at different times or on different machines.

EXPORT Stage Halide::Func::specialize ( Expr  condition)

Specialize a Func.

This creates a special-case version of the Func where the given condition is true. The most effective conditions are those of the form param == value, and boolean Params. Consider a simple example:

f(x) = x + select(cond, 0, 1);
f.compute_root();

This is equivalent to:

for (int x = 0; x < width; x++) {
f[x] = x + (cond ? 0 : 1);
}

Adding the scheduling directive:

f.specialize(cond)

makes it equivalent to:

if (cond) {
for (int x = 0; x < width; x++) {
f[x] = x;
}
} else {
for (int x = 0; x < width; x++) {
f[x] = x + 1;
}
}

Note that the inner loops have been simplified. In the first path Halide knows that cond is true, and in the second path Halide knows that it is false.

The specialized version gets its own schedule, which inherits every directive made about the parent Func's schedule so far except for its specializations. This method returns a handle to the new schedule. If you wish to retrieve the specialized sub-schedule again later, you can call this method with the same condition. Consider the following example of scheduling the specialized version:

f(x) = x;
f.compute_root();
f.specialize(width > 1).unroll(x, 2);

Assuming for simplicity that width is even, this is equivalent to:

if (width > 1) {
for (int x = 0; x < width/2; x++) {
f[2*x] = 2*x;
f[2*x + 1] = 2*x + 1;
}
} else {
for (int x = 0; x < width/2; x++) {
f[x] = x;
}
}

For this case, it may be better to schedule the un-specialized case instead:

f(x) = x;
f.compute_root();
f.specialize(width == 1); // Creates a copy of the schedule so far.
f.unroll(x, 2); // Only applies to the unspecialized case.

This is equivalent to:

if (width == 1) {
f[0] = 0;
} else {
for (int x = 0; x < width/2; x++) {
f[2*x] = 2*x;
f[2*x + 1] = 2*x + 1;
}
}

This can be a good way to write a pipeline that splits, vectorizes, or tiles, but can still handle small inputs.

If a Func has several specializations, the first matching one will be used, so the order in which you define specializations is significant. For example:

f(x) = x + select(cond1, a, b) - select(cond2, c, d);
f.specialize(cond1);
f.specialize(cond2);

is equivalent to:

if (cond1) {
for (int x = 0; x < width; x++) {
f[x] = x + a - (cond2 ? c : d);
}
} else if (cond2) {
for (int x = 0; x < width; x++) {
f[x] = x + b - c;
}
} else {
for (int x = 0; x < width; x++) {
f[x] = x + b - d;
}
}

Specializations may in turn be specialized, which creates a nested if statement in the generated code.

f(x) = x + select(cond1, a, b) - select(cond2, c, d);
f.specialize(cond1).specialize(cond2);

This is equivalent to:

if (cond1) {
if (cond2) {
for (int x = 0; x < width; x++) {
f[x] = x + a - c;
}
} else {
for (int x = 0; x < width; x++) {
f[x] = x + a - d;
}
}
} else {
for (int x = 0; x < width; x++) {
f[x] = x + b - (cond2 ? c : d);
}
}

To create a 4-way if statement that simplifies away all of the ternary operators above, you could say:

f.specialize(cond1).specialize(cond2);
f.specialize(cond2);

or

f.specialize(cond1 && cond2);
f.specialize(cond1);
f.specialize(cond2);

Any prior Func which is compute_at some variable of this Func gets separately included in all paths of the generated if statement. The Var in the compute_at call to must exist in all paths, but it may have been generated via a different path of splits, fuses, and renames. This can be used somewhat creatively. Consider the following code:

g(x, y) = 8*x;
f(x, y) = g(x, y) + 1;
f.compute_root().specialize(cond);
Var g_loop;
f.specialize(cond).rename(y, g_loop);
f.rename(x, g_loop);
g.compute_at(f, g_loop);

When cond is true, this is equivalent to g.compute_at(f,y). When it is false, this is equivalent to g.compute_at(f,x).

EXPORT void Halide::Func::specialize_fail ( const std::string &  message)

Add a specialization to a Func that always terminates execution with a call to halide_error().

By itself, this is of limited use, but can be useful to terminate chains of specialize() calls where no "default" case is expected (thus avoiding unnecessary code generation).

For instance, say we want to optimize a pipeline to process images in planar and interleaved format; we might typically do something like:

ImageParam im(UInt(8), 3);
Func f = do_something_with(im);
f.specialize(im.dim(0).stride() == 1).vectorize(x, 8); // planar
f.specialize(im.dim(2).stride() == 1).reorder(c, x, y).vectorize(c); // interleaved

This code will vectorize along rows for the planar case, and across pixel components for the interleaved case... but there is an implicit "else" for the unhandled cases, which generates unoptimized code. If we never anticipate passing any other sort of images to this, we code streamline our code by adding specialize_fail():

ImageParam im(UInt(8), 3);
Func f = do_something(im);
f.specialize(im.dim(0).stride() == 1).vectorize(x, 8); // planar
f.specialize(im.dim(2).stride() == 1).reorder(c, x, y).vectorize(c); // interleaved
f.specialize_fail("Unhandled image format");

Conceptually, this produces codes like:

if (im.dim(0).stride() == 1) {
do_something_planar();
} else if (im.dim(2).stride() == 1) {
do_something_interleaved();
} else {
halide_error("Unhandled image format");
}

Note that calling specialize_fail() terminates the specialization chain for a given Func; you cannot create new specializations for the Func afterwards (though you can retrieve handles to previous specializations).

EXPORT Func& Halide::Func::gpu_threads ( VarOrRVar  thread_x,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU thread indices.

This is useful if you compute a producer function within the block indices of a consumer function, and want to control how that function's dimensions map to GPU threads. If the selected target is not an appropriate GPU, this just marks those dimensions as parallel.

EXPORT Func& Halide::Func::gpu_threads ( VarOrRVar  thread_x,
VarOrRVar  thread_y,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU thread indices.

This is useful if you compute a producer function within the block indices of a consumer function, and want to control how that function's dimensions map to GPU threads. If the selected target is not an appropriate GPU, this just marks those dimensions as parallel.

EXPORT Func& Halide::Func::gpu_threads ( VarOrRVar  thread_x,
VarOrRVar  thread_y,
VarOrRVar  thread_z,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU thread indices.

This is useful if you compute a producer function within the block indices of a consumer function, and want to control how that function's dimensions map to GPU threads. If the selected target is not an appropriate GPU, this just marks those dimensions as parallel.

EXPORT Func& Halide::Func::gpu_single_thread ( DeviceAPI  device_api = DeviceAPI::Default_GPU)

Tell Halide to run this stage using a single gpu thread and block.

This is not an efficient use of your GPU, but it can be useful to avoid copy-back for intermediate update stages that touch a very small part of your Func.

Referenced by Halide::Internal::schedule_scalar().

EXPORT Func& Halide::Func::gpu_blocks ( VarOrRVar  block_x,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU block indices.

This is useful for scheduling stages that will run serially within each GPU block. If the selected target is not ptx, this just marks those dimensions as parallel.

EXPORT Func& Halide::Func::gpu_blocks ( VarOrRVar  block_x,
VarOrRVar  block_y,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU block indices.

This is useful for scheduling stages that will run serially within each GPU block. If the selected target is not ptx, this just marks those dimensions as parallel.

EXPORT Func& Halide::Func::gpu_blocks ( VarOrRVar  block_x,
VarOrRVar  block_y,
VarOrRVar  block_z,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU block indices.

This is useful for scheduling stages that will run serially within each GPU block. If the selected target is not ptx, this just marks those dimensions as parallel.

EXPORT Func& Halide::Func::gpu ( VarOrRVar  block_x,
VarOrRVar  thread_x,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU block indices and thread indices.

If the selected target is not ptx, these just mark the given dimensions as parallel. The dimensions are consumed by this call, so do all other unrolling, reordering, etc first.

EXPORT Func& Halide::Func::gpu ( VarOrRVar  block_x,
VarOrRVar  block_y,
VarOrRVar  thread_x,
VarOrRVar  thread_y,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU block indices and thread indices.

If the selected target is not ptx, these just mark the given dimensions as parallel. The dimensions are consumed by this call, so do all other unrolling, reordering, etc first.

EXPORT Func& Halide::Func::gpu ( VarOrRVar  block_x,
VarOrRVar  block_y,
VarOrRVar  block_z,
VarOrRVar  thread_x,
VarOrRVar  thread_y,
VarOrRVar  thread_z,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Tell Halide that the following dimensions correspond to GPU block indices and thread indices.

If the selected target is not ptx, these just mark the given dimensions as parallel. The dimensions are consumed by this call, so do all other unrolling, reordering, etc first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
VarOrRVar  bx,
Var  tx,
Expr  x_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
VarOrRVar  bx,
RVar  tx,
Expr  x_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
VarOrRVar  tx,
Expr  x_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
VarOrRVar  y,
VarOrRVar  bx,
VarOrRVar  by,
VarOrRVar  tx,
VarOrRVar  ty,
Expr  x_size,
Expr  y_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
VarOrRVar  y,
VarOrRVar  tx,
Var  ty,
Expr  x_size,
Expr  y_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
VarOrRVar  y,
VarOrRVar  tx,
RVar  ty,
Expr  x_size,
Expr  y_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
VarOrRVar  y,
VarOrRVar  z,
VarOrRVar  bx,
VarOrRVar  by,
VarOrRVar  bz,
VarOrRVar  tx,
VarOrRVar  ty,
VarOrRVar  tz,
Expr  x_size,
Expr  y_size,
Expr  z_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
VarOrRVar  y,
VarOrRVar  z,
VarOrRVar  tx,
VarOrRVar  ty,
VarOrRVar  tz,
Expr  x_size,
Expr  y_size,
Expr  z_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
Expr  x_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
VarOrRVar  y,
Expr  x_size,
Expr  y_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::gpu_tile ( VarOrRVar  x,
VarOrRVar  y,
VarOrRVar  z,
Expr  x_size,
Expr  y_size,
Expr  z_size,
TailStrategy  tail = TailStrategy::Auto,
DeviceAPI  device_api = DeviceAPI::Default_GPU 
)

Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.

Consumes the variables given, so do all other scheduling first.

EXPORT Func& Halide::Func::shader ( Var  x,
Var  y,
Var  c,
DeviceAPI  device_api 
)

Schedule for execution using coordinate-based hardware api.

GLSL is an example of this. Conceptually, this is similar to parallelization over 'x' and 'y' (since GLSL shaders compute individual output pixels in parallel) and vectorization over 'c' (since GLSL/RS implicitly vectorizes the color channel).

EXPORT Func& Halide::Func::glsl ( Var  x,
Var  y,
Var  c 
)

Schedule for execution as GLSL kernel.

EXPORT Func& Halide::Func::hexagon ( VarOrRVar  x = Var::outermost())

Schedule for execution on Hexagon.

When a loop is marked with Hexagon, that loop is executed on a Hexagon DSP.

Referenced by Halide::Internal::schedule_scalar().

EXPORT Func& Halide::Func::prefetch ( const Func f,
VarOrRVar  var,
Expr  offset = 1,
PrefetchBoundStrategy  strategy = PrefetchBoundStrategy::GuardWithIf 
)

Prefetch data written to or read from a Func or an ImageParam by a subsequent loop iteration, at an optionally specified iteration offset.

'var' specifies at which loop level the prefetch calls should be inserted. The final argument specifies how prefetch of region outside bounds should be handled.

For example, consider this pipeline:

Func f, g;
Var x, y;
f(x, y) = x + y;
g(x, y) = 2 * f(x, y);

The following schedule:

f.compute_root();

will inject prefetch call at the innermost loop of 'g' and generate the following loop nest: for y = ... for x = ... f(x, y) = x + y for y = .. for x = ... prefetch(&f[x + 2, y], 1, 16); g(x, y) = 2 * f(x, y)

EXPORT Func& Halide::Func::prefetch ( const Internal::Parameter param,
VarOrRVar  var,
Expr  offset = 1,
PrefetchBoundStrategy  strategy = PrefetchBoundStrategy::GuardWithIf 
)

Prefetch data written to or read from a Func or an ImageParam by a subsequent loop iteration, at an optionally specified iteration offset.

'var' specifies at which loop level the prefetch calls should be inserted. The final argument specifies how prefetch of region outside bounds should be handled.

For example, consider this pipeline:

Func f, g;
Var x, y;
f(x, y) = x + y;
g(x, y) = 2 * f(x, y);

The following schedule:

f.compute_root();

will inject prefetch call at the innermost loop of 'g' and generate the following loop nest: for y = ... for x = ... f(x, y) = x + y for y = .. for x = ... prefetch(&f[x + 2, y], 1, 16); g(x, y) = 2 * f(x, y)

template<typename T >
Func& Halide::Func::prefetch ( const T &  image,
VarOrRVar  var,
Expr  offset = 1,
PrefetchBoundStrategy  strategy = PrefetchBoundStrategy::GuardWithIf 
)
inline

Prefetch data written to or read from a Func or an ImageParam by a subsequent loop iteration, at an optionally specified iteration offset.

'var' specifies at which loop level the prefetch calls should be inserted. The final argument specifies how prefetch of region outside bounds should be handled.

For example, consider this pipeline:

Func f, g;
Var x, y;
f(x, y) = x + y;
g(x, y) = 2 * f(x, y);

The following schedule:

f.compute_root();

will inject prefetch call at the innermost loop of 'g' and generate the following loop nest: for y = ... for x = ... f(x, y) = x + y for y = .. for x = ... prefetch(&f[x + 2, y], 1, 16); g(x, y) = 2 * f(x, y)

Definition at line 1605 of file Func.h.

References Halide::Stage::prefetch().

EXPORT Func& Halide::Func::reorder_storage ( const std::vector< Var > &  dims)

Specify how the storage for the function is laid out.

These calls let you specify the nesting order of the dimensions. For example, foo.reorder_storage(y, x) tells Halide to use column-major storage for any realizations of foo, without changing how you refer to foo in the code. You may want to do this if you intend to vectorize across y. When representing color images, foo.reorder_storage(c, x, y) specifies packed storage (red, green, and blue values adjacent in memory), and foo.reorder_storage(x, y, c) specifies planar storage (entire red, green, and blue images one after the other in memory).

If you leave out some dimensions, those remain in the same positions in the nesting order while the specified variables are reordered around them.

EXPORT Func& Halide::Func::reorder_storage ( Var  x,
Var  y 
)

Specify how the storage for the function is laid out.

These calls let you specify the nesting order of the dimensions. For example, foo.reorder_storage(y, x) tells Halide to use column-major storage for any realizations of foo, without changing how you refer to foo in the code. You may want to do this if you intend to vectorize across y. When representing color images, foo.reorder_storage(c, x, y) specifies packed storage (red, green, and blue values adjacent in memory), and foo.reorder_storage(x, y, c) specifies planar storage (entire red, green, and blue images one after the other in memory).

If you leave out some dimensions, those remain in the same positions in the nesting order while the specified variables are reordered around them.

template<typename... Args>
NO_INLINE std::enable_if<Internal::all_are_convertible<Var, Args...>::value, Func &>::type Halide::Func::reorder_storage ( Var  x,
Var  y,
Args &&...  args 
)
inline

Specify how the storage for the function is laid out.

These calls let you specify the nesting order of the dimensions. For example, foo.reorder_storage(y, x) tells Halide to use column-major storage for any realizations of foo, without changing how you refer to foo in the code. You may want to do this if you intend to vectorize across y. When representing color images, foo.reorder_storage(c, x, y) specifies packed storage (red, green, and blue values adjacent in memory), and foo.reorder_storage(x, y, c) specifies planar storage (entire red, green, and blue images one after the other in memory).

If you leave out some dimensions, those remain in the same positions in the nesting order while the specified variables are reordered around them.

Definition at line 1631 of file Func.h.

References EXPORT.

EXPORT Func& Halide::Func::align_storage ( Var  dim,
Expr  alignment 
)

Pad the storage extent of a particular dimension of realizations of this function up to be a multiple of the specified alignment.

This guarantees that the strides for the dimensions stored outside of dim will be multiples of the specified alignment, where the strides and alignment are measured in numbers of elements.

For example, to guarantee that a function foo(x, y, c) representing an image has scanlines starting on offsets aligned to multiples of 16, use foo.align_storage(x, 16).

EXPORT Func& Halide::Func::fold_storage ( Var  dim,
Expr  extent,
bool  fold_forward = true 
)

Store realizations of this function in a circular buffer of a given extent.

This is more efficient when the extent of the circular buffer is a power of 2. If the fold factor is too small, or the dimension is not accessed monotonically, the pipeline will generate an error at runtime.

The fold_forward option indicates that the new values of the producer are accessed by the consumer in a monotonically increasing order. Folding storage of producers is also supported if the new values are accessed in a monotonically decreasing order by setting fold_forward to false.

For example, consider the pipeline:

Func f, g;
Var x, y;
g(x, y) = x*y;
f(x, y) = g(x, y) + g(x, y+1);

If we schedule f like so:

g.compute_at(f, y).store_root().fold_storage(y, 2);

Then g will be computed at each row of f and stored in a buffer with an extent in y of 2, alternately storing each computed row of g in row y=0 or y=1.

EXPORT Func& Halide::Func::compute_at ( Func  f,
Var  var 
)

Compute this function as needed for each unique value of the given var for the given calling function f.

For example, consider the simple pipeline:

Func f, g;
Var x, y;
g(x, y) = x*y;
f(x, y) = g(x, y) + g(x, y+1) + g(x+1, y) + g(x+1, y+1);

If we schedule f like so:

g.compute_at(f, x);

Then the C code equivalent to this pipeline will look like this

int f[height][width];
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
int g[2][2];
g[0][0] = x*y;
g[0][1] = (x+1)*y;
g[1][0] = x*(y+1);
g[1][1] = (x+1)*(y+1);
f[y][x] = g[0][0] + g[1][0] + g[0][1] + g[1][1];
}
}

The allocation and computation of g is within f's loop over x, and enough of g is computed to satisfy all that f will need for that iteration. This has excellent locality - values of g are used as soon as they are computed, but it does redundant work. Each value of g ends up getting computed four times. If we instead schedule f like so:

g.compute_at(f, y);

The equivalent C code is:

int f[height][width];
for (int y = 0; y < height; y++) {
int g[2][width+1];
for (int x = 0; x < width; x++) {
g[0][x] = x*y;
g[1][x] = x*(y+1);
}
for (int x = 0; x < width; x++) {
f[y][x] = g[0][x] + g[1][x] + g[0][x+1] + g[1][x+1];
}
}

The allocation and computation of g is within f's loop over y, and enough of g is computed to satisfy all that f will need for that iteration. This does less redundant work (each point in g ends up being evaluated twice), but the locality is not quite as good, and we have to allocate more temporary memory to store g.

Examples:
tutorial/lesson_08_scheduling_2.cpp, tutorial/lesson_09_update_definitions.cpp, and tutorial/lesson_15_generators.cpp.
EXPORT Func& Halide::Func::compute_at ( Func  f,
RVar  var 
)

Schedule a function to be computed within the iteration over some dimension of an update domain.

Produces equivalent code to the version of compute_at that takes a Var.

EXPORT Func& Halide::Func::compute_at ( LoopLevel  loop_level)

Schedule a function to be computed within the iteration over a given LoopLevel.

EXPORT Func& Halide::Func::compute_root ( )

Compute all of this function once ahead of time.

Reusing the example in Func::compute_at :

Func f, g;
Var x, y;
g(x, y) = x*y;
f(x, y) = g(x, y) + g(x, y+1) + g(x+1, y) + g(x+1, y+1);
g.compute_root();

is equivalent to

int f[height][width];
int g[height+1][width+1];
for (int y = 0; y < height+1; y++) {
for (int x = 0; x < width+1; x++) {
g[y][x] = x*y;
}
}
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
f[y][x] = g[y][x] + g[y+1][x] + g[y][x+1] + g[y+1][x+1];
}
}

g is computed once ahead of time, and enough is computed to satisfy all uses of it. This does no redundant work (each point in g is evaluated once), but has poor locality (values of g are probably not still in cache when they are used by f), and allocates lots of temporary memory to store g.

Examples:
tutorial/lesson_08_scheduling_2.cpp.
EXPORT Func& Halide::Func::memoize ( )

Use the halide_memoization_cache_...

interface to store a computed version of this function across invocations of the Func.

EXPORT Func& Halide::Func::store_at ( Func  f,
Var  var 
)

Allocate storage for this function within f's loop over var.

Scheduling storage is optional, and can be used to separate the loop level at which storage occurs from the loop level at which computation occurs to trade off between locality and redundant work. This can open the door for two types of optimization.

Consider again the pipeline from Func::compute_at :

Func f, g;
Var x, y;
g(x, y) = x*y;
f(x, y) = g(x, y) + g(x+1, y) + g(x, y+1) + g(x+1, y+1);

If we schedule it like so:

g.compute_at(f, x).store_at(f, y);

Then the computation of g takes place within the loop over x, but the storage takes place within the loop over y:

int f[height][width];
for (int y = 0; y < height; y++) {
int g[2][width+1];
for (int x = 0; x < width; x++) {
g[0][x] = x*y;
g[0][x+1] = (x+1)*y;
g[1][x] = x*(y+1);
g[1][x+1] = (x+1)*(y+1);
f[y][x] = g[0][x] + g[1][x] + g[0][x+1] + g[1][x+1];
}
}

Provided the for loop over x is serial, halide then automatically performs the following sliding window optimization:

int f[height][width];
for (int y = 0; y < height; y++) {
int g[2][width+1];
for (int x = 0; x < width; x++) {
if (x == 0) {
g[0][x] = x*y;
g[1][x] = x*(y+1);
}
g[0][x+1] = (x+1)*y;
g[1][x+1] = (x+1)*(y+1);
f[y][x] = g[0][x] + g[1][x] + g[0][x+1] + g[1][x+1];
}
}

Two of the assignments to g only need to be done when x is zero. The rest of the time, those sites have already been filled in by a previous iteration. This version has the locality of compute_at(f, x), but allocates more memory and does much less redundant work.

Halide then further optimizes this pipeline like so:

int f[height][width];
for (int y = 0; y < height; y++) {
int g[2][2];
for (int x = 0; x < width; x++) {
if (x == 0) {
g[0][0] = x*y;
g[1][0] = x*(y+1);
}
g[0][(x+1)%2] = (x+1)*y;
g[1][(x+1)%2] = (x+1)*(y+1);
f[y][x] = g[0][x%2] + g[1][x%2] + g[0][(x+1)%2] + g[1][(x+1)%2];
}
}

Halide has detected that it's possible to use a circular buffer to represent g, and has reduced all accesses to g modulo 2 in the x dimension. This optimization only triggers if the for loop over x is serial, and if halide can statically determine some power of two large enough to cover the range needed. For powers of two, the modulo operator compiles to more efficient bit-masking. This optimization reduces memory usage, and also improves locality by reusing recently-accessed memory instead of pulling new memory into cache.

Examples:
tutorial/lesson_08_scheduling_2.cpp, and tutorial/lesson_09_update_definitions.cpp.
EXPORT Func& Halide::Func::store_at ( Func  f,
RVar  var 
)

Equivalent to the version of store_at that takes a Var, but schedules storage within the loop over a dimension of a reduction domain.

EXPORT Func& Halide::Func::store_at ( LoopLevel  loop_level)

Equivalent to the version of store_at that takes a Var, but schedules storage at a given LoopLevel.

EXPORT Func& Halide::Func::store_root ( )

Equivalent to Func::store_at, but schedules storage outside the outermost loop.

Examples:
tutorial/lesson_08_scheduling_2.cpp.
EXPORT Func& Halide::Func::compute_inline ( )

Aggressively inline all uses of this function.

This is the default schedule, so you're unlikely to need to call this. For a Func with an update definition, that means it gets computed as close to the innermost loop as possible.

Consider once more the pipeline from Func::compute_at :

Func f, g;
Var x, y;
g(x, y) = x*y;
f(x, y) = g(x, y) + g(x+1, y) + g(x, y+1) + g(x+1, y+1);

Leaving g as inline, this compiles to code equivalent to the following C:

int f[height][width];
for (int y = 0; y < height; y++) {
for (int x = 0; x < width; x++) {
f[y][x] = x*y + x*(y+1) + (x+1)*y + (x+1)*(y+1);
}
}
EXPORT Stage Halide::Func::update ( int  idx = 0)

Get a handle on an update step for the purposes of scheduling it.

Examples:
tutorial/lesson_09_update_definitions.cpp.
EXPORT Func& Halide::Func::trace_loads ( )

Trace all loads from this Func by emitting calls to halide_trace.

If the Func is inlined, this has no effect.

Examples:
tutorial/lesson_09_update_definitions.cpp.
EXPORT Func& Halide::Func::trace_stores ( )

Trace all stores to the buffer backing this Func by emitting calls to halide_trace.

If the Func is inlined, this call has no effect.

Examples:
tutorial/lesson_04_debugging_2.cpp, tutorial/lesson_05_scheduling_1.cpp, tutorial/lesson_06_realizing_over_shifted_domains.cpp, tutorial/lesson_08_scheduling_2.cpp, and tutorial/lesson_09_update_definitions.cpp.
EXPORT Func& Halide::Func::trace_realizations ( )

Trace all realizations of this Func by emitting calls to halide_trace.

Internal::Function Halide::Func::function ( ) const
inline

Get a handle on the internal halide function that this Func represents.

Useful if you want to do introspection on Halide functions

Definition at line 1962 of file Func.h.

EXPORT Halide::Func::operator Stage ( ) const

You can cast a Func to its pure stage for the purposes of scheduling it.

EXPORT OutputImageParam Halide::Func::output_buffer ( ) const

Get a handle on the output buffer for this Func.

Only relevant if this is the output Func in a pipeline. Useful for making static promises about strides, mins, and extents.

EXPORT std::vector<OutputImageParam> Halide::Func::output_buffers ( ) const

Get a handle on the output buffer for this Func.

Only relevant if this is the output Func in a pipeline. Useful for making static promises about strides, mins, and extents.

Halide::Func::operator ExternFuncArgument ( ) const
inline

Use a Func as an argument to an external stage.

Definition at line 1979 of file Func.h.

EXPORT std::vector<Argument> Halide::Func::infer_arguments ( ) const

Infer the arguments to the Func, sorted into a canonical order: all buffers (sorted alphabetically by name), followed by all non-buffers (sorted alphabetically by name).

This lets you write things like:

func.compile_to_assembly("/dev/stdout", func.infer_arguments());

The documentation for this class was generated from the following file: