Halide 19.0.0
Halide compiler and libraries
|
A halide function. More...
#include <Func.h>
Public Member Functions | |
Func (const std::string &name) | |
Declare a new undefined function with the given name. | |
Func (const Type &required_type, int required_dims, const std::string &name) | |
Declare a new undefined function with the given name. | |
Func (const std::vector< Type > &required_types, int required_dims, const std::string &name) | |
Declare a new undefined function with the given name. | |
Func () | |
Declare a new undefined function with an automatically-generated unique name. | |
Func (const Expr &e) | |
Declare a new function with an automatically-generated unique name, and define it to return the given expression (which may not contain free variables). | |
Func (Internal::Function f) | |
Construct a new Func to wrap an existing, already-define Function object. | |
template<typename T , int Dims> | |
HALIDE_NO_USER_CODE_INLINE | Func (Buffer< T, Dims > &im) |
Construct a new Func to wrap a Buffer. | |
Realization | realize (std::vector< int32_t > sizes={}, const Target &target=Target()) |
Evaluate this function over some rectangular domain and return the resulting buffer or buffers. | |
Realization | realize (JITUserContext *context, std::vector< int32_t > sizes={}, const Target &target=Target()) |
Same as above, but takes a custom user-provided context to be passed to runtime functions. | |
void | realize (Pipeline::RealizationArg outputs, const Target &target=Target()) |
Evaluate this function into an existing allocated buffer or buffers. | |
void | realize (JITUserContext *context, Pipeline::RealizationArg outputs, const Target &target=Target()) |
Same as above, but takes a custom user-provided context to be passed to runtime functions. | |
void | infer_input_bounds (const std::vector< int32_t > &sizes, const Target &target=get_jit_target_from_environment()) |
For a given size of output, or a given output buffer, determine the bounds required of all unbound ImageParams referenced. | |
void | infer_input_bounds (Pipeline::RealizationArg outputs, const Target &target=get_jit_target_from_environment()) |
void | infer_input_bounds (JITUserContext *context, const std::vector< int32_t > &sizes, const Target &target=get_jit_target_from_environment()) |
Versions of infer_input_bounds that take a custom user context to pass to runtime functions. | |
void | infer_input_bounds (JITUserContext *context, Pipeline::RealizationArg outputs, const Target &target=get_jit_target_from_environment()) |
void | compile_to_bitcode (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment()) |
Statically compile this function to llvm bitcode, with the given filename (which should probably end in .bc), type signature, and C function name (which defaults to the same name as this halide function. | |
void | compile_to_bitcode (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment()) |
void | compile_to_llvm_assembly (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment()) |
Statically compile this function to llvm assembly, with the given filename (which should probably end in .ll), type signature, and C function name (which defaults to the same name as this halide function. | |
void | compile_to_llvm_assembly (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment()) |
void | compile_to_object (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment()) |
Statically compile this function to an object file, with the given filename (which should probably end in .o or .obj), type signature, and C function name (which defaults to the same name as this halide function. | |
void | compile_to_object (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment()) |
void | compile_to_header (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name="", const Target &target=get_target_from_environment()) |
Emit a header file with the given filename for this function. | |
void | compile_to_assembly (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name, const Target &target=get_target_from_environment()) |
Statically compile this function to text assembly equivalent to the object file generated by compile_to_object. | |
void | compile_to_assembly (const std::string &filename, const std::vector< Argument > &, const Target &target=get_target_from_environment()) |
void | compile_to_c (const std::string &filename, const std::vector< Argument > &, const std::string &fn_name="", const Target &target=get_target_from_environment()) |
Statically compile this function to C source code. | |
void | compile_to_lowered_stmt (const std::string &filename, const std::vector< Argument > &args, StmtOutputFormat fmt=Text, const Target &target=get_target_from_environment()) |
Write out an internal representation of lowered code. | |
void | compile_to_conceptual_stmt (const std::string &filename, const std::vector< Argument > &args, StmtOutputFormat fmt=Text, const Target &target=get_target_from_environment()) |
Write out a conceptual representation of lowered code, before any parallel loop get factored out into separate functions, or GPU loops are offloaded to kernel code.r Useful for analyzing and debugging scheduling. | |
void | print_loop_nest () |
Write out the loop nests specified by the schedule for this Function. | |
void | compile_to_file (const std::string &filename_prefix, const std::vector< Argument > &args, const std::string &fn_name="", const Target &target=get_target_from_environment()) |
Compile to object file and header pair, with the given arguments. | |
void | compile_to_static_library (const std::string &filename_prefix, const std::vector< Argument > &args, const std::string &fn_name="", const Target &target=get_target_from_environment()) |
Compile to static-library file and header pair, with the given arguments. | |
void | compile_to_multitarget_static_library (const std::string &filename_prefix, const std::vector< Argument > &args, const std::vector< Target > &targets) |
Compile to static-library file and header pair once for each target; each resulting function will be considered (in order) via halide_can_use_target_features() at runtime, with the first appropriate match being selected for subsequent use. | |
void | compile_to_multitarget_object_files (const std::string &filename_prefix, const std::vector< Argument > &args, const std::vector< Target > &targets, const std::vector< std::string > &suffixes) |
Like compile_to_multitarget_static_library(), except that the object files are all output as object files (rather than bundled into a static library). | |
Module | compile_to_module (const std::vector< Argument > &args, const std::string &fn_name="", const Target &target=get_target_from_environment()) |
Store an internal representation of lowered code as a self contained Module suitable for further compilation. | |
void | compile_to (const std::map< OutputFileType, std::string > &output_files, const std::vector< Argument > &args, const std::string &fn_name, const Target &target=get_target_from_environment()) |
Compile and generate multiple target files with single call. | |
void | compile_jit (const Target &target=get_jit_target_from_environment()) |
Eagerly jit compile the function to machine code. | |
JITHandlers & | jit_handlers () |
Get a struct containing the currently set custom functions used by JIT. | |
Callable | compile_to_callable (const std::vector< Argument > &args, const Target &target=get_jit_target_from_environment()) |
Eagerly jit compile the function to machine code and return a callable struct that behaves like a function pointer. | |
template<typename T > | |
void | add_custom_lowering_pass (T *pass) |
Add a custom pass to be used during lowering. | |
void | add_custom_lowering_pass (Internal::IRMutator *pass, std::function< void()> deleter) |
Add a custom pass to be used during lowering, with the function that will be called to delete it also passed in. | |
void | clear_custom_lowering_passes () |
Remove all previously-set custom lowering passes. | |
const std::vector< CustomLoweringPass > & | custom_lowering_passes () |
Get the custom lowering passes. | |
void | debug_to_file (const std::string &filename) |
When this function is compiled, include code that dumps its values to a file after it is realized, for the purpose of debugging. | |
const std::string & | name () const |
The name of this function, either given during construction, or automatically generated. | |
std::vector< Var > | args () const |
Get the pure arguments. | |
Expr | value () const |
The right-hand-side value of the pure definition of this function. | |
Tuple | values () const |
The values returned by this function. | |
bool | defined () const |
Does this function have at least a pure definition. | |
const std::vector< Expr > & | update_args (int idx=0) const |
Get the left-hand-side of the update definition. | |
Expr | update_value (int idx=0) const |
Get the right-hand-side of an update definition. | |
Tuple | update_values (int idx=0) const |
Get the right-hand-side of an update definition for functions that returns multiple values. | |
std::vector< RVar > | rvars (int idx=0) const |
Get the RVars of the reduction domain for an update definition, if there is one. | |
bool | has_update_definition () const |
Does this function have at least one update definition? | |
int | num_update_definitions () const |
How many update definitions does this function have? | |
bool | is_extern () const |
Is this function an external stage? That is, was it defined using define_extern? | |
void | define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > ¶ms, Type t, int dimensionality, NameMangling mangling=NameMangling::Default, DeviceAPI device_api=DeviceAPI::Host) |
Add an extern definition for this Func. | |
void | define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > ¶ms, const std::vector< Type > &types, int dimensionality, NameMangling mangling) |
void | define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > ¶ms, const std::vector< Type > &types, int dimensionality, NameMangling mangling=NameMangling::Default, DeviceAPI device_api=DeviceAPI::Host) |
void | define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > ¶ms, Type t, const std::vector< Var > &arguments, NameMangling mangling=NameMangling::Default, DeviceAPI device_api=DeviceAPI::Host) |
void | define_extern (const std::string &function_name, const std::vector< ExternFuncArgument > ¶ms, const std::vector< Type > &types, const std::vector< Var > &arguments, NameMangling mangling=NameMangling::Default, DeviceAPI device_api=DeviceAPI::Host) |
const Type & | type () const |
Get the type(s) of the outputs of this Func. | |
const std::vector< Type > & | types () const |
int | outputs () const |
Get the number of outputs of this Func. | |
const std::string & | extern_function_name () const |
Get the name of the extern function called for an extern definition. | |
int | dimensions () const |
The dimensionality (number of arguments) of this function. | |
FuncRef | operator() (std::vector< Var >) const |
Construct either the left-hand-side of a definition, or a call to a functions that happens to only contain vars as arguments. | |
template<typename... Args> | |
HALIDE_NO_USER_CODE_INLINE std::enable_if< Internal::all_are_convertible< Var, Args... >::value, FuncRef >::type | operator() (Args &&...args) const |
FuncRef | operator() (std::vector< Expr >) const |
Either calls to the function, or the left-hand-side of an update definition (see RDom). | |
template<typename... Args> | |
HALIDE_NO_USER_CODE_INLINE std::enable_if< Internal::all_are_convertible< Expr, Args... >::value, FuncRef >::type | operator() (const Expr &x, Args &&...args) const |
Func | in (const Func &f) |
Creates and returns a new identity Func that wraps this Func. | |
Func | in (const std::vector< Func > &fs) |
Create and return an identity wrapper shared by all the Funcs in 'fs'. | |
Func | in () |
Create and return a global identity wrapper, which wraps all calls to this Func by any other Func. | |
Func | clone_in (const Func &f) |
Similar to Func::in; however, instead of replacing the call to this Func with an identity Func that refers to it, this replaces the call with a clone of this Func. | |
Func | clone_in (const std::vector< Func > &fs) |
Func | copy_to_device (DeviceAPI d=DeviceAPI::Default_GPU) |
Declare that this function should be implemented by a call to halide_buffer_copy with the given target device API. | |
Func | copy_to_host () |
Declare that this function should be implemented by a call to halide_buffer_copy with a NULL target device API. | |
Func & | split (const VarOrRVar &old, const VarOrRVar &outer, const VarOrRVar &inner, const Expr &factor, TailStrategy tail=TailStrategy::Auto) |
Split a dimension into inner and outer subdimensions with the given names, where the inner dimension iterates from 0 to factor-1. | |
Func & | fuse (const VarOrRVar &inner, const VarOrRVar &outer, const VarOrRVar &fused) |
Join two dimensions into a single fused dimension. | |
Func & | serial (const VarOrRVar &var) |
Mark a dimension to be traversed serially. | |
Func & | parallel (const VarOrRVar &var) |
Mark a dimension to be traversed in parallel. | |
Func & | parallel (const VarOrRVar &var, const Expr &task_size, TailStrategy tail=TailStrategy::Auto) |
Split a dimension by the given task_size, and the parallelize the outer dimension. | |
Func & | vectorize (const VarOrRVar &var) |
Mark a dimension to be computed all-at-once as a single vector. | |
Func & | unroll (const VarOrRVar &var) |
Mark a dimension to be completely unrolled. | |
Func & | vectorize (const VarOrRVar &var, const Expr &factor, TailStrategy tail=TailStrategy::Auto) |
Split a dimension by the given factor, then vectorize the inner dimension. | |
Func & | unroll (const VarOrRVar &var, const Expr &factor, TailStrategy tail=TailStrategy::Auto) |
Split a dimension by the given factor, then unroll the inner dimension. | |
Func & | partition (const VarOrRVar &var, Partition partition_policy) |
Set the loop partition policy. | |
Func & | never_partition (const std::vector< VarOrRVar > &vars) |
Set the loop partition policy to Never for a vector of Vars and RVars. | |
template<typename... Args> | |
HALIDE_NO_USER_CODE_INLINE std::enable_if< Internal::all_are_convertible< VarOrRVar, Args... >::value, Func & >::type | never_partition (const VarOrRVar &x, Args &&...args) |
Set the loop partition policy to Never for some number of Vars and RVars. | |
Func & | never_partition_all () |
Set the loop partition policy to Never for all Vars and RVar of the initial definition of the Func. | |
Func & | always_partition (const std::vector< VarOrRVar > &vars) |
Set the loop partition policy to Always for a vector of Vars and RVars. | |
template<typename... Args> | |
HALIDE_NO_USER_CODE_INLINE std::enable_if< Internal::all_are_convertible< VarOrRVar, Args... >::value, Func & >::type | always_partition (const VarOrRVar &x, Args &&...args) |
Set the loop partition policy to Always for some number of Vars and RVars. | |
Func & | always_partition_all () |
Set the loop partition policy to Always for all Vars and RVar of the initial definition of the Func. | |
Func & | bound (const Var &var, Expr min, Expr extent) |
Statically declare that the range over which a function should be evaluated is given by the second and third arguments. | |
Func & | set_estimate (const Var &var, const Expr &min, const Expr &extent) |
Statically declare the range over which the function will be evaluated in the general case. | |
Func & | set_estimates (const Region &estimates) |
Set (min, extent) estimates for all dimensions in the Func at once; this is equivalent to calling set_estimate(args()[n], min, extent) repeatedly, but slightly terser. | |
Func & | align_bounds (const Var &var, Expr modulus, Expr remainder=0) |
Expand the region computed so that the min coordinates is congruent to 'remainder' modulo 'modulus', and the extent is a multiple of 'modulus'. | |
Func & | align_extent (const Var &var, Expr modulus) |
Expand the region computed so that the extent is a multiple of 'modulus'. | |
Func & | bound_extent (const Var &var, Expr extent) |
Bound the extent of a Func's realization, but not its min. | |
Func & | tile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &xo, const VarOrRVar &yo, const VarOrRVar &xi, const VarOrRVar &yi, const Expr &xfactor, const Expr &yfactor, TailStrategy tail=TailStrategy::Auto) |
Split two dimensions at once by the given factors, and then reorder the resulting dimensions to be xi, yi, xo, yo from innermost outwards. | |
Func & | tile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &xi, const VarOrRVar &yi, const Expr &xfactor, const Expr &yfactor, TailStrategy tail=TailStrategy::Auto) |
A shorter form of tile, which reuses the old variable names as the new outer dimensions. | |
Func & | tile (const std::vector< VarOrRVar > &previous, const std::vector< VarOrRVar > &outers, const std::vector< VarOrRVar > &inners, const std::vector< Expr > &factors, const std::vector< TailStrategy > &tails) |
A more general form of tile, which defines tiles of any dimensionality. | |
Func & | tile (const std::vector< VarOrRVar > &previous, const std::vector< VarOrRVar > &outers, const std::vector< VarOrRVar > &inners, const std::vector< Expr > &factors, TailStrategy tail=TailStrategy::Auto) |
The generalized tile, with a single tail strategy to apply to all vars. | |
Func & | tile (const std::vector< VarOrRVar > &previous, const std::vector< VarOrRVar > &inners, const std::vector< Expr > &factors, TailStrategy tail=TailStrategy::Auto) |
Generalized tiling, reusing the previous names as the outer names. | |
Func & | reorder (const std::vector< VarOrRVar > &vars) |
Reorder variables to have the given nesting order, from innermost out. | |
template<typename... Args> | |
HALIDE_NO_USER_CODE_INLINE std::enable_if< Internal::all_are_convertible< VarOrRVar, Args... >::value, Func & >::type | reorder (const VarOrRVar &x, const VarOrRVar &y, Args &&...args) |
Func & | rename (const VarOrRVar &old_name, const VarOrRVar &new_name) |
Rename a dimension. | |
Func & | allow_race_conditions () |
Specify that race conditions are permitted for this Func, which enables parallelizing over RVars even when Halide cannot prove that it is safe to do so. | |
Func & | atomic (bool override_associativity_test=false) |
Issue atomic updates for this Func. | |
Stage | specialize (const Expr &condition) |
Specialize a Func. | |
void | specialize_fail (const std::string &message) |
Add a specialization to a Func that always terminates execution with a call to halide_error(). | |
Func & | gpu_threads (const VarOrRVar &thread_x, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Tell Halide that the following dimensions correspond to GPU thread indices. | |
Func & | gpu_threads (const VarOrRVar &thread_x, const VarOrRVar &thread_y, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | gpu_threads (const VarOrRVar &thread_x, const VarOrRVar &thread_y, const VarOrRVar &thread_z, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | gpu_lanes (const VarOrRVar &thread_x, DeviceAPI device_api=DeviceAPI::Default_GPU) |
The given dimension corresponds to the lanes in a GPU warp. | |
Func & | gpu_single_thread (DeviceAPI device_api=DeviceAPI::Default_GPU) |
Tell Halide to run this stage using a single gpu thread and block. | |
Func & | gpu_blocks (const VarOrRVar &block_x, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Tell Halide that the following dimensions correspond to GPU block indices. | |
Func & | gpu_blocks (const VarOrRVar &block_x, const VarOrRVar &block_y, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | gpu_blocks (const VarOrRVar &block_x, const VarOrRVar &block_y, const VarOrRVar &block_z, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | gpu (const VarOrRVar &block_x, const VarOrRVar &thread_x, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Tell Halide that the following dimensions correspond to GPU block indices and thread indices. | |
Func & | gpu (const VarOrRVar &block_x, const VarOrRVar &block_y, const VarOrRVar &thread_x, const VarOrRVar &thread_y, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | gpu (const VarOrRVar &block_x, const VarOrRVar &block_y, const VarOrRVar &block_z, const VarOrRVar &thread_x, const VarOrRVar &thread_y, const VarOrRVar &thread_z, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | gpu_tile (const VarOrRVar &x, const VarOrRVar &bx, const VarOrRVar &tx, const Expr &x_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices. | |
Func & | gpu_tile (const VarOrRVar &x, const VarOrRVar &tx, const Expr &x_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | gpu_tile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &bx, const VarOrRVar &by, const VarOrRVar &tx, const VarOrRVar &ty, const Expr &x_size, const Expr &y_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | gpu_tile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &tx, const VarOrRVar &ty, const Expr &x_size, const Expr &y_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | gpu_tile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &z, const VarOrRVar &bx, const VarOrRVar &by, const VarOrRVar &bz, const VarOrRVar &tx, const VarOrRVar &ty, const VarOrRVar &tz, const Expr &x_size, const Expr &y_size, const Expr &z_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | gpu_tile (const VarOrRVar &x, const VarOrRVar &y, const VarOrRVar &z, const VarOrRVar &tx, const VarOrRVar &ty, const VarOrRVar &tz, const Expr &x_size, const Expr &y_size, const Expr &z_size, TailStrategy tail=TailStrategy::Auto, DeviceAPI device_api=DeviceAPI::Default_GPU) |
Func & | hexagon (const VarOrRVar &x=Var::outermost()) |
Schedule for execution on Hexagon. | |
Func & | prefetch (const Func &f, const VarOrRVar &at, const VarOrRVar &from, Expr offset=1, PrefetchBoundStrategy strategy=PrefetchBoundStrategy::GuardWithIf) |
Prefetch data written to or read from a Func or an ImageParam by a subsequent loop iteration, at an optionally specified iteration offset. | |
Func & | prefetch (const Parameter ¶m, const VarOrRVar &at, const VarOrRVar &from, Expr offset=1, PrefetchBoundStrategy strategy=PrefetchBoundStrategy::GuardWithIf) |
template<typename T > | |
Func & | prefetch (const T &image, const VarOrRVar &at, const VarOrRVar &from, Expr offset=1, PrefetchBoundStrategy strategy=PrefetchBoundStrategy::GuardWithIf) |
Func & | reorder_storage (const std::vector< Var > &dims) |
Specify how the storage for the function is laid out. | |
Func & | reorder_storage (const Var &x, const Var &y) |
template<typename... Args> | |
HALIDE_NO_USER_CODE_INLINE std::enable_if< Internal::all_are_convertible< Var, Args... >::value, Func & >::type | reorder_storage (const Var &x, const Var &y, Args &&...args) |
Func & | align_storage (const Var &dim, const Expr &alignment) |
Pad the storage extent of a particular dimension of realizations of this function up to be a multiple of the specified alignment. | |
Func & | fold_storage (const Var &dim, const Expr &extent, bool fold_forward=true) |
Store realizations of this function in a circular buffer of a given extent. | |
Func & | compute_at (const Func &f, const Var &var) |
Compute this function as needed for each unique value of the given var for the given calling function f. | |
Func & | compute_at (const Func &f, const RVar &var) |
Schedule a function to be computed within the iteration over some dimension of an update domain. | |
Func & | compute_at (LoopLevel loop_level) |
Schedule a function to be computed within the iteration over a given LoopLevel. | |
Func & | compute_with (const Stage &s, const VarOrRVar &var, const std::vector< std::pair< VarOrRVar, LoopAlignStrategy > > &align) |
Schedule the iteration over the initial definition of this function to be fused with another stage 's' from outermost loop to a given LoopLevel. | |
Func & | compute_with (const Stage &s, const VarOrRVar &var, LoopAlignStrategy align=LoopAlignStrategy::Auto) |
Func & | compute_with (LoopLevel loop_level, const std::vector< std::pair< VarOrRVar, LoopAlignStrategy > > &align) |
Func & | compute_with (LoopLevel loop_level, LoopAlignStrategy align=LoopAlignStrategy::Auto) |
Func & | compute_root () |
Compute all of this function once ahead of time. | |
Func & | memoize (const EvictionKey &eviction_key=EvictionKey()) |
Use the halide_memoization_cache_... interface to store a computed version of this function across invocations of the Func. | |
Func & | async () |
Produce this Func asynchronously in a separate thread. | |
Func & | ring_buffer (Expr extent) |
Expands the storage of the function by an extra dimension to enable ring buffering. | |
Func & | bound_storage (const Var &dim, const Expr &bound) |
Bound the extent of a Func's storage, but not extent of its compute. | |
Func & | store_at (const Func &f, const Var &var) |
Allocate storage for this function within f's loop over var. | |
Func & | store_at (const Func &f, const RVar &var) |
Equivalent to the version of store_at that takes a Var, but schedules storage within the loop over a dimension of a reduction domain. | |
Func & | store_at (LoopLevel loop_level) |
Equivalent to the version of store_at that takes a Var, but schedules storage at a given LoopLevel. | |
Func & | store_root () |
Equivalent to Func::store_at, but schedules storage outside the outermost loop. | |
Func & | hoist_storage (const Func &f, const Var &var) |
Hoist storage for this function within f's loop over var. | |
Func & | hoist_storage (const Func &f, const RVar &var) |
Equivalent to the version of hoist_storage that takes a Var, but schedules storage within the loop over a dimension of a reduction domain. | |
Func & | hoist_storage (LoopLevel loop_level) |
Equivalent to the version of hoist_storage that takes a Var, but schedules storage at a given LoopLevel. | |
Func & | hoist_storage_root () |
Equivalent to Func::hoist_storage_root, but schedules storage outside the outermost loop. | |
Func & | compute_inline () |
Aggressively inline all uses of this function. | |
Stage | update (int idx=0) |
Get a handle on an update step for the purposes of scheduling it. | |
Func & | store_in (MemoryType memory_type) |
Set the type of memory this Func should be stored in. | |
Func & | trace_loads () |
Trace all loads from this Func by emitting calls to halide_trace. | |
Func & | trace_stores () |
Trace all stores to the buffer backing this Func by emitting calls to halide_trace. | |
Func & | trace_realizations () |
Trace all realizations of this Func by emitting calls to halide_trace. | |
Func & | add_trace_tag (const std::string &trace_tag) |
Add a string of arbitrary text that will be passed thru to trace inspection code if the Func is realized in trace mode. | |
Func & | no_profiling () |
Marks this function as a function that should not be profiled when using the target feature Profile or ProfileByTimer. | |
Internal::Function | function () const |
Get a handle on the internal halide function that this Func represents. | |
operator Stage () const | |
You can cast a Func to its pure stage for the purposes of scheduling it. | |
OutputImageParam | output_buffer () const |
Get a handle on the output buffer for this Func. | |
std::vector< OutputImageParam > | output_buffers () const |
operator ExternFuncArgument () const | |
Use a Func as an argument to an external stage. | |
std::vector< Argument > | infer_arguments () const |
Infer the arguments to the Func, sorted into a canonical order: all buffers (sorted alphabetically by name), followed by all non-buffers (sorted alphabetically by name). | |
const Internal::StageSchedule & | get_schedule () const |
Return the current StageSchedule associated with this initial Stage of this Func. | |
A halide function.
This class represents one stage in a Halide pipeline, and is the unit by which we schedule things. By default they are aggressively inlined, so you are encouraged to make lots of little functions, rather than storing things in Exprs.
|
explicit |
Declare a new undefined function with the given name.
|
explicit |
Declare a new undefined function with the given name.
The function will be constrained to represent Exprs of required_type. If required_dims is not AnyDims, the function will be constrained to exactly that many dimensions.
|
explicit |
Declare a new undefined function with the given name.
If required_types is not empty, the function will be constrained to represent Tuples of the same arity and types. (If required_types is empty, there is no constraint.) If required_dims is not AnyDims, the function will be constrained to exactly that many dimensions.
Halide::Func::Func | ( | ) |
Declare a new undefined function with an automatically-generated unique name.
|
explicit |
Declare a new function with an automatically-generated unique name, and define it to return the given expression (which may not contain free variables).
|
explicit |
Construct a new Func to wrap an existing, already-define Function object.
|
inlineexplicit |
Realization Halide::Func::realize | ( | std::vector< int32_t > | sizes = {}, |
const Target & | target = Target() ) |
Evaluate this function over some rectangular domain and return the resulting buffer or buffers.
Performs compilation if the Func has not previously been realized and compile_jit has not been called. If the final stage of the pipeline is on the GPU, data is copied back to the host before being returned. The returned Realization should probably be instantly converted to a Buffer class of the appropriate type. That is, do this:
If your Func has multiple values, because you defined it using a Tuple, then casting the result of a realize call to a buffer or image will produce a run-time error. Instead you should do the following:
In Halide formal arguments of a computation are specified using Param<T> and ImageParam objects in the expressions defining the computation. Note that this method is not thread-safe, in that Param<T> and ImageParam are globals shared by all threads; to call jitted code in a thread-safe manner, use compile_to_callable() instead.
Alternatively, an initializer list can be used directly in the realize call to pass this information:
If the Func cannot be realized into a buffer of the given size due to scheduling constraints on scattering update definitions, it will be realized into a larger buffer of the minimum size possible, and a cropped view at the requested size will be returned. It is thus not safe to assume the returned buffers are contiguous in memory. This behavior can be disabled with the NoBoundsQuery target flag, in which case an error about writing out of bounds on the output buffer will trigger instead.
Referenced by Halide::evaluate(), Halide::evaluate(), Halide::evaluate_may_gpu(), Halide::evaluate_may_gpu(), Halide::Internal::StubOutputBufferBase::realize(), and Halide::Internal::StubOutputBufferBase::realize().
Realization Halide::Func::realize | ( | JITUserContext * | context, |
std::vector< int32_t > | sizes = {}, | ||
const Target & | target = Target() ) |
Same as above, but takes a custom user-provided context to be passed to runtime functions.
This can be used to pass state to runtime overrides in a thread-safe manner. A nullptr context is legal, and is equivalent to calling the variant of realize that does not take a context.
void Halide::Func::realize | ( | Pipeline::RealizationArg | outputs, |
const Target & | target = Target() ) |
Evaluate this function into an existing allocated buffer or buffers.
If the buffer is also one of the arguments to the function, strange things may happen, as the pipeline isn't necessarily safe to run in-place. If you pass multiple buffers, they must have matching sizes. This form of realize does not automatically copy data back from the GPU.
void Halide::Func::realize | ( | JITUserContext * | context, |
Pipeline::RealizationArg | outputs, | ||
const Target & | target = Target() ) |
Same as above, but takes a custom user-provided context to be passed to runtime functions.
This can be used to pass state to runtime overrides in a thread-safe manner. A nullptr context is legal, and is equivalent to calling the variant of realize that does not take a context.
void Halide::Func::infer_input_bounds | ( | const std::vector< int32_t > & | sizes, |
const Target & | target = get_jit_target_from_environment() ) |
For a given size of output, or a given output buffer, determine the bounds required of all unbound ImageParams referenced.
Communicates the result by allocating new buffers of the appropriate size and binding them to the unbound ImageParams.
void Halide::Func::infer_input_bounds | ( | Pipeline::RealizationArg | outputs, |
const Target & | target = get_jit_target_from_environment() ) |
void Halide::Func::infer_input_bounds | ( | JITUserContext * | context, |
const std::vector< int32_t > & | sizes, | ||
const Target & | target = get_jit_target_from_environment() ) |
Versions of infer_input_bounds that take a custom user context to pass to runtime functions.
void Halide::Func::infer_input_bounds | ( | JITUserContext * | context, |
Pipeline::RealizationArg | outputs, | ||
const Target & | target = get_jit_target_from_environment() ) |
void Halide::Func::compile_to_bitcode | ( | const std::string & | filename, |
const std::vector< Argument > & | , | ||
const std::string & | fn_name, | ||
const Target & | target = get_target_from_environment() ) |
Statically compile this function to llvm bitcode, with the given filename (which should probably end in .bc), type signature, and C function name (which defaults to the same name as this halide function.
void Halide::Func::compile_to_bitcode | ( | const std::string & | filename, |
const std::vector< Argument > & | , | ||
const Target & | target = get_target_from_environment() ) |
void Halide::Func::compile_to_llvm_assembly | ( | const std::string & | filename, |
const std::vector< Argument > & | , | ||
const std::string & | fn_name, | ||
const Target & | target = get_target_from_environment() ) |
Statically compile this function to llvm assembly, with the given filename (which should probably end in .ll), type signature, and C function name (which defaults to the same name as this halide function.
void Halide::Func::compile_to_llvm_assembly | ( | const std::string & | filename, |
const std::vector< Argument > & | , | ||
const Target & | target = get_target_from_environment() ) |
void Halide::Func::compile_to_object | ( | const std::string & | filename, |
const std::vector< Argument > & | , | ||
const std::string & | fn_name, | ||
const Target & | target = get_target_from_environment() ) |
Statically compile this function to an object file, with the given filename (which should probably end in .o or .obj), type signature, and C function name (which defaults to the same name as this halide function.
You probably don't want to use this directly; call compile_to_static_library or compile_to_file instead.
void Halide::Func::compile_to_object | ( | const std::string & | filename, |
const std::vector< Argument > & | , | ||
const Target & | target = get_target_from_environment() ) |
void Halide::Func::compile_to_header | ( | const std::string & | filename, |
const std::vector< Argument > & | , | ||
const std::string & | fn_name = "", | ||
const Target & | target = get_target_from_environment() ) |
Emit a header file with the given filename for this function.
The header will define a function with the type signature given by the second argument, and a name given by the third. The name defaults to the same name as this halide function. You don't actually have to have defined this function yet to call this. You probably don't want to use this directly; call compile_to_static_library or compile_to_file instead.
void Halide::Func::compile_to_assembly | ( | const std::string & | filename, |
const std::vector< Argument > & | , | ||
const std::string & | fn_name, | ||
const Target & | target = get_target_from_environment() ) |
Statically compile this function to text assembly equivalent to the object file generated by compile_to_object.
This is useful for checking what Halide is producing without having to disassemble anything, or if you need to feed the assembly into some custom toolchain to produce an object file (e.g. iOS)
Referenced by Halide::SimdOpCheckTest::check_one().
void Halide::Func::compile_to_assembly | ( | const std::string & | filename, |
const std::vector< Argument > & | , | ||
const Target & | target = get_target_from_environment() ) |
void Halide::Func::compile_to_c | ( | const std::string & | filename, |
const std::vector< Argument > & | , | ||
const std::string & | fn_name = "", | ||
const Target & | target = get_target_from_environment() ) |
Statically compile this function to C source code.
This is useful for providing fallback code paths that will compile on many platforms. Vectorization will fail, and parallelization will produce serial code.
void Halide::Func::compile_to_lowered_stmt | ( | const std::string & | filename, |
const std::vector< Argument > & | args, | ||
StmtOutputFormat | fmt = Text, | ||
const Target & | target = get_target_from_environment() ) |
Write out an internal representation of lowered code.
Useful for analyzing and debugging scheduling. Can emit html or plain text.
void Halide::Func::compile_to_conceptual_stmt | ( | const std::string & | filename, |
const std::vector< Argument > & | args, | ||
StmtOutputFormat | fmt = Text, | ||
const Target & | target = get_target_from_environment() ) |
Write out a conceptual representation of lowered code, before any parallel loop get factored out into separate functions, or GPU loops are offloaded to kernel code.r Useful for analyzing and debugging scheduling.
Can emit html or plain text.
void Halide::Func::print_loop_nest | ( | ) |
Write out the loop nests specified by the schedule for this Function.
Helpful for understanding what a schedule is doing.
void Halide::Func::compile_to_file | ( | const std::string & | filename_prefix, |
const std::vector< Argument > & | args, | ||
const std::string & | fn_name = "", | ||
const Target & | target = get_target_from_environment() ) |
Compile to object file and header pair, with the given arguments.
The name defaults to the same name as this halide function.
void Halide::Func::compile_to_static_library | ( | const std::string & | filename_prefix, |
const std::vector< Argument > & | args, | ||
const std::string & | fn_name = "", | ||
const Target & | target = get_target_from_environment() ) |
Compile to static-library file and header pair, with the given arguments.
The name defaults to the same name as this halide function.
void Halide::Func::compile_to_multitarget_static_library | ( | const std::string & | filename_prefix, |
const std::vector< Argument > & | args, | ||
const std::vector< Target > & | targets ) |
Compile to static-library file and header pair once for each target; each resulting function will be considered (in order) via halide_can_use_target_features() at runtime, with the first appropriate match being selected for subsequent use.
This is typically useful for specializations that may vary unpredictably by machine (e.g., SSE4.1/AVX/AVX2 on x86 desktop machines). All targets must have identical arch-os-bits.
void Halide::Func::compile_to_multitarget_object_files | ( | const std::string & | filename_prefix, |
const std::vector< Argument > & | args, | ||
const std::vector< Target > & | targets, | ||
const std::vector< std::string > & | suffixes ) |
Like compile_to_multitarget_static_library(), except that the object files are all output as object files (rather than bundled into a static library).
suffixes
is an optional list of strings to use for as the suffix for each object file. If nonempty, it must be the same length as targets
. (If empty, Target::to_string() will be used for each suffix.)
Note that if targets.size()
> 1, the wrapper code (to select the subtarget) will be generated with the filename ${filename_prefix}_wrapper.o
Note that if targets.size()
> 1 and no_runtime
is not specified, the runtime will be generated with the filename ${filename_prefix}_runtime.o
Module Halide::Func::compile_to_module | ( | const std::vector< Argument > & | args, |
const std::string & | fn_name = "", | ||
const Target & | target = get_target_from_environment() ) |
Store an internal representation of lowered code as a self contained Module suitable for further compilation.
void Halide::Func::compile_to | ( | const std::map< OutputFileType, std::string > & | output_files, |
const std::vector< Argument > & | args, | ||
const std::string & | fn_name, | ||
const Target & | target = get_target_from_environment() ) |
Compile and generate multiple target files with single call.
Deduces target files based on filenames specified in output_files map.
Referenced by Halide::SimdOpCheckTest::compile_and_check().
void Halide::Func::compile_jit | ( | const Target & | target = get_jit_target_from_environment() | ) |
Eagerly jit compile the function to machine code.
This normally happens on the first call to realize. If you're running your halide pipeline inside time-sensitive code and wish to avoid including the time taken to compile a pipeline, then you can call this ahead of time. Default is to use the Target returned from Halide::get_jit_target_from_environment()
JITHandlers & Halide::Func::jit_handlers | ( | ) |
Get a struct containing the currently set custom functions used by JIT.
This can be mutated. Changes will take effect the next time this Func is realized.
Callable Halide::Func::compile_to_callable | ( | const std::vector< Argument > & | args, |
const Target & | target = get_jit_target_from_environment() ) |
Eagerly jit compile the function to machine code and return a callable struct that behaves like a function pointer.
The calling convention will exactly match that of an AOT-compiled version of this Func with the same Argument list.
Referenced by Halide::SimdOpCheckTest::check_one().
|
inline |
Add a custom pass to be used during lowering.
It is run after all other lowering passes. Can be used to verify properties of the lowered Stmt, instrument it with extra code, or otherwise modify it. The Func takes ownership of the pass, and will call delete on it when the Func goes out of scope. So don't pass a stack object, or share pass instances between multiple Funcs.
Definition at line 1059 of file Func.h.
References add_custom_lowering_pass().
Referenced by add_custom_lowering_pass().
void Halide::Func::add_custom_lowering_pass | ( | Internal::IRMutator * | pass, |
std::function< void()> | deleter ) |
Add a custom pass to be used during lowering, with the function that will be called to delete it also passed in.
Set it to nullptr if you wish to retain ownership of the object.
void Halide::Func::clear_custom_lowering_passes | ( | ) |
Remove all previously-set custom lowering passes.
const std::vector< CustomLoweringPass > & Halide::Func::custom_lowering_passes | ( | ) |
Get the custom lowering passes.
void Halide::Func::debug_to_file | ( | const std::string & | filename | ) |
When this function is compiled, include code that dumps its values to a file after it is realized, for the purpose of debugging.
If filename ends in ".tif" or ".tiff" (case insensitive) the file is in TIFF format and can be read by standard tools. Oherwise, the file format is as follows:
All data is in the byte-order of the target platform. First, a 20 byte-header containing four 32-bit ints, giving the extents of the first four dimensions. Dimensions beyond four are folded into the fourth. Then, a fifth 32-bit int giving the data type of the function. The typecodes are given by: float = 0, double = 1, uint8_t = 2, int8_t = 3, uint16_t = 4, int16_t = 5, uint32_t = 6, int32_t = 7, uint64_t = 8, int64_t = 9. The data follows the header, as a densely packed array of the given size and the given type. If given the extension .tmp, this file format can be natively read by the program ImageStack.
const std::string & Halide::Func::name | ( | ) | const |
The name of this function, either given during construction, or automatically generated.
std::vector< Var > Halide::Func::args | ( | ) | const |
Get the pure arguments.
Referenced by always_partition(), never_partition(), operator()(), operator()(), reorder(), and reorder_storage().
Expr Halide::Func::value | ( | ) | const |
The right-hand-side value of the pure definition of this function.
Causes an error if there's no pure definition, or if the function is defined to return multiple values.
Tuple Halide::Func::values | ( | ) | const |
The values returned by this function.
An error if the function has not been been defined. Returns a Tuple with one element for functions defined to return a single value.
bool Halide::Func::defined | ( | ) | const |
Does this function have at least a pure definition.
const std::vector< Expr > & Halide::Func::update_args | ( | int | idx = 0 | ) | const |
Get the left-hand-side of the update definition.
An empty vector if there's no update definition. If there are multiple update definitions for this function, use the argument to select which one you want.
Expr Halide::Func::update_value | ( | int | idx = 0 | ) | const |
Get the right-hand-side of an update definition.
An error if there's no update definition. If there are multiple update definitions for this function, use the argument to select which one you want.
Tuple Halide::Func::update_values | ( | int | idx = 0 | ) | const |
Get the right-hand-side of an update definition for functions that returns multiple values.
An error if there's no update definition. Returns a Tuple with one element for functions that return a single value.
std::vector< RVar > Halide::Func::rvars | ( | int | idx = 0 | ) | const |
Get the RVars of the reduction domain for an update definition, if there is one.
bool Halide::Func::has_update_definition | ( | ) | const |
Does this function have at least one update definition?
int Halide::Func::num_update_definitions | ( | ) | const |
How many update definitions does this function have?
bool Halide::Func::is_extern | ( | ) | const |
Is this function an external stage? That is, was it defined using define_extern?
|
inline |
Add an extern definition for this Func.
This lets you define a Func that represents an external pipeline stage. You can, for example, use it to wrap a call to an extern library such as fftw.
Definition at line 1154 of file Func.h.
References define_extern(), and Halide::Internal::make_argument_list().
Referenced by define_extern(), define_extern(), define_extern(), and define_extern().
|
inline |
Definition at line 1164 of file Func.h.
References define_extern(), Halide::Internal::make_argument_list(), and types().
|
inline |
Definition at line 1172 of file Func.h.
References define_extern(), Halide::Internal::make_argument_list(), and types().
|
inline |
Definition at line 1182 of file Func.h.
References define_extern().
void Halide::Func::define_extern | ( | const std::string & | function_name, |
const std::vector< ExternFuncArgument > & | params, | ||
const std::vector< Type > & | types, | ||
const std::vector< Var > & | arguments, | ||
NameMangling | mangling = NameMangling::Default, | ||
DeviceAPI | device_api = DeviceAPI::Host ) |
const Type & Halide::Func::type | ( | ) | const |
Get the type(s) of the outputs of this Func.
It is not legal to call type() unless the Func has non-Tuple elements.
If the Func isn't yet defined, and was not specified with required types, a runtime error will occur.
If the Func isn't yet defined, but was specified with required types, the requirements will be returned.
const std::vector< Type > & Halide::Func::types | ( | ) | const |
Referenced by define_extern(), and define_extern().
int Halide::Func::outputs | ( | ) | const |
const std::string & Halide::Func::extern_function_name | ( | ) | const |
Get the name of the extern function called for an extern definition.
int Halide::Func::dimensions | ( | ) | const |
The dimensionality (number of arguments) of this function.
If the Func isn't yet defined, but was specified with required dimensionality, the dimensionality specified in the requirements will be returned.
Construct either the left-hand-side of a definition, or a call to a functions that happens to only contain vars as arguments.
If the function has already been defined, and fewer arguments are given than the function has dimensions, then enough implicit vars are added to the end of the argument list to make up the difference (see Var::implicit)
Referenced by operator()().
|
inline |
Definition at line 1239 of file Func.h.
References args(), and operator()().
Either calls to the function, or the left-hand-side of an update definition (see RDom).
If the function has already been defined, and fewer arguments are given than the function has dimensions, then enough implicit vars are added to the end of the argument list to make up the difference. (see Var::implicit)
|
inline |
Creates and returns a new identity Func that wraps this Func.
During compilation, Halide replaces all calls to this Func done by 'f' with calls to the wrapper. If this Func is already wrapped for use in 'f', will return the existing wrapper.
For example, g.in(f) would rewrite a pipeline like this:
into a pipeline like this:
This has a variety of uses. You can use it to schedule this Func differently in the different places it is used:
You can also use it to stage loads from this Func via some intermediate buffer (perhaps on the stack as in test/performance/block_transpose.cpp, or in shared GPU memory as in test/performance/wrap.cpp). In this we compute the wrapper at tiles of the consuming Funcs like so:
Func::in() can also be used to compute pieces of a Func into a smaller scratch buffer (perhaps on the GPU) and then copy them into a larger output buffer one tile at a time. See apps/interpolate/interpolate.cpp for an example of this. In this case we compute the Func at tiles of its own wrapper:
A similar use of Func::in() wrapping Funcs with multiple update stages in a pure wrapper. The following code:
Is equivalent to:
using Func::in(), we can write:
which instead produces:
Referenced by do_cost_model_schedule(), Halide::Internal::GeneratorInput_Buffer< T2 >::in(), Halide::Internal::GeneratorInput_Buffer< T2 >::in(), Halide::Internal::GeneratorInput_Buffer< T2 >::in(), Halide::Internal::GeneratorInput_Func< T >::in(), Halide::Internal::GeneratorInput_Func< T >::in(), and Halide::Internal::GeneratorInput_Func< T >::in().
Create and return an identity wrapper shared by all the Funcs in 'fs'.
If any of the Funcs in 'fs' already have a custom wrapper, this will throw an error.
Func Halide::Func::in | ( | ) |
Similar to Func::in; however, instead of replacing the call to this Func with an identity Func that refers to it, this replaces the call with a clone of this Func.
For example, f.clone_in(g) would rewrite a pipeline like this:
into a pipeline like this:
Referenced by Halide::SimdOpCheckTest::check_one().
Func Halide::Func::copy_to_device | ( | DeviceAPI | d = DeviceAPI::Default_GPU | ) |
Declare that this function should be implemented by a call to halide_buffer_copy with the given target device API.
Asserts that the Func has a pure definition which is a simple call to a single input, and no update definitions. The wrapper Funcs returned by in() are suitable candidates. Consumes all pure variables, and rewrites the Func to have an extern definition that calls halide_buffer_copy.
Func Halide::Func::copy_to_host | ( | ) |
Declare that this function should be implemented by a call to halide_buffer_copy with a NULL target device API.
Equivalent to copy_to_device(DeviceAPI::Host). Asserts that the Func has a pure definition which is a simple call to a single input, and no update definitions. The wrapper Funcs returned by in() are suitable candidates. Consumes all pure variables, and rewrites the Func to have an extern definition that calls halide_buffer_copy.
Note that if the source Func is already valid in host memory, this compiles to code that does the minimum number of calls to memcpy.
Func & Halide::Func::split | ( | const VarOrRVar & | old, |
const VarOrRVar & | outer, | ||
const VarOrRVar & | inner, | ||
const Expr & | factor, | ||
TailStrategy | tail = TailStrategy::Auto ) |
Split a dimension into inner and outer subdimensions with the given names, where the inner dimension iterates from 0 to factor-1.
The inner and outer subdimensions can then be dealt with using the other scheduling calls. It's ok to reuse the old variable name as either the inner or outer variable. The final argument specifies how the tail should be handled if the split factor does not provably divide the extent.
Referenced by do_cost_model_schedule().
Func & Halide::Func::fuse | ( | const VarOrRVar & | inner, |
const VarOrRVar & | outer, | ||
const VarOrRVar & | fused ) |
Join two dimensions into a single fused dimension.
The fused dimension covers the product of the extents of the inner and outer dimensions given. The loop type (e.g. parallel, vectorized) of the resulting fused dimension is inherited from the first argument.
Referenced by do_cost_model_schedule().
Mark a dimension to be traversed serially.
This is the default.
Referenced by do_cost_model_schedule().
Mark a dimension to be traversed in parallel.
Referenced by do_cost_model_schedule().
Func & Halide::Func::parallel | ( | const VarOrRVar & | var, |
const Expr & | task_size, | ||
TailStrategy | tail = TailStrategy::Auto ) |
Split a dimension by the given task_size, and the parallelize the outer dimension.
This creates parallel tasks that have size task_size. After this call, var refers to the outer dimension of the split. The inner dimension has a new anonymous name. If you wish to mutate it, or schedule with respect to it, do the split manually.
Mark a dimension to be computed all-at-once as a single vector.
The dimension should have constant extent - e.g. because it is the inner dimension following a split by a constant factor. For most uses of vectorize you want the two argument form. The variable to be vectorized should be the innermost one.
Referenced by Halide::SimdOpCheckTest::check_one(), and do_cost_model_schedule().
Mark a dimension to be completely unrolled.
The dimension should have constant extent - e.g. because it is the inner dimension following a split by a constant factor. For most uses of unroll you want the two-argument form.
Referenced by do_cost_model_schedule().
Func & Halide::Func::vectorize | ( | const VarOrRVar & | var, |
const Expr & | factor, | ||
TailStrategy | tail = TailStrategy::Auto ) |
Split a dimension by the given factor, then vectorize the inner dimension.
This is how you vectorize a loop of unknown size. The variable to be vectorized should be the innermost one. After this call, var refers to the outer dimension of the split. 'factor' must be an integer.
Func & Halide::Func::unroll | ( | const VarOrRVar & | var, |
const Expr & | factor, | ||
TailStrategy | tail = TailStrategy::Auto ) |
Split a dimension by the given factor, then unroll the inner dimension.
This is how you unroll a loop of unknown size by some constant factor. After this call, var refers to the outer dimension of the split. 'factor' must be an integer.
Set the loop partition policy.
Loop partitioning can be useful to optimize boundary conditions (such as clamp_edge). Loop partitioning splits a for loop into three for loops: a prologue, a steady-state, and an epilogue. The default policy is Auto.
Set the loop partition policy to Never for a vector of Vars and RVars.
Referenced by never_partition().
|
inline |
Set the loop partition policy to Never for some number of Vars and RVars.
Definition at line 1481 of file Func.h.
References args(), and never_partition().
Func & Halide::Func::never_partition_all | ( | ) |
Set the loop partition policy to Always for a vector of Vars and RVars.
Referenced by always_partition().
|
inline |
Set the loop partition policy to Always for some number of Vars and RVars.
Definition at line 1498 of file Func.h.
References always_partition(), and args().
Func & Halide::Func::always_partition_all | ( | ) |
Statically declare that the range over which a function should be evaluated is given by the second and third arguments.
This can let Halide perform some optimizations. E.g. if you know there are going to be 4 color channels, you can completely vectorize the color channel dimension without the overhead of splitting it up. If bounds inference decides that it requires more of this function than the bounds you have stated, a runtime error will occur when you try to run your pipeline.
Referenced by Halide::SimdOpCheckTest::check_one().
Statically declare the range over which the function will be evaluated in the general case.
This provides a basis for the auto scheduler to make trade-offs and scheduling decisions. The auto generated schedules might break when the sizes of the dimensions are very different from the estimates specified. These estimates are used only by the auto scheduler if the function is a pipeline output.
Expand the region computed so that the min coordinates is congruent to 'remainder' modulo 'modulus', and the extent is a multiple of 'modulus'.
For example, f.align_bounds(x, 2) forces the min and extent realized to be even, and calling f.align_bounds(x, 2, 1) forces the min to be odd and the extent to be even. The region computed always contains the region that would have been computed without this directive, so no assertions are injected.
Expand the region computed so that the extent is a multiple of 'modulus'.
For example, f.align_extent(x, 2) forces the extent realized to be even. The region computed always contains the region that would have been computed without this directive, so no assertions are injected. (This is essentially equivalent to align_bounds(), but always leaving the min untouched.)
Bound the extent of a Func's realization, but not its min.
This means the dimension can be unrolled or vectorized even when its min is not fixed (for example because it is compute_at tiles of another Func). This can also be useful for forcing a function's allocation to be a fixed size, which often means it can go on the stack.
Func & Halide::Func::tile | ( | const VarOrRVar & | x, |
const VarOrRVar & | y, | ||
const VarOrRVar & | xo, | ||
const VarOrRVar & | yo, | ||
const VarOrRVar & | xi, | ||
const VarOrRVar & | yi, | ||
const Expr & | xfactor, | ||
const Expr & | yfactor, | ||
TailStrategy | tail = TailStrategy::Auto ) |
Split two dimensions at once by the given factors, and then reorder the resulting dimensions to be xi, yi, xo, yo from innermost outwards.
This gives a tiled traversal.
Func & Halide::Func::tile | ( | const VarOrRVar & | x, |
const VarOrRVar & | y, | ||
const VarOrRVar & | xi, | ||
const VarOrRVar & | yi, | ||
const Expr & | xfactor, | ||
const Expr & | yfactor, | ||
TailStrategy | tail = TailStrategy::Auto ) |
A shorter form of tile, which reuses the old variable names as the new outer dimensions.
Func & Halide::Func::tile | ( | const std::vector< VarOrRVar > & | previous, |
const std::vector< VarOrRVar > & | outers, | ||
const std::vector< VarOrRVar > & | inners, | ||
const std::vector< Expr > & | factors, | ||
const std::vector< TailStrategy > & | tails ) |
A more general form of tile, which defines tiles of any dimensionality.
Func & Halide::Func::tile | ( | const std::vector< VarOrRVar > & | previous, |
const std::vector< VarOrRVar > & | outers, | ||
const std::vector< VarOrRVar > & | inners, | ||
const std::vector< Expr > & | factors, | ||
TailStrategy | tail = TailStrategy::Auto ) |
The generalized tile, with a single tail strategy to apply to all vars.
Func & Halide::Func::tile | ( | const std::vector< VarOrRVar > & | previous, |
const std::vector< VarOrRVar > & | inners, | ||
const std::vector< Expr > & | factors, | ||
TailStrategy | tail = TailStrategy::Auto ) |
Generalized tiling, reusing the previous names as the outer names.
Reorder variables to have the given nesting order, from innermost out.
Referenced by do_cost_model_schedule(), and reorder().
|
inline |
Rename a dimension.
Equivalent to split with a inner size of one.
Func & Halide::Func::allow_race_conditions | ( | ) |
Specify that race conditions are permitted for this Func, which enables parallelizing over RVars even when Halide cannot prove that it is safe to do so.
Use this with great caution, and only if you can prove to yourself that this is safe, as it may result in a non-deterministic routine that returns different values at different times or on different machines.
Func & Halide::Func::atomic | ( | bool | override_associativity_test = false | ) |
Issue atomic updates for this Func.
This allows parallelization on associative RVars. The function throws a compile error when Halide fails to prove associativity. Use override_associativity_test to disable the associativity test if you believe the function is associative or the order of reduction variable execution does not matter. Halide compiles this into hardware atomic operations whenever possible, and falls back to a mutex lock per storage element if it is impossible to atomically update. There are three possible outcomes of the compiled code: atomic add, compare-and-swap loop, and mutex lock. For example:
hist(x) = 0; hist(im(r)) += 1; hist.compute_root(); hist.update().atomic().parallel();
will be compiled to atomic add operations.
hist(x) = 0; hist(im(r)) = min(hist(im(r)) + 1, 100); hist.compute_root(); hist.update().atomic().parallel();
will be compiled to compare-and-swap loops.
arg_max() = {0, im(0)}; Expr old_index = arg_max()[0]; Expr old_max = arg_max()[1]; Expr new_index = select(old_max < im(r), r, old_index); Expr new_max = max(im(r), old_max); arg_max() = {new_index, new_max}; arg_max.compute_root(); arg_max.update().atomic().parallel();
will be compiled to updates guarded by a mutex lock, since it is impossible to atomically update two different locations.
Currently the atomic operation is supported by x86, CUDA, and OpenCL backends. Compiling to other backends results in a compile error. If an operation is compiled into a mutex lock, and is vectorized or is compiled to CUDA or OpenCL, it also results in a compile error, since per-element mutex lock on vectorized operation leads to a deadlock. Vectorization of predicated RVars (through rdom.where()) on CPU is also unsupported yet (see https://github.com/halide/Halide/issues/4298). 8-bit and 16-bit atomics on GPU are also not supported.
Specialize a Func.
This creates a special-case version of the Func where the given condition is true. The most effective conditions are those of the form param == value, and boolean Params. Consider a simple example:
This is equivalent to:
Adding the scheduling directive:
makes it equivalent to:
Note that the inner loops have been simplified. In the first path Halide knows that cond is true, and in the second path Halide knows that it is false.
The specialized version gets its own schedule, which inherits every directive made about the parent Func's schedule so far except for its specializations. This method returns a handle to the new schedule. If you wish to retrieve the specialized sub-schedule again later, you can call this method with the same condition. Consider the following example of scheduling the specialized version:
Assuming for simplicity that width is even, this is equivalent to:
For this case, it may be better to schedule the un-specialized case instead:
This is equivalent to:
This can be a good way to write a pipeline that splits, vectorizes, or tiles, but can still handle small inputs.
If a Func has several specializations, the first matching one will be used, so the order in which you define specializations is significant. For example:
is equivalent to:
Specializations may in turn be specialized, which creates a nested if statement in the generated code.
This is equivalent to:
To create a 4-way if statement that simplifies away all of the ternary operators above, you could say:
or
Any prior Func which is compute_at some variable of this Func gets separately included in all paths of the generated if statement. The Var in the compute_at call to must exist in all paths, but it may have been generated via a different path of splits, fuses, and renames. This can be used somewhat creatively. Consider the following code:
When cond is true, this is equivalent to g.compute_at(f,y). When it is false, this is equivalent to g.compute_at(f,x).
void Halide::Func::specialize_fail | ( | const std::string & | message | ) |
Add a specialization to a Func that always terminates execution with a call to halide_error().
By itself, this is of limited use, but can be useful to terminate chains of specialize() calls where no "default" case is expected (thus avoiding unnecessary code generation).
For instance, say we want to optimize a pipeline to process images in planar and interleaved format; we might typically do something like:
This code will vectorize along rows for the planar case, and across pixel components for the interleaved case... but there is an implicit "else" for the unhandled cases, which generates unoptimized code. If we never anticipate passing any other sort of images to this, we code streamline our code by adding specialize_fail():
Conceptually, this produces codes like:
Note that calling specialize_fail() terminates the specialization chain for a given Func; you cannot create new specializations for the Func afterwards (though you can retrieve handles to previous specializations).
Func & Halide::Func::gpu_threads | ( | const VarOrRVar & | thread_x, |
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Tell Halide that the following dimensions correspond to GPU thread indices.
This is useful if you compute a producer function within the block indices of a consumer function, and want to control how that function's dimensions map to GPU threads. If the selected target is not an appropriate GPU, this just marks those dimensions as parallel.
Func & Halide::Func::gpu_threads | ( | const VarOrRVar & | thread_x, |
const VarOrRVar & | thread_y, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::gpu_threads | ( | const VarOrRVar & | thread_x, |
const VarOrRVar & | thread_y, | ||
const VarOrRVar & | thread_z, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::gpu_lanes | ( | const VarOrRVar & | thread_x, |
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
The given dimension corresponds to the lanes in a GPU warp.
GPU warp lanes are distinguished from GPU threads by the fact that all warp lanes run together in lockstep, which permits lightweight communication of data from one lane to another.
Func & Halide::Func::gpu_single_thread | ( | DeviceAPI | device_api = DeviceAPI::Default_GPU | ) |
Tell Halide to run this stage using a single gpu thread and block.
This is not an efficient use of your GPU, but it can be useful to avoid copy-back for intermediate update stages that touch a very small part of your Func.
Referenced by Halide::Internal::schedule_scalar().
Func & Halide::Func::gpu_blocks | ( | const VarOrRVar & | block_x, |
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Tell Halide that the following dimensions correspond to GPU block indices.
This is useful for scheduling stages that will run serially within each GPU block. If the selected target is not ptx, this just marks those dimensions as parallel.
Func & Halide::Func::gpu_blocks | ( | const VarOrRVar & | block_x, |
const VarOrRVar & | block_y, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::gpu_blocks | ( | const VarOrRVar & | block_x, |
const VarOrRVar & | block_y, | ||
const VarOrRVar & | block_z, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::gpu | ( | const VarOrRVar & | block_x, |
const VarOrRVar & | thread_x, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Tell Halide that the following dimensions correspond to GPU block indices and thread indices.
If the selected target is not ptx, these just mark the given dimensions as parallel. The dimensions are consumed by this call, so do all other unrolling, reordering, etc first.
Func & Halide::Func::gpu | ( | const VarOrRVar & | block_x, |
const VarOrRVar & | block_y, | ||
const VarOrRVar & | thread_x, | ||
const VarOrRVar & | thread_y, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::gpu | ( | const VarOrRVar & | block_x, |
const VarOrRVar & | block_y, | ||
const VarOrRVar & | block_z, | ||
const VarOrRVar & | thread_x, | ||
const VarOrRVar & | thread_y, | ||
const VarOrRVar & | thread_z, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::gpu_tile | ( | const VarOrRVar & | x, |
const VarOrRVar & | bx, | ||
const VarOrRVar & | tx, | ||
const Expr & | x_size, | ||
TailStrategy | tail = TailStrategy::Auto, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Short-hand for tiling a domain and mapping the tile indices to GPU block indices and the coordinates within each tile to GPU thread indices.
Consumes the variables given, so do all other scheduling first.
Func & Halide::Func::gpu_tile | ( | const VarOrRVar & | x, |
const VarOrRVar & | tx, | ||
const Expr & | x_size, | ||
TailStrategy | tail = TailStrategy::Auto, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::gpu_tile | ( | const VarOrRVar & | x, |
const VarOrRVar & | y, | ||
const VarOrRVar & | bx, | ||
const VarOrRVar & | by, | ||
const VarOrRVar & | tx, | ||
const VarOrRVar & | ty, | ||
const Expr & | x_size, | ||
const Expr & | y_size, | ||
TailStrategy | tail = TailStrategy::Auto, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::gpu_tile | ( | const VarOrRVar & | x, |
const VarOrRVar & | y, | ||
const VarOrRVar & | tx, | ||
const VarOrRVar & | ty, | ||
const Expr & | x_size, | ||
const Expr & | y_size, | ||
TailStrategy | tail = TailStrategy::Auto, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::gpu_tile | ( | const VarOrRVar & | x, |
const VarOrRVar & | y, | ||
const VarOrRVar & | z, | ||
const VarOrRVar & | bx, | ||
const VarOrRVar & | by, | ||
const VarOrRVar & | bz, | ||
const VarOrRVar & | tx, | ||
const VarOrRVar & | ty, | ||
const VarOrRVar & | tz, | ||
const Expr & | x_size, | ||
const Expr & | y_size, | ||
const Expr & | z_size, | ||
TailStrategy | tail = TailStrategy::Auto, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::gpu_tile | ( | const VarOrRVar & | x, |
const VarOrRVar & | y, | ||
const VarOrRVar & | z, | ||
const VarOrRVar & | tx, | ||
const VarOrRVar & | ty, | ||
const VarOrRVar & | tz, | ||
const Expr & | x_size, | ||
const Expr & | y_size, | ||
const Expr & | z_size, | ||
TailStrategy | tail = TailStrategy::Auto, | ||
DeviceAPI | device_api = DeviceAPI::Default_GPU ) |
Func & Halide::Func::hexagon | ( | const VarOrRVar & | x = Var::outermost() | ) |
Schedule for execution on Hexagon.
When a loop is marked with Hexagon, that loop is executed on a Hexagon DSP.
Referenced by Halide::Internal::schedule_scalar().
Func & Halide::Func::prefetch | ( | const Func & | f, |
const VarOrRVar & | at, | ||
const VarOrRVar & | from, | ||
Expr | offset = 1, | ||
PrefetchBoundStrategy | strategy = PrefetchBoundStrategy::GuardWithIf ) |
Prefetch data written to or read from a Func or an ImageParam by a subsequent loop iteration, at an optionally specified iteration offset.
You may specify specification of different vars for the location of the prefetch() instruction vs. the location that is being prefetched:
If 'at' and 'from' are distinct vars, then 'from' must be at a nesting level outside 'at.' Note that the value for 'offset' applies only to 'from', not 'at'.
The final argument specifies how prefetch of region outside bounds should be handled.
For example, consider this pipeline:
The following schedule:
will inject prefetch call at the innermost loop of 'g' and 'h' and generate the following loop nest:
Note that the 'from' nesting level need not be adjacent to 'at':
The following schedule:
will produce code that prefetches a tile of data:
Note that calling prefetch() with the same var for both 'at' and 'from' is equivalent to calling prefetch() with that var.
Referenced by prefetch().
Func & Halide::Func::prefetch | ( | const Parameter & | param, |
const VarOrRVar & | at, | ||
const VarOrRVar & | from, | ||
Expr | offset = 1, | ||
PrefetchBoundStrategy | strategy = PrefetchBoundStrategy::GuardWithIf ) |
|
inline |
Definition at line 2052 of file Func.h.
References prefetch().
Specify how the storage for the function is laid out.
These calls let you specify the nesting order of the dimensions. For example, foo.reorder_storage(y, x) tells Halide to use column-major storage for any realizations of foo, without changing how you refer to foo in the code. You may want to do this if you intend to vectorize across y. When representing color images, foo.reorder_storage(c, x, y) specifies packed storage (red, green, and blue values adjacent in memory), and foo.reorder_storage(x, y, c) specifies planar storage (entire red, green, and blue images one after the other in memory).
If you leave out some dimensions, those remain in the same positions in the nesting order while the specified variables are reordered around them.
|
inline |
Pad the storage extent of a particular dimension of realizations of this function up to be a multiple of the specified alignment.
This guarantees that the strides for the dimensions stored outside of dim will be multiples of the specified alignment, where the strides and alignment are measured in numbers of elements.
For example, to guarantee that a function foo(x, y, c) representing an image has scanlines starting on offsets aligned to multiples of 16, use foo.align_storage(x, 16).
Func & Halide::Func::fold_storage | ( | const Var & | dim, |
const Expr & | extent, | ||
bool | fold_forward = true ) |
Store realizations of this function in a circular buffer of a given extent.
This is more efficient when the extent of the circular buffer is a power of 2. If the fold factor is too small, or the dimension is not accessed monotonically, the pipeline will generate an error at runtime.
The fold_forward option indicates that the new values of the producer are accessed by the consumer in a monotonically increasing order. Folding storage of producers is also supported if the new values are accessed in a monotonically decreasing order by setting fold_forward to false.
For example, consider the pipeline:
If we schedule f like so:
Then g will be computed at each row of f and stored in a buffer with an extent in y of 2, alternately storing each computed row of g in row y=0 or y=1.
Compute this function as needed for each unique value of the given var for the given calling function f.
For example, consider the simple pipeline:
If we schedule f like so:
Then the C code equivalent to this pipeline will look like this
The allocation and computation of g is within f's loop over x, and enough of g is computed to satisfy all that f will need for that iteration. This has excellent locality - values of g are used as soon as they are computed, but it does redundant work. Each value of g ends up getting computed four times. If we instead schedule f like so:
The equivalent C code is:
The allocation and computation of g is within f's loop over y, and enough of g is computed to satisfy all that f will need for that iteration. This does less redundant work (each point in g ends up being evaluated twice), but the locality is not quite as good, and we have to allocate more temporary memory to store g.
Referenced by Halide::SimdOpCheckTest::check_one(), and do_cost_model_schedule().
Schedule a function to be computed within the iteration over some dimension of an update domain.
Produces equivalent code to the version of compute_at that takes a Var.
Schedule a function to be computed within the iteration over a given LoopLevel.
Func & Halide::Func::compute_with | ( | const Stage & | s, |
const VarOrRVar & | var, | ||
const std::vector< std::pair< VarOrRVar, LoopAlignStrategy > > & | align ) |
Schedule the iteration over the initial definition of this function to be fused with another stage 's' from outermost loop to a given LoopLevel.
Func & Halide::Func::compute_with | ( | const Stage & | s, |
const VarOrRVar & | var, | ||
LoopAlignStrategy | align = LoopAlignStrategy::Auto ) |
Func & Halide::Func::compute_with | ( | LoopLevel | loop_level, |
const std::vector< std::pair< VarOrRVar, LoopAlignStrategy > > & | align ) |
Func & Halide::Func::compute_with | ( | LoopLevel | loop_level, |
LoopAlignStrategy | align = LoopAlignStrategy::Auto ) |
Func & Halide::Func::compute_root | ( | ) |
Compute all of this function once ahead of time.
Reusing the example in Func::compute_at :
is equivalent to
g is computed once ahead of time, and enough is computed to satisfy all uses of it. This does no redundant work (each point in g is evaluated once), but has poor locality (values of g are probably not still in cache when they are used by f), and allocates lots of temporary memory to store g.
Referenced by Halide::SimdOpCheckTest::check_one(), and do_cost_model_schedule().
Func & Halide::Func::memoize | ( | const EvictionKey & | eviction_key = EvictionKey() | ) |
Use the halide_memoization_cache_... interface to store a computed version of this function across invocations of the Func.
If an eviction_key is provided, it must be constructed with Expr of integer or handle type. The key Expr will be promoted to a uint64_t and can be used with halide_memoization_cache_evict to remove memoized entries using this eviction key from the cache. Memoized computations that do not provide an eviction key will never be evicted by this mechanism.
Func & Halide::Func::async | ( | ) |
Produce this Func asynchronously in a separate thread.
Consumers will be run by the task system when the production is complete. If this Func's store level is different to its compute level, consumers will be run concurrently, blocking as necessary to prevent reading ahead of what the producer has computed. If storage is folded, then the producer will additionally not be permitted to run too far ahead of the consumer, to avoid clobbering data that has not yet been used.
Take special care when combining this with custom thread pool implementations, as avoiding deadlock with producer-consumer parallelism requires a much more sophisticated parallel runtime than with data parallelism alone. It is strongly recommended you just use Halide's default thread pool, which guarantees no deadlock and a bound on the number of threads launched.
Expands the storage of the function by an extra dimension to enable ring buffering.
For this to be useful the storage of the function has to be hoisted to an upper loop level using Func::hoist_storage. The index for the new ring buffer dimension is calculated implicitly based on a linear combination of the all of the loop variables between hoist_storage and compute_at/store_at loop levels. Scheduling a function with ring_buffer increases the amount of memory required for this function by an extent times. ring_buffer is especially useful in combination with Func::async, but can be used without it.
The extent is expected to be a positive integer.
Bound the extent of a Func's storage, but not extent of its compute.
This can be useful for forcing a function's allocation to be a fixed size, which often means it can go on the stack. If bounds inference decides that it requires more storage for this function than the allocation size you have stated, a runtime error will occur when you try to run the pipeline.
Allocate storage for this function within f's loop over var.
Scheduling storage is optional, and can be used to separate the loop level at which storage occurs from the loop level at which computation occurs to trade off between locality and redundant work. This can open the door for two types of optimization.
Consider again the pipeline from Func::compute_at :
If we schedule it like so:
Then the computation of g takes place within the loop over x, but the storage takes place within the loop over y:
Provided the for loop over x is serial, halide then automatically performs the following sliding window optimization:
Two of the assignments to g only need to be done when x is zero. The rest of the time, those sites have already been filled in by a previous iteration. This version has the locality of compute_at(f, x), but allocates more memory and does much less redundant work.
Halide then further optimizes this pipeline like so:
Halide has detected that it's possible to use a circular buffer to represent g, and has reduced all accesses to g modulo 2 in the x dimension. This optimization only triggers if the for loop over x is serial, and if halide can statically determine some power of two large enough to cover the range needed. For powers of two, the modulo operator compiles to more efficient bit-masking. This optimization reduces memory usage, and also improves locality by reusing recently-accessed memory instead of pulling new memory into cache.
Referenced by do_cost_model_schedule().
Equivalent to the version of store_at that takes a Var, but schedules storage within the loop over a dimension of a reduction domain.
Func & Halide::Func::store_root | ( | ) |
Equivalent to Func::store_at, but schedules storage outside the outermost loop.
Hoist storage for this function within f's loop over var.
This is different from Func::store_at, because hoist_storage simply moves an actual allocation to a given loop level and doesn't trigger any of the optimizations such as sliding window. Hoisting storage is optional and can be used as an optimization to avoid unnecessary allocations by moving it out from an inner loop.
Consider again the pipeline from Func::compute_at :
If we schedule f like so:
Then the C code equivalent to this pipeline will look like this
Note the allocation for g inside of the loop over variable x which can happen for each iteration of the inner loop (in total height * width times). In some cases allocation can be expensive, so it might be better to do it once and reuse allocated memory across all iterations of the loop.
This can be done by scheduling g like so:
Then the C code equivalent to this pipeline will look like this
hoist_storage can be used together with Func::store_at and Func::fold_storage (for example, to hoist the storage allocated after sliding window optimization).
Equivalent to the version of hoist_storage that takes a Var, but schedules storage within the loop over a dimension of a reduction domain.
Func & Halide::Func::hoist_storage_root | ( | ) |
Equivalent to Func::hoist_storage_root, but schedules storage outside the outermost loop.
Func & Halide::Func::compute_inline | ( | ) |
Aggressively inline all uses of this function.
This is the default schedule, so you're unlikely to need to call this. For a Func with an update definition, that means it gets computed as close to the innermost loop as possible.
Consider once more the pipeline from Func::compute_at :
Leaving g as inline, this compiles to code equivalent to the following C:
Stage Halide::Func::update | ( | int | idx = 0 | ) |
Get a handle on an update step for the purposes of scheduling it.
Referenced by Halide::SimdOpCheckTest::check_one(), and do_cost_model_schedule().
Func & Halide::Func::store_in | ( | MemoryType | memory_type | ) |
Set the type of memory this Func should be stored in.
Controls whether allocations go on the stack or the heap on the CPU, and in global vs shared vs local on the GPU. See the documentation on MemoryType for more detail.
Referenced by do_cost_model_schedule().
Func & Halide::Func::trace_loads | ( | ) |
Func & Halide::Func::trace_stores | ( | ) |
Func & Halide::Func::trace_realizations | ( | ) |
Trace all realizations of this Func by emitting calls to halide_trace.
Func & Halide::Func::add_trace_tag | ( | const std::string & | trace_tag | ) |
Func & Halide::Func::no_profiling | ( | ) |
Marks this function as a function that should not be profiled when using the target feature Profile or ProfileByTimer.
This is useful when this function is does too little work at once such that the overhead of setting the profiling token might become significant, or that the measured time is not representative due to modern processors (instruction level parallelism, out-of-order execution).
|
inline |
Halide::Func::operator Stage | ( | ) | const |
You can cast a Func to its pure stage for the purposes of scheduling it.
OutputImageParam Halide::Func::output_buffer | ( | ) | const |
std::vector< OutputImageParam > Halide::Func::output_buffers | ( | ) | const |
Halide::Func::operator ExternFuncArgument | ( | ) | const |
Use a Func as an argument to an external stage.
std::vector< Argument > Halide::Func::infer_arguments | ( | ) | const |
Infer the arguments to the Func, sorted into a canonical order: all buffers (sorted alphabetically by name), followed by all non-buffers (sorted alphabetically by name).
This lets you write things like:
|
inline |
Return the current StageSchedule associated with this initial Stage of this Func.
For introspection only: to modify schedule, use the Func interface.
Definition at line 2608 of file Func.h.
References Halide::Stage::get_schedule().
Referenced by do_cost_model_schedule().