Halide 21.0.0
Halide compiler and libraries
Loading...
Searching...
No Matches
Schedule.h
Go to the documentation of this file.
1#ifndef HALIDE_SCHEDULE_H
2#define HALIDE_SCHEDULE_H
3
4/** \file
5 * Defines the internal representation of the schedule for a function
6 */
7
8#include <map>
9#include <string>
10#include <utility>
11#include <vector>
12
13#include "DeviceAPI.h"
14#include "Expr.h"
15#include "FunctionPtr.h"
17#include "Parameter.h"
18#include "PrefetchDirective.h"
19
20namespace Halide {
21
22class Func;
23struct VarOrRVar;
24
25namespace Internal {
26class Function;
27struct FunctionContents;
28struct LoopLevelContents;
29} // namespace Internal
30
31/** Different ways to handle a tail case in a split when the
32 * factor does not provably divide the extent. */
33enum class TailStrategy {
34 /** Round up the extent to be a multiple of the split
35 * factor. Not legal for RVars, as it would change the meaning
36 * of the algorithm. Pros: generates the simplest, fastest
37 * code. Cons: if used on a stage that reads from the input or
38 * writes to the output, constrains the input or output size
39 * to be a multiple of the split factor. */
41
42 /** Guard the inner loop with an if statement that prevents
43 * evaluation beyond the original extent. Always legal. The if
44 * statement is treated like a boundary condition, and
45 * factored out into a loop epilogue if possible. Pros: no
46 * redundant re-evaluation; does not constrain input our
47 * output sizes. Cons: increases code size due to separate
48 * tail-case handling; vectorization will scalarize in the tail
49 * case to handle the if statement. */
51
52 /** Guard the loads and stores in the loop with an if statement
53 * that prevents evaluation beyond the original extent. Always
54 * legal. The if statement is treated like a boundary condition,
55 * and factored out into a loop epilogue if possible.
56 * Pros: no redundant re-evaluation; does not constrain input or
57 * output sizes. Cons: increases code size due to separate
58 * tail-case handling. */
60
61 /** Guard the loads in the loop with an if statement that
62 * prevents evaluation beyond the original extent. Only legal
63 * for innermost splits. Not legal for RVars, as it would change
64 * the meaning of the algorithm. The if statement is treated like
65 * a boundary condition, and factored out into a loop epilogue if
66 * possible.
67 * Pros: does not constrain input sizes, output size constraints
68 * are simpler than full predication. Cons: increases code size
69 * due to separate tail-case handling, constrains the output size
70 * to be a multiple of the split factor. */
72
73 /** Guard the stores in the loop with an if statement that
74 * prevents evaluation beyond the original extent. Only legal
75 * for innermost splits. Not legal for RVars, as it would change
76 * the meaning of the algorithm. The if statement is treated like
77 * a boundary condition, and factored out into a loop epilogue if
78 * possible.
79 * Pros: does not constrain output sizes, input size constraints
80 * are simpler than full predication. Cons: increases code size
81 * due to separate tail-case handling, constraints the input size
82 * to be a multiple of the split factor.. */
84
85 /** Prevent evaluation beyond the original extent by shifting
86 * the tail case inwards, re-evaluating some points near the
87 * end. Only legal for pure variables in pure definitions. If
88 * the inner loop is very simple, the tail case is treated
89 * like a boundary condition and factored out into an
90 * epilogue.
91 *
92 * This is a good trade-off between several factors. Like
93 * RoundUp, it supports vectorization well, because the inner
94 * loop is always a fixed size with no data-dependent
95 * branching. It increases code size slightly for inner loops
96 * due to the epilogue handling, but not for outer loops
97 * (e.g. loops over tiles). If used on a stage that reads from
98 * an input or writes to an output, this stategy only requires
99 * that the input/output extent be at least the split factor,
100 * instead of a multiple of the split factor as with RoundUp. */
102
103 /** Equivalent to ShiftInwards, but protects values that would be
104 * re-evaluated by loading the memory location that would be stored to,
105 * modifying only the elements not contained within the overlap, and then
106 * storing the blended result.
107 *
108 * This tail strategy is useful when you want to use ShiftInwards to
109 * vectorize without a scalar tail, but are scheduling a stage where that
110 * isn't legal (e.g. an update definition).
111 *
112 * Because this is a read - modify - write, this tail strategy cannot be
113 * used on any dimension the stage is parallelized over as it would cause a
114 * race condition.
115 */
117
118 /** Equivalent to RoundUp, but protected values that would be written beyond
119 * the end by loading the memory location that would be stored to,
120 * modifying only the elements within the region being computed, and then
121 * storing the blended result.
122 *
123 * This tail strategy is useful when vectorizing an update to some sub-region
124 * of a larger Func. As with ShiftInwardsAndBlend, it can't be combined with
125 * parallelism.
126 */
128
129 /** For pure definitions use ShiftInwards. For pure vars in
130 * update definitions use RoundUp. For RVars in update
131 * definitions use GuardWithIf. */
133};
134
135/** Different ways to handle the case when the start/end of the loops of stages
136 * computed with (fused) are not aligned. */
138 /** Shift the start of the fused loops to align. */
140
141 /** Shift the end of the fused loops to align. */
143
144 /** compute_with will make no attempt to align the start/end of the
145 * fused loops. */
147
148 /** By default, LoopAlignStrategy is set to NoAlign. */
150};
151
152/** A reference to a site in a Halide statement at the top of the
153 * body of a particular for loop. Evaluating a region of a halide
154 * function is done by generating a loop nest that spans its
155 * dimensions. We schedule the inputs to that function by
156 * recursively injecting realizations for them at particular sites
157 * in this loop nest. A LoopLevel identifies such a site. The site
158 * can either be a loop nest within all stages of a function
159 * or it can refer to a loop nest within a particular function's
160 * stage (initial definition or updates).
161 *
162 * Note that a LoopLevel is essentially a pointer to an underlying value;
163 * all copies of a LoopLevel refer to the same site, so mutating one copy
164 * (via the set() method) will effectively mutate all copies:
165 \code
166 Func f;
167 Var x;
168 LoopLevel a(f, x);
169 // Both a and b refer to LoopLevel(f, x)
170 LoopLevel b = a;
171 // Now both a and b refer to LoopLevel::root()
172 a.set(LoopLevel::root());
173 \endcode
174 * This is quite useful when splitting Halide code into utility libraries, as it allows
175 * a library to schedule code according to a caller's specifications, even if the caller
176 * hasn't fully defined its pipeline yet:
177 \code
178 Func demosaic(Func input,
179 LoopLevel intermed_compute_at,
180 LoopLevel intermed_store_at,
181 LoopLevel output_compute_at) {
182 Func intermed = ...;
183 Func output = ...;
184 intermed.compute_at(intermed_compute_at).store_at(intermed_store_at);
185 output.compute_at(output_compute_at);
186 return output;
187 }
188
189 void process() {
190 // Note that these LoopLevels are all undefined when we pass them to demosaic()
191 LoopLevel intermed_compute_at, intermed_store_at, output_compute_at;
192 Func input = ...;
193 Func demosaiced = demosaic(input, intermed_compute_at, intermed_store_at, output_compute_at);
194 Func output = ...;
195
196 // We need to ensure all LoopLevels have a well-defined value prior to lowering:
197 intermed_compute_at.set(LoopLevel(output, y));
198 intermed_store_at.set(LoopLevel(output, y));
199 output_compute_at.set(LoopLevel(output, x));
200 }
201 \endcode
202 */
203class LoopLevel {
205
207 : contents(std::move(c)) {
208 }
209
210public:
211 /** Return the index of the function stage associated with this loop level.
212 * Asserts if undefined */
213 int stage_index() const;
214
215 /** Identify the loop nest corresponding to some dimension of some function */
216 // @{
217 LoopLevel(const Internal::Function &f, const VarOrRVar &v, int stage_index = -1);
218 LoopLevel(const Func &f, const VarOrRVar &v, int stage_index = -1);
219 // @}
220
221 /** Construct an undefined LoopLevel. Calling any method on an undefined
222 * LoopLevel (other than set()) will assert. */
224
225 /** For deserialization only. */
226 LoopLevel(const std::string &func_name, const std::string &var_name,
227 bool is_rvar, int stage_index, bool locked = false);
228
229 /** Construct a special LoopLevel value that implies
230 * that a function should be inlined away. */
231 static LoopLevel inlined();
232
233 /** Construct a special LoopLevel value which represents the
234 * location outside of all for loops. */
235 static LoopLevel root();
236
237 /** Mutate our contents to match the contents of 'other'. */
238 void set(const LoopLevel &other);
239
240 // All the public methods below this point are meant only for internal
241 // use by Halide, rather than user code; hence, they are deliberately
242 // documented with plain comments (rather than Doxygen) to avoid being
243 // present in user documentation.
244
245 // Lock this LoopLevel.
246 LoopLevel &lock();
247
248 // Return the Func name. Asserts if the LoopLevel is_root() or is_inlined() or !defined().
249 std::string func() const;
250
251 // Return the VarOrRVar. Asserts if the LoopLevel is_root() or is_inlined() or !defined().
252 VarOrRVar var() const;
253
254 // Return true iff the LoopLevel is defined. (Only LoopLevels created
255 // with the default ctor are undefined.)
256 bool defined() const;
257
258 // Test if a loop level corresponds to inlining the function.
259 bool is_inlined() const;
260
261 // Test if a loop level is 'root', which describes the site
262 // outside of all for loops.
263 bool is_root() const;
264
265 // For serialization only. Do not use in other cases.
266 int get_stage_index() const;
267
268 // For serialization only. Do not use in other cases.
269 std::string func_name() const;
270
271 // For serialization only. Do not use in other cases.
272 std::string var_name() const;
273
274 // For serialization only. Do not use in other cases.
275 bool is_rvar() const;
276
277 // For serialization only. Do not use in other cases.
278 bool locked() const;
279
280 // Return a string of the form func.var -- note that this is safe
281 // to call for root or inline LoopLevels, but asserts if !defined().
282 std::string to_string() const;
283
284 // Compare this loop level against the variable name of a for
285 // loop, to see if this loop level refers to the site
286 // immediately inside this loop. Asserts if !defined().
287 bool match(const std::string &loop) const;
288
289 bool match(const LoopLevel &other) const;
290
291 // Check if two loop levels are exactly the same.
292 bool operator==(const LoopLevel &other) const;
293
294 bool operator!=(const LoopLevel &other) const {
295 return !(*this == other);
296 }
297
298private:
299 void check_defined() const;
300 void check_locked() const;
301 void check_defined_and_locked() const;
302};
303
306 /** Contains alignment strategies for the fused dimensions (indexed by the
307 * dimension name). If not in the map, use the default alignment strategy
308 * to align the fused dimension (see \ref LoopAlignStrategy::Auto).
309 */
310 std::map<std::string, LoopAlignStrategy> align;
311
313 : level(LoopLevel::inlined().lock()) {
314 }
315 FuseLoopLevel(const LoopLevel &level, const std::map<std::string, LoopAlignStrategy> &align)
316 : level(level), align(align) {
317 }
318};
319
320namespace Internal {
321
322class IRMutator;
323struct ReductionVariable;
324
325struct Split {
326 std::string old_var, outer, inner;
328 bool exact; // Is it required that the factor divides the extent
329 // of the old var. True for splits of RVars. Forces
330 // tail strategy to be GuardWithIf.
332
336
337 // If split_type is Rename, then this is just a renaming of the
338 // old_var to the outer and not a split. The inner var should
339 // be ignored, and factor should be one. Renames are kept in
340 // the same list as splits so that ordering between them is
341 // respected.
342
343 // If split_type is Fuse, then this does the opposite of a
344 // split, it joins the outer and inner into the old_var.
346};
347
348/** Each Dim below has a dim_type, which tells you what
349 * transformations are legal on it. When you combine two Dims of
350 * distinct DimTypes (e.g. with Stage::fuse), the combined result has
351 * the greater enum value of the two types. */
352enum class DimType {
353 /** This dim originated from a Var. You can evaluate a Func at
354 * distinct values of this Var in any order over an interval
355 * that's at least as large as the interval required. In pure
356 * definitions you can even redundantly re-evaluate points. */
358
359 /** The dim originated from an RVar. You can evaluate a Func at
360 * distinct values of this RVar in any order (including in
361 * parallel) over exactly the interval specified in the
362 * RDom. PureRVars can also be reordered arbitrarily in the dims
363 * list, as there are no data hazards between the evaluation of
364 * the Func at distinct values of the RVar.
365 *
366 * The most common case where an RVar is considered pure is RVars
367 * that are used in a way which obeys all the syntactic
368 * constraints that a Var does, e.g:
369 *
370 \code
371 RDom r(0, 100);
372 f(r.x) = f(r.x) + 5;
373 \endcode
374 *
375 * Other cases where RVars are pure are where the sites being
376 * written to by the Func evaluated at one value of the RVar
377 * couldn't possibly collide with the sites being written or read
378 * by the Func at a distinct value of the RVar. For example, r.x
379 * is pure in the following three definitions:
380 *
381 \code
382
383 // This definition writes to even coordinates and reads from the
384 // same site (which no other value of r.x is writing to) and odd
385 // sites (which no other value of r.x is writing to):
386 f(2*r.x) = max(f(2*r.x), f(2*r.x + 7));
387
388 // This definition writes to scanline zero and reads from the the
389 // same site and scanline one:
390 f(r.x, 0) += f(r.x, 1);
391
392 // This definition reads and writes over non-overlapping ranges:
393 f(r.x + 100) += f(r.x);
394 \endcode
395 *
396 * To give two counterexamples, r.x is not pure in the following
397 * definitions:
398 *
399 \code
400 // The same site is written by distinct values of the RVar
401 // (write-after-write hazard):
402 f(r.x / 2) += f(r.x);
403
404 // One value of r.x reads from a site that another value of r.x
405 // is writing to (read-after-write hazard):
406 f(r.x) += f(r.x + 1);
407 \endcode
408 */
410
411 /** The dim originated from an RVar. You must evaluate a Func at
412 * distinct values of this RVar in increasing order over precisely
413 * the interval specified in the RDom. ImpureRVars may not be
414 * reordered with respect to other ImpureRVars.
415 *
416 * All RVars are impure by default. Those for which we can prove
417 * no data hazards exist get promoted to PureRVar. There are two
418 * instances in which ImpureRVars may be parallelized or reordered
419 * even in the presence of hazards:
420 *
421 * 1) In the case of an update definition that has been proven to be
422 * an associative and commutative reduction, reordering of
423 * ImpureRVars is allowed, and parallelizing them is allowed if
424 * the update has been made atomic.
425 *
426 * 2) ImpureRVars can also be reordered and parallelized if
427 * Func::allow_race_conditions() has been set. This is the escape
428 * hatch for when there are no hazards but the checks above failed
429 * to prove that (RDom::where can encode arbitrary facts about
430 * non-linear integer arithmetic, which is undecidable), or for
431 * when you don't actually care about the non-determinism
432 * introduced by data hazards (e.g. in the algorithm HOGWILD!).
433 */
435};
436
437/** The Dim struct represents one loop in the schedule's
438 * representation of a loop nest. */
439struct Dim {
440 /** Name of the loop variable */
441 std::string var;
442
443 /** How are the loop values traversed (e.g. unrolled, vectorized, parallel) */
445
446 /** On what device does the body of the loop execute (e.g. Host, GPU, Hexagon) */
448
449 /** The DimType tells us what transformations are legal on this
450 * loop (see the DimType enum above). */
452
453 /** The strategy for loop partitioning. */
455
456 /** Can this loop be evaluated in any order (including in
457 * parallel)? Equivalently, are there no data hazards between
458 * evaluations of the Func at distinct values of this var? */
459 bool is_pure() const {
461 }
462
463 /** Did this loop originate from an RVar (in which case the bounds
464 * of the loops are algorithmically meaningful)? */
465 bool is_rvar() const {
467 }
468
469 /** Could multiple iterations of this loop happen at the same
470 * time, with reads and writes interleaved in arbitrary ways
471 * according to the memory model of the underlying compiler and
472 * machine? */
476
477 /** Could multiple iterations of this loop happen at the same
478 * time? Vectorized and GPULanes loop types are parallel but not
479 * unordered, because the loop iterations proceed together in
480 * lockstep with some well-defined outcome if there are hazards. */
481 bool is_parallel() const {
483 }
484};
485
486/** A bound on a loop, typically from Func::bound */
487struct Bound {
488 /** The loop var being bounded */
489 std::string var;
490
491 /** Declared min and extent of the loop. min may be undefined if
492 * Func::bound_extent was used. */
494
495 /** If defined, the number of iterations will be a multiple of
496 * "modulus", and the first iteration will be at a value congruent
497 * to "remainder" modulo "modulus". Set by Func::align_bounds and
498 * Func::align_extent. */
500};
501
502/** Properties of one axis of the storage of a Func */
504 /** The var in the pure definition corresponding to this axis */
505 std::string var;
506
507 /** The bounds allocated (not computed) must be a multiple of
508 * "alignment". Set by Func::align_storage. */
510
511 /** The bounds allocated (not computed). Set by Func::bound_storage. */
513
514 /** If the Func is explicitly folded along this axis (with
515 * Func::fold_storage) this gives the extent of the circular
516 * buffer used, and whether it is used in increasing order
517 * (fold_forward = true) or decreasing order (fold_forward =
518 * false). */
521};
522
523/** This represents two stages with fused loop nests from outermost to
524 * a specific loop level. The loops to compute func_1(stage_1) are
525 * fused with the loops to compute func_2(stage_2) from outermost to
526 * loop level var_name and the computation from stage_1 of func_1
527 * occurs first.
528 */
529struct FusedPair {
530 std::string func_1;
531 std::string func_2;
532 size_t stage_1;
533 size_t stage_2;
534 std::string var_name;
535
536 FusedPair() = default;
537 FusedPair(const std::string &f1, size_t s1, const std::string &f2,
538 size_t s2, const std::string &var)
539 : func_1(f1), func_2(f2), stage_1(s1), stage_2(s2), var_name(var) {
540 }
541
542 bool operator==(const FusedPair &other) const {
543 return (func_1 == other.func_1) && (func_2 == other.func_2) &&
544 (stage_1 == other.stage_1) && (stage_2 == other.stage_2) &&
545 (var_name == other.var_name);
546 }
547 bool operator<(const FusedPair &other) const {
548 if (func_1 != other.func_1) {
549 return func_1 < other.func_1;
550 }
551 if (func_2 != other.func_2) {
552 return func_2 < other.func_2;
553 }
554 if (var_name != other.var_name) {
555 return var_name < other.var_name;
556 }
557 if (stage_1 != other.stage_1) {
558 return stage_1 < other.stage_1;
559 }
560 return stage_2 < other.stage_2;
561 }
562};
563
564struct FuncScheduleContents;
565struct StageScheduleContents;
566struct FunctionContents;
567
568/** A schedule for a Function of a Halide pipeline. This schedule is
569 * applied to all stages of the Function. Right now this interface is
570 * basically a struct, offering mutable access to its innards.
571 * In the future it may become more encapsulated. */
574
575public:
577 : contents(std::move(c)) {
578 }
579 FuncSchedule(const FuncSchedule &other) = default;
581
582 /** Return a deep copy of this FuncSchedule. It recursively deep copies all
583 * called functions, schedules, specializations, and reduction domains. This
584 * method takes a map of <old FunctionContents, deep-copied version> as input
585 * and would use the deep-copied FunctionContents from the map if exists
586 * instead of creating a new deep-copy to avoid creating deep-copies of the
587 * same FunctionContents multiple times.
588 */
590 std::map<FunctionPtr, FunctionPtr> &copied_map) const;
591
592 /** This flag is set to true if the schedule is memoized. */
593 // @{
594 bool &memoized();
595 bool memoized() const;
596 // @}
597
598 /** This flag is set to true if the schedule is memoized and has an attached
599 * eviction key. */
600 // @{
603 // @}
604
605 /** Is the production of this Function done asynchronously */
606 bool &async();
607 bool async() const;
608
611
612 /** The list and order of dimensions used to store this
613 * function. The first dimension in the vector corresponds to the
614 * innermost dimension for storage (i.e. which dimension is
615 * tightly packed in memory) */
616 // @{
617 const std::vector<StorageDim> &storage_dims() const;
618 std::vector<StorageDim> &storage_dims();
619 // @}
620
621 /** The memory type (heap/stack/shared/etc) used to back this Func. */
622 // @{
625 // @}
626
627 /** You may explicitly bound some of the dimensions of a function,
628 * or constrain them to lie on multiples of a given factor. See
629 * \ref Func::bound and \ref Func::align_bounds and \ref Func::align_extent. */
630 // @{
631 const std::vector<Bound> &bounds() const;
632 std::vector<Bound> &bounds();
633 // @}
634
635 /** You may explicitly specify an estimate of some of the function
636 * dimensions. See \ref Func::set_estimate */
637 // @{
638 const std::vector<Bound> &estimates() const;
639 std::vector<Bound> &estimates();
640 // @}
641
642 /** Mark calls of a function by 'f' to be replaced with its identity
643 * wrapper or clone during the lowering stage. If the string 'f' is empty,
644 * it means replace all calls to the function by all other functions
645 * (excluding itself) in the pipeline with the global identity wrapper.
646 * See \ref Func::in and \ref Func::clone_in for more details. */
647 // @{
648 const std::map<std::string, Internal::FunctionPtr> &wrappers() const;
649 std::map<std::string, Internal::FunctionPtr> &wrappers();
650 void add_wrapper(const std::string &f,
651 const Internal::FunctionPtr &wrapper);
652 // @}
653
654 /** At what sites should we inject the allocation and the
655 * computation of this function? The store_level must be outside
656 * of or equal to the compute_level. If the compute_level is
657 * inline, the store_level is meaningless. See \ref Func::store_at
658 * and \ref Func::compute_at */
659 // @{
660 const LoopLevel &store_level() const;
661 const LoopLevel &compute_level() const;
666 // @}
667
668 /** Pass an IRVisitor through to all Exprs referenced in the
669 * Schedule. */
670 void accept(IRVisitor *) const;
671
672 /** Pass an IRMutator through to all Exprs referenced in the
673 * Schedule. */
675};
676
677/** A schedule for a single stage of a Halide pipeline. Right now this
678 * interface is basically a struct, offering mutable access to its
679 * innards. In the future it may become more encapsulated. */
682
683public:
685 : contents(std::move(c)) {
686 }
687 StageSchedule(const StageSchedule &other) = default;
689 StageSchedule(const std::vector<ReductionVariable> &rvars, const std::vector<Split> &splits,
690 const std::vector<Dim> &dims, const std::vector<PrefetchDirective> &prefetches,
691 const FuseLoopLevel &fuse_level, const std::vector<FusedPair> &fused_pairs,
693
694 /** Return a copy of this StageSchedule. */
696
697 /** This flag is set to true if the dims list has been manipulated
698 * by the user (or if a ScheduleHandle was created that could have
699 * been used to manipulate it). It controls the warning that
700 * occurs if you schedule the vars of the pure step but not the
701 * update steps. */
702 // @{
703 bool &touched();
704 bool touched() const;
705 // @}
706
707 /** RVars of reduction domain associated with this schedule if there is any. */
708 // @{
709 const std::vector<ReductionVariable> &rvars() const;
710 std::vector<ReductionVariable> &rvars();
711 // @}
712
713 /** The traversal of the domain of a function can have some of its
714 * dimensions split into sub-dimensions. See \ref Func::split */
715 // @{
716 const std::vector<Split> &splits() const;
717 std::vector<Split> &splits();
718 // @}
719
720 /** The list and ordering of dimensions used to evaluate this
721 * function, after all splits have taken place. The first
722 * dimension in the vector corresponds to the innermost for loop,
723 * and the last is the outermost. Also specifies what type of for
724 * loop to use for each dimension. Does not specify the bounds on
725 * each dimension. These get inferred from how the function is
726 * used, what the splits are, and any optional bounds in the list below. */
727 // @{
728 const std::vector<Dim> &dims() const;
729 std::vector<Dim> &dims();
730 // @}
731
732 /** You may perform prefetching in some of the dimensions of a
733 * function. See \ref Func::prefetch */
734 // @{
735 const std::vector<PrefetchDirective> &prefetches() const;
736 std::vector<PrefetchDirective> &prefetches();
737 // @}
738
739 /** Innermost loop level of fused loop nest for this function stage.
740 * Fusion runs from outermost to this loop level. The stages being fused
741 * should not have producer/consumer relationship. See \ref Func::compute_with
742 * and \ref Func::compute_with */
743 // @{
744 const FuseLoopLevel &fuse_level() const;
746 // @}
747
748 /** List of function stages that are to be fused with this function stage
749 * from the outermost loop to a certain loop level. Those function stages
750 * are to be computed AFTER this function stage at the last fused loop level.
751 * This list is populated when realization_order() is called. See
752 * \ref Func::compute_with */
753 // @{
754 const std::vector<FusedPair> &fused_pairs() const;
755 std::vector<FusedPair> &fused_pairs();
756
757 /** Are race conditions permitted? */
758 // @{
761 // @}
762
763 /** Use atomic update? */
764 // @{
765 bool atomic() const;
766 bool &atomic();
767 // @}
768
769 /** Atomic updates are only allowed on associative reductions.
770 * We try to prove the associativity, but the user can override
771 * the associativity test and suppress compiler error if the prover
772 * fails to recognize the associativity or the user does not care. */
773 // @{
776 // @}
777
778 /** Pass an IRVisitor through to all Exprs referenced in the
779 * Schedule. */
780 void accept(IRVisitor *) const;
781
782 /** Pass an IRMutator through to all Exprs referenced in the
783 * Schedule. */
785};
786
787} // namespace Internal
788} // namespace Halide
789
790#endif
Defines DeviceAPI.
Base classes for Halide expressions (Halide::Expr) and statements (Halide::Internal::Stmt)
Defines the Partition enum.
Defines the internal representation of parameters to halide piplines.
Defines the PrefetchDirective struct.
A halide function.
Definition Func.h:706
Expr & memoize_eviction_key()
This flag is set to true if the schedule is memoized and has an attached eviction key.
void add_wrapper(const std::string &f, const Internal::FunctionPtr &wrapper)
const std::vector< Bound > & estimates() const
You may explicitly specify an estimate of some of the function dimensions.
const std::vector< Bound > & bounds() const
You may explicitly bound some of the dimensions of a function, or constrain them to lie on multiples ...
void mutate(IRMutator *)
Pass an IRMutator through to all Exprs referenced in the Schedule.
bool & async()
Is the production of this Function done asynchronously.
const LoopLevel & hoist_storage_level() const
void accept(IRVisitor *) const
Pass an IRVisitor through to all Exprs referenced in the Schedule.
const LoopLevel & store_level() const
At what sites should we inject the allocation and the computation of this function?
const std::map< std::string, Internal::FunctionPtr > & wrappers() const
Mark calls of a function by 'f' to be replaced with its identity wrapper or clone during the lowering...
MemoryType memory_type() const
The memory type (heap/stack/shared/etc) used to back this Func.
std::vector< StorageDim > & storage_dims()
FuncSchedule(const FuncSchedule &other)=default
FuncSchedule(IntrusivePtr< FuncScheduleContents > c)
Definition Schedule.h:576
std::map< std::string, Internal::FunctionPtr > & wrappers()
FuncSchedule deep_copy(std::map< FunctionPtr, FunctionPtr > &copied_map) const
Return a deep copy of this FuncSchedule.
bool & memoized()
This flag is set to true if the schedule is memoized.
const LoopLevel & compute_level() const
std::vector< Bound > & estimates()
const std::vector< StorageDim > & storage_dims() const
The list and order of dimensions used to store this function.
std::vector< Bound > & bounds()
A reference-counted handle to Halide's internal representation of a function.
Definition Function.h:39
A base class for passes over the IR which modify it (e.g.
Definition IRMutator.h:26
A base class for algorithms that need to recursively walk over the IR.
Definition IRVisitor.h:19
StageSchedule(IntrusivePtr< StageScheduleContents > c)
Definition Schedule.h:684
std::vector< PrefetchDirective > & prefetches()
std::vector< ReductionVariable > & rvars()
const std::vector< ReductionVariable > & rvars() const
RVars of reduction domain associated with this schedule if there is any.
StageSchedule get_copy() const
Return a copy of this StageSchedule.
bool & touched()
This flag is set to true if the dims list has been manipulated by the user (or if a ScheduleHandle wa...
std::vector< FusedPair > & fused_pairs()
const std::vector< FusedPair > & fused_pairs() const
List of function stages that are to be fused with this function stage from the outermost loop to a ce...
std::vector< Split > & splits()
bool allow_race_conditions() const
Are race conditions permitted?
StageSchedule(const std::vector< ReductionVariable > &rvars, const std::vector< Split > &splits, const std::vector< Dim > &dims, const std::vector< PrefetchDirective > &prefetches, const FuseLoopLevel &fuse_level, const std::vector< FusedPair > &fused_pairs, bool touched, bool allow_race_conditions, bool atomic, bool override_atomic_associativity_test)
const FuseLoopLevel & fuse_level() const
Innermost loop level of fused loop nest for this function stage.
bool atomic() const
Use atomic update?
const std::vector< Split > & splits() const
The traversal of the domain of a function can have some of its dimensions split into sub-dimensions.
std::vector< Dim > & dims()
const std::vector< Dim > & dims() const
The list and ordering of dimensions used to evaluate this function, after all splits have taken place...
void mutate(IRMutator *)
Pass an IRMutator through to all Exprs referenced in the Schedule.
void accept(IRVisitor *) const
Pass an IRVisitor through to all Exprs referenced in the Schedule.
const std::vector< PrefetchDirective > & prefetches() const
You may perform prefetching in some of the dimensions of a function.
bool override_atomic_associativity_test() const
Atomic updates are only allowed on associative reductions.
StageSchedule(const StageSchedule &other)=default
A reference to a site in a Halide statement at the top of the body of a particular for loop.
Definition Schedule.h:203
VarOrRVar var() const
std::string to_string() const
static LoopLevel root()
Construct a special LoopLevel value which represents the location outside of all for loops.
LoopLevel(const Internal::Function &f, const VarOrRVar &v, int stage_index=-1)
Identify the loop nest corresponding to some dimension of some function.
static LoopLevel inlined()
Construct a special LoopLevel value that implies that a function should be inlined away.
int get_stage_index() const
LoopLevel(const Func &f, const VarOrRVar &v, int stage_index=-1)
bool operator==(const LoopLevel &other) const
std::string func() const
int stage_index() const
Return the index of the function stage associated with this loop level.
LoopLevel()
Construct an undefined LoopLevel.
std::string var_name() const
void set(const LoopLevel &other)
Mutate our contents to match the contents of 'other'.
bool match(const LoopLevel &other) const
bool locked() const
bool is_root() const
LoopLevel(const std::string &func_name, const std::string &var_name, bool is_rvar, int stage_index, bool locked=false)
For deserialization only.
bool is_inlined() const
bool operator!=(const LoopLevel &other) const
Definition Schedule.h:294
std::string func_name() const
bool defined() const
bool match(const std::string &loop) const
bool is_rvar() const
LoopLevel & lock()
DimType
Each Dim below has a dim_type, which tells you what transformations are legal on it.
Definition Schedule.h:352
@ ImpureRVar
The dim originated from an RVar.
Definition Schedule.h:434
@ PureRVar
The dim originated from an RVar.
Definition Schedule.h:409
@ PureVar
This dim originated from a Var.
Definition Schedule.h:357
ForType
An enum describing a type of loop traversal.
Definition Expr.h:406
bool is_unordered_parallel(ForType for_type)
Check if for_type executes for loop iterations in parallel and unordered.
bool is_parallel(ForType for_type)
Returns true if for_type executes for loop iterations in parallel.
This file defines the class FunctionDAG, which is our representation of a Halide pipeline,...
@ GuardWithIf
Guard the prefetch with if-guards that ignores the prefetch if any of the prefetched region ever goes...
TailStrategy
Different ways to handle a tail case in a split when the factor does not provably divide the extent.
Definition Schedule.h:33
@ RoundUp
Round up the extent to be a multiple of the split factor.
Definition Schedule.h:40
@ RoundUpAndBlend
Equivalent to RoundUp, but protected values that would be written beyond the end by loading the memor...
Definition Schedule.h:127
@ Predicate
Guard the loads and stores in the loop with an if statement that prevents evaluation beyond the origi...
Definition Schedule.h:59
@ PredicateStores
Guard the stores in the loop with an if statement that prevents evaluation beyond the original extent...
Definition Schedule.h:83
@ ShiftInwardsAndBlend
Equivalent to ShiftInwards, but protects values that would be re-evaluated by loading the memory loca...
Definition Schedule.h:116
@ ShiftInwards
Prevent evaluation beyond the original extent by shifting the tail case inwards, re-evaluating some p...
Definition Schedule.h:101
@ PredicateLoads
Guard the loads in the loop with an if statement that prevents evaluation beyond the original extent.
Definition Schedule.h:71
LoopAlignStrategy
Different ways to handle the case when the start/end of the loops of stages computed with (fused) are...
Definition Schedule.h:137
@ NoAlign
compute_with will make no attempt to align the start/end of the fused loops.
Definition Schedule.h:146
@ AlignEnd
Shift the end of the fused loops to align.
Definition Schedule.h:142
@ AlignStart
Shift the start of the fused loops to align.
Definition Schedule.h:139
DeviceAPI
An enum describing a type of device API.
Definition DeviceAPI.h:15
MemoryType
An enum describing different address spaces to be used with Func::store_in.
Definition Expr.h:353
@ Auto
Let Halide select a storage type automatically.
Definition Expr.h:355
Partition
Different ways to handle loops with a potentially optimizable boundary conditions.
A fragment of Halide syntax.
Definition Expr.h:258
std::map< std::string, LoopAlignStrategy > align
Contains alignment strategies for the fused dimensions (indexed by the dimension name).
Definition Schedule.h:310
FuseLoopLevel(const LoopLevel &level, const std::map< std::string, LoopAlignStrategy > &align)
Definition Schedule.h:315
A bound on a loop, typically from Func::bound.
Definition Schedule.h:487
Expr min
Declared min and extent of the loop.
Definition Schedule.h:493
std::string var
The loop var being bounded.
Definition Schedule.h:489
Expr modulus
If defined, the number of iterations will be a multiple of "modulus", and the first iteration will be...
Definition Schedule.h:499
The Dim struct represents one loop in the schedule's representation of a loop nest.
Definition Schedule.h:439
std::string var
Name of the loop variable.
Definition Schedule.h:441
DimType dim_type
The DimType tells us what transformations are legal on this loop (see the DimType enum above).
Definition Schedule.h:451
Partition partition_policy
The strategy for loop partitioning.
Definition Schedule.h:454
bool is_rvar() const
Did this loop originate from an RVar (in which case the bounds of the loops are algorithmically meani...
Definition Schedule.h:465
ForType for_type
How are the loop values traversed (e.g.
Definition Schedule.h:444
DeviceAPI device_api
On what device does the body of the loop execute (e.g.
Definition Schedule.h:447
bool is_parallel() const
Could multiple iterations of this loop happen at the same time?
Definition Schedule.h:481
bool is_unordered_parallel() const
Could multiple iterations of this loop happen at the same time, with reads and writes interleaved in ...
Definition Schedule.h:473
bool is_pure() const
Can this loop be evaluated in any order (including in parallel)?
Definition Schedule.h:459
A possibly-weak pointer to a Halide function.
Definition FunctionPtr.h:27
bool operator==(const FusedPair &other) const
Definition Schedule.h:542
FusedPair(const std::string &f1, size_t s1, const std::string &f2, size_t s2, const std::string &var)
Definition Schedule.h:537
bool operator<(const FusedPair &other) const
Definition Schedule.h:547
Intrusive shared pointers have a reference count (a RefCount object) stored in the class itself.
Properties of one axis of the storage of a Func.
Definition Schedule.h:503
std::string var
The var in the pure definition corresponding to this axis.
Definition Schedule.h:505
Expr alignment
The bounds allocated (not computed) must be a multiple of "alignment".
Definition Schedule.h:509
Expr bound
The bounds allocated (not computed).
Definition Schedule.h:512
Expr fold_factor
If the Func is explicitly folded along this axis (with Func::fold_storage) this gives the extent of t...
Definition Schedule.h:519
A class that can represent Vars or RVars.
Definition Func.h:29