Halide
StageStridedLoads.h
Go to the documentation of this file.
1 #ifndef HALIDE_INTERNAL_STAGE_STRIDED_LOADS_H
2 #define HALIDE_INTERNAL_STAGE_STRIDED_LOADS_H
3 
4 /** \file
5  *
6  * Defines the compiler pass that converts strided loads into dense loads
7  * followed by shuffles.
8  */
9 
10 #include "Expr.h"
11 
12 namespace Halide {
13 namespace Internal {
14 
15 /** Convert all unpredicated strided loads in a Stmt into dense loads followed
16  * by shuffles.
17  *
18  * For a stride of two, the trick is to do a dense load of twice the size, and
19  * then extract either the even or odd lanes. This was previously done in
20  * codegen, where it was challenging, because it's not easy to know there if
21  * it's safe to do the double-sized load, as it either loads one element beyond
22  * or before the original load. We used the alignment of the ramp base to try to
23  * tell if it was safe to shift backwards, and we added padding to internal
24  * allocations so that for those at least it was safe to shift
25  * forwards. Unfortunately the alignment of the ramp base is usually unknown if
26  * you don't know anything about the strides of the input, and adding padding to
27  * allocations was a serious wart in our memory allocators.
28  *
29  * This pass instead actively looks for evidence elsewhere in the Stmt (at some
30  * location which definitely executes whenever the load being transformed
31  * executes) that it's safe to read further forwards or backwards in memory. The
32  * evidence is in the form of a load at the same base address with a different
33  * constant offset. It also clusters groups of these loads so that they do the
34  * same dense load and extract the appropriate slice of lanes. If it fails to
35  * find any evidence, for loads from external buffers it does two overlapping
36  * half-sized dense loads and shuffles out the desired lanes, and for loads from
37  * internal allocations it adds padding to the allocation explicitly, by setting
38  * the padding field on Allocate nodes.
39  */
40 Stmt stage_strided_loads(const Stmt &s);
41 
42 } // namespace Internal
43 } // namespace Halide
44 
45 #endif
Halide
This file defines the class FunctionDAG, which is our representation of a Halide pipeline,...
Definition: AbstractGenerator.h:19
Halide::LinkageType::Internal
@ Internal
Not visible externally, similar to 'static' linkage in C.
Expr.h
Halide::Internal::stage_strided_loads
Stmt stage_strided_loads(const Stmt &s)
Convert all unpredicated strided loads in a Stmt into dense loads followed by shuffles.