Halide
StageStridedLoads.h
Go to the documentation of this file.
1
#ifndef HALIDE_INTERNAL_STAGE_STRIDED_LOADS_H
2
#define HALIDE_INTERNAL_STAGE_STRIDED_LOADS_H
3
4
/** \file
5
*
6
* Defines the compiler pass that converts strided loads into dense loads
7
* followed by shuffles.
8
*/
9
10
#include "
Expr.h
"
11
12
namespace
Halide
{
13
namespace
Internal
{
14
15
/** Convert all unpredicated strided loads in a Stmt into dense loads followed
16
* by shuffles.
17
*
18
* For a stride of two, the trick is to do a dense load of twice the size, and
19
* then extract either the even or odd lanes. This was previously done in
20
* codegen, where it was challenging, because it's not easy to know there if
21
* it's safe to do the double-sized load, as it either loads one element beyond
22
* or before the original load. We used the alignment of the ramp base to try to
23
* tell if it was safe to shift backwards, and we added padding to internal
24
* allocations so that for those at least it was safe to shift
25
* forwards. Unfortunately the alignment of the ramp base is usually unknown if
26
* you don't know anything about the strides of the input, and adding padding to
27
* allocations was a serious wart in our memory allocators.
28
*
29
* This pass instead actively looks for evidence elsewhere in the Stmt (at some
30
* location which definitely executes whenever the load being transformed
31
* executes) that it's safe to read further forwards or backwards in memory. The
32
* evidence is in the form of a load at the same base address with a different
33
* constant offset. It also clusters groups of these loads so that they do the
34
* same dense load and extract the appropriate slice of lanes. If it fails to
35
* find any evidence, for loads from external buffers it does two overlapping
36
* half-sized dense loads and shuffles out the desired lanes, and for loads from
37
* internal allocations it adds padding to the allocation explicitly, by setting
38
* the padding field on Allocate nodes.
39
*/
40
Stmt
stage_strided_loads
(
const
Stmt &s);
41
42
}
// namespace Internal
43
}
// namespace Halide
44
45
#endif
Halide
This file defines the class FunctionDAG, which is our representation of a Halide pipeline,...
Definition:
AbstractGenerator.h:19
Halide::LinkageType::Internal
@ Internal
Not visible externally, similar to 'static' linkage in C.
Expr.h
Halide::Internal::stage_strided_loads
Stmt stage_strided_loads(const Stmt &s)
Convert all unpredicated strided loads in a Stmt into dense loads followed by shuffles.
src
StageStridedLoads.h
Generated by
1.8.17