summaryrefslogtreecommitdiff
path: root/docs
diff options
context:
space:
mode:
authorAyal Zaks <ayal.zaks@intel.com>2017-05-29 15:36:23 +0000
committerAyal Zaks <ayal.zaks@intel.com>2017-05-29 15:36:23 +0000
commit98be03e2e2a261628e067a4f89fb26fd799f7f86 (patch)
tree92afa7e1a839f0b334687f5b0178da5e059d7770 /docs
parent107c8c925ef28a5b3a572a6898f8833241014faa (diff)
[Docs] Add VectorizationPlan to docs/Proposals.
Following the request made in https://reviews.llvm.org/D32871, the general documentation of the Vectorization Plan is hereby placed under docs/Proposals. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@304161 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs')
-rw-r--r--docs/Proposals/VectorizationPlan.rst182
-rw-r--r--docs/Vectorizers.rst11
-rw-r--r--docs/index.rst3
3 files changed, 196 insertions, 0 deletions
diff --git a/docs/Proposals/VectorizationPlan.rst b/docs/Proposals/VectorizationPlan.rst
new file mode 100644
index 00000000000..82ce4b2de17
--- /dev/null
+++ b/docs/Proposals/VectorizationPlan.rst
@@ -0,0 +1,182 @@
+==================
+Vectorization Plan
+==================
+
+.. contents::
+ :local:
+
+Abstract
+========
+The vectorization transformation can be rather complicated, involving several
+potential alternatives, especially for outer-loops [1]_ but also possibly for
+innermost loops. These alternatives may have significant performance impact,
+both positive and negative. A cost model is therefore employed to identify the
+best alternative, including the alternative of avoiding any transformation
+altogether.
+
+The Vectorization Plan is an explicit model for describing vectorization
+candidates. It serves for both optimizing candidates including estimating their
+cost reliably, and for performing their final translation into IR. This
+facilitates dealing with multiple vectorization candidates.
+
+High-level Design
+=================
+
+Vectorization Workflow
+----------------------
+VPlan-based vectorization involves three major steps, taking a "scenario-based
+approach" to vectorization planning:
+
+1. Legal Step: check if a loop can be legally vectorized; encode contraints and
+ artifacts if so.
+2. Plan Step:
+
+ a. Build initial VPlans following the constraints and decisions taken by
+ Legal Step 1, and compute their cost.
+ b. Apply optimizations to the VPlans, possibly forking additional VPlans.
+ Prune sub-optimal VPlans having relatively high cost.
+3. Execute Step: materialize the best VPlan. Note that this is the only step
+ that modifies the IR.
+
+Design Guidelines
+-----------------
+In what follows, the term "input IR" refers to code that is fed into the
+vectorizer whereas the term "output IR" refers to code that is generated by the
+vectorizer. The output IR contains code that has been vectorized or "widened"
+according to a loop Vectorization Factor (VF), and/or loop unroll-and-jammed
+according to an Unroll Factor (UF).
+The design of VPlan follows several high-level guidelines:
+
+1. Analysis-like: building and manipulating VPlans must not modify the input IR.
+ In particular, if the best option is not to vectorize at all, the
+ vectorization process terminates before reaching Step 3, and compilation
+ should proceed as if VPlans had not been built.
+
+2. Align Cost & Execute: each VPlan must support both estimating the cost and
+ generating the output IR code, such that the cost estimation evaluates the
+ to-be-generated code reliably.
+
+3. Support vectorizing additional constructs:
+
+ a. Outer-loop vectorization. In particular, VPlan must be able to model the
+ control-flow of the output IR which may include multiple basic-blocks and
+ nested loops.
+ b. SLP vectorization.
+ c. Combinations of the above, including nested vectorization: vectorizing
+ both an inner loop and an outer-loop at the same time (each with its own
+ VF and UF), mixed vectorization: vectorizing a loop with SLP patterns
+ inside [4]_, (re)vectorizing input IR containing vector code.
+ d. Function vectorization [2]_.
+
+4. Support multiple candidates efficiently. In particular, similar candidates
+ related to a range of possible VF's and UF's must be represented efficiently.
+ Potential versioning needs to be supported efficiently.
+
+5. Support vectorizing idioms, such as interleaved groups of strided loads or
+ stores. This is achieved by modeling a sequence of output instructions using
+ a "Recipe", which is responsible for computing its cost and generating its
+ code.
+
+6. Encapsulate Single-Entry Single-Exit regions (SESE). During vectorization
+ such regions may need to be, for example, predicated and linearized, or
+ replicated VF*UF times to handle scalarized and predicated instructions.
+ Innerloops are also modelled as SESE regions.
+
+Low-level Design
+================
+The low-level design of VPlan comprises of the following classes.
+
+:LoopVectorizationPlanner:
+ A LoopVectorizationPlanner is designed to handle the vectorization of a loop
+ or a loop nest. It can construct, optimize and discard one or more VPlans,
+ each VPlan modelling a distinct way to vectorize the loop or the loop nest.
+ Once the best VPlan is determined, including the best VF and UF, this VPlan
+ drives the generation of output IR.
+
+:VPlan:
+ A model of a vectorized candidate for a given input IR loop or loop nest. This
+ candidate is represented using a Hierarchical CFG. VPlan supports estimating
+ the cost and driving the generation of the output IR code it represents.
+
+:Hierarchical CFG:
+ A control-flow graph whose nodes are basic-blocks or Hierarchical CFG's. The
+ Hierarchical CFG data structure is similar to the Tile Tree [5]_, where
+ cross-Tile edges are lifted to connect Tiles instead of the original
+ basic-blocks as in Sharir [6]_, promoting the Tile encapsulation. The terms
+ Region and Block are used rather than Tile [5]_ to avoid confusion with loop
+ tiling.
+
+:VPBlockBase:
+ The building block of the Hierarchical CFG. A pure-virtual base-class of
+ VPBasicBlock and VPRegionBlock, see below. VPBlockBase models the hierarchical
+ control-flow relations with other VPBlocks. Note that in contrast to the IR
+ BasicBlock, a VPBlockBase models its control-flow successors and predecessors
+ directly, rather than through a Terminator branch or through predecessor
+ branches that "use" the VPBlockBase.
+
+:VPBasicBlock:
+ VPBasicBlock is a subclass of VPBlockBase, and serves as the leaves of the
+ Hierarchical CFG. It represents a sequence of output IR instructions that will
+ appear consecutively in an output IR basic-block. The instructions of this
+ basic-block originate from one or more VPBasicBlocks. VPBasicBlock holds a
+ sequence of zero or more VPRecipes that model the cost and generation of the
+ output IR instructions.
+
+:VPRegionBlock:
+ VPRegionBlock is a subclass of VPBlockBase. It models a collection of
+ VPBasicBlocks and VPRegionBlocks which form a SESE subgraph of the output IR
+ CFG. A VPRegionBlock may indicate that its contents are to be replicated a
+ constant number of times when output IR is generated, effectively representing
+ a loop with constant trip-count that will be completely unrolled. This is used
+ to support scalarized and predicated instructions with a single model for
+ multiple candidate VF's and UF's.
+
+:VPRecipeBase:
+ A pure-virtual base class modeling a sequence of one or more output IR
+ instructions, possibly based on one or more input IR instructions. These
+ input IR instructions are referred to as "Ingredients" of the Recipe. A Recipe
+ may specify how its ingredients are to be transformed to produce the output IR
+ instructions; e.g., cloned once, replicated multiple times or widened
+ according to selected VF.
+
+:VPTransformState:
+ Stores information used for generating output IR, passed from
+ LoopVectorizationPlanner to its selected VPlan for execution, and used to pass
+ additional information down to VPBlocks and VPRecipes.
+
+Related LLVM components
+-----------------------
+1. SLP Vectorizer: one can compare the VPlan model with LLVM's existing SLP
+ tree, where TSLP [3]_ adds Plan Step 2.b.
+
+2. RegionInfo: one can compare VPlan's H-CFG with the Region Analysis as used by
+ Polly [7]_.
+
+References
+----------
+.. [1] "Outer-loop vectorization: revisited for short SIMD architectures", Dorit
+ Nuzman and Ayal Zaks, PACT 2008.
+
+.. [2] "Proposal for function vectorization and loop vectorization with function
+ calls", Xinmin Tian, [`cfe-dev
+ <http://lists.llvm.org/pipermail/cfe-dev/2016-March/047732.html>`_].,
+ March 2, 2016.
+ See also `review <https://reviews.llvm.org/D22792>`_.
+
+.. [3] "Throttling Automatic Vectorization: When Less is More", Vasileios
+ Porpodas and Tim Jones, PACT 2015 and LLVM Developers' Meeting 2015.
+
+.. [4] "Exploiting mixed SIMD parallelism by reducing data reorganization
+ overhead", Hao Zhou and Jingling Xue, CGO 2016.
+
+.. [5] "Register Allocation via Hierarchical Graph Coloring", David Callahan and
+ Brian Koblenz, PLDI 1991
+
+.. [6] "Structural analysis: A new approach to flow analysis in optimizing
+ compilers", M. Sharir, Journal of Computer Languages, Jan. 1980
+
+.. [7] "Enabling Polyhedral Optimizations in LLVM", Tobias Grosser, Diploma
+ thesis, 2011.
+
+.. [8] "Introducing VPlan to the Loop Vectorizer", Gil Rapaport and Ayal Zaks,
+ European LLVM Developers' Meeting 2017.
diff --git a/docs/Vectorizers.rst b/docs/Vectorizers.rst
index a909d458c31..317271af403 100644
--- a/docs/Vectorizers.rst
+++ b/docs/Vectorizers.rst
@@ -382,6 +382,17 @@ And Linpack-pc with the same configuration. Result is Mflops, higher is better.
.. image:: linpack-pc.png
+Ongoing Development Directions
+------------------------------
+
+.. toctree::
+ :hidden:
+
+ Proposals/VectorizationPlan
+
+:doc:`Proposals/VectorizationPlan`
+ Modeling the process and upgrading the infrastructure of LLVM's Loop Vectorizer.
+
.. _slp-vectorizer:
The SLP Vectorizer
diff --git a/docs/index.rst b/docs/index.rst
index becbe48e7ec..220df1566bd 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -528,6 +528,7 @@ can be better.
CodeOfConduct
Proposals/GitHubMove
+ Proposals/VectorizationPlan
:doc:`CodeOfConduct`
Proposal to adopt a code of conduct on the LLVM social spaces (lists, events,
@@ -536,6 +537,8 @@ can be better.
:doc:`Proposals/GitHubMove`
Proposal to move from SVN/Git to GitHub.
+:doc:`Proposals/VectorizationPlan`
+ Proposal to model the process and upgrade the infrastructure of LLVM's Loop Vectorizer.
Indices and tables
==================