[AMDGPU] Update code object description

- Use ELF header flags to identify processor. - Remove isa note record. - Add target feature section. - Make metadata for NumVGPRs, NumSGPRs and MaxFlatWorkGroupSize required. - Add FixedWorkGroupSize to CodeProps metadata. - Add ReqdWorkGroupSize* to kernel descriptor and move MaxFlatWorkGroupSize to be adjacent. - Move IsXNACKEnabled in the kernel descriptor to be at the end of the unused flags. - Remove IsDynamicCallStack from the metadata and kernel descriptor. - Remove legacy debugger metadata. - Remove old XNACK enabled processor names. Differential Revision: https://reviews.llvm.org/D39828 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@317855 91177308-0d34-0410-b5e6-96231b3b80d8
author: Tony Tye <Tony.Tye@amd.com> 2017-11-10 01:00:54 +0000
committer: Tony Tye <Tony.Tye@amd.com> 2017-11-10 01:00:54 +0000
commit: 9000e8c60d07871233823d8a76a67121cdbee9dd (patch)
tree: 9196616e8320b7aa0a570eb0aca5d46a1bd851f4 /docs
parent: 8887214dfa91e035edf1b1f884651c52b3bd54d4 (diff)
1 files changed, 368 insertions, 295 deletions
diff --git a/docs/AMDGPUUsage.rst b/docs/AMDGPUUsage.rst
index 7062d75d92e..9ff266e2092 100644
--- a/docs/AMDGPUUsage.rst
+++ b/docs/AMDGPUUsage.rst
@@ -23,50 +23,55 @@ Target Triples
 Use the ``clang -target <Architecture>-<Vendor>-<OS>-<Environment>`` option to
 specify the target triple:
 
-  .. table:: AMDGPU Target Triples
-     :name: amdgpu-target-triples-table
-
-     ============ ======== ========= ===========
-     Architecture Vendor   OS        Environment
-     ============ ======== ========= ===========
-     r600         amd      <empty>   <empty>
-     amdgcn       amd      <empty>   <empty>
-     amdgcn       amd      amdhsa    <empty>
-     amdgcn       amd      amdhsa    opencl
-     amdgcn       amd      amdhsa    amdgizcl
-     amdgcn       amd      amdhsa    amdgiz
-     amdgcn       amd      amdhsa    hcc
-     ============ ======== ========= ===========
-
-``r600-amd--``
-  Supports AMD GPUs HD2XXX-HD6XXX for graphics and compute shaders executed on
-  the MESA runtime.
-
-``amdgcn-amd--``
-  Supports AMD GPUs GCN GFX6 onwards for graphics and compute shaders executed on
-  the MESA runtime.
-
-``amdgcn-amd-amdhsa-``
-  Supports AMD GCN GPUs GFX6 onwards for compute kernels executed on HSA [HSA]_
-  compatible runtimes such as AMD's ROCm [AMD-ROCm]_.
-
-``amdgcn-amd-amdhsa-opencl``
-  Supports AMD GCN GPUs GFX6 onwards for OpenCL compute kernels executed on HSA
-  [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. See
-  :ref:`amdgpu-opencl`.
-
-``amdgcn-amd-amdhsa-amdgizcl``
-  Same as ``amdgcn-amd-amdhsa-opencl`` except a different address space mapping
-  is used (see :ref:`amdgpu-address-spaces`).
-
-``amdgcn-amd-amdhsa-amdgiz``
-  Same as ``amdgcn-amd-amdhsa-`` except a different address space mapping is
-  used (see :ref:`amdgpu-address-spaces`).
-
-``amdgcn-amd-amdhsa-hcc``
-  Supports AMD GCN GPUs GFX6 onwards for AMD HC language compute kernels
-  executed on HSA [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. See
-  :ref:`amdgpu-hcc`.
+  .. table:: AMDGPU Architectures
+     :name: amdgpu-architecture-table
+
+     ============ ==============================================================
+     Architecture Description
+     ============ ==============================================================
+     ``r600``     AMD GPUs HD2XXX-HD6XXX for graphics and compute shaders.
+     ``amdgcn``   AMD GPUs GCN GFX6 onwards for graphics and compute shaders.
+     ============ ==============================================================
+
+  .. table:: AMDGPU Vendors
+     :name: amdgpu-vendor-table
+
+     ============ ==============================================================
+     Vendor       Description
+     ============ ==============================================================
+     ``amd``      Can be used for all AMD GPU usage.
+     ``mesa3d``   Can be used if the OS is ``mesa3d``.
+     ============ ==============================================================
+
+  .. table:: AMDGPU Operating Systems
+     :name: amdgpu-os-table
+
+     ============== ============================================================
+     OS             Description
+     ============== ============================================================
+     *<empty>*      Defaults to the *unknown* OS.
+     ``amdhsa``     Compute kernels executed on HSA [HSA]_ compatible runtimes
+                    such as AMD's ROCm [AMD-ROCm]_.
+     ``amdpal``     Graphic shaders and compute kernels executed on AMD PAL
+                    runtime.
+     ``mesa3d``     Graphic shaders and compute kernels executed on Mesa 3D
+                    runtime.
+     ============== ============================================================
+
+  .. table:: AMDGPU Environments
+     :name: amdgpu-environment-table
+
+     ============ ==============================================================
+     Environment  Description
+     ============ ==============================================================
+     *<empty>*    Defaults to ``opencl``.
+     ``opencl``   OpenCL compute kernel (see :ref:`amdgpu-opencl`).
+     ``amdgizcl`` Same as ``opencl`` except a different address space mapping is
+                  used (see :ref:`amdgpu-address-spaces`).
+     ``amdgiz``   Same as ``opencl`` except a different address space mapping is
+                  used (see :ref:`amdgpu-address-spaces`).
+     ``hcc``      AMD HC language compute kernel (see :ref:`amdgpu-hcc`).
+     ============ ==============================================================
 
 .. _amdgpu-processors:
 
@@ -77,139 +82,179 @@ Use the ``clang -mcpu <Processor>`` option to specify the AMD GPU processor. The
 names from both the *Processor* and *Alternative Processor* can be used.
 
   .. table:: AMDGPU Processors
-     :name: amdgpu-processors-table
+     :name: amdgpu-processor-table
 
-     ========== =========== ============ ===== ======= ==================
-     Processor  Alternative Target       dGPU/ Runtime Example
-                Processor   Triple       APU   Support Products
-                            Architecture
-     ========== =========== ============ ===== ======= ==================
+     =========== =============== ============ ===== ======= ==================
+     Processor   Alternative     Target       dGPU/ ROCm    Example
+                 Processor       Triple       APU   Support Products
+                                 Architecture
+     =========== =============== ============ ===== ======= ==================
      **Radeon HD 2000/3000 Series (R600)** [AMD-RADEON-HD-2000-3000]_
-     --------------------------------------------------------------------
-     r600                   r600         dGPU
-     r630                   r600         dGPU
-     rs880                  r600         dGPU
-     rv670                  r600         dGPU
+     -------------------------------------------------------------------------
+     ``r600``                    ``r600``     dGPU
+     ``r630``                    ``r600``     dGPU
+     ``rs880``                   ``r600``     dGPU
+     ``rv670``                   ``r600``     dGPU
      **Radeon HD 4000 Series (R700)** [AMD-RADEON-HD-4000]_
-     --------------------------------------------------------------------
-     rv710                  r600         dGPU
-     rv730                  r600         dGPU
-     rv770                  r600         dGPU
+     -------------------------------------------------------------------------
+     ``rv710``                   ``r600``     dGPU
+     ``rv730``                   ``r600``     dGPU
+     ``rv770``                   ``r600``     dGPU
      **Radeon HD 5000 Series (Evergreen)** [AMD-RADEON-HD-5000]_
-     --------------------------------------------------------------------
-     cedar                  r600         dGPU
-     redwood                r600         dGPU
-     sumo                   r600         dGPU
-     juniper                r600         dGPU
-     cypress                r600         dGPU
+     -------------------------------------------------------------------------
+     ``cedar``                   ``r600``     dGPU
+     ``redwood``                 ``r600``     dGPU
+     ``sumo``                    ``r600``     dGPU
+     ``juniper``                 ``r600``     dGPU
+     ``cypress``                 ``r600``     dGPU
      **Radeon HD 6000 Series (Northern Islands)** [AMD-RADEON-HD-6000]_
-     --------------------------------------------------------------------
-     barts                  r600         dGPU
-     turks                  r600         dGPU
-     caicos                 r600         dGPU
-     cayman                 r600         dGPU
+     -------------------------------------------------------------------------
+     ``barts``                   ``r600``     dGPU
+     ``turks``                   ``r600``     dGPU
+     ``caicos``                  ``r600``     dGPU
+     ``cayman``                  ``r600``     dGPU
      **GCN GFX6 (Southern Islands (SI))** [AMD-GCN-GFX6]_
-     --------------------------------------------------------------------
-     gfx600     - tahiti    amdgcn       dGPU
-     gfx601     - pitcairn  amdgcn       dGPU
-                - verde
-                - oland
-                - hainan
+     -------------------------------------------------------------------------
+     ``gfx600``  - ``tahiti``    ``amdgcn``   dGPU
+     ``gfx601``  - ``pitcairn``  ``amdgcn``   dGPU
+                 - ``verde``
+                 - ``oland``
+                 - ``hainan``
      **GCN GFX7 (Sea Islands (CI))** [AMD-GCN-GFX7]_
-     --------------------------------------------------------------------
-     gfx700     - bonaire   amdgcn       dGPU          - Radeon HD 7790
-                                                       - Radeon HD 8770
-                                                       - R7 260
-                                                       - R7 260X
-     \          - kaveri    amdgcn       APU           - A6-7000
-                                                       - A6 Pro-7050B
-                                                       - A8-7100
-                                                       - A8 Pro-7150B
-                                                       - A10-7300
-                                                       - A10 Pro-7350B
-                                                       - FX-7500
-                                                       - A8-7200P
-                                                       - A10-7400P
-                                                       - FX-7600P
-     gfx701     - hawaii    amdgcn       dGPU  ROCm    - FirePro W8100
-                                                       - FirePro W9100
-                                                       - FirePro S9150
-                                                       - FirePro S9170
-     gfx702                              dGPU  ROCm    - Radeon R9 290
-                                                       - Radeon R9 290x
-                                                       - Radeon R390
-                                                       - Radeon R390x
-     gfx703     - kabini    amdgcn       APU           - E1-2100
-                - mullins                              - E1-2200
-                                                       - E1-2500
-                                                       - E2-3000
-                                                       - E2-3800
-                                                       - A4-5000
-                                                       - A4-5100
-                                                       - A6-5200
-                                                       - A4 Pro-3340B
+     -------------------------------------------------------------------------
+     ``gfx700``  - ``bonaire``   ``amdgcn``   dGPU          - Radeon HD 7790
+                                                            - Radeon HD 8770
+                                                            - R7 260
+                                                            - R7 260X
+     \           - ``kaveri``    ``amdgcn``   APU           - A6-7000
+                                                            - A6 Pro-7050B
+                                                            - A8-7100
+                                                            - A8 Pro-7150B
+                                                            - A10-7300
+                                                            - A10 Pro-7350B
+                                                            - FX-7500
+                                                            - A8-7200P
+                                                            - A10-7400P
+                                                            - FX-7600P
+     ``gfx701``  - ``hawaii``    ``amdgcn``   dGPU  ROCm    - FirePro W8100
+                                                            - FirePro W9100
+                                                            - FirePro S9150
+                                                            - FirePro S9170
+     ``gfx702``                  ``amdgcn``   dGPU  ROCm    - Radeon R9 290
+                                                            - Radeon R9 290x
+                                                            - Radeon R390
+                                                            - Radeon R390x
+     ``gfx703``  - ``kabini``    ``amdgcn``   APU           - E1-2100
+                 - ``mullins``                              - E1-2200
+                                                            - E1-2500
+                                                            - E2-3000
+                                                            - E2-3800
+                                                            - A4-5000
+                                                            - A4-5100
+                                                            - A6-5200
+                                                            - A4 Pro-3340B
      **GCN GFX8 (Volcanic Islands (VI))** [AMD-GCN-GFX8]_
-     --------------------------------------------------------------------
-     gfx800     - iceland   amdgcn       dGPU          - FirePro S7150
-                                                       - FirePro S7100
-                                                       - FirePro W7100
-                                                       - Radeon R285
-                                                       - Radeon R9 380
-                                                       - Radeon R9 385
-                                                       - Mobile FirePro
-                                                         M7170
-     gfx801     - carrizo   amdgcn       APU           - A6-8500P
-                                                       - Pro A6-8500B
-                                                       - A8-8600P
-                                                       - Pro A8-8600B
-                                                       - FX-8800P
-                                                       - Pro A12-8800B
-     \                      amdgcn       APU   ROCm    - A10-8700P
-                                                       - Pro A10-8700B
-                                                       - A10-8780P
-     \                      amdgcn       APU           - A10-9600P
-                                                       - A10-9630P
-                                                       - A12-9700P
-                                                       - A12-9730P
-                                                       - FX-9800P
-                                                       - FX-9830P
-     \                      amdgcn       APU           - E2-9010
-                                                       - A6-9210
-                                                       - A9-9410
-     gfx802     - tonga     amdgcn       dGPU  ROCm    Same as gfx800
-     gfx803     - fiji      amdgcn       dGPU  ROCm    - Radeon R9 Nano
-                                                       - Radeon R9 Fury
-                                                       - Radeon R9 FuryX
-                                                       - Radeon Pro Duo
-                                                       - FirePro S9300x2
-                                                       - Radeon Instinct MI8
-     \          - polaris10 amdgcn       dGPU  ROCm    - Radeon RX 470
-                                                       - Radeon RX 480
-                                                       - Radeon Instinct MI6
-     \          - polaris11 amdgcn       dGPU  ROCm    - Radeon RX 460
-     gfx804                 amdgcn       dGPU          Same as gfx803
-     gfx810     - stoney    amdgcn       APU
+     -------------------------------------------------------------------------
+     ``gfx800``  - ``iceland``   ``amdgcn``   dGPU          - FirePro S7150
+                                                            - FirePro S7100
+                                                            - FirePro W7100
+                                                            - Radeon R285
+                                                            - Radeon R9 380
+                                                            - Radeon R9 385
+                                                            - Mobile FirePro
+                                                              M7170
+     ``gfx801``  - ``carrizo``   ``amdgcn``   APU           - A6-8500P
+                                                            - Pro A6-8500B
+                                                            - A8-8600P
+                                                            - Pro A8-8600B
+                                                            - FX-8800P
+                                                            - Pro A12-8800B
+     \                           ``amdgcn``   APU   ROCm    - A10-8700P
+                                                            - Pro A10-8700B
+                                                            - A10-8780P
+     \                           ``amdgcn``   APU           - A10-9600P
+                                                            - A10-9630P
+                                                            - A12-9700P
+                                                            - A12-9730P
+                                                            - FX-9800P
+                                                            - FX-9830P
+     \                           ``amdgcn``   APU           - E2-9010
+                                                            - A6-9210
+                                                            - A9-9410
+     ``gfx802``  - ``tonga``     ``amdgcn``   dGPU  ROCm    Same as gfx800
+     ``gfx803``  - ``fiji``      ``amdgcn``   dGPU  ROCm    - Radeon R9 Nano
+                                                            - Radeon R9 Fury
+                                                            - Radeon R9 FuryX
+                                                            - Radeon Pro Duo
+                                                            - FirePro S9300x2
+                                                            - Radeon Instinct MI8
+     \           - ``polaris10`` ``amdgcn``   dGPU  ROCm    - Radeon RX 470
+                                                            - Radeon RX 480
+                                                            - Radeon Instinct MI6
+     \           - ``polaris11`` ``amdgcn``   dGPU  ROCm    - Radeon RX 460
+     ``gfx810``  - ``stoney``    ``amdgcn``   APU
      **GCN GFX9** [AMD-GCN-GFX9]_
-     --------------------------------------------------------------------
-     gfx900                 amdgcn       dGPU          - Radeon Vega
-                                                         Frontier Edition
-                                                       - Radeon RX Vega 56
-                                                       - Radeon RX Vega 64
-                                                       - Radeon RX Vega 64
-                                                         Liquid
-                                                       - Radeon Instinct MI25
-     gfx901                 amdgcn       dGPU  ROCm    Same as gfx900
-                                                       except XNACK is
-                                                       enabled
-     gfx902                 amdgcn       APU           *TBA*
-
-                                                       .. TODO
-                                                          Add product
-                                                          names.
-     gfx903                 amdgcn       APU           Same as gfx902
-                                                       except XNACK is
-                                                       enabled
-     ========== =========== ============ ===== ======= ==================
+     -------------------------------------------------------------------------
+     ``gfx900``                  ``amdgcn``   dGPU  ROCm    - Radeon Vega
+                                                              Frontier Edition
+                                                            - Radeon RX Vega 56
+                                                            - Radeon RX Vega 64
+                                                            - Radeon RX Vega 64
+                                                              Liquid
+                                                            - Radeon Instinct MI25
+     ``gfx902``                  ``amdgcn``   APU           *TBA*
+
+                                                            .. TODO
+                                                               Add product
+                                                               names.
+     =========== =============== ============ ===== ======= ==================
+
+.. _amdgpu-target-features:
+
+Target Features
+---------------
+
+Target features control how code is generated to support certain
+features. Not all target features are supported by all processors. The
+runtime must ensure that the features supported by the device used to
+execute the code match the features enabled when generating the
+code. A mismatch of features may result in incorrect execution, or a
+reduction in performance.
+
+Use the ``clang -m[no-]<TargetFeature>`` option to specify the AMD GPU
+target features.
+
+For example:
+
+``-mxnack``
+  Enable the *XNACK* feature.
+``-mno-xnack``
+  Disable the *XNACK* feature.
+
+  .. table:: AMDGPU Target Features
+     :name: amdgpu-target-feature-table
+
+     ============== ======== ==================================================
+     Target Feature Default  Description
+     ============== ======== ==================================================
+     -m[no-]xnack   disabled Enable/disable generating code that has
+                             memory clauses that are compatible with
+                             having XNACK replay enabled.
+
+                             This is used for demand paging and page
+                             migration. If XNACK replay is enabled in
+                             the device, then if a page fault occurs
+                             the code may execute incorrectly if the
+                             XNACK feature is not enabled. Executing
+                             code that has the feature enabled on a
+                             device that does not have XNACK replay
+                             enabled will execute correctly, but may
+                             be less performant than code with the
+                             feature disabled.
+
+                             This feature is supported by the
+                             ``amdgcn`` architecture for GFX7-GFX9.
+     ============== ======== ==================================================
 
 .. _amdgpu-address-spaces:
 
@@ -268,14 +313,14 @@ The memory model supported is based on the HSA memory model [HSA]_ which is
 based in turn on HRF-indirect with scope inclusion [HRF]_. The happens-before
 relation is transitive over the synchonizes-with relation independent of scope,
 and synchonizes-with allows the memory scope instances to be inclusive (see
-table :ref:`amdgpu-amdhsa-llvm-sync-scopes-amdhsa-table`).
+table :ref:`amdgpu-amdhsa-llvm-sync-scopes-table`).
 
 This is different to the OpenCL [OpenCL]_ memory model which does not have scope
 inclusion and requires the memory scopes to exactly match. However, this
 is conservatively correct for OpenCL.
 
-  .. table:: AMDHSA LLVM Sync Scopes for AMDHSA
-     :name: amdgpu-amdhsa-llvm-sync-scopes-amdhsa-table
+  .. table:: AMDHSA LLVM Sync Scopes
+     :name: amdgpu-amdhsa-llvm-sync-scopes-table
 
      ================ ==========================================================
      LLVM Sync Scope  Description
@@ -364,16 +409,18 @@ The AMDGPU backend uses the following ELF header:
      ========================== ===============================
      ``e_ident[EI_CLASS]``      ``ELFCLASS64``
      ``e_ident[EI_DATA]``       ``ELFDATA2LSB``
-     ``e_ident[EI_OSABI]``      ``ELFOSABI_AMDGPU_HSA``,
-                                ``ELFOSABI_AMDGPU_PAL`` or
-                                ``ELFOSABI_AMDGPU_MESA3D``
-     ``e_ident[EI_ABIVERSION]`` ``ELFABIVERSION_AMDGPU_HSA``,
-                                ``ELFABIVERSION_AMDGPU_PAL`` or
-                                ``ELFABIVERSION_AMDGPU_MESA3D``
-     ``e_type``                 ``ET_REL`` or ``ET_DYN``
+     ``e_ident[EI_OSABI]``      - ``ELFOSABI_NONE``
+                                - ``ELFOSABI_AMDGPU_HSA``
+                                - ``ELFOSABI_AMDGPU_PAL``
+                                - ``ELFOSABI_AMDGPU_MESA3D``
+     ``e_ident[EI_ABIVERSION]`` - ``ELFABIVERSION_AMDGPU_HSA``
+                                - ``ELFABIVERSION_AMDGPU_PAL``
+                                - ``ELFABIVERSION_AMDGPU_MESA3D``
+     ``e_type``                 - ``ET_REL``
+                                - ``ET_DYN``
      ``e_machine``              ``EM_AMDGPU``
      ``e_entry``                0
-     ``e_flags``                0
+     ``e_flags``                See :ref:`amdgpu-elf-header-e_flags-table`
      ========================== ===============================
 
 ..
@@ -385,6 +432,7 @@ The AMDGPU backend uses the following ELF header:
      Name                            Value
      =============================== =====
      ``EM_AMDGPU``                   224
+     ``ELFOSABI_NONE``               0
      ``ELFOSABI_AMDGPU_HSA``         64
      ``ELFOSABI_AMDGPU_PAL``         65
      ``ELFOSABI_AMDGPU_MESA3D``      66
@@ -394,23 +442,27 @@ The AMDGPU backend uses the following ELF header:
      =============================== =====
 
 ``e_ident[EI_CLASS]``
-  The ELF class is always ``ELFCLASS64``. The AMDGPU backend only supports 64
-  bit applications.
+  The ELF class is:
+
+  * ``ELFCLASS32`` for ``r600`` architecture.
+
+  * ``ELFCLASS64`` for ``amdgcn`` architecture which only supports 64
+    bit applications.
 
 ``e_ident[EI_DATA]``
-  All AMDGPU targets use ELFDATA2LSB for little-endian byte ordering.
+  All AMDGPU targets use ``ELFDATA2LSB`` for little-endian byte ordering.
 
 ``e_ident[EI_OSABI]``
-  One of the following AMD GPU architecture specific OS ABIs:
+  One of the following AMD GPU architecture specific OS ABIs
+  (see :ref:`amdgpu-os-table`):
 
-  * ``ELFOSABI_AMDGPU_HSA`` is used to specify that the code object conforms to
-    the AMD HSA runtime ABI [HSA]_.
+  * ``ELFOSABI_NONE`` for *unknown* OS.
 
-  * ``ELFOSABI_AMDGPU_PAL`` is used to specify that the code object conforms to
-    the AMD PAL runtime ABI.
+  * ``ELFOSABI_AMDGPU_HSA`` for ``amdhsa`` OS.
 
-  * ``ELFOSABI_AMDGPU_MESA3D`` is used to specify that the code object conforms
-    to the AMD MESA runtime ABI.
+  * ``ELFOSABI_AMDGPU_PAL`` for ``amdpal`` OS.
+
+  * ``ELFOSABI_AMDGPU_MESA3D`` for ``mesa3D`` OS.
 
 ``e_ident[EI_ABIVERSION]``
   The ABI version of the AMD GPU architecture specific OS ABI to which the code
@@ -423,7 +475,7 @@ The AMDGPU backend uses the following ELF header:
     runtime ABI.
 
   * ``ELFABIVERSION_AMDGPU_MESA3D`` is used to specify the version of AMD MESA
-    runtime ABI.
+    3D runtime ABI.
 
 ``e_type``
   Can be one of the following values:
@@ -439,17 +491,74 @@ The AMDGPU backend uses the following ELF header:
   The AMD HSA runtime loader requires a ``ET_DYN`` code object.
 
 ``e_machine``
-  The value ``EM_AMDGPU`` is used for the machine for all members of the AMD GPU
-  architecture family. The specific member is specified in the
-  ``NT_AMD_AMDGPU_ISA`` entry in the ``.note`` section (see
-  :ref:`amdgpu-note-records`).
+  The value ``EM_AMDGPU`` is used for the machine for all processors supported
+  by the ``r600`` and ``amdgcn`` architectures (see
+  :ref:`amdgpu-processor-table`). The specific processor is specified in the
+  ``EF_AMDGPU_MACH`` bit field of the ``e_flags`` (see
+  :ref:`amdgpu-elf-header-e_flags-table`).
 
 ``e_entry``
   The entry point is 0 as the entry points for individual kernels must be
   selected in order to invoke them through AQL packets.
 
 ``e_flags``
-  The value is 0 as no flags are used.
+  The AMDGPU backend uses the following ELF header flags:
+
+  .. table:: AMDGPU ELF Header ``e_flags``
+     :name: amdgpu-elf-header-e_flags-table
+
+     ================================= ========== =============================
+     Name                              Value      Description
+     ================================= ========== =============================
+     **AMDGPU Processor Flag**                    See :ref:`amdgpu-processor-table`.
+     -------------------------------------------- -----------------------------
+     ``EF_AMDGPU_MACH``                0x000000ff AMDGPU processor selection
+                                                  mask for
+                                                  ``EF_AMDGPU_MACH_xxx`` values
+                                                  defined in
+                                                  :ref:`amdgpu-ef-amdgpu-mach-table`.
+     ================================= ========== =============================
+
+  .. table:: AMDGPU ``EF_AMDGPU_MACH`` Values
+     :name: amdgpu-ef-amdgpu-mach-table
+
+     ================================= ========== =============================
+     Name                              Value      Description (see
+                                                  :ref:`amdgpu-processor-table`)
+     ================================= ========== =============================
+     ``EF_AMDGPU_MACH_NONE``           0          *not specified*
+     ``EF_AMDGPU_MACH_R600_R600``      1          ``r600``
+     ``EF_AMDGPU_MACH_R600_R630``      2          ``r630``
+     ``EF_AMDGPU_MACH_R600_RS880``     3          ``rs880``
+     ``EF_AMDGPU_MACH_R600_RV670``     4          ``rv670``
+     ``EF_AMDGPU_MACH_R600_RV710``     5          ``rv710``
+     ``EF_AMDGPU_MACH_R600_RV730``     6          ``rv730``
+     ``EF_AMDGPU_MACH_R600_RV770``     7          ``rv770``
+     ``EF_AMDGPU_MACH_R600_CEDAR``     8          ``cedar``
+     ``EF_AMDGPU_MACH_R600_REDWOOD``   9          ``redwood``
+     ``EF_AMDGPU_MACH_R600_SUMO``      10         ``sumo``
+     ``EF_AMDGPU_MACH_R600_JUNIPER``   11         ``juniper``
+     ``EF_AMDGPU_MACH_R600_CYPRESS``   12         ``cypress``
+     ``EF_AMDGPU_MACH_R600_BARTS``     13         ``barts``
+     ``EF_AMDGPU_MACH_R600_TURKS``     14         ``turks``
+     ``EF_AMDGPU_MACH_R600_CAICOS``    15         ``caicos``
+     ``EF_AMDGPU_MACH_R600_CAYMAN``    16         ``cayman``
+     *reserved*                        17-31      Reserved for ``r600``
+                                                  architecture processors.
+     ``EF_AMDGPU_MACH_AMDGCN_GFX600``  32         ``gfx600``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX601``  33         ``gfx601``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX700``  34         ``gfx700``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX701``  35         ``gfx701``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX702``  36         ``gfx702``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX703``  37         ``gfx703``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX800``  38         ``gfx800``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX801``  39         ``gfx801``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX802``  40         ``gfx802``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX803``  41         ``gfx803``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX810``  42         ``gfx810``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX900``  43         ``gfx900``
+     ``EF_AMDGPU_MACH_AMDGCN_GFX902``  44         ``gfx902``
+     ================================= ========== =============================
 
 Sections
 --------
@@ -515,11 +624,11 @@ if needed.
 Note Records
 ------------
 
-As required by ``ELFCLASS64``, minimal zero byte padding must be generated after
-the ``name`` field to ensure the ``desc`` field is 4 byte aligned. In addition,
-minimal zero byte padding must be generated to ensure the ``desc`` field size is
-a multiple of 4 bytes. The ``sh_addralign`` field of the ``.note`` section must
-be at least 4 to indicate at least 8 byte alignment.
+As required by ``ELFCLASS32`` and ``ELFCLASS64``, minimal zero byte padding must
+be generated after the ``name`` field to ensure the ``desc`` field is 4 byte
+aligned. In addition, minimal zero byte padding must be generated to ensure the
+``desc`` field size is a multiple of 4 bytes. The ``sh_addralign`` field of the
+``.note`` section must be at least 4 to indicate at least 8 byte alignment.
 
 The AMDGPU backend code object uses the following ELF note records in the
 ``.note`` section. The *Description* column specifies the layout of the note
@@ -537,7 +646,6 @@ Additional note records can be present.
      Name  Type                           Description
      ===== ============================== ======================================
      "AMD" ``NT_AMD_AMDGPU_HSA_METADATA`` <metadata null terminated string>
-     "AMD" ``NT_AMD_AMDGPU_ISA``          <isa name null terminated string>
      ===== ============================== ======================================
 
 ..
@@ -550,49 +658,9 @@ Additional note records can be present.
      ============================== =====
      *reserved*                       0-9
      ``NT_AMD_AMDGPU_HSA_METADATA``    10
-     ``NT_AMD_AMDGPU_ISA``             11
+     *reserved*                        11
      ============================== =====
 
-``NT_AMD_AMDGPU_ISA``
-  Specifies the instruction set architecture used by the machine code contained
-  in the code object.
-
-  This note record is required for code objects containing machine code for
-  processors matching the ``amdgcn`` architecture in table
-  :ref:`amdgpu-processors`.
-
-  The null terminated string has the following syntax:
-
-    *architecture*\ ``-``\ *vendor*\ ``-``\ *os*\ ``-``\ *environment*\ ``-``\ *processor*
-
-  where:
-
-    *architecture*
-      The architecture from table :ref:`amdgpu-target-triples-table`.
-
-      This is always ``amdgcn`` when the target triple OS is ``amdhsa`` (see
-      :ref:`amdgpu-target-triples`).
-
-    *vendor*
-      The vendor from table :ref:`amdgpu-target-triples-table`.
-
-      For the AMDGPU backend this is always ``amd``.
-
-    *os*
-      The OS from table :ref:`amdgpu-target-triples-table`.
-
-    *environment*
-      An environment from table :ref:`amdgpu-target-triples-table`, or blank if
-      the environment has no affect on the execution of the code object.
-
-      For the AMDGPU backend this is currently always blank.
-    *processor*
-      The processor from table :ref:`amdgpu-processors-table`.
-
-  For example:
-
-    ``amdgcn-amd-amdhsa--gfx901``
-
 ``NT_AMD_AMDGPU_HSA_METADATA``
   Specifies extensible metadata associated with the code objects executed on HSA
   [HSA]_ compatible runtimes such as AMD's ROCm [AMD-ROCm]_. It is required when
@@ -904,10 +972,6 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                 the kernel code. See
                                                 :ref:`amdgpu-amdhsa-code-object-kernel-code-properties-metadata-mapping-table`
                                                 for the mapping definition.
-     "DebugProps"      mapping                  Mapping of properties related to
-                                                the kernel debugging. See
-                                                :ref:`amdgpu-amdhsa-code-object-kernel-debug-properties-metadata-mapping-table`
-                                                for the mapping definition.
      ================= ============== ========= ================================
 
 ..
@@ -1175,9 +1239,9 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                            private address space
                                                            memory required for a
                                                            work-item in
-                                                           bytes. If
-                                                           IsDynamicCallstack
-                                                           is 1 then additional
+                                                           bytes. If the kernel
+                                                           uses a dynamic call
+                                                           stack then additional
                                                            space must be added
                                                            to this value for the
                                                            call stack.
@@ -1188,7 +1252,7 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                            be a power of 2.
      "WavefrontSize"              integer        Required  Wavefront size. Must
                                                            be a power of 2.
-     "NumSGPRs"                   integer                  Number of scalar
+     "NumSGPRs"                   integer        Required  Number of scalar
                                                            registers used by a
                                                            wavefront for
                                                            GFX6-GFX9. This
@@ -1204,44 +1268,37 @@ non-AMD key names should be prefixed by "*vendor-name*.".
                                                            rounded up to the
                                                            allocation
                                                            granularity.
-     "NumVGPRs"                   integer                  Number of vector
+     "NumVGPRs"                   integer        Required  Number of vector
                                                            registers used by
                                                            each work-item for
                                                            GFX6-GFX9
-     "MaxFlatWorkGroupSize"       integer                  Maximum flat
+     "MaxFlatWorkGroupSize"       integer        Required  Maximum flat
                                                            work-group size
                                                            supported by the
                                                            kernel in work-items.
-     "IsDynamicCallStack"         boolean                  Indicates if the
-                                                           generated machine
-                                                           code is using a
-                                                           dynamically sized
-                                                           call stack.
+                                                           Must be >=1 and
+                                                           consistent with any
+                                                           non-0 values in
+                                                           FixedWorkGroupSize.
+     "FixedWorkGroupSize"         sequence of              Corresponds to the
+                                  3 integers               dispatch work-group
+                                                           size X, Y, Z. If
+                                                           omitted, defaults to
+                                                           0, 0, 0. If an
+                                                           element is non-0 then
+                                                           the kernel must only
+                                                           be launched with a
+                                                           matching corresponding
+                                                           work-group size.
      "IsXNACKEnabled"             boolean                  Indicates if the
                                                            generated machine
                                                            code is capable of
-                                                           supporting XNACK.
+                                                           supporting XNACK. See
+                                                           :ref:`amdgpu-target-features`.
      ============================ ============== ========= =====================
 
 ..
 
-  .. table:: AMDHSA Code Object Kernel Debug Properties Metadata Mapping
-     :name: amdgpu-amdhsa-code-object-kernel-debug-properties-metadata-mapping-table
-
-     =================================== ============== ========= ==============
-     String Key                          Value Type     Required? Description
-     =================================== ============== ========= ==============
-     "DebuggerABIVersion"                sequence of
-                                         2 integers
-     "ReservedNumVGPRs"                  integer
-     "ReservedFirstVGPR"                 integer
-     "PrivateSegmentBufferSGPR"          integer
-     "WavefrontPrivateSegmentOffsetSGPR" integer
-     =================================== ============== ========= ==============
-
-.. TODO
-   Plan to remove the debug properties metadata.
-
 Kernel Dispatch
 ~~~~~~~~~~~~~~~
 
@@ -1451,17 +1508,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                      then additional space must
                                                      be added to this value for
                                                      the call stack.
-     95:64   4 bytes MaxFlatWorkGroupSize            Maximum flat work-group
-                                                     size supported by the
-                                                     kernel in work-items.
-     96      1 bit   IsDynamicCallStack              Indicates if the generated
-                                                     machine code is using a
-                                                     dynamically sized call
-                                                     stack.
-     97      1 bit   IsXNACKEnabled                  Indicates if the generated
-                                                     machine code is capable of
-                                                     suppoting XNACK.
-     127:98  30 bits                                 Reserved, must be 0.
+     127:64  8 bytes                                 Reserved, must be 0.
      191:128 8 bytes KernelCodeEntryByteOffset       Byte offset (possibly
                                                      negative) from base
                                                      address of kernel
@@ -1469,7 +1516,30 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                      entry point instruction
                                                      which must be 256 byte
                                                      aligned.
-     383:192 24                                      Reserved, must be 0.
+     223:192 4 bytes MaxFlatWorkGroupSize            Maximum flat work-group
+                                                     size supported by the
+                                                     kernel in work-items. If
+                                                     an exact work-group size
+                                                     is required then must be
+                                                     omitted or 0 and
+                                                     ReqdWorkGroupSize* must
+                                                     be set to non-0.
+     239:224 2 bytes ReqdWorkGroupSizeX              If present and non-0 then
+                                                     the kernel
+                                                     must be executed with the
+                                                     specified work-group size
+                                                     for X.
+     255:240 2 bytes ReqdWorkGroupSizeY              If present and non-0 then
+                                                     the kernel
+                                                     must be executed with the
+                                                     specified work-group size
+                                                     for Y.
+     271:256 2 bytes ReqdWorkGroupSizeZ              If present and non-0 then
+                                                     the kernel
+                                                     must be executed with the
+                                                     specified work-group size
+                                                     for Z.
+     383:271 14                                      Reserved, must be 0.
              bytes
      415:384 4 bytes ComputePgmRsrc1                 Compute Shader (CS)
                                                      program settings used by
@@ -1509,7 +1579,10 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                      should always be 0.
      457     1 bit   EnableSGPRGridWorkgroupCountZ   Not implemented in CP and
                                                      should always be 0.
-     463:458 6 bits                                  Reserved, must be 0.
+     462:458 5 bits                                  Reserved, must be 0.
+     463     1 bit   IsXNACKEnabled                  Indicates if the generated
+                                                     machine code is capable of
+                                                     supporting XNACK.
      511:464 6                                       Reserved, must be 0.
              bytes
      512     **Total size 64 bytes.**
@@ -1528,7 +1601,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                      granularity is device
                                                      specific:
 
-                                                     GFX6-9
+                                                     GFX6-GFX9
                                                        - max_vgpr 1..256
                                                        - roundup((max_vgpg + 1)
                                                          / 4) - 1
@@ -1540,7 +1613,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                      granularity is device
                                                      specific:
 
-                                                     GFX6-8
+                                                     GFX6-GFX8
                                                        - max_sgpr 1..112
                                                        - roundup((max_sgpg + 1)
                                                          / 8) - 1
@@ -1679,7 +1752,7 @@ CP microcode requires the Kernel descritor to be allocated on 64 byte alignment.
                                                      CP is responsible for
                                                      filling in
                                                      ``COMPUTE_PGM_RSRC1.CDBG_USER``.
-     26      1 bit   FP16_OVFL                       GFX6-8
+     26      1 bit   FP16_OVFL                       GFX6-GFX8
                                                        Reserved, must be 0.
                                                      GFX9
                                                        Wavefront starts execution
@@ -2166,7 +2239,7 @@ Flat Scratch register pair are adjacent SGRRs so they can be moved as a 64 bit
 value to the hardware required SGPRn-3 and SGPRn-4 respectively.
 
 The global segment can be accessed either using buffer instructions (GFX6 which
-has V# 64 bit address support), flat instructions (GFX7-9), or global
+has V# 64 bit address support), flat instructions (GFX7-GFX9), or global
 instructions (GFX9).
 
 If buffer operations are used then the compiler can generate a V# with the
@@ -2212,7 +2285,7 @@ Offset SGPR registers (see :ref:`amdgpu-amdhsa-initial-kernel-execution-state`):
 GFX6
   Flat scratch is not supported.
 
-GFX7-8
+GFX7-GFX8
   1. The low word of Flat Scratch Init is 32 bit byte offset from
      ``SH_HIDDEN_PRIVATE_BASE_VIMID`` to the base of scratch backing memory
      being managed by SPI for the queue executing the kernel dispatch. This is
@@ -2305,7 +2378,7 @@ For GFX6-GFX9:
   same wavefront.
 * The vector memory operations are performed as wavefront wide operations and
   completion is reported to a wavefront in execution order. The exception is
-  that for GFX7-9 ``flat_load/store/atomic`` instructions can report out of
+  that for GFX7-GFX9 ``flat_load/store/atomic`` instructions can report out of
   vector memory order if they access LDS memory, and out of LDS operation order
   if they access global memory.
 * The vector memory operations access a single vector L1 cache shared by all
@@ -2332,7 +2405,7 @@ For GFX6-GFX9:
 * The L2 cache can be kept coherent with other agents on some targets, or ranges
   of virtual addresses can be set up to bypass it to ensure system coherence.
 
-Private address space uses ``buffer_load/store`` using the scratch V# (GFX6-8),
+Private address space uses ``buffer_load/store`` using the scratch V# (GFX6-GFX8),
 or ``scratch_load/store`` (GFX9). Since only a single thread is accessing the
 memory, atomic memory orderings are not meaningful and all accesses are treated
 as non-atomic.
author	Tony Tye <Tony.Tye@amd.com>	2017-11-10 01:00:54 +0000
committer	Tony Tye <Tony.Tye@amd.com>	2017-11-10 01:00:54 +0000
commit	9000e8c60d07871233823d8a76a67121cdbee9dd (patch)
tree	9196616e8320b7aa0a570eb0aca5d46a1bd851f4 /docs
parent	8887214dfa91e035edf1b1f884651c52b3bd54d4 (diff)