summaryrefslogtreecommitdiff
path: root/docs/PDB/DbiStream.rst
diff options
context:
space:
mode:
authorZachary Turner <zturner@google.com>2016-11-14 17:59:28 +0000
committerZachary Turner <zturner@google.com>2016-11-14 17:59:28 +0000
commit576eea861d3acf2909e1122797624347836da49f (patch)
treea0aa4817ca7024bf245118f497aeb52730d1b6ca /docs/PDB/DbiStream.rst
parent3b7c9b9cdb4f15e7cacf17a99fd97a1400ebc329 (diff)
[PDB] Add documentation for the DBI Stream.
Differential Revision: https://reviews.llvm.org/D26552 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@286853 91177308-0d34-0410-b5e6-96231b3b80d8
Diffstat (limited to 'docs/PDB/DbiStream.rst')
-rw-r--r--docs/PDB/DbiStream.rst448
1 files changed, 445 insertions, 3 deletions
diff --git a/docs/PDB/DbiStream.rst b/docs/PDB/DbiStream.rst
index 0a247a13c05..fec0e29ae53 100644
--- a/docs/PDB/DbiStream.rst
+++ b/docs/PDB/DbiStream.rst
@@ -1,3 +1,445 @@
-=====================================
-The PDB DBI (Debug Info) Stream
-=====================================
+=====================================
+The PDB DBI (Debug Info) Stream
+=====================================
+
+.. contents::
+ :local:
+
+.. _dbi_intro:
+
+Introduction
+============
+
+The PDB DBI Stream (Index 3) is one of the largest and most important streams
+in a PDB file. It contains information about how the program was compiled,
+(e.g. compilation flags, etc), the compilands (e.g. object files) that
+were used to link together the program, the source files which were used
+to build the program, as well as references to other streams that contain more
+detailed information about each compiland, such as the CodeView symbol records
+contained within each compiland and the source and line information for
+functions and other symbols within each compiland.
+
+
+.. _dbi_header:
+
+Stream Header
+=============
+At offset 0 of the DBI Stream is a header with the following layout:
+
+
+.. code-block:: c++
+
+ struct DbiStreamHeader {
+ int32_t VersionSignature;
+ uint32_t VersionHeader;
+ uint32_t Age;
+ uint16_t GlobalStreamIndex;
+ uint16_t BuildNumber;
+ uint16_t PublicStreamIndex;
+ uint16_t PdbDllVersion;
+ uint16_t SymRecordStream;
+ uint16_t PdbDllRbld;
+ int32_t ModInfoSize;
+ int32_t SectionContributionSize;
+ int32_t SectionMapSize;
+ int32_t SourceInfoSize;
+ int32_t TypeServerSize;
+ uint32_t MFCTypeServerIndex;
+ int32_t OptionalDbgHeaderSize;
+ int32_t ECSubstreamSize;
+ uint16_t Flags;
+ uint16_t Machine;
+ uint32_t Padding;
+ };
+
+- **VersionSignature** - Unknown meaning. Appears to always be ``-1``.
+
+- **VersionHeader** - A value from the following enum.
+
+.. code-block:: c++
+
+ enum class DbiStreamVersion : uint32_t {
+ VC41 = 930803,
+ V50 = 19960307,
+ V60 = 19970606,
+ V70 = 19990903,
+ V110 = 20091201
+ };
+
+Similar to the :doc:`PDB Stream <PdbStream>`, this value always appears to be
+``V70``, and it is not clear what the other values are for.
+
+- **Age** - The number of times the PDB has been written. Equal to the same
+ field from the :ref:`PDB Stream header <pdb_stream_header>`.
+
+- **GlobalStreamIndex** - The index of the :doc:`Global Symbol Stream <GlobalStream>`,
+ which contains CodeView symbol records for all global symbols. Actual records
+ are stored in the symbol record stream, and are referenced from this stream.
+
+- **BuildNumber** - A bitfield containing values representing the major and minor
+ version number of the toolchain (e.g. 12.0 for MSVC 2013) used to build the
+ program, with the following layout:
+
+.. code-block:: c++
+
+ uint16_t MinorVersion : 8;
+ uint16_t MajorVersion : 7;
+ uint16_t NewVersionFormat : 1;
+
+For the purposes of LLVM, we assume ``NewVersionFormat`` to be always ``true``.
+If it is ``false``, the layout above does not apply and the reader should consult
+the `Microsoft Source Code <https://github.com/Microsoft/microsoft-pdb>`__ for
+further guidance.
+
+- **PublicStreamIndex** - The index of the :doc:`Public Symbol Stream <PublicStream>`,
+ which contains CodeView symbol records for all public symbols. Actual records
+ are stored in the symbol record stream, and are referenced from this stream.
+
+- **PdbDllVersion** - The version number of ``mspdbXXXX.dll`` used to produce this
+ PDB. Note this obviously does not apply for LLVM as LLVM does not use ``mspdb.dll``.
+
+- **SymRecordStream** - The stream containing all CodeView symbol records used
+ by the program. This is used for deduplication, so that many different
+ compilands can refer to the same symbols without having to include the full record
+ content inside of each module stream.
+
+- **PdbDllRbld** - Unknown
+
+- **MFCTypeServerIndex** - The length of the :ref:dbi_mfc_type_server_substream
+
+- **Flags** - A bitfield with the following layout, containing various
+ information about how the program was built:
+
+.. code-block:: c++
+
+ uint16_t WasIncrementallyLinked : 1;
+ uint16_t ArePrivateSymbolsStripped : 1;
+ uint16_t HasConflictingTypes : 1;
+ uint16_t Reserved : 13;
+
+The only one of these that is not self-explanatory is ``HasConflictingTypes``.
+Although undocumented, ``link.exe`` contains a hidden flag ``/DEBUG:CTYPES``.
+If it is passed to ``link.exe``, this field will be set. Otherwise it will
+not be set. It is unclear what this flag does, although it seems to have
+subtle implications on the algorithm used to look up type records.
+
+- **Machine** - A value from the `CV_CPU_TYPE_e <https://msdn.microsoft.com/en-us/library/b2fc64ek.aspx>`__
+ enumeration. Common values are ``0x8664`` (x86-64) and ``0x14C`` (x86).
+
+Immediately after the fixed-size DBI Stream header are ``7`` variable-length
+`substreams`. The following ``7`` fields of the DBI Stream header specify the
+number of bytes of the corresponding substream. Each substream's contents will
+be described in detail :ref:`below <dbi_substreams>`. The length of the entire
+DBI Stream should equal ``64`` (the length of the header above) plus the value
+of each of the following ``7`` fields.
+
+- **ModInfoSize** - The length of the :ref:`dbi_mod_info_substream`.
+
+- **SectionContributionSize** - The length of the :ref:`dbi_sec_contr_substream`.
+
+- **SectionMapSize** - The length of the :ref:`dbi_section_map_substream`.
+
+- **SourceInfoSize** - The length of the :ref:`dbi_file_info_substream`.
+
+- **TypeServerSize** - The length of the :ref:`dbi_type_server_substream`.
+
+- **OptionalDbgHeaderSize** - The length of the :ref:`dbi_optional_dbg_stream`.
+
+- **ECSubstreamSize** - The length of the :ref:`dbi_ec_substream`.
+
+.. _dbi_substreams:
+
+Substreams
+==========
+
+.. _dbi_mod_info_substream:
+
+Module Info Substream
+^^^^^^^^^^^^^^^^^^^^^
+
+Begins at offset ``0`` immediately after the :ref:`header <dbi_header>`. The
+module info substream is an array of variable-length records, each one
+describing a single module (e.g. object file) linked into the program. Each
+record in the array has the format:
+
+.. code-block:: c++
+
+ struct SectionContribEntry {
+ uint16_t Section;
+ char Padding1[2];
+ int32_t Offset;
+ int32_t Size;
+ uint32_t Characteristics;
+ uint16_t ModuleIndex;
+ char Padding2[2];
+ uint32_t DataCrc;
+ uint32_t RelocCrc;
+ };
+
+While most of these are self-explanatory, the ``Characteristics`` field
+warrants some elaboration. It corresponds to the ``Characteristics``
+field of the `IMAGE_SECTION_HEADER <https://msdn.microsoft.com/en-us/library/windows/desktop/ms680341(v=vs.85).aspx>`__
+structure.
+
+.. code-block:: c++
+
+ struct ModInfo {
+ uint32_t Unused1;
+ SectionContribEntry SectionContr;
+ uint16_t Flags;
+ uint16_t ModuleSymStream;
+ uint32_t SymByteSize;
+ uint32_t C11ByteSize;
+ uint32_t C13ByteSize;
+ uint16_t SourceFileCount;
+ char Padding[2];
+ uint32_t Unused2;
+ uint32_t SourceFileNameIndex;
+ uint32_t PdbFilePathNameIndex;
+ char ModuleName[];
+ char ObjFileName[];
+ };
+
+- **SectionContr** - Describes the properties of the section in the final binary
+ which contain the code and data from this module.
+
+- **Flags** - A bitfield with the following format:
+
+.. code-block:: c++
+
+ uint16_t Dirty : 1; // ``true`` if this ModInfo has been written since reading the PDB.
+ uint16_t EC : 1; // ``true`` if EC information is present for this module. It is unknown what EC actually is.
+ uint16_t Unused : 6;
+ uint16_t TSM : 8; // Type Server Index for this module. It is unknown what this is used for, but it is not used by LLVM.
+
+
+- **ModuleSymStream** - The index of the stream that contains symbol information
+ for this module. This includes CodeView symbol information as well as source
+ and line information.
+
+- **SymByteSize** - The number of bytes of data from the stream identified by
+ ``ModuleSymStream`` that represent CodeView symbol records.
+
+- **C11ByteSize** - The number of bytes of data from the stream identified by
+ ``ModuleSymStream`` that represent C11-style CodeView line information.
+
+- **C13ByteSize** - The number of bytes of data from the stream identified by
+ ``ModuleSymStream`` that represent C13-style CodeView line information. At
+ most one of ``C11ByteSize`` and ``C13ByteSize`` will be non-zero.
+
+- **SourceFileCount** - The number of source files that contributed to this
+ module during compilation.
+
+- **SourceFileNameIndex** - The offset in the names buffer of the primary
+ translation unit used to build this module. All PDB files observed to date
+ always have this value equal to 0.
+
+- **PdbFilePathNameIndex** - The offset in the names buffer of the PDB file
+ containing this module's symbol information. This has only been observed
+ to be non-zero for the special ``* Linker *`` module.
+
+- **ModuleName** - The module name. This is usually either a full path to an
+ object file (either directly passed to ``link.exe`` or from an archive) or
+ a string of the form ``Import:<dll name>``.
+
+- **ObjFileName** - The object file name. In the case of an module that is
+ linked directly passed to ``link.exe``, this is the same as **ModuleName**.
+ In the case of a module that comes from an archive, this is usually the full
+ path to the archive.
+
+.. _dbi_sec_contr_substream:
+
+Section Contribution Substream
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Begins at offset ``0`` immediately after the :ref:`dbi_mod_info_substream` ends,
+and consumes ``Header->SectionContributionSize`` bytes. This substream begins
+with a single ``uint32_t`` which will be one of the following values:
+
+.. code-block:: c++
+
+ enum class SectionContrSubstreamVersion : uint32_t {
+ Ver60 = 0xeffe0000 + 19970605,
+ V2 = 0xeffe0000 + 20140516
+ };
+
+``Ver60`` is the only value which has been observed in a PDB so far. Following
+this ``4`` byte field is an array of fixed-length structures. If the version
+is ``Ver60``, it is an array of ``SectionContribEntry`` structures. If the
+version is ``V2``, it is an array of ``SectionContribEntry2`` structures,
+defined as follows:
+
+.. code-block:: c++
+
+ struct SectionContribEntry2 {
+ SectionContribEntry SC;
+ uint32_t ISectCoff;
+ };
+
+The purpose of the second field is not well understood.
+
+
+.. _dbi_section_map_substream:
+
+Section Map Substream
+^^^^^^^^^^^^^^^^^^^^^
+Begins at offset ``0`` immediately after the :ref:`dbi_sec_contr_substream` ends,
+and consumes ``Header->SectionMapSize`` bytes. This substream begins with an ``8``
+byte header followed by an array of fixed-length records. The header and records
+have the following layout:
+
+.. code-block:: c++
+
+ struct SectionMapHeader {
+ uint16_t Count; // Number of segment descriptors
+ uint16_t LogCount; // Number of logical segment descriptors
+ };
+
+ struct SectionMapEntry {
+ uint16_t Flags; // See the SectionMapEntryFlags enum below.
+ uint16_t Ovl; // Logical overlay number
+ uint16_t Group; // Group index into descriptor array.
+ uint16_t Frame;
+ uint16_t SectionName; // Byte index of segment / group name in string table, or 0xFFFF.
+ uint16_t ClassName; // Byte index of class in string table, or 0xFFFF.
+ uint32_t Offset; // Byte offset of the logical segment within physical segment. If group is set in flags, this is the offset of the group.
+ uint32_t SectionLength; // Byte count of the segment or group.
+ };
+
+ enum class SectionMapEntryFlags : uint16_t {
+ Read = 1 << 0, // Segment is readable.
+ Write = 1 << 1, // Segment is writable.
+ Execute = 1 << 2, // Segment is executable.
+ AddressIs32Bit = 1 << 3, // Descriptor describes a 32-bit linear address.
+ IsSelector = 1 << 8, // Frame represents a selector.
+ IsAbsoluteAddress = 1 << 9, // Frame represents an absolute address.
+ IsGroup = 1 << 10 // If set, descriptor represents a group.
+ };
+
+Many of these fields are not well understood, so will not be discussed further.
+
+.. _dbi_file_info_substream:
+
+File Info Substream
+^^^^^^^^^^^^^^^^^^^
+Begins at offset ``0`` immediately after the :ref:`dbi_section_map_substream` ends,
+and consumes ``Header->SourceInfoSize`` bytes. This substream defines the mapping
+from module to the source files that contribute to that module. Since multiple
+modules can use the same source file (for example, a header file), this substream
+uses a string table to store each unique file name only once, and then have each
+module use offsets into the string table rather than embedding the string's value
+directly. The format of this substream is as follows:
+
+.. code-block:: c++
+
+ struct FileInfoSubstream {
+ uint16_t NumModules;
+ uint16_t NumSourceFiles;
+
+ uint16_t ModIndices[NumModules];
+ uint16_t ModFileCounts[NumModules];
+ uint32_t FileNameOffsets[NumSourceFiles];
+ char NamesBuffer[][NumSourceFiles];
+ };
+
+**NumModules** - The number of modules for which source file information is
+contained within this substream. Should match the corresponding value from the
+ref:`dbi_header`.
+
+**NumSourceFiles**: In theory this is supposed to contain the number of source
+files for which this substream contains information. But that would present a
+problem in that the width of this field being ``16``-bits would prevent one from
+having more than 64K source files in a program. In early versions of the file
+format, this seems to have been the case. In order to support more than this, this
+field of the is simply ignored, and computed dynamically by summing up the values of
+the ``ModFileCounts`` array (discussed below). In short, this value should be
+ignored.
+
+**ModIndices** - This array is present, but does not appear to be useful.
+
+**ModFileCountArray** - An array of ``NumModules`` integers, each one containing
+the number of source files which contribute to the module at the specified index.
+While each individual module is limited to 64K contributing source files, the
+union of all modules' source files may be greater than 64K. The real number of
+source files is thus computed by summing this array. Note that summing this array
+does not give the number of `unique` source files, only the total number of source
+file contributions to modules.
+
+**FileNameOffsets** - An array of **NumSourceFiles** integers (where **NumSourceFiles**
+here refers to the 32-bit value obtained from summing **ModFileCountArray**), where
+each integer is an offset into **NamesBuffer** pointing to a null terminated string.
+
+**NamesBuffer** - An array of null terminated strings containing the actual source
+file names.
+
+.. _dbi_type_server_substream:
+
+Type Server Substream
+^^^^^^^^^^^^^^^^^^^^^
+Begins at offset ``0`` immediately after the :ref:`dbi_file_info_substream` ends,
+and consumes ``Header->TypeServerSize`` bytes. Neither the purpose nor the layout
+of this substream is understood, although it is assumed to related somehow to the
+usage of ``/Zi`` and ``mspdbsrv.exe``. This substream will not be discussed further.
+
+.. _dbi_ec_substream:
+
+EC Substream
+^^^^^^^^^^^^
+Begins at offset ``0`` immediately after the :ref:`dbi_type_server_substream` ends,
+and consumes ``Header->ECSubstreamSize`` bytes. Neither the purpose nor the layout
+of this substream is understood, and it will not be discussed further.
+
+.. _dbi_optional_dbg_stream:
+
+Optional Debug Header Stream
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Begins at offset ``0`` immediately after the :ref:`dbi_ec_substream` ends, and
+consumes ``Header->OptionalDbgHeaderSize`` bytes. This field is an array of
+stream indices (e.g. ``uint16_t``'s), each of which identifies a stream
+index in the larger MSF file which contains some additional debug information.
+Each position of this array has a special meaning, allowing one to determine
+what kind of debug information is at the referenced stream. ``11`` indices
+are currently understood, although it's possible there may be more. The
+layout of each stream generally corresponds exactly to a particular type
+of debug data directory from the PE/COFF file. The format of these fields
+can be found in the `Microsoft PE/COFF Specification <https://www.microsoft.com/en-us/download/details.aspx?id=19509>`__.
+
+**FPO Data** - ``DbgStreamArray[0]``. The data in the referenced stream is a
+debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``
+
+**Exception Data** - ``DbgStreamArray[1]``. The data in the referenced stream
+is a debug data directory of type ``IMAGE_DEBUG_TYPE_EXCEPTION``.
+
+**Fixup Data** - ``DbgStreamArray[2]``. The data in the referenced stream is a
+debug data directory of type ``IMAGE_DEBUG_TYPE_FIXUP``.
+
+**Omap To Src Data** - ``DbgStreamArray[3]``. The data in the referenced stream
+is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_TO_SRC``. This
+is used for mapping addresses between instrumented and uninstrumented code.
+
+**Omap From Src Data** - ``DbgStreamArray[4]``. The data in the referenced stream
+is a debug data directory of type ``IMAGE_DEBUG_TYPE_OMAP_FROM_SRC``. This
+is used for mapping addresses between instrumented and uninstrumented code.
+
+**Section Header Data** - ``DbgStreamArray[5]``. A dump of all section headers from
+the original executable.
+
+**Token / RID Map** - ``DbgStreamArray[6]``. The layout of this stream is not
+understood, but it is assumed to be a mapping from ``CLR Token`` to
+``CLR Record ID``. Refer to `ECMA 335 <http://www.ecma-international.org/publications/standards/Ecma-335.htm>`__
+for more information.
+
+**Xdata** - ``DbgStreamArray[7]``. A copy of the ``.xdata`` section from the
+executable.
+
+**Pdata** - ``DbgStreamArray[8]``. This is assumed to be a copy of the ``.pdata``
+section from the executable, but that would make it identical to
+``DbgStreamArray[1]``. The difference between these two indices is not well
+understood.
+
+**New FPO Data** - ``DbgStreamArray[9]``. The data in the referenced stream is a
+debug data directory of type ``IMAGE_DEBUG_TYPE_FPO``. It is not clear how this
+differs from ``DbgStreamArray[0]``, but in practice all observed PDB files have
+used the "new" format rather than the "old" format.
+
+**Original Section Header Data** - ``DbgStreamArray[10]``. Assumed to be similar
+to ``DbgStreamArray[5]``, but has not been observed in practice.