summaryrefslogtreecommitdiff
path: root/gcc/doc/cppinternals.texi
diff options
context:
space:
mode:
authorNeil Booth <neil@daikokuya.demon.co.uk>2001-10-06 11:29:51 +0000
committerNeil Booth <neil@gcc.gnu.org>2001-10-06 11:29:51 +0000
commit5b810d3c839d7b5b208bd036a7bfc947830e611b (patch)
tree75faf39e4eedd33b096e8fbef9e57ab4111e9e87 /gcc/doc/cppinternals.texi
parentd644be7b4c9aba239a4ce9b29375cbff27705746 (diff)
* doc/cppinternals.texi: Update.
From-SVN: r46050
Diffstat (limited to 'gcc/doc/cppinternals.texi')
-rw-r--r--gcc/doc/cppinternals.texi112
1 files changed, 63 insertions, 49 deletions
diff --git a/gcc/doc/cppinternals.texi b/gcc/doc/cppinternals.texi
index dee2dea5133..95c4ceba9fa 100644
--- a/gcc/doc/cppinternals.texi
+++ b/gcc/doc/cppinternals.texi
@@ -41,7 +41,7 @@ into another language, under the above conditions for modified versions.
@titlepage
@c @finalout
@title Cpplib Internals
-@subtitle Last revised September 2001
+@subtitle Last revised October 2001
@subtitle for GCC version 3.1
@author Neil Booth
@page
@@ -71,7 +71,7 @@ into another language, under the above conditions for modified versions.
@chapter Cpplib---the core of the GNU C Preprocessor
The GNU C preprocessor in GCC 3.x has been completely rewritten. It is
-now implemented as a library, cpplib, so it can be easily shared between
+now implemented as a library, @dfn{cpplib}, so it can be easily shared between
a stand-alone preprocessor, and a preprocessor integrated with the C,
C++ and Objective-C front ends. It is also available for use by other
programs, though this is not recommended as its exposed interface has
@@ -498,12 +498,13 @@ both for aesthetic reasons and because it causes problems for people who
still try to abuse the preprocessor for things like Fortran source and
Makefiles.
-For now, just notice that the only places we need to be careful about
-@dfn{paste avoidance} are when tokens are added (or removed) from the
-original token stream. This only occurs because of macro expansion, but
-care is needed in many places: before @strong{and} after each macro
-replacement, each argument replacement, and additionally each token
-created by the @samp{#} and @samp{##} operators.
+For now, just notice that when tokens are added (or removed, as shown by
+the @code{EMPTY} example) from the original lexed token stream, we need
+to check for accidental token pasting. We call this @dfn{paste
+avoidance}. Token addition and removal can only occur because of macro
+expansion, but accidental pasting can occur in many places: both before
+and after each macro replacement, each argument replacement, and
+additionally each token created by the @samp{#} and @samp{##} operators.
Let's look at how the preprocessor gets whitespace output correct
normally. The @code{cpp_token} structure contains a flags byte, and one
@@ -512,7 +513,7 @@ indicates that the token was preceded by whitespace of some form other
than a new line. The stand-alone preprocessor can use this flag to
decide whether to insert a space between tokens in the output.
-Now consider the following:
+Now consider the result of the following macro expansion:
@smallexample
#define add(x, y, z) x + y +z;
@@ -524,20 +525,21 @@ The interesting thing here is that the tokens @samp{1} and @samp{2} are
output with a preceding space, and @samp{3} is output without a
preceding space, but when lexed none of these tokens had that property.
Careful consideration reveals that @samp{1} gets its preceding
-whitespace from the space preceding @samp{add} in the macro
-@emph{invocation}, @samp{2} gets its whitespace from the space preceding
-the parameter @samp{y} in the macro @emph{replacement list}, and
-@samp{3} has no preceding space because parameter @samp{z} has none in
-the replacement list.
+whitespace from the space preceding @samp{add} in the macro invocation,
+@emph{not} replacement list. @samp{2} gets its whitespace from the
+space preceding the parameter @samp{y} in the macro replacement list,
+and @samp{3} has no preceding space because parameter @samp{z} has none
+in the replacement list.
Once lexed, tokens are effectively fixed and cannot be altered, since
pointers to them might be held in many places, in particular by
in-progress macro expansions. So instead of modifying the two tokens
above, the preprocessor inserts a special token, which I call a
-@dfn{padding token}, into the token stream in front of every macro
-expansion and expanded macro argument, to indicate that the subsequent
-token should assume its @code{PREV_WHITE} flag from a different
-@dfn{source token}. In the above example, the source tokens are
+@dfn{padding token}, into the token stream to indicate that spacing of
+the subsequent token is special. The preprocessor inserts padding
+tokens in front of every macro expansion and expanded macro argument.
+These point to a @dfn{source token} from which the subsequent real token
+should inherit its spacing. In the above example, the source tokens are
@samp{add} in the macro invocation, and @samp{y} and @samp{z} in the
macro replacement list, respectively.
@@ -551,10 +553,14 @@ a macro's first replacement token expands straight into another macro.
@expansion{} [baz]
@end smallexample
-Here, two padding tokens with sources @samp{foo} between the brackets,
-and @samp{bar} from foo's replacement list, are generated. Clearly the
-first padding token is the one that matters. But what if we happen to
-leave a macro expansion? Adjusting the above example slightly:
+Here, two padding tokens are generated with sources the @samp{foo} token
+between the brackets, and the @samp{bar} token from foo's replacement
+list, respectively. Clearly the first padding token is the one we
+should use, so our output code should contain a rule that the first
+padding token in a sequence is the one that matters.
+
+But what if we happen to leave a macro expansion? Adjusting the above
+example slightly:
@smallexample
#define foo bar
@@ -564,33 +570,41 @@ leave a macro expansion? Adjusting the above example slightly:
@expansion{} [ baz] ;
@end smallexample
-As shown, now there should be a space before baz and the semicolon. Our
-initial algorithm fails for the former, because we would see three
-padding tokens, one per macro invocation, followed by @samp{baz}, which
-would have inherit its spacing from the original source, @samp{foo},
-which has no leading space. Note that it is vital that cpplib get
-spacing correct in these examples, since any of these macro expansions
-could be stringified, where spacing matters.
-
-So, I have demonstrated that not just entering macro and argument
-expansions, but leaving them requires special handling too. So cpplib
-inserts a padding token with a @code{NULL} source token when leaving
-macro expansions and after each replaced argument in a macro's
-replacement list. It also inserts appropriate padding tokens on either
-side of tokens created by the @samp{#} and @samp{##} operators.
-
-Now we can see the relationship with paste avoidance: we have to be
-careful about paste avoidance in exactly the same locations we take care
-to get white space correct. This makes implementation of paste
-avoidance easy: wherever the stand-alone preprocessor is fixing up
-spacing because of padding tokens, and it turns out that no space is
-needed, it has to take the extra step to check that a space is not
-needed after all to avoid an accidental paste. The function
-@code{cpp_avoid_paste} advises whether a space is required between two
-consecutive tokens. To avoid excessive spacing, it tries hard to only
-require a space if one is likely to be necessary, but for reasons of
-efficiency it is slightly conservative and might recommend a space where
-one is not strictly needed.
+As shown, now there should be a space before @samp{baz} and the
+semicolon in the output.
+
+The rules we decided above fail for @samp{baz}: we generate three
+padding tokens, one per macro invocation, before the token @samp{baz}.
+We would then have it take its spacing from the first of these, which
+carries source token @samp{foo} with no leading space.
+
+It is vital that cpplib get spacing correct in these examples since any
+of these macro expansions could be stringified, where spacing matters.
+
+So, this demonstrates that not just entering macro and argument
+expansions, but leaving them requires special handling too. I made
+cpplib insert a padding token with a @code{NULL} source token when
+leaving macro expansions, as well as after each replaced argument in a
+macro's replacement list. It also inserts appropriate padding tokens on
+either side of tokens created by the @samp{#} and @samp{##} operators.
+I expanded the rule so that, if we see a padding token with a
+@code{NULL} source token, @emph{and} that source token has no leading
+space, then we behave as if we have seen no padding tokens at all. A
+quick check shows this rule will then get the above example correct as
+well.
+
+Now a relationship with paste avoidance is apparent: we have to be
+careful about paste avoidance in exactly the same locations we have
+padding tokens in order to get white space correct. This makes
+implementation of paste avoidance easy: wherever the stand-alone
+preprocessor is fixing up spacing because of padding tokens, and it
+turns out that no space is needed, it has to take the extra step to
+check that a space is not needed after all to avoid an accidental paste.
+The function @code{cpp_avoid_paste} advises whether a space is required
+between two consecutive tokens. To avoid excessive spacing, it tries
+hard to only require a space if one is likely to be necessary, but for
+reasons of efficiency it is slightly conservative and might recommend a
+space where one is not strictly needed.
@node Line Numbering
@unnumbered Line numbering