summaryrefslogtreecommitdiff
path: root/gcc/README.Portability
blob: cc5fa32d56a9793d368e2a957594b3c34c8457bb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
Copyright (C) 2000 Free Software Foundation, Inc.

This file is intended to contain a few notes about writing C code
within GCC so that it compiles without error on the full range of
compilers GCC needs to be able to compile on.

The problem is that many ISO-standard constructs are not accepted by
either old or buggy compilers, and we keep getting bitten by them.
This knowledge until know has been sparsely spread around, so I
thought I'd collect it in one useful place.  Please add and correct
any problems as you come across them.

I'm going to start from a base of the ISO C89 standard, since that is
probably what most people code to naturally.  Obviously using
constructs introduced after that is not a good idea.

The first section of this file deals strictly with portability issues,
the second with common coding pitfalls.


			Portability Issues
			==================

Unary +
-------

K+R C compilers and preprocessors have no notion of unary '+'.  Thus
the following code snippet contains 2 portability problems.

int x = +2;  /* int x = 2;  */
#if +1       /* #if 1  */
#endif


Pointers to void
----------------

K+R C compilers did not have a void pointer, and used char * as the
pointer to anything.  The macro PTR is defined as either void * or
char * depending on whether you have a standards compliant compiler or
a K+R one.  Thus

  free ((void *) h->value.expansion);

should be written

  free ((PTR) h->value.expansion);


String literals
---------------

K+R C did not allow concatenation of string literals like

  "This is a " "single string literal".

Moreover, some compilers like MSVC++ have fairly low limits on the
maximum length of a string literal; 509 is the lowest we've come
across.  You may need to break up a long printf statement into many
smaller ones.


Empty macro arguments
---------------------

ISO C (6.8.3 in the 1990 standard) specifies the following:

If (before argument substitution) any argument consists of no
preprocessing tokens, the behavior is undefined.

This was relaxed by ISO C99, but some older compilers emit an error,
so code like

#define foo(x, y) x y
foo (bar, )

needs to be coded in some other way.


signed keyword
--------------

The signed keyword did not exist in K+R comilers, it was introduced in
ISO C89, so you cannot use it.  In both K+R and standard C,
unqualified char and bitfields may be signed or unsigned.  There is no
way to portably declare signed chars or signed bitfields.

All other arithmetic types are signed unless you use the 'unsigned'
qualifier.  For instance, it is safe to write

  short paramc;

instead of

  signed short paramc;

If you have an algorithm that depends on signed char or signed
bitfields, you must find another way to write it before it can be
integrated into GCC.


Function prototypes
-------------------

You need to provide a function prototype for every function before you
use it, and functions must be defined K+R style.  The function
prototype should use the PARAMS macro, which takes a single argument.
Therefore the parameter list must be enclosed in parentheses.  For
example,

int myfunc PARAMS ((double, int *));

int
myfunc (var1, var2)
	double var1;
	int *var2;
{
  ...
}

You also need to use PARAMS when referring to function protypes in
other circumstances, for example see "Calling functions through
pointers to functions" below.

Variable-argument functions are best described by example:-

void cpp_ice PARAMS ((cpp_reader *, const char *msgid, ...));

void
cpp_ice VPARAMS ((cpp_reader *pfile, const char *msgid, ...))
{  
#ifndef ANSI_PROTOTYPES
  cpp_reader *pfile;
  const char *msgid;
#endif
  va_list ap;
  
  VA_START (ap, msgid);
  
#ifndef ANSI_PROTOTYPES
  pfile = va_arg (ap, cpp_reader *);
  msgid = va_arg (ap, const char *);
#endif

  ...
  va_end (ap);
}

For the curious, here are the definitions of the above macros.  See
ansidecl.h for the definitions of the above macros and more.

#define PARAMS(paramlist)  paramlist  /* ISO C.  */
#define VPARAMS(args)   args

#define PARAMS(paramlist)  ()         /* K+R C.  */
#define VPARAMS(args)   (va_alist) va_dcl

One aspect of using K+R style function declarations, is you cannot
have arguments whose types are char, short, or float, since without
prototypes (ie, K+R rules), these types are promoted to int, int, and
double respectively.

Calling functions through pointers to functions
-----------------------------------------------

K+R C compilers require brackets around the dereferenced pointer
variable, whereas ISO C relaxes the syntax.  For example

typedef void (* cl_directive_handler) PARAMS ((cpp_reader *, const char *));
      *p->handler (pfile, p->arg);

needs to become

      (*p->handler) (pfile, p->arg);


Macros
------

The rules under K+R C and ISO C for achieving stringification and
token pasting are quite different.  Therefore some macros have been
defined which will get it right depending upon the compiler.

  CONCAT2(a,b) CONCAT3(a,b,c) and CONCAT4(a,b,c,d)

will paste the tokens passed as arguments.  You must not leave any
space around the commas.  Also,

  STRINGX(x)

will stringify an argument; to get the same result on K+R and ISO
compilers x should not have spaces around it.


Enums
-----

In K+R C, you have to cast enum types to use them as integers, and
some compilers in particular give lots of warnings for using an enum
as an array index.


Bitfields
---------

See also "signed keyword" above.  In K+R C only unsigned int bitfields
were defined (i.e. unsigned char, unsigned short, unsigned long.
Using plain int/short/long was not allowed).


free and realloc
----------------

Some implementations crash upon attempts to free or realloc the null
pointer.  Thus if mem might be null, you need to write

  if (mem)
    free (mem);


Reserved Keywords
-----------------

K+R C has "entry" as a reserved keyword, so you should not use it for
your variable names.


Type promotions
---------------

K+R used unsigned-preserving rules for arithmetic expresssions, while
ISO uses value-preserving.  This means an unsigned char compared to an
int is done as an unsigned comparison in K+R (since unsigned char
promotes to unsigned) while it is signed in ISO (since all of the
values in unsigned char fit in an int, it promotes to int).

Trigraphs
---------

You weren't going to use them anyway, but trigraphs were not defined
in K+R C, and some otherwise ISO C compliant compilers do not accept
them.


Suffixes on Integer Constants
-----------------------------

K+R C did not accept a 'u' suffix on integer constants.  If you want
to declare a constant to be be unsigned, you must use an explicit
cast.

You should never use a 'l' suffix on integer constants ('L' is fine),
since it can easily be confused with the number '1'.


			Common Coding Pitfalls
			======================

errno
-----

errno might be declared as a macro.


Implicit int
------------

In C, the 'int' keyword can often be omitted from type declarations.
For instance, you can write

  unsigned variable;

as shorthand for

  unsigned int variable;

There are several places where this can cause trouble.  First, suppose
'variable' is a long; then you might think

  (unsigned) variable

would convert it to unsigned long.  It does not.  It converts to
unsigned int.  This mostly causes problems on 64-bit platforms, where
long and int are not the same size.

Second, if you write a function definition with no return type at
all:

  operate(a, b)
      int a, b;
  {
    ...
  }

that function is expected to return int, *not* void.  GCC will warn
about this.  K+R C has no problem with 'void' as a return type, so you
need not worry about that.

Implicit function declarations always have return type int.  So if you
correct the above definition to

  void
  operate(a, b)
      int a, b;
  ...

but operate() is called above its definition, you will get an error
about a "type mismatch with previous implicit declaration".  The cure
is to prototype all functions at the top of the file, or in an
appropriate header.

Char vs unsigned char vs int
----------------------------

In C, unqualified 'char' may be either signed or unsigned; it is the
implementation's choice.  When you are processing 7-bit ASCII, it does
not matter.  But when your program must handle arbitrary binary data,
or fully 8-bit character sets, you have a problem.  The most obvious
issue is if you have a look-up table indexed by characters.

For instance, the character '\341' in ISO Latin 1 is SMALL LETTER A
WITH ACUTE ACCENT.  In the proper locale, isalpha('\341') will be
true.  But if you read '\341' from a file and store it in a plain
char, isalpha(c) may look up character 225, or it may look up
character -31.  And the ctype table has no entry at offset -31, so
your program will crash.  (If you're lucky.)

It is wise to use unsigned char everywhere you possibly can.  This
avoids all these problems.  Unfortunately, the routines in <string.h>
take plain char arguments, so you have to remember to cast them back
and forth - or avoid the use of strxxx() functions, which is probably
a good idea anyway.

Another common mistake is to use either char or unsigned char to
receive the result of getc() or related stdio functions.  They may
return EOF, which is outside the range of values representable by
char.  If you use char, some legal character value may be confused
with EOF, such as '\377' (SMALL LETTER Y WITH UMLAUT, in Latin-1).
The correct choice is int.

A more subtle version of the same mistake might look like this:

  unsigned char pushback[NPUSHBACK];
  int pbidx;
  #define unget(c) (assert(pbidx < NPUSHBACK), pushback[pbidx++] = (c))
  #define get(c) (pbidx ? pushback[--pbidx] : getchar())
  ...
  unget(EOF);

which will mysteriously turn a pushed-back EOF into a SMALL LETTER Y
WITH UMLAUT.


Other common pitfalls
---------------------

o Expecting 'plain' char to be either sign or unsigned extending

o Shifting an item by a negative amount or by greater than or equal to
  the number of bits in a type (expecting shifts by 32 to be sensible
  has caused quite a number of bugs at least in the early days).

o Expecting ints shifted right to be sign extended.

o Modifying the same value twice within one sequence point.

o Host vs. target floating point representation, including emitting NaNs
  and Infinities in a form that the assembler handles.

o qsort being an unstable sort function (unstable in the sense that
  multiple items that sort the same may be sorted in different orders
  by different qsort functions).

o Passing incorrect types to fprintf and friends.

o Adding a function declaration for a module declared in another file to
  a .c file instead of to a .h file.