Enlarging integer type in C

Next: Changes to makefile/mk
Up: Modifications to code
Previous: Enlarging integer type in Fortran 77

Enlarging integer type in C

All references to the type int in C code should be changed to INT_BIG. This is a reference to a macro which can be defined in a system-dependent header file or on the compiler command line (and hence typically in the mk environment variable CFLAGS). For a 32-bit build it would normally be defined at build time as int and for a 64-bit build as long. As well as declarations of variables and functions, this applies to almost any other syntactically significant occurrence of the identifier int, for instance casts and arguments of sizeof. References to the types short int and long int, which are just synonyms for short and long respectively, should not be changed. Type unsigned INT_BIG (or, redundantly, signed INT_BIG) may be used. There are some places where an int declaration is implicit, and here an INT_BIG must be inserted.

As regards calling functions from external libraries, the approach described in section 3.2 of recoding all the functions in the library will of course work in C as well as in Fortran. However, this is not always necessary. If the function's ANSI C-style declaration (prototype) is in scope when the function is called, the compiler will normally arrange for conversion of each actual argument to the declared type of the corresponding formal argument in the declaration before passing it by value. Hence the following code

int add( int a, int b );
INT_BIG i, j;
add( i, j );

is correct whatever integral type INT_BIG is defined as, since i and j are converted to type int before add() sees them. If the prototype of add() were not in scope however, the conversion would not take place and the code would be in error.²The lesson is to make sure that header files are included. Of course this relies on having ANSI C-style function prototypes; if old-style function declarations are used then no argument type conversions are made. For code which uses old-style function declarations the best thing is to convert it, or at least a corresponding header file, to ANSI C style.

However this does not solve all problems. Where a function has a variable argument list (as declared in the prototype using ellipsis `...' and handled in the function using the stdarg.h macros), the function prototype is not able to specify the types of all arguments, and so the type of the actual argument must match the type of the formal argument for correctness. If it is impractical to recode the function (as in the case of printf), the best solution is to cast the variable arguments to the type which is expected where the function is called.

A more difficult problem is when the address of an argument is passed so that the contents of that address can be changed by the function. In this case if the called function has a different idea of the length of the object being pointed to it will write to the wrong amount of memory, possibly overwriting other data. Consider this function:

   void zero( int *a ) { 
      *a = 0; 
   }

and this code:

   INT_BIG x;
   zero( &x );

If INT_BIG is a 64 bit type and int is 32 bits then only half the bits in the variable x will be zeroed by this call. If the function declaration is in scope when the call is made, the compiler will issue a pointer mismatch warning about this sort of thing; again, make sure that the appropriate header files are included. Again, in the case of variable argument lists, the compiler can't spot it.

To summarise, external functions should be declared before use by including the appropriate header files. If this is done, then the only problems associated with calling functions which have not been converted to use INT_BIG instead of int should be:

INT_BIG in variable part of argument list:: A modified caller of an unmodified function should explicitly cast an INT_BIG argument in the variable (...) part of the argument list. Normally the cast should be to int, but it may be possible, as with printf, to cast to long and indicate to the called function that this has been done. See the printf example below. A modified function which may get called by unmodified code should expect arguments of type int (i.e. should call the va_arg macro with a second argument of int instead of INT_BIG).
Pointer to INT_BIG variable passed:: A modified caller of an unmodified function will have to declare a local variable of type int and exchange values between it and the INT_BIG before, and possibly after, the call. See the scanf example below. A modified function which may get called by unmodified code will have to declare pointer arguments as pointers to a given fixed type (presumably int), not to INT_BIG.
Overflow:: If an INT_BIG value which is too large to be an int is passed to a variable which is an int, arithmetic overflow will occur when C tries to do the type conversion according to the function declaration. No warning is issued by the Solaris or Tru64 C runtime systems about such overflows.

External libraries which code may have to link against can be split into a few categories:

Starlink libraries

As with Fortran, the plan is for Starlink libraries to get converted to use INT_BIG before code which uses them (although for C code calling C libraries this is not so necessary as with Fortran when functions are pre-declared using header files).

Blocks of source code not to be converted

It may not be a good idea to do INT_BIG conversion to large bodies of non-Starlink code used by the USSC; Perl and Tcl spring to mind. If there is function-level access to these packages, some recoding may be required as above.

The C standard library

The functions of the C standard library are no different from any other unconverted external library, but since their use is common, it is discussed in detail here. Most of the functions in the standard library will be handled adequately as described above by including the appropriate header files, since they do not have int * arguments or variable argument lists. The only exceptions in the standard library are as follows:

frexp

The second argument of the mathematical function frexp is declared int *, so a pointer to int and not to INT_BIG must be passed.

bsearch, qsort

Both these functions take as arguments a comparison function declared int (*)(void *, void *), i.e. a function rerturning int. Although the calls to bsearch and qsort do not need to be modified therefore, the function passed to them must be of the right type, and not modified to be of type INT_BIG (*)(void *, void *).

signal

The second argument of signal is declared as void (*)(int). Calls to signal do not need to be modified, but functions passed to it ought to be declared functions of int and not of INT_BIG.

printf, scanf

The functions which use format strings and variable argument lists, printf, fprintf, sprintf and scanf, fscanf, sscanf, require careful scrutiny. For printf and friends any of the format specifiers cdiouxX*³indicate that the corresponding argument should be an int, and the n specifier indicates a pointer to int. For scanf and friends, any of the specifiers diouxXn indicate pointer to int. If any of the actual arguments in the call is of type INT_BIG (or INT_BIG *) when it should be of type int (or int *), then the calling code needs to be changed.

In the case of int arguments, if the actual argument might be too large to be represented in an int, then an l should be inserted after the `%' sign to indicate that a long argument is being supplied and the corresponding argument should be cast to long. If it will definitely be possible to store the value in an int then the format specifier may be left alone and the argument cast to int. Arguments passed using the c or * specifiers should be cast to int, and not modified with an l character. In the case of int * arguments, intermediate variables have to be used.

Here is an example. If after simple substitution of INT_BIG for int a piece of code reads:

   extern INT_BIG triple( INT_BIG x );
   INT_BIG i, j;
   char c;
   scanf( "%i %c", &i, &c );
   j = triple( i );
   printf( "Integer tripled is %i; Character is '%c'\n", j, c );

then the scanf call must be replaced by something like this:

   {  
      long tmp; 
      scanf( "%li %c", &tmp, &c ); 
      i = tmp; 
   }

and the printf call by something like this:

   printf( "Integer tripled is %li; Character is '%c'\n", (long) j, c );

Other external libraries

If your code links to any other external libraries which cannot, or will not, be converted to use INT_BIGs, some recoding of the calls may be required as above. The most common case of this is use of the various system libraries whose functions are declared in header files in or under /usr/include. waitpid(), signal() and pipe() are a few of the culprits.

There are a few other issues which arise from replacing int type with INT_BIG:

Integer constants from limits.h

Where an int is compared against one of the values INT_MAX, INT_MIN and UINT_MAX defined in the system header file limits.h, an INT_BIG should be compared against one of the corresponding macros INT_BIG_MAX etc. These macros are defined in the header file extreme.h.

Implicit int declarations

There are several places in C (macros and typedefs apart) in which an identifier can be declared as an int without the int reserved word appearing. For example, in the code:

sub( x ) {
   static y;
   signed z;
}

the symbols sub, x, y and z all have type int so that the converted code should read

INT_BIG sub( INT_BIG x ) {
   static INT_BIG y;
   signed INT_BIG z;
}

int used for Fortran LOGICAL

Where a C int is used to represent a LOGICAL variable in Fortran, it should not be replaced by INT_BIG. This is only likely to arise in certain low-level code (e.g. HDS and CNF) which does direct interfacing with Fortran. Under normal circumstances code should use the macros F77_LOGICAL_TYPE and F77_INTEGER_TYPE defined in the CNF header file cnf.h.

The program crepint, and the corresponding driver script do-crepint, is provided for making some of these changes. It replaces all references to int type, with a few exceptions, by INT_BIG type, modifies explicit declarations which are implicitly of type int, and warns about constructs which might need further attention. Full details of which of the above constructs it fixes, and which it warns about, are given in the appendix.

A construction which crepint misses altogether is finding implicit declarations in function prototypes, which are implicitly of type int. These can be spotted by suitable use of the C compilers. Given a file sub.c which reads:

sub( x ) {
   return x;
}

then running gcc with the -Wimplicit flag generates warnings for such declarations:

% gcc -fsyntax-only -Wimplicit-function-declaration -Wstrict-prototypes sub.c
sub.c:1: warning: return-type defaults to `int'
sub.c:1: warning: function declaration isn't a prototype

The -proto flag of Tru64 Unix's C compiler is a little more effort to use, but produces more concise output. Running

% cc -protois -noobject sub.c

will produce a file called sub.H which reads:

extern int sub(int x);

Occurrences of int in the output file sub.H should be changed to read INT_BIG wherever the function is declared or defined in the source code (typically in a source file and maybe a header file), so that sub.c should end up reading:

INT_BIG sub( INT_BIG x ) {
   return x;
}

By running

% cc -protois -noobject -DINT_BIG=long *.c

and attending to any int declarations in the resulting .H files, it should be possible to find any offending implicit declarations. Note however that this only writes function prototypes from function definitions, it does not normalise existing prototypes, so it cannot usefully be applied to header files.

Next: Changes to makefile/mk
Up: Modifications to code
Previous: Enlarging integer type in Fortran 77