16-bit Assembly Language Considerations

This chapter deals with the following topics:

The data representation of the basic types supported by Watcom C/C++.
The method for passing arguments and returning values.
The two methods for passing floating-point arguments and returning floating-point values:
- One method is used when one of the Watcom C/C++ fpi or fpi87 options is specified for the generation of in-line 80x87 instructions. When the fpi option is specified, an 80x87 emulator is included from a math library if the application includes floating-point operations. When the fpi87 option is used exclusively, the 80x87 emulator isn't included. See ``Calling Conventions for 80x87-based Applications''.
- The other method is used when the Watcom C/C++ fpc option is specified. In this case, the compiler generates calls to floating-point support routines in the alternate math libraries. See ``Calling Conventions for non-80x87 Applications''.

An understanding of the Intel 80x86 architecture is assumed.

Data Representation

This section describes the internal or machine representation of the basic types supported by Watcom C/C++:

char
short int
long int
int
float
double

char

An item of type char occupies 1 byte of storage. Its value is in the following range:

    0 <= n <= 255

Note that char is, by default, unsigned. The Watcom C/C++ compiler option j can be used to change the default from unsigned to signed. If char is signed, an item of type char is in the following range:

    -128 <= n <= 127

You can force an item of type char to be unsigned or signed regardless of the default by defining it to be of type unsigned char or signed char, respectively.

short int

An item of type short int occupies 2 bytes of storage. Its value is in the following range:

    -32768 <= n <= 32767

Note that short int is signed, and hence short int and signed short int are equivalent. If an item of type short int is to be unsigned, it must be defined as unsigned short int. In this case, its value is in the following range:

    0 <= n <= 65535

long int

An item of type long int occupies 4 bytes of storage. Its value is in the following range:

    -2147483648 <= n <= 2147483647

Note that long int is signed, and hence long int and signed long int are equivalent. If an item of type long int is to be unsigned, it must be defined as unsigned long int. In this case, its value is in the following range:

    0 <= n <= 4294967295

int

An item of type int occupies 2 bytes of storage. Its value is in the following range:

    -32768 <= n <= 32767

Note that int is signed, and hence int and signed int are equivalent. If an item of type int is to be unsigned, it must be defined as unsigned int. In this case its value is in the following range:

    0 <= n <= 65535

If you are generating code that executes in 16-bit mode, short int and int are equivalent, unsigned short int and unsigned int are equivalent, and signed short int and signed int are equivalent. This might not be the case in other environments where int and long int are 4 bytes.

float

A datum of type float is an approximate representation of a real number. Each datum of type float occupies 4 bytes. If m is the magnitude of x (an item of type float), then x can be approximated if

    2**-126 <= m < 2**128

or, in more approximate terms, if

    1.175494e-38 <= m <= 3.402823e38

Data of type float are represented internally as follows. Note that bytes are stored in memory with the least significant byte first and the most significant byte last.

fig: ./images/float.gif

Note the following:

S: S = Sign bit (0=positive, 1=negative)
Exponent: The exponent bias is 127 (that is, exponent value 1 represents 2**-126, exponent value 127 represents 2**0, exponent value 254 represents 2**127, and so on). The exponent field is 8 bits long.
Significand: The leading bit of the significand is always 1, hence it isn't stored in the significand field. Thus the significand is always normalized. The significand field is 23 bits long.
Zero: A real zero quantity occurs when the sign bit, exponent, and significand are all zero.
Infinity: When the exponent field is all 1 bits, and the significand field is all zero bits, then the quantity represents positive or negative infinity, depending on the sign bit.
Not Numbers: When the exponent field is all 1 bits, and the significand field is non-zero, then the quantity is a special value called a NAN (Not-A-Number).
When the exponent field is all 0 bits, and the significand field is non-zero, then the quantity is a special value called a denormal or nonnormal number.

double

A datum of type double is an approximate representation of a real number. The precision of a datum of type double is greater than or equal to one of type float. Each datum of type double occupies 8 bytes. If m is the magnitude of x (an item of type double), then x can be approximated if

2**-1022 <= m < 2**1024

or, in more approximate terms, if

2.2250738585072e-308 <= m <= 1.79769313486232e308

Data of type double are represented internally as follows. Note that bytes are stored in memory with the least significant byte first and the most significant byte last.

fig: ./images/double.gif

Note the following:

S: S = Sign bit (0=positive, 1=negative)
Exponent: The exponent bias is 1023 (that is, exponent value 1 represents 2**-1022, exponent value 1023 represents 2**0, exponent value 2046 represents 2**1023, and so on). The exponent field is 11 bits long.
Significand: The leading bit of the significand is always 1, hence it isn't stored in the significand field. Thus the significand is always normalized. The significand field is 52 bits long.
Zero: A double-precision zero quantity occurs when the sign bit, exponent, and significand are all zero.
Infinity: When the exponent field is all 1 bits, and the significand field is all zero bits, then the quantity represents positive or negative infinity, depending on the sign bit.
Not Numbers: When the exponent field is all 1 bits, and the significand field is non-zero, then the quantity is a special value called a NAN (Not-A-Number).
When the exponent field is all 0 bits, and the significand field is non-zero then the quantity is a special value called a denormal or nonnormal number.

Calling Conventions for non-80x87 Applications

The following sections describe the calling conventions used when compiling with the fpc compiler option:

Passing Arguments Using Register-based Calling Conventions
Sizes of Predefined Types
Size of Enumerated Types
Effect of Function Prototypes on Arguments
Interfacing to Assembly Language Functions
Functions with Variable Number of Arguments
Returning Values from Functions

Passing Arguments Using Register-based Calling Conventions

How arguments are passed to a function with register-based calling conventions is determined by the size (in bytes) of the argument, and where in the argument list the argument appears.

Depending on the size, arguments are either passed in registers or on the stack. Arguments such as structures are almost always passed on the stack since they're generally too large to fit in registers.
Since arguments are processed from left to right, the first few arguments are likely to be passed in registers (if they can fit) and, if the argument list contains many arguments, the last few arguments are likely to be passed on the stack.

The registers used to pass arguments to a function are AX, BX, CX and DX. The following algorithm describes how arguments are passed to functions.

Initially, we have the following registers available for passing arguments: AX, DX, BX and CX. Note that registers are selected from this list in the order in which they appear. That is, the first register selected is AX, and the last is CX. For each argument Ai, starting with the leftmost argument, perform the following steps:

If the size of Ai is 1 byte, convert it to 2 bytes and proceed to the next step. If Ai is of type unsigned char, it is converted to an unsigned int. If Ai is of type signed char, it is converted to a signed int. If Ai is a 1-byte structure, the padding is determined by the compiler.
If an argument has already been assigned a position on the stack, Ai is also assigned a position on the stack. Otherwise, proceed to the next step.
If the size of Ai is 2 bytes, select a register from the list of available registers. If a register is available, Ai is assigned that register. The register is then removed from the list of available registers. If no registers are available, Ai is assigned a position on the stack.
If the size of Ai is 4 bytes, select a register pair from [DX AX] or [CX BX]. The first available register pair is assigned to Ai, and removed from the list of available pairs. The high-order 16 bits of the argument are assigned to the first register in the pair; the low-order 16 bits are assigned to the second register in the pair. If none of the above register pairs is available, Ai is assigned a position on the stack.
If the type of Ai is double or float (in the absence of a function prototype), select [AX BX CX DX] from the list of available registers. All four registers are removed from the list of available registers. The high-order 16 bits of the argument are assigned to the first register and the low-order 16 bits are assigned to the fourth register. If any of the four registers isn't available, Ai is assigned a position on the stack.
All other arguments are assigned a position on the stack.

Note the following:

Arguments that are assigned a position on the stack are padded to a multiple of 2 bytes. That is, if a 3-byte structure is assigned a position on the stack, 4 bytes are pushed on the stack.
Arguments that are assigned a position on the stack are pushed onto the stack starting with the rightmost argument.

Sizes of Predefined Types

The following table lists the predefined types, their size as returned by the sizeof() function, the size of an argument of that type and the registers used to pass that argument if it is the only argument in the argument list.

Basic Type	sizeof()	Argument size	Registers used
`char`	1	2	[AX]
`short int`	2	2	[AX]
`int`	2	2	[AX]
`long int`	4	4	[DX AX]
`float`	4	8	[AX BX CX DX]
`double`	8	8	[AX BX CX DX]
`near pointer`	2	2	[AX]
`far pointer`	4	4	[DX AX]
`huge pointer`	4	4	[DX AX]

The size of the argument listed in the table assumes that no function prototypes are specified. Function prototypes affect the way arguments are passed. This is discussed in the section entitled ``Effect of Function Prototypes on Arguments''.

Provided no function prototypes exist, an argument is converted to a default type as described in the following table.

Argument Type	Passed As
`char`	`unsigned int`
`signed char`	`signed int`
`unsigned char`	`unsigned int`
`float`	`double`

Size of Enumerated Types

The integral type of an enumerated type is determined by the values of the enumeration constants. In strict ANSI C mode, all enumerated constants are of type int. In the extensions mode, the compiler uses the smallest integral type possible (excluding long ints) that can represent all values of the enumerated type. For instance, if the minimum and maximum values of the enumeration constants are in the range -128 through 127, the enumerated type is equivalent to a signed char (size = 1 byte). All references to enumerated constants in this instance have type signed char. An enumerated constant is always promoted to an int when passed as an argument.

Effect of Function Prototypes on Arguments

Function prototypes define the types of the formal parameters of a function. Their appearance affects the way in which arguments are passed. An argument is converted to the type of the corresponding formal parameter in the function prototype. Consider the following example:

void prototype( float x, int i );

void main()
{
  float x;
  int    i;

  x = 3.14;
  i = 314;
  prototype( x, i );
  rtn( x, i );
}

The function prototype for prototype() specifies that the first argument is to be passed as a float and the second argument is to be passed as an int. This means that the first argument is passed in registers DX and AX, and the second argument in register BX.

If no function prototype is given, as is the case for the function rtn(), the first argument is passed as a double, and the second argument would be passed as an int. This means that the first argument is passed in registers AX, BX, CX, and DX, and the second argument is passed on the stack.

Note that even though both prototype() and rtn() were called with identical argument lists, the way in which the arguments were passed was completely different, simply because a function prototype for prototype() was specified. Function prototyping is an excellent way to guarantee that arguments are passed as expected to your assembly language function.

Interfacing to Assembly Language Functions

Consider the following example.

void main()
{
    double  x;
    int     i;
    double  y;

    x = 7;
    i = 77;
    y = 777;
    myrtn( x, i, y );
}

myrtn() is an assembly language function that requires three arguments:

the first argument is of type long int
the second argument is of type int
the third argument is again of type long int

Using the rules for register-based calling conventions, these arguments are passed to myrtn() in the following way:

The first argument is passed in registers DX and AX, leaving BX and CX as available registers for other arguments.
The second argument is passed in register BX, leaving CX as an available register for other arguments.
The third argument won't fit in register CX (its size is 4 bytes), and hence is pushed on the stack.

Let's look at the stack upon entry to myrtn():

fig: ./images/stk16_1.gif The return address is the top element on the stack. In a small code model, the return address is 1 word (16 bits).

Register SP can't be used as a base register to address the third argument on the stack. Register BP is normally used to address arguments on the stack. Upon entry to the function, register BP is set to point to the stack, but before doing so we must save its contents. The following two instructions achieve this:

push    BP         ; save current value of BP
mov     BP,SP      ; get access to arguments

After executing these instructions, the stack for the small code model looks like this:

fig: ./images/stk16_2.gif

As the above diagrams show, the third argument is at offset 4 from register BP in a small code model, and offset 6 in a big code model.

Upon exit from myrtn(), we must restore the value of BP. The following two instructions achieve this:

mov    SP,BP     ; restore stack pointer
pop    BP        ; restore BP

The following is a sample assembly language function that implements myrtn() for a small memory model (small code, small data).

DGROUP   group   _DATA, _BSS
_TEXT    segment byte public 'CODE'
         assume  CS:_TEXT
         assume  DS:DGROUP
         public  myrtn_
myrtn_   proc    near
         push    BP         ; save BP    
         mov     BP,SP      ; get access to arguments
;
; body of function
;
         mov    SP,BP       ; restore SP
         pop    BP          ; restore BP
         ret    4           ; return and pop last arg
myrtn_   endp
_TEXT    ends

The same function for a large memory model (big code, big data) is shown below:

DGROUP   group   _DATA, _BSS
MYRTN_TEXT segment byte public 'CODE'
         assume  CS:MYRTN_TEXT
         public  myrtn_
myrtn_   proc    far
         push    BP          ; save BP
         mov     BP,SP       ; get access to arguments
;
; body of function
;
         mov     SP,BP       ; restore SP
         pop     BP          ; restore BP
         ret     4           ; return and pop last arg
myrtn_   endp
MYRTN_TEXT ends

Note the following:

Global function names must be followed by an underscore. Global variable names must be preceded by an underscore.
All used 80x86 registers must be saved on entry and restored on exit, except those used to pass arguments and return values. Note that segment registers only have to saved and restored if you are compiling your application with the r option.
The direction flag must be clear before returning to the caller.
In a small code model, any segment containing executable code must belong to the segment _TEXT and the class CODE. The segment _TEXT must have a combine type of PUBLIC. On entry, CS contains the segment address of the segment _TEXT. In a big code model there is no restriction on the naming of segments that contain executable code.
In a small data model, segment register DS contains the segment address of the group DGROUP. This isn't the case in a big data model.
When writing assembly language functions for the small code model, you must declare them as near. If you wish to write assembly language functions for the big code model, you must declare them as far.
In general, when naming segments for your code or data, you should follow the conventions described in the section entitled ``Memory Layout'' in the chapter 16-bit Memory Models.
If any of the arguments are pushed onto the stack, the called routine must pop them off the stack in the ret instruction.

Functions with Variable Number of Arguments

A function prototype with a parameter list that ends with ``,...'' has a variable number of arguments. In this case, all arguments are passed on the stack. Since no prototyping information exists for arguments represented by ``,...'', those arguments are passed as described in the section ``Passing Arguments Using Register-based Calling Conventions''.

Returning Values from Functions

The way in which function values are returned depends on the size of the return value. The following examples describe how function values are to be returned. They are coded for a small code model.

1-byte values are to be returned in register AL.

_TEXT   segment byte public 'CODE'
        assume  CS:_TEXT
        public  Ret1_
Ret1_   proc    near   ; char Ret1()
        mov     AL,'G'
        ret
Ret1_   endp
_TEXT   ends
        end

2-byte values are to be returned in register AX.

_TEXT   segment byte public 'CODE'
        assume  CS:_TEXT
        public  Ret2_
Ret2_   proc    near   ; short int Ret2()
        mov     AX,77
        ret
Ret2_   endp
_TEXT   ends
        end

4-byte values are to be returned in registers DX and AX, with the most significant word in register DX.

_TEXT   segment byte public 'CODE'
        assume  CS:_TEXT
        public  Ret4_
Ret4_   proc    near   ; long int Ret4()
        mov     AX,word ptr CS:Val4+0
        mov     DX,word ptr CS:Val4+2
        ret
Val4    dd      7777777
Ret4_   endp
_TEXT   ends
        end

8-byte values, except structures, are to be returned in registers AX, BX, CX and DX with the most significant word in register AX.
```
        .8087
_TEXT   segment byte public 'CODE'
        assume  CS:_TEXT
        public  Ret8_
Ret8_   proc    near   ; double Ret8()
        mov     DX,word ptr CS:Val8+0
        mov     CX,word ptr CS:Val8+2
        mov     BX,word ptr CS:Val8+4
        mov     AX,word ptr CS:Val8+6
        ret
Val8:   dq      7.7
Ret8_   endp
_TEXT   ends
        end
    
```
The .8087 pseudo-op must be specified so that all floating-point constants are generated in 8087 format. When using the fpc (floating-point calls) option, float and double are returned in registers. See the section ``Returning Values in 80x87-based Applications'' when using the fpi or fpi87 options.

Otherwise, the caller allocates space on the stack for the return value, and sets register SI to point to this area. In a big data model, register SI contains an offset relative to the segment value in segment register SS.

_TEXT   segment byte public 'CODE'
        assume  CS:_TEXT
        public  RetX_
;
; struct int_values {
;     int value1, value2, value3, value4, value5;
;     };
;
RetX_   proc    near ; struct int_values RetX()
        mov     dwordptr SS:0[SI],71
        mov     dwordptr SS:4[SI],72
        mov     dwordptr SS:8[SI],73
        mov     dwordptr SS:12[SI],74
        mov     dwordptr SS:16[SI],75
        ret
RetX_   endp
_TEXT   ends
        end

When returning values on the stack, remember to use a segment override to the stack segment (SS).

The following is an example of a Watcom C/C++ program that calls the above assembly language subprograms:

#include <stdio.h>

struct int_values {
    int value1;
    int value2;
    int value3;
    int value4;
    int value5;
};


extern    char              Ret1(void);
extern    short int         Ret2(void);
extern    long int          Ret4(void);
extern    double            Ret8(void);
extern    struct int_values RetX(void);


void main()
{
    struct int_values x;

    printf( "Ret1 = %c\n", Ret1() );
    printf( "Ret2 = %d\n", Ret2() );
    printf( "Ret4 = %ld\n", Ret4() );
    printf( "Ret8 = %f\n", Ret8() );
    x = RetX();
    printf( "RetX1 = %d\n", x.value1 );
    printf( "RetX2 = %d\n", x.value2 );
    printf( "RetX3 = %d\n", x.value3 );
    printf( "RetX4 = %d\n", x.value4 );
    printf( "RetX5 = %d\n", x.value5 );
}

The above function should be compiled for a small code model (use the ms or mc compiler option).

Calling Conventions for 80x87-based Applications

When a source file is compiled by Watcom C/C++ with one of the fpi or fpi87 options, all floating-point arguments are passed on the 80x86 stack. The rules for passing arguments are as follows:

If the argument isn't floating-point, use the procedure described earlier in this chapter in the section ``Passing Arguments Using Register-based Calling Conventions''.
If the argument is floating-point, it's assigned a position on the 80x86 stack.

Passing Values in 80x87-based Applications

Consider the following example:

extern void myrtn(int,float,double,long int);

void main()
{
    float    x;
    double   y;
    int      i;
    long int j;

    x = 7.7;
    i = 7;
    y = 77.77
    j = 77;
    myrtn( i, x, y, j );
}

myrtn() is an assembly language function that requires four arguments:

The first argument is of type int ( 2 bytes)
the second argument is of type float (4 bytes)
the third argument is of type double (8 bytes)
the fourth argument is of type long int (4 bytes)

These arguments are passed to myrtn() in the following way:

The first argument is passed in register AX, leaving BX, CX, and DX as available registers for other arguments.
The second argument is passed on the 80x86 stack, since it's a floating-point argument.
The third argument is also passed on the 80x86 stack, since it's a floating-point argument.
The fourth argument is passed on the 80x86 stack, since a previous argument has been assigned a position on the 80x86 stack.

Remember, arguments are pushed on the stack from right to left. That is, the rightmost argument is pushed first.

Any assembly language function must obey the rule:

All arguments passed on the stack must be removed by the called function.

The following is a sample assembly language function that implements myrtn():

        .8087
_TEXT   segment byte public 'CODE'
        assume  CS:_TEXT
        public  myrtn_
myrtn_  proc    near
;
; body of function
;
        ret 16         ; return and pop arguments
myrtn_  endp
_TEXT   ends
        end

Note the following:

Function names must be followed by an underscore.
All used 80x86 registers must be saved on entry and restored on exit, except those used to pass arguments and return values. Note that segment registers only have to saved and restored if you're compiling your application with the r option. In this example, AX doesn't have to be saved, as it was used to pass the first argument. Floating-point registers can be modified without saving their contents.
The direction flag must be clear before returning to the caller.
This function has been written for a small code model. Any segment containing executable code must belong to the class CODE and the segment _TEXT. On entry, CS contains the segment address of the segment _TEXT. The above restrictions don't apply in a big code memory model.
When writing assembly language functions for a small code model, you must declare them as near. If you wish to write assembly language functions for a big code model, you must declare them as far.

Returning Values in 80x87-based Applications

Floating-point values are returned in ST(0) when using the fpi or fpi87 options. All other values are returned in the manner described earlier in this chapter in the section ``Returning Values from Functions''.