INTEL 80386 PROGRAMMER'S REFERENCE MANUAL 1986

After a signal on the RESET pin, certain registers of the 80386 are set to
predefined values. These values are adequate to enable execution of a
bootstrap program, but additional initialization must be performed by
software before all the features of the processor can be utilized.

10.1 Processor State After Reset

The contents of EAX depend upon the results of the power-up self test. The
self-test may be requested externally by assertion of BUSY# at the end of
RESET. The EAX register holds zero if the 80386 passed the test. A nonzero
value in EAX after self-test indicates that the particular 80386 unit is
faulty. If the self-test is not requested, the contents of EAX after RESET
is undefined.

DX holds a component identifier and revision number after RESET as Figure
10-1 illustrates. DH contains 3, which indicates an 80386 component. DL
contains a unique identifier of the revision level.

Control register zero (CR0) contains the values shown in Figure 10-2. The
ET bit of CR0 is set if an 80387 is present in the configuration (according
to the state of the ERROR# pin after RESET). If ET is reset, the
configuration either contains an 80287 or does not contain a coprocessor. A
software test is required to distinguish between these latter two
possibilities.

The remaining registers and flags are set as follows:

EFLAGS =00000002H
IP =0000FFF0H
CS selector =000H
DS selector =0000H
ES selector =0000H
SS selector =0000H
FS selector =0000H
GS selector =0000H
IDTR:
base =0
limit =03FFH

All registers not mentioned above are undefined.

These settings imply that the processor begins in real-address mode with
interrupts disabled.

Figure 10-1. Contents of EDX after RESET

EDX REGISTER

31 23 15 7 0
ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ DH ³ DL º
º±±±±±±±±±±±±UNDEFINED±±±±±±±±±±±±³ DEVICE ID ³ STEPPING ID º
º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ 3 ³ (UNIQUE) º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼

Figure 10-2. Initial Contents of CR0

CONTROL REGISTER ZERO

31 23 15 7 4 3 1 0
ÉÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÑÍÑÍÑÍÑÍÑÍ»
ºP³ ³E³T³E³M³Pº
º ³ UNDEFINED ³ ³ ³ ³ ³ º
ºG³ ³T³S³M³P³Eº
ÈÑÏÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÏÑÏÑÏÑÏÑÏÑ¼
³ ³ ³ ³ ³ ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄ0 - PAGING DISABLED ³ ³ ³ ³ ³
* - INDICATES PRESENCE OF 80387ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ ³ ³
0 - NO TASK SWITCHÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ ³
0 - DO NOT MONITOR COPROCESSORÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³
0 - COPROCESSOR NOT PRESENTÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³
0 - PROTECTION NOT ENABLED (REAL ADDRESS MODE)ÄÄÄÄÄÄÄÄÄÄÙ

10.2 Software Initialization for Real-Address Mode

In real-address mode a few structures must be initialized before a program
can take advantage of all the features available in this mode.

10.2.1 Stack

No instructions that use the stack can be used until the stack-segment
register (SS) has been loaded. SS must point to an area in RAM.

10.2.2 Interrupt Table

The initial state of the 80386 leaves interrupts disabled; however, the
processor will still attempt to access the interrupt table if an exception
or nonmaskable interrupt (NMI) occurs. Initialization software should take
one of the following actions:

þ Change the limit value in the IDTR to zero. This will cause a shutdown
if an exception or nonmaskable interrupt occurs. (Refer to the 80386
Hardware Reference Manual to see how shutdown is signalled externally.)

þ Put pointers to valid interrupt handlers in all positions of the
interrupt table that might be used by exceptions or interrupts.

þ Change the IDTR to point to a valid interrupt table.

10.2.3 First Instructions

After RESET, address lines A{31-20} are automatically asserted for
instruction fetches. This fact, together with the initial values of CS:IP,
causes instruction execution to begin at physical address FFFFFFF0H. Near
(intrasegment) forms of control transfer instructions may be used to pass
control to other addresses in the upper 64K bytes of the address space. The
first far (intersegment) JMP or CALL instruction causes A{31-20} to drop
low, and the 80386 continues executing instructions in the lower one
megabyte of physical memory. This automatic assertion of address lines
A{31-20} allows systems designers to use a ROM at the high end of
the address space to initialize the system.

10.3 Switching to Protected Mode

Setting the PE bit of the MSW in CR0 causes the 80386 to begin executing in
protected mode. The current privilege level (CPL) starts at zero. The
segment registers continue to point to the same linear addresses as in real
address mode (in real address mode, linear addresses are the same physical
addresses).

Immediately after setting the PE flag, the initialization code must flush
the processor's instruction prefetch queue by executing a JMP instruction.
The 80386 fetches and decodes instructions and addresses before they are
used; however, after a change into protected mode, the prefetched
instruction information (which pertains to real-address mode) is no longer
valid. A JMP forces the processor to discard the invalid information.

10.4 Software Initialization for Protected Mode

Most of the initialization needed for protected mode can be done either
before or after switching to protected mode. If done in protected mode,
however, the initialization procedures must not use protected-mode features
that are not yet initialized.

10.4.1 Interrupt Descriptor Table

The IDTR may be loaded in either real-address or protected mode. However,
the format of the interrupt table for protected mode is different than that
for real-address mode. It is not possible to change to protected mode and
change interrupt table formats at the same time; therefore, it is inevitable
that, if IDTR selects an interrupt table, it will have the wrong format at
some time. An interrupt or exception that occurs at this time will have
unpredictable results. To avoid this unpredictability, interrupts should
remain disabled until interrupt handlers are in place and a valid IDT has
been created in protected mode.

10.4.2 Stack

The SS register may be loaded in either real-address mode or protected
mode. If loaded in real-address mode, SS continues to point to the same
linear base-address after the switch to protected mode.

10.4.3 Global Descriptor Table

Before any segment register is changed in protected mode, the GDT register
must point to a valid GDT. Initialization of the GDT and GDTR may be done in
real-address mode. The GDT (as well as LDTs) should reside in RAM, because
the processor modifies the accessed bit of descriptors.

10.4.4 Page Tables

Page tables and the PDBR in CR3 can be initialized in either real-address
mode or in protected mode; however, the paging enabled (PG) bit of CR0
cannot be set until the processor is in protected mode. PG may be set
simultaneously with PE, or later. When PG is set, the PDBR in CR3 should
already be initialized with a physical address that points to a valid page
directory. The initialization procedure should adopt one of the following
strategies to ensure consistent addressing before and after paging is
enabled:

þ The page that is currently being executed should map to the same
physical addresses both before and after PG is set.

þ A JMP instruction should immediately follow the setting of PG.

10.4.5 First Task

The initialization procedure can run awhile in protected mode without
initializing the task register; however, before the first task switch, the
following conditions must prevail:

þ There must be a valid task state segment (TSS) for the new task. The
stack pointers in the TSS for privilege levels numerically less than or
equal to the initial CPL must point to valid stack segments.

þ The task register must point to an area in which to save the current
task state. After the first task switch, the information dumped in this
area is not needed, and the area can be used for other purposes.

10.5 Initialization Example

$TITLE ('Initial Task')

NAME INIT

init_stack SEGMENT RW
DW 20 DUP(?)
tos LABEL WORD
init_stack ENDS

init_data SEGMENT RW PUBLIC
DW 20 DUP(?)
init_data ENDS

init_code SEGMENT ER PUBLIC

ASSUME DS:init_data

nop
nop
nop
init_start:
; set up stack
mov ax, init_stack
mov ss, ax
mov esp, offset tos

mov a1,1
blink:
xor a1,1
out 0e4h,a1
mov cx,3FFFh
here:
dec cx
jnz here

jmp SHORT blink

hlt
init_code ends

END init_start, SS:init_stack, DS:init_data

$TITLE('Protected Mode Transition -- 386 initialization')
NAME RESET

;*****************************************************************
; Upon reset the 386 starts executing at address 0FFFFFFF0H. The
; upper 12 address bits remain high until a FAR call or jump is
; executed.
;
; Assume the following:
;
;
; - a short jump at address 0FFFFFFF0H (placed there by the
; system builder) causes execution to begin at START in segment
; RESET_CODE.
;
;
; - segment RESET_CODE is based at physical address 0FFFF0000H,
; i.e. at the start of the last 64K in the 4G address space.
; Note that this is the base of the CS register at reset. If
; you locate ROMcode above this address, you will need to
; figure out an adjustment factor to address things within this
; segment.
;
;*****************************************************************
$EJECT ;

; Define addresses to locate GDT and IDT in RAM.
; These addresses are also used in the BLD386 file that defines
; the GDT and IDT. If you change these addresses, make sure you
; change the base addresses specified in the build file.

GDTbase EQU 00001000H ; physical address for GDT base
IDTbase EQU 00000400H ; physical address for IDT base

PUBLIC GDT_EPROM
PUBLIC IDT_EPROM
PUBLIC START

DUMMY segment rw ; ONLY for ASM386 main module stack init
DW 0
DUMMY ends

;*****************************************************************
;
; Note: RESET CODE must be USEl6 because the 386 initally executes
; in real mode.
;

RESET_CODE segment er PUBLIC USE16

ASSUME DS:nothing, ES:nothing

;
; 386 Descriptor template

DESC STRUC
lim_0_15 DW 0 ; limit bits (0..15)
bas_0_15 DW 0 ; base bits (0..15)
bas_16_23 DB 0 ; base bits (16..23)
access DB 0 ; access byte
gran DB 0 ; granularity byte
bas_24_31 DB 0 ; base bits (24..31)
DESC ENDS

; The following is the layout of the real GDT created by BLD386.
; It is located in EPROM and will be copied to RAM.
;
; GDT[O] ... NULL
; GDT[1] ... Alias for RAM GDT
; GDT[2] ... Alias for RAM IDT
; GDT[2] ... initial task TSS
; GDT[3] ... initial task TSS alias
; GDT[4] ... initial task LDT
; GDT[5] ... initial task LDT alias

;
; define entries in GDT and IDT.

GDT_ENTRIES EQU 8
IDT_ENTRIES EQU 32

; define some constants to index into the real GDT

GDT_ALIAS EQU 1*SIZE DESC
IDT_ALIAS EQU 2*SIZE DESC
INIT_TSS EQU 3*SIZE DESC
INIT_TSS_A EQU 4*SIZE DESC
INIT_LDT EQU 5*SIZE DESC
INIT_LDT_A EQU 6*SIZE DESC

;
; location of alias in INIT_LDT

INIT_LDT_ALIAS EQU 1*SIZE DESC

;
; access rights byte for DATA and TSS descriptors

DS_ACCESS EQU 010010010B
TSS_ACCESS EQU 010001001B

;
; This temporary GDT will be used to set up the real GDT in RAM.

Temp_GDT LABEL BYTE ; tag for begin of scratch GDT

NULL_DES DESC <> ; NULL descriptor

; 32-Gigabyte data segment based at 0
FLAT_DES DESC <0FFFFH,0,0,92h,0CFh,0>

GDT_eprom DP ? ; Builder places GDT address and limit
; in this 6 byte area.

IDT_eprom DP ? ; Builder places IDT address and limit
; in this 6 byte area.

;
; Prepare operand for loadings GDTR and LDTR.

TGDT_pword LABEL PWORD ; for temp GDT
DW end_Temp_GDT_Temp_GDT -1
DD 0

GDT_pword LABEL PWORD ; for GDT in RAM
DW GDT_ENTRIES * SIZE DESC -1
DD GDTbase

IDT_pword LABEL PWORD ; for IDT in RAM
DW IDT_ENTRIES * SIZE DESC -1
DD IDTbase

end_Temp_GDT LABEL BYTE

;
; Define equates for addressing convenience.

GDT_DES_FLAT EQU DS:GDT_ALIAS +GDTbase
IDT_DES_FLAT EQU DS:IDT_ALIAS +GDTbase

INIT_TSS_A_OFFSET EQU DS:INIT_TSS_A
INIT_TSS_OFFSET EQU DS:INIT_TSS

INIT_LDT_A_OFFSET EQU DS:INIT_LDT_A
INIT_LDT_OFFSET EQU DS:INIT_LDT

; define pointer for first task switch

ENTRY POINTER LABEL DWORD
DW 0, INIT_TSS

;******************************************************************
;
; Jump from reset vector to here.

START:

CLI ;disable interrupts
CLD ;clear direction flag

LIDT NULL_des ;force shutdown on errors

;
; move scratch GDT to RAM at physical 0

XOR DI,DI
MOV ES,DI ;point ES:DI to physical location 0

MOV SI,OFFSET Temp_GDT
MOV CX,end_Temp_GDT-Temp_GDT ;set byte count
INC CX
;
; move table

REP MOVS BYTE PTR ES:[DI],BYTE PTR CS:[SI]

LGDT tGDT_pword ;load GDTR for Temp. GDT
;(located at 0)

; switch to protected mode

MOV EAX,CR0 ;get current CRO
MOV EAX,1 ;set PE bit
MOV CRO,EAX ;begin protected mode
;
; clear prefetch queue

JMP SHORT flush
flush:

; set DS,ES,SS to address flat linear space (0 ... 4GB)

MOV BX,FLAT_DES-Temp_GDT
MOV US,BX
MOV ES,BX
MOV SS,BX
;
; initialize stack pointer to some (arbitrary) RAM location

MOV ESP, OFFSET end_Temp_GDT

;
; copy eprom GDT to RAM

MOV ESI,DWORD PTR GDT_eprom +2 ; get base of eprom GDT
; (put here by builder).

MOV EDI,GDTbase ; point ES:EDI to GDT base in RAM.

MOV CX,WORD PTR gdt_eprom +0 ; limit of eprom GDT
INC CX
SHR CX,1 ; easier to move words
CLD
REP MOVS WORD PTR ES:[EDI],WORD PTR DS:[ESI]

;
; copy eprom IDT to RAM
;
MOV ESI,DWORD PTR IDT_eprom +2 ; get base of eprom IDT
; (put here by builder)

MOV EDI,IDTbase ; point ES:EDI to IDT base in RAM.

MOV CX,WORD PTR idt_eprom +0 ; limit of eprom IDT
INC CX
SHR CX,1
CLD
REP MOVS WORD PTR ES:[EDI],WORD PTR DS:[ESI]

; switch to RAM GDT and IDT
;
LIDT IDT_pword
LGDT GDT_pword

;
MOV BX,GDT_ALIAS ; point DS to GDT alias
MOV DS,BX
;
; copy eprom TSS to RAM
;
MOV BX,INIT_TSS_A ; INIT TSS A descriptor base
; has RAM location of INIT TSS.

MOV ES,BX ; ES points to TSS in RAM

MOV BX,INIT_TSS ; get inital task selector
LAR DX,BX ; save access byte
MOV [BX].access,DS_ACCESS ; set access as data segment
MOV FS,BX ; FS points to eprom TSS

XOR si,si ; FS:si points to eprom TSS
XOR di,di ; ES:di points to RAM TSS

MOV CX,[BX].lim_0_15 ; get count to move
INC CX

;
; move INIT_TSS to RAM.

REP MOVS BYTE PTR ES:[di],BYTE PTR FS:[si]

MOV [BX].access,DH ; restore access byte

;
; change base of INIT TSS descriptor to point to RAM.

MOV AX,INIT_TSS_A_OFFSET.bas_0_15
MOV INIT_TSS_OFFSET.bas_0_15,AX
MOV AL,INIT_TSS_A_OFFSET.bas_16_23
MOV INIT_TSS_OFFSET.bas_16_23,AL
MOV AL,INIT_TSS_A_OFFSET.bas_24_31
MOV INIT_TSS_OFFSET.bas_24_31,AL

;
; change INIT TSS A to form a save area for TSS on first task
; switch. Use RAM at location 0.

MOV BX,INIT_TSS_A
MOV WORD PTR [BX].bas_0_15,0
MOV [BX].bas_16_23,0
MOV [BX].bas_24_31,0
MOV [BX].access,TSS_ACCESS
MOV [BX].gran,O
LTR BX ; defines save area for TSS

;
; copy eprom LDT to RAM

MOV BX,INIT_LDT_A ; INIT_LDT_A descriptor has
; base address in RAM for INIT_LDT.

MOV ES,BX ; ES points LDT location in RAM.

MOV AH,[BX].bas_24_31
MOV AL,[BX].bas_16_23
SHL EAX,16
MOV AX,[BX].bas_0_15 ; save INIT_LDT base (ram) in EAX

MOV BX,INIT_LDT ; get inital LDT selector
LAR DX,BX ; save access rights
MOV [BX].access,DS_ACCESS ; set access as data segment
MOV FS,BX ; FS points to eprom LDT

XOR si,si ; FS:SI points to eprom LDT
XOR di,di ; ES:DI points to RAM LDT

MOV CX,[BX].lim_0_15 ; get count to move
INC CX
;
; move initial LDT to RAM

REP MOVS BYTE PTR ES:[di],BYTE PTR FS:[si]

MOV [BX].access,DH ; restore access rights in
; INIT_LDT descriptor

;
; change base of alias (of INIT_LDT) to point to location in RAM.

MOV ES:[INIT_LDT_ALIAS].bas_0_15,AX
SHR EAX,16
MOV ES:[INIT_LDT_ALIAS].bas_16_23,AL
MOV ES:[INIT_LDT_ALIAS].bas_24_31,AH

;
; now set the base value in INIT_LDT descriptor

MOV AX,INIT_LDT_A_OFFSET.bas_0_15
MOV INIT_LDT_OFFSET.bas_0_15,AX
MOV AL,INIT_LDT_A_OFFSET.bas_16_23
MOV INIT_LDT_OFFSET.bas_16_23,AL
MOV AL,INIT_LDT_A_OFFSET.bas_24_31
MOV INIT_LDT_OFFSET.bas_24_31,AL

;
; Now GDT, IDT, initial TSS and initial LDT are all set up.
;
; Start the first task!
'
JMP ENTRY_POINTER

RESET_CODE ends
END START, SS:DUMMY,DS:DUMMY

10.6 TLB Testing

The 80386 provides a mechanism for testing the Translation Lookaside Buffer
(TLB), the cache used for translating linear addresses to physical
addresses. Although failure of the TLB hardware is extremely unlikely, users
may wish to include TLB confidence tests among other power-up confidence
tests for the 80386.

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
NOTE
This TLB testing mechanism is unique to the 80386 and may not be
continued in the same way in future processors. Sortware that uses
this mechanism may be incompatible with future processors.
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

When testing the TLB it is recommended that paging be turned off (PG=0 in
CR0) to avoid interference with the test data being written to the TLB.

10.6.1 Structure of the TLB

The TLB is a four-way set-associative memory. Figure 10-3 illustrates the
structure of the TLB. There are four sets of eight entries each. Each entry
consists of a tag and data. Tags are 24-bits wide. They contain the
high-order 20 bits of the linear address, the valid bit, and three attribute
bits. The data portion of each entry contains the high-order 20 bits of the
physical address.

10.6.2 Test Registers

Two test registers, shown in Figure 10-4, are provided for the purpose of
testing. TR6 is the test command register, and TR7 is the test data
register. These registers are accessed by variants of the MOV
instruction. A test register may be either the source operand or destination
operand. The MOV instructions are defined in both real-address mode and
protected mode. The test registers are privileged resources; in protected
mode, the MOV instructions that access them can only be executed at
privilege level 0. An attempt to read or write the test registers when
executing at any other privilege level causes a general
protection exception.

The test command register (TR6) contains a command and an address tag to
use in performing the command:

C This is the command bit. There are two TLB testing commands:
write entries into the TLB, and perform TLB lookups. To cause an
immediate write into the TLB entry, move a doubleword into TR6
that contains a 0 in this bit. To cause an immediate TLB lookup,
move a doubleword into TR6 that contains a 1 in this bit.

Linear On a TLB write, a TLB entry is allocated to this linear address;
Address the rest of that TLB entry is set per the value of TR7 and the
value just written into TR6. On a TLB lookup, the TLB is
interrogated per this value; if one and only one TLB entry
matches, the rest of the fields of TR6 and TR7 are set from the
matching TLB entry.

V The valid bit for this TLB entry. The TLB uses the valid bit to
identify entries that contain valid data. Entries of the TLB
that have not been assigned values have zero in the valid bit.
All valid bits can be cleared by writing to CR3.

D, D# The dirty bit (and its complement) for/from the TLB entry.

U, U# The U/S bit (and its complement) for/from the TLB entry.

W, W# The R/W bit (and its complement) for/from the TLB entry.

The meaning of these pairs of bits is given by Table 10-1,
where X represents D, U, or W.

The test data register (TR7) holds data read from or data to be written to
the TLB.

Physical This is the data field of the TLB. On a write to the TLB, the
Address TLB entry allocated to the linear address in TR6 is set to this
value. On a TLB lookup, if HT is set, the data field (physical
address) from the TLB is read out to this field. If HT is not
set, this field is undefined.

HT For a TLB lookup, the HT bit indicates whether the lookup was a
hit (HT 1) or a miss (HT 0). For a TLB write, HT must be set
to 1.

REP For a TLB write, selects which of four associative blocks of the
TLB is to be written. For a TLB read, if HT is set, REP reports
in which of the four associative blocks the tag was found; if HT
is not set, REP is undefined.

Table 10-1. Meaning of D, U, and W Bit Pairs

X X# Effect during Value of bit X
TLB Lookup after TLB Write

0 0 (undefined) (undefined)
0 1 Match if X=0 Bit X becomes 0
1 0 Match if X=1 Bit X becomes 1
1 1 (undefined) (undefined)

Figure 10-3. TLB Structure

ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
7º TAG º DATA º
ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹

ÚÄÄÄÄÄÄÄ
³ SET 11
³ ÚÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
³ ³ 1º TAG º DATA º
³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
³ ³ 0º TAG º DATA º
³ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼
³ ³
³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
³ ³ 7º TAG º DATA º
³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
³ ³
³ ÀÄÄ
³ SET 10
³ ÚÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
³ ³ 1º TAG º DATA º
³ D ³ ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
³ A ³ ³ ³ 0º TAG º DATA º
³ T ÀÄÄÄÄÄÄÙ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼
³ A ³
³ ÚÄÄÄÄÄÄ¿ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
³ B ³ ³ ³ 7º TAG º DATA º
³ U ³ ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
³ S ³ ³ ³
³ ÀÄÄ
³ SET 01
³ ÚÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
³ ³ 1º TAG º DATA º
³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
³ ³ 0º TAG º DATA º
³ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼
³ ³
³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
³ ³ 7º TAG º DATA º
³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
³ ³
³ ÀÄÄ
³ SET 00
ÀÄÄÄÄÄÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
1º TAG º DATA º
ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
0º TAG º DATA º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼

Figure 10-4. Test Registers

31 23 15 11 7 0
ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍØÍÍÍÍÍÍÍØÍÍÍÍÍÑÍÑÍÍÍÑÍÍÍ»
º ³ ³H³ ³ º
º PHYSICAL ADDRESS ³0 0 0 0 0 0 0³ ³REP³0 0º TR7
º ³ ³T³ ³ º
ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÂÄÂÄÂÄÂÄÂÄÂÄÅÄÁÄÄÄÁÄÂÄ¶
º ³ ³ ³D³ ³U³ ³W³ ³ º
º LINEAR ADDRESS ³V³D³ ³U³ ³ ³ ³0 0 0 0³Cº TR8
º ³ ³ ³#³ ³#³ ³#³ ³ º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍØÍÏÍÏÍÏÍØÍÏÍÏÍÏÍÍÍÍÍÍÍÏÍ¼

NOTE: 0 INDICATES INTEL RESERVED. NO NOT DEFINE

10.6.3 Test Operations

To write a TLB entry:

1. Move a doubleword to TR7 that contains the desired physical address,
HT, and REP values. HT must contain 1. REP must point to the
associative block in which to place the entry.

2. Move a doubleword to TR6 that contains the appropriate linear
address, and values for V, D, U, and W. Be sure C=0 for "write"
command.

Be careful not to write duplicate tags; the results of doing so are
undefined.

To look up (read) a TLB entry:

1. Move a doubleword to TR6 that contains the appropriate linear address
and attributes. Be sure C=1 for "lookup" command.

2. Store TR7. If the HT bit in TR7 indicates a hit, then the other
values reveal the TLB contents. If HT indicates a miss, then the other
values in TR7 are indeterminate.

For the purposes of testing, the V bit functions as another bit of
addresss. The V bit for a lookup request should usually be set, so that
uninitialized tags do not match. Lookups with V=0 are unpredictable if any
tags are uninitialized.

Chapter 11 Coprocessing and Multiprocessing

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

The 80386 has two levels of support for multiple parallel processing units:

þ A highly specialized interface for very closely coupled processors of
a type known as coprocessors.

þ A more general interface for more loosely coupled processors of
unspecified type.

11.1 Coprocessing

The components of the coprocessor interface include:

þ ET bit of control register zero (CR0)
þ The EM, and MP bits of CR0
þ The ESC instructions
þ The WAIT instruction
þ The TS bit of CR0
þ Exceptions

11.1.1 Coprocessor Identification

The 80386 is designed to operate with either an 80287 or 80387 math
coprocessor. The ET bit of CR0 indicates which type of coprocessor is
present. ET is set automatically by the 80386 after RESET according to the
level detected on the ERROR# input. If desired, ET may also be set or reset
by loading CR0 with a MOV instruction. If ET is set, the 80386 uses the
32-bit protocol of the 80387; if reset, the 80386 uses the 16-bit protocol
of the 80287.

11.1.2 ESC and WAIT Instructions

The 80386 interprets the pattern 11011B in the first five bits of an
instruction as an opcode intended for a coprocessor. Instructions thus
marked are called ESCAPE or ESC instructions. The CPU performs the following
functions upon encountering an ESC instruction before sending the
instruction to the coprocessor:

þ Tests the emulation mode (EM) flag to determine whether coprocessor
functions are being emulated by software.

þ Tests the TS flag to determine whether there has been a context change
since the last ESC instruction.

þ For some ESC instructions, tests the ERROR# pin to determine whether
the coprocessor detected an error in the previous ESC instruction.

The WAIT instruction is not an ESC instruction, but WAIT causes the CPU to
perform some of the same tests that it performs upon encountering an ESC
instruction. The processor performs the following actions for a WAIT
instruction:

þ Waits until the coprocessor no longer asserts the BUSY# pin.

þ Tests the ERROR# pin (after BUSY# goes inactive). If ERROR# is active,
the 80386 signals exception 16, which indicates that the coprocessor
encountered an error in the previous ESC instruction.

þ WAIT can therefore be used to cause exception 16 if an error is
pending from a previous ESC instruction. Note that, if no coprocessor
is present, the ERROR# and BUSY# pins should be tied inactive to
prevent WAIT from waiting forever or causing spurious exceptions.

11.1.3 EM and MP Flags

The EM and MP flags of CR0 control how the processor reacts to coprocessor
instructions.

The EM bit indicates whether coprocessor functions are to be emulated. If
the processor finds EM set when executing an ESC instruction, it signals
exception 7, giving the exception handler an opportunity to emulate the ESC
instruction.

The MP (monitor coprocessor) bit indicates whether a coprocessor is
actually attached. The MP flag controls the function of the WAIT
instruction. If, when executing a WAIT instruction, the CPU finds MP set,
then it tests the TS flag; it does not otherwise test TS during a WAIT
instruction. If it finds TS set under these conditions, the CPU signals
exception 7.

The EM and MP flags can be changed with the aid of a MOV instruction using
CR0 as the destination operand and read with the aid of a MOV instruction
with CR0 as the source operand. These forms of the MOV instruction can be
executed only at privilege level zero.

11.1.4 The Task-Switched Flag

The TS bit of CR0 helps to determine when the context of the coprocessor
does not match that of the task being executed by the 80386 CPU. The 80386
sets TS each time it performs a task switch (whether triggered by software
or by hardware interrupt). If, when interpreting one of the ESC
instructions, the CPU finds TS already set, it causes exception 7. The WAIT
instruction also causes exception 7 if both TS and MP are set. Operating
systems can use this exception to switch the context of the coprocessor to
correspond to the current task. Refer to the 80386 System Software Writer's
Guide for an example.

The CLTS instruction (legal only at privilege level zero) resets the TS
flag.

11.1.5 Coprocessor Exceptions

Three exceptions aid in interfacing to a coprocessor: interrupt 7
(coprocessor not available), interrupt 9 (coprocessor segment overrun), and
interrupt 16 (coprocessor error).

11.1.5.1 Interrupt 7 ÄÄ Coprocessor Not Available

This exception occurs in either of two conditions:

1. The CPU encounters an ESC instruction and EM is set. In this case,
the exception handler should emulate the instruction that caused the
exception. TS may also be set.

2. The CPU encounters either the WAIT instruction or an ESC instruction
when both MP and TS are set. In this case, the exception handler
should update the state of the coprocessor, if necessary.

11.1.5.2 Interrupt 9 ÄÄ Coprocessor Segment Overrun

This exception occurs in protected mode under the following conditions:

þ An operand of a coprocessor instruction wraps around an addressing
limit (0FFFFH for small segments, 0FFFFFFFFH for big segments, zero for
expand-down segments). An operand may wrap around an addressing limit
when the segment limit is near an addressing limit and the operand is
near the largest valid address in the segment. Because of the
wrap-around, the beginning and ending addresses of such an operand
will be near opposite ends of the segment.

þ Both the first byte and the last byte of the operand (considering
wrap-around) are at addresses located in the segment and in present and
accessible pages.

þ The operand spans inaccessible addresses. There are two ways that such
an operand may also span inaccessible addresses:

1. The segment limit is not equal to the addressing limit (e.g.,
addressing limit is FFFFH and segment limit is FFFDH); therefore,
the operand will span addresses that are not within the segment
(e.g., an 8-byte operand that starts at valid offset FFFC will span
addresses FFFC-FFFF and 0000-0003; however, addresses FFFE and FFFF
are not valid, because they exceed the limit);

2. The operand begins and ends in present and accessible pages but
intermediate bytes of the operand fall either in a not-present page
or in a page to which the current procedure does not have access
rights.

The address of the failing numerics instruction and data operand may be
lost; an FSTENV does not return reliable addresses. As with the 80286/80287,
the segment overrun exception should be handled by executing an FNINIT
instruction (i.e., an FINIT without a preceding WAIT). The return address on
the stack does not necessarily point to the failing instruction nor to the
following instruction. The failing numerics instruction is not restartable.

Case 2 can be avoided by either aligning all segments on page boundaries or
by not starting them within 108 bytes of the start or end of a page. (The
maximum size of a coprocessor operand is 108 bytes.) Case 1 can be avoided
by making sure that the gap between the last valid offset and the first
valid offset of a segment is either no less than 108 bytes or is zero (i.e.,
the segment is of full size). If neither software system design constraint
is acceptable, the exception handler should execute FNINIT and should
probably terminate the task.

11.1.5.3 Interrupt 16 ÄÄ Coprocessor Error

The numerics coprocessors can detect six different exception conditions
during instruction execution. If the detected exception is not masked by a
bit in the control word, the coprocessor communicates the fact that an error
occurred to the CPU by a signal at the ERROR# pin. The CPU causes interrupt
16 the next time it checks the ERROR# pin, which is only at the beginning of
a subsequent WAIT or certain ESC instructions. If the exception is masked,
the numerics coprocessor handles the exception according to on-board logic;
it does not assert the ERROR# pin in this case.

11.2 General Multiprocessing

The components of the general multiprocessing interface include:

þ The LOCK# signal

þ The LOCK instruction prefix, which gives programmed control of the
LOCK# signal.

þ Automatic assertion of the LOCK# signal with implicit memory updates
by the processor

11.2.1 LOCK and the LOCK# Signal

The LOCK instruction prefix and its corresponding output signal LOCK# can
be used to prevent other bus masters from interrupting a data movement
operation. LOCK may only be used with the following 80386 instructions when
they modify memory. An undefined-opcode exception results from using LOCK
before any instruction other than:

þ Bit test and change: BTS, BTR, BTC.
þ Exchange: XCHG.
þ Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR.
þ One-operand arithmetic and logical: INC, DEC, NOT, and NEG.

A locked instruction is only guaranteed to lock the area of memory defined
by the destination operand, but it may lock a larger memory area. For
example, typical 8086 and 80286 configurations lock the entire physical
memory space. The area of memory defined by the destination operand is
guaranteed to be locked against access by a processor executing a locked
instruction on exactly the same memory area, i.e., an operand with
identical starting address and identical length.

The integrity of the lock is not affected by the alignment of the memory
field. The LOCK signal is asserted for as many bus cycles as necessary to
update the entire operand.

11.2.2 Automatic Locking

In several instances, the processor itself initiates activity on the data
bus. To help ensure that such activities function correctly in
multiprocessor configurations, the processor automatically asserts the LOCK#
signal. These instances include:

þ Acknowledging interrupts.

After an interrupt request, the interrupt controller uses the data bus
to send the interrupt ID of the interrupt source to the CPU. The CPU
asserts LOCK# to ensure that no other data appears on the data bus
during this time.

þ Setting busy bit of TSS descriptor.

The processor tests and sets the busy-bit in the type field of the TSS
descriptor when switching to a task. To ensure that two different
processors cannot simultaneously switch to the same task, the processor
asserts LOCK# while testing and setting this bit.

þ Loading of descriptors.

While copying the contents of a descriptor from a descriptor table into
a segment register, the processor asserts LOCK# so that the descriptor
cannot be modified by another processor while it is being loaded. For
this action to be effective, operating-system procedures that update
descriptors should adhere to the following steps:

ÄÄ Use a locked update to the access-rights byte to mark the
descriptor not-present.

ÄÄ Update the fields of the descriptor. (This may require several
memory accesses; therefore, LOCK cannot be used.)

ÄÄ Use a locked update to the access-rights byte to mark the
descriptor present again.

þ Updating page-table A and D bits.

The processor exerts LOCK# while updating the A (accessed) and D
(dirty) bits of page-table entries. Also the processor bypasses the
page-table cache and directly updates these bits in memory.

þ Executing XCHG instruction.

The 80386 always asserts LOCK during an XCHG instruction that
references memory (even if the LOCK prefix is not used).

11.2.3 Cache Considerations

Systems programmers must take care when updating shared data that may also
be stored in on-chip registers and caches. With the 80386, such shared
data includes:

þ Descriptors, which may be held in segment registers.

A change to a descriptor that is shared among processors should be
broadcast to all processors. Segment registers are effectively
"descriptor caches". A change to a descriptor will not be utilized by
another processor if that processor already has a copy of the old
version of the descriptor in a segment register.

þ Page tables, which may be held in the page-table cache.

A change to a page table that is shared among processors should be
broadcast to all processors, so that others can flush their page-table
caches and reload them with up-to-date page tables from memory.

Systems designers can employ an interprocessor interrupt to handle the
above cases. When one processor changes data that may be cached by other
processors, it can send an interrupt signal to all other processors that may
be affected by the change. If the interrupt is serviced by an interrupt
task, the task switch automatically flushes the segment registers. The task
switch also flushes the page-table cache if the PDBR (the contents of CR3)
of the interrupt task is different from the PDBR of every other task.

In multiprocessor systems that need a cacheability signal from the CPU, it
is recommended that physical address pin A31 be used to indicate
cacheability. Such a system can then possess up to 2 Gbytes of physical
memory. The virtual address range available to the programmer is not
affected by this convention.

Chapter 12 Debugging

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

The 80386 brings to Intel's line of microprocessors significant advances in
debugging power. The single-step exception and breakpoint exception of
previous processors are still available in the 80386, but the principal
debugging support takes the form of debug registers. The debug registers
support both instruction breakpoints and data breakpoints. Data breakpoints
are an important innovation that can save hours of debugging time by
pinpointing, for example, exactly when a data structure is being
overwritten. The breakpoint registers also eliminate the complexities
associated with writing a breakpoint instruction into a code segment
(requires a data-segment alias for the code segment) or a code segment
shared by multiple tasks (the breakpoint exception can occur in the context
of any of the tasks). Breakpoints can even be set in code contained in ROM.

12.1 Debugging Features of the Architecture

The features of the 80386 architecture that support debugging include:

Reserved debug interrupt vector

Permits processor to automatically invoke a debugger task or procedure when
an event occurs that is of interest to the debugger.

Four debug address registers

Permit programmers to specify up to four addresses that the CPU will
automatically monitor.

Debug control register

Allows programmers to selectively enable various debug conditions
associated with the four debug addresses.

Debug status register

Helps debugger identify condition that caused debug exception.

Trap bit of TSS (T-bit)

Permits monitoring of task switches.

Resume flag (RF) of flags register

Allows an instruction to be restarted after a debug exception without
immediately causing another debug exception due to the same condition.

Single-step flag (TF)

Allows complete monitoring of program flow by specifying whether the CPU
should cause a debug exception with the execution of every instruction.

Breakpoint instruction

Permits debugger intervention at any point in program execution and aids
debugging of debugger programs.

Reserved interrupt vector for breakpoint exception

Permits processor to automatically invoke a handler task or procedure upon
encountering a breakpoint instruction.

These features make it possible to invoke a debugger that is either a
separate task or a procedure in the context of the current task. The
debugger can be invoked under any of the following kinds of conditions:

þ Task switch to a specific task.
þ Execution of the breakpoint instruction.
þ Execution of every instruction.
þ Execution of any instruction at a given address.
þ Read or write of a byte, word, or doubleword at any specified address.
þ Write to a byte, word, or doubleword at any specified address.
þ Attempt to change a debug register.

12.2 Debug Registers

Six 80386 registers are used to control debug features. These registers are
accessed by variants of the MOV instruction. A debug register may be either
the source operand or destination operand. The debug registers are
privileged resources; the MOV instructions that access them can only be
executed at privilege level zero. An attempt to read or write the debug
registers when executing at any other privilege level causes a general
protection exception. Figure 12-1 shows the format of the debug registers.

Figure 12-1. Debug Registers

31 23 15 7 0
ÉÍÍÍÑÍÍÍÑÍÍÍÑÍÍÍØÍÍÍÑÍÍÍÑÍÍÍÑÍÍÍØÍÍÍÑÍÑÍÍÍÍÍÑÍÑÍØÍÑÍÑÍÑÍÑÍÑÍÑÍÑÍ»
ºLEN³R/W³LEN³R/W³LEN³R/W³LEN³R/W³ ³ ³ ³G³L³G³L³G³L³G³L³G³Lº
º ³ ³ ³ ³ ³ ³ ³ ³0 0³0³0 0 0³ ³ ³ ³ ³ ³ ³ ³ ³ ³ º DR7
º 3 ³ 3 ³ 2 ³ 2 ³ 1 ³ 1 ³ 0 ³ 0 ³ ³ ³ ³E³E³3³3³2³2³1³1³0³0º
ÇÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÅÄÂÄÅÄÅÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÁÄÅÄÅÄÅÄÅÄ¶
º ³B³B³B³ ³B³B³B³Bº
º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0³ ³ ³ ³0 0 0 0 0 0 0 0 0³ ³ ³ ³ º DR6
º ³T³S³D³ ³3³2³1³0º
ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÁÄÁÄÁÄÁÄ¶
º º
º RESERVED º DR5
º º
ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¶
º º
º RESERVED º DR4
º º
ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¶
º º
º BREAKPOINT 3 LINEAR ADDRESS º DR3
º º
ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¶
º º
º BREAKPOINT 2 LINEAR ADDRESS º DR2
º º
ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¶
º º
º BREAKPOINT 1 LINEAR ADDRESS º DR1
º º
ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¶
º º
º BREAKPOINT 0 LINEAR ADDRESS º DR0
º º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
NOTE
0 MEANS INTEL RESERVED. DO NOT DEFINE.
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

12.2.1 Debug Address Registers (DR0-DR3)

Each of these registers contains the linear address associated with one of
four breakpoint conditions. Each breakpoint condition is further defined by
bits in DR7.

The debug address registers are effective whether or not paging is enabled.
The addresses in these registers are linear addresses. If paging is enabled,
the linear addresses are translated into physical addresses by the
processor's paging mechanism (as explained in Chapter 5). If paging is not
enabled, these linear addresses are the same as physical addresses.

Note that when paging is enabled, different tasks may have different
linear-to-physical address mappings. When this is the case, an address in a
debug address register may be relevant to one task but not to another. For
this reason the 80386 has both global and local enable bits in DR7. These
bits indicate whether a given debug address has a global (all tasks) or
local (current task only) relevance.

12.2.2 Debug Control Register (DR7)

The debug control register shown in Figure 12-1 both helps to define the
debug conditions and selectively enables and disables those conditions.

For each address in registers DR0-DR3, the corresponding fields R/W0
through R/W3 specify the type of action that should cause a breakpoint. The
processor interprets these bits as follows:

00 ÄÄ Break on instruction execution only
01 ÄÄ Break on data writes only
10 ÄÄ undefined
11 ÄÄ Break on data reads or writes but not instruction fetches

Fields LEN0 through LEN3 specify the length of data item to be monitored. A
length of 1, 2, or 4 bytes may be specified. The values of the length fields
are interpreted as follows:

00 ÄÄ one-byte length
01 ÄÄ two-byte length
10 ÄÄ undefined
11 ÄÄ four-byte length

If RWn is 00 (instruction execution), then LENn should also be 00. Any other
length is undefined.

The low-order eight bits of DR7 (L0 through L3 and G0 through G3)
selectively enable the four address breakpoint conditions. There are two
levels of enabling: the local (L0 through L3) and global (G0 through G3)
levels. The local enable bits are automatically reset by the processor at
every task switch to avoid unwanted breakpoint conditions in the new task.
The global enable bits are not reset by a task switch; therefore, they can
be used for conditions that are global to all tasks.

The LE and GE bits control the "exact data breakpoint match" feature of the
processor. If either LE or GE is set, the processor slows execution so that
data breakpoints are reported on the instruction that causes them. It is
recommended that one of these bits be set whenever data breakpoints are
armed. The processor clears LE at a task switch but does not clear GE.

12.2.3 Debug Status Register (DR6)

The debug status register shown in Figure 12-1 permits the debugger to
determine which debug conditions have occurred.

When the processor detects an enabled debug exception, it sets the
low-order bits of this register (B0 thru B3) before entering the debug
exception handler. Bn is set if the condition described by DRn, LENn, and
R/Wn occurs. (Note that the processor sets Bn regardless of whether Gn or
Ln is set. If more than one breakpoint condition occurs at one time and if
the breakpoint trap occurs due to an enabled condition other than n, Bn may
be set, even though neither Gn nor Ln is set.)

The BT bit is associated with the T-bit (debug trap bit) of the TSS (refer
to 7 for the location of the T-bit). The processor sets the BT bit before
entering the debug handler if a task switch has occurred and the T-bit of
the new TSS is set. There is no corresponding bit in DR7 that enables and
disables this trap; the T-bit of the TSS is the sole enabling bit.

The BS bit is associated with the TF (trap flag) bit of the EFLAGS
register. The BS bit is set if the debug handler is entered due to the
occurrence of a single-step exception. The single-step trap is the
highest-priority debug exception; therefore, when BS is set, any of the
other debug status bits may also be set.

The BD bit is set if the next instruction will read or write one of the
eight debug registers and ICE-386 is also using the debug registers at the
same time.

Note that the bits of DR6 are never cleared by the processor. To avoid any
confusion in identifying the next debug exception, the debug handler should
move zeros to DR6 immediately before returning.

12.2.4 Breakpoint Field Recognition

The linear address and LEN field for each of the four breakpoint conditions
define a range of sequential byte addresses for a data breakpoint. The LEN
field permits specification of a one-, two-, or four-byte field. Two-byte
fields must be aligned on word boundaries (addresses that are multiples of
two) and four-byte fields must be aligned on doubleword boundaries
(addresses that are multiples of four). These requirements are enforced by
the processor; it uses the LEN bits to mask the low-order bits of the
addresses in the debug address registers. Improperly aligned code or data
breakpoint addresses will not yield the expected results.

A data read or write breakpoint is triggered if any of the bytes
participating in a memory access is within the field defined by a breakpoint
address register and the corresponding LEN field. Table 12-1 gives some
examples of breakpoint fields with memory references that both do and do not
cause traps.

To set a data breakpoint for a misaligned field longer than one byte, it
may be desirable to put two sets of entries in the breakpoint register such
that each entry is properly aligned and the two entries together span the
length of the field.

Instruction breakpoint addresses must have a length specification of one
byte (LEN = 00); other values are undefined. The processor recognizes an
instruction breakpoint address only when it points to the first byte of an
instruction. If the instruction has any prefixes, the breakpoint address
must point to the first prefix.

Table 12-1. Breakpoint Field Recognition Examples

Address (hex) Length

DR0 0A0001 1 (LEN0 = 00)
Register Contents DR1 0A0002 1 (LEN1 = 00)
DR2 0B0002 2 (LEN2 = 01)
DR3 0C0000 4 (LEN3 = 11)

Some Examples of Memory 0A0001 1
References That Cause Traps 0A0002 1
0A0001 2
0A0002 2
0B0002 2
0B0001 4
0C0000 4
0C0001 2
0C0003 1

Some Examples of Memory 0A0000 1
References That Don't Cause Traps 0A0003 4
0B0000 2
0C0004 4

12.3 Debug Exceptions

Two of the interrupt vectors of the 80386 are reserved for exceptions that
relate to debugging. Interrupt 1 is the primary means of invoking debuggers
designed expressly for the 80386; interrupt 3 is intended for debugging
debuggers and for compatibility with prior processors in Intel's 8086
processor family.

12.3.1 Interrupt 1 ÄÄ Debug Exceptions

The handler for this exception is usually a debugger or part of a debugging
system. The processor causes interrupt 1 for any of several conditions. The
debugger can check flags in DR6 and DR7 to determine what condition caused
the exception and what other conditions might be in effect at the same time.
Table 12-2 associates with each breakpoint condition the combination of
bits that indicate when that condition has caused the debug exception.

Instruction address breakpoint conditions are faults, while other debug
conditions are traps. The debug exception may report either or both at one
time. The following paragraphs present details for each class of debug
exception.

Table 12-2. Debug Exception Conditions

Flags to Test Condition

BS=1 Single-step trap
B0=1 AND (GE0=1 OR LE0=1) Breakpoint DR0, LEN0, R/W0
B1=1 AND (GE1=1 OR LE1=1) Breakpoint DR1, LEN1, R/W1
B2=1 AND (GE2=1 OR LE2=1) Breakpoint DR2, LEN2, R/W2
B3=1 AND (GE3=1 OR LE3=1) Breakpoint DR3, LEN3, R/W3
BD=1 Debug registers not available; in use by ICE-386.
BT=1 Task switch

12.3.1.1 Instruction Addrees Breakpoint

The processor reports an instruction-address breakpoint before it executes
the instruction that begins at the given address; i.e., an instruction-
address breakpoint exception is a fault.

The RF (restart flag) permits the debug handler to retry instructions that
cause other kinds of faults in addition to debug faults. When it detects a
fault, the processor automatically sets RF in the flags image that it pushes
onto the stack. (It does not, however, set RF for traps and aborts.)

When RF is set, it causes any debug fault to be ignored during the next
instruction. (Note, however, that RF does not cause breakpoint traps to be
ignored, nor other kinds of faults.)

The processor automatically clears RF at the successful completion of every
instruction except after the IRET instruction, after the POPF instruction,
and after a JMP, CALL, or INT instruction that causes a task switch. These
instructions set RF to the value specified by the memory image of the EFLAGS
register.

The processor automatically sets RF in the EFLAGS image on the stack before
entry into any fault handler. Upon entry into the fault handler for
instruction address breakpoints, for example, RF is set in the EFLAGS image
on the stack; therefore, the IRET instruction at the end of the handler will
set RF in the EFLAGS register, and execution will resume at the breakpoint
address without generating another breakpoint fault at the same address.

If, after a debug fault, RF is set and the debug handler retries the
faulting instruction, it is possible that retrying the instruction will
raise other faults. The retry of the instruction after these faults will
also be done with RF=1, with the result that debug faults continue to be
ignored. The processor clears RF only after successful completion of the
instruction.

Real-mode debuggers can control the RF flag by using a 32-bit IRET. A
16-bit IRET instruction does not affect the RF bit (which is in the
high-order 16 bits of EFLAGS). To use a 32-bit IRET, the debugger must
rearrange the stack so that it holds appropriate values for the 32-bit EIP,
CS, and EFLAGS (with RF set in the EFLAGS image). Then executing an IRET
with an operand size prefix causes a 32-bit return, popping the RF flag
into EFLAGS.

12.3.1.2 Data Address Breakpoint

A data-address breakpoint exception is a trap; i.e., the processor reports
a data-address breakpoint after executing the instruction that accesses the
given memory item.

When using data breakpoints it is recommended that either the LE or GE bit
of DR7 be set also. If either LE or GE is set, any data breakpoint trap is
reported exactly after completion of the instruction that accessed the
specified memory item. This exact reporting is accomplished by forcing the
80386 execution unit to wait for completion of data operand transfers before
beginning execution of the next instruction. If neither GE nor LE is set,
data breakpoints may not be reported until one instruction after the data is
accessed or may not be reported at all. This is due to the fact that,
normally, instruction execution is overlapped with memory transfers to such
a degree that execution of the next instruction may begin before memory
transfers for the prior instruction are completed.

If a debugger needs to preserve the contents of a write breakpoint
location, it should save the original contents before setting a write
breakpoint. Because data breakpoints are traps, a write into a breakpoint
location will complete before the trap condition is reported. The handler
can report the saved value after the breakpoint is triggered. The data in
the debug registers can be used to address the new value stored by the
instruction that triggered the breakpoint.

12.3.1.3 General Detect Fault

This exception occurs when an attempt is made to use the debug registers at
the same time that ICE-386 is using them. This additional protection feature
is provided to guarantee that ICE-386 can have full control over the
debug-register resources when required. ICE-386 uses the debug-registers;
therefore, a software debugger that also uses these registers cannot run
while ICE-386 is in use. The exception handler can detect this condition by
examining the BD bit of DR6.

12.3.1.4 Single-Step Trap

This debug condition occurs at the end of an instruction if the trap flag
(TF) of the flags register held the value one at the beginning of that
instruction. Note that the exception does not occur at the end of an
instruction that sets TF. For example, if POPF is used to set TF, a
single-step trap does not occur until after the instruction that follows
POPF.

The processor clears the TF bit before invoking the handler. If TF=1 in
the flags image of a TSS at the time of a task switch, the exception occurs
after the first instruction is executed in the new task.

The single-step flag is normally not cleared by privilege changes inside a
task. INT instructions, however, do clear TF. Therefore, software
debuggers that single-step code must recognize and emulate INT n or INTO
rather than executing them directly.

To maintain protection, system software should check the current execution
privilege level after any single step interrupt to see whether single
stepping should continue at the current privilege level.

The interrupt priorities in hardware guarantee that if an external
interrupt occurs, single stepping stops. When both an external interrupt and
a single step interrupt occur together, the single step interrupt is
processed first. This clears the TF bit. After saving the return address or
switching tasks, the external interrupt input is examined before the first
instruction of the single step handler executes. If the external interrupt
is still pending, it is then serviced. The external interrupt handler is not
single-stepped. To single step an interrupt handler, just single step an INT
n instruction that refers to the interrupt handler.

12.3.1.5 Task Switch Breakpoint

The debug exception also occurs after a switch to an 80386 task if the
T-bit of the new TSS is set. The exception occurs after control has passed
to the new task, but before the first instruction of that task is executed.
The exception handler can detect this condition by examining the BT bit of
the debug status register DR6.

Note that if the debug exception handler is a task, the T-bit of its TSS
should not be set. Failure to observe this rule will cause the processor to
enter an infinite loop.

12.3.2 Interrupt 3 ÄÄ Breakpoint Exception

This exception is caused by execution of the breakpoint instruction INT 3.
Typically, a debugger prepares a breakpoint by substituting the opcode of
the one-byte breakpoint instruction in place of the first opcode byte of the
instruction to be trapped. When execution of the INT 3 instruction causes
the exception handler to be invoked, the saved value of ES:EIP points to the
byte following the INT 3 instruction.

With prior generations of processors, this feature is used extensively for
trapping execution of specific instructions. With the 80386, the needs
formerly filled by this feature are more conveniently solved via the debug
registers and interrupt 1. However, the breakpoint exception is still
useful for debugging debuggers, because the breakpoint exception can vector
to a different exception handler than that used by the debugger. The
breakpoint exception can also be useful when it is necessary to set a
greater number of breakpoints than permitted by the debug registers.

PART III COMPATIBILITY

Chapter 13 Executing 80286 Protected-Mode Code

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

13.1 80286 Code Executes as a Subset of the 80386

In general, programs designed for execution in protected mode on an 80286
execute without modification on the 80386, because the features of the 80286
are a subset of those of the 80386.

All the descriptors used by the 80286 are supported by the 80386 as long as
the Intel-reserved word (last word) of the 80286 descriptor is zero.

The descriptors for data segments, executable segments, local descriptor
tables, and task gates are common to both the 80286 and the 80386. Other
80286 descriptorsÄÄTSS segment, call gate, interrupt gate, and trap
gateÄÄare supported by the 80386. The 80386 also has new versions of
descriptors for TSS segment, call gate, interrupt gate, and trap gate that
support the 32-bit nature of the 80386. Both sets of descriptors can be
used simultaneously in the same system.

For those descriptors that are common to both the 80286 and the 80386, the
presence of zeros in the final word causes the 80386 to interpret these
descriptors exactly as 80286 does; for example:

Base Address The high-order eight bits of the 32-bit base address are
zero, limiting base addresses to 24 bits.

Limit The high-order four bits of the limit field are zero,
restricting the value of the limit field to 64K.

Granularity bit The granularity bit is zero, which implies that the value
of the 16-bit limit is interpreted in units of one byte.

B-bit In a data-segment descriptor, the B-bit is zero, implying
that the segment is no larger than 64 Kbytes.

D-bit In an executable-segment descriptor, the D-bit is zero,
implying that 16-bit addressing and operands are the
default.

For formats of these descriptors and documentation of their use refer to
the iAPX 286 Programmer's Reference Manual.

13.2 Two ways to Execute 80286 Tasks

When porting 80286 programs to the 80386, there are two cases to consider:

1. Porting an entire 80286 system to the 80386, complete with 80286
operating system, loader, and system builder.

In this case, all tasks will have 80286 TSSs. The 80386 is being used
as a faster 286.

2. Porting selected 80286 applications to run in an 80386 environment
with an 80386 operating system, loader, and system builder.

In this case, the TSSs used to represent 80286 tasks should be
changed to 80386 TSSs. It is theoretically possible to mix 80286 and
80386 TSSs, but the benefits are slight and the problems are great. It
is recommended that all tasks in a 80386 software system have 80386
TSSs. It is not necessary to change the 80286 object modules
themselves; TSSs are usually constructed by the operating system, by
the loader, or by the system builder. Refer to Chapter 16 for further
discussion of the interface between 16-bit and 32-bit code.

13.3 Differences From 80286

The few differences that do exist primarily affect operating system code.

13.3.1 Wraparound of 80286 24-Bit Physical Address Space

With the 80286, any base and offset combination that addresses beyond 16M
bytes wraps around to the first megabyte of the 80286 address space. With
the 80386, since it has a greater physical address space, any such address
falls into the 17th megabyte. In the unlikely event that any software
depends on this anomaly, the same effect can be simulated on the 80386 by
using paging to map the first 64K bytes of the 17th megabyte of logical
addresses to physical addresses in the first megabyte.

13.3.2 Reserved Word of Descriptor

Because the 80386 uses the contents of the reserved word (last word) of
every descriptor, 80286 programs that place values in this word may not
execute correctly on the 80386.

13.3.3 New Descriptor Type Codes

Operating-system code that manages space in descriptor tables often uses an
invalid value in the access-rights field of descriptor-table entries to
identify unused entries. Access rights values of 80H and 00H remain invalid
for both the 80286 and 80386. Other values that were invalid on for the
80286 may be valid for the 80386 because of the additional descriptor types
defined by the 80386.

13.3.4 Restricted Semantics of LOCK

The 80286 processor implements the bus lock function differently than the
80386. Programs that use forms of memory locking specific to the 80286 may
not execute properly when transported to a specific application of the
80386.

The LOCK prefix and its corresponding output signal should only be used to
prevent other bus masters from interrupting a data movement operation. LOCK
may only be used with the following 80386 instructions when they modify
memory. An undefined-opcode exception results from using LOCK before any
other instruction.

þ Bit test and change: BTS, BTR, BTC.
þ Exchange: XCHG.
þ One-operand arithmetic and logical: INC, DEC, NOT, and NEG.
þ Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR.

A locked instruction is guaranteed to lock only the area of memory defined
by the destination operand, but may lock a larger memory area. For example,
typical 8086 and 80286 configurations lock the entire physical memory space.
With the 80386, the defined area of memory is guaranteed to be locked
against access by a processor executing a locked instruction on exactly the
same memory area, i.e., an operand with identical starting address and
identical length.

13.3.5 Additional Exceptions

The 80386 defines new exceptions that can occur even in systems designed
for the 80286.

þ Exception #6 ÄÄ invalid opcode

This exception can result from improper use of the LOCK instruction.

þ Exception #14 ÄÄ page fault

This exception may occur in an 80286 program if the operating system
enables paging. Paging can be used in a system with 80286 tasks as long
as all tasks use the same page directory. Because there is no place in
an 80286 TSS to store the PDBR, switching to an 80286 task does not
change the value of PDBR. Tasks ported from the 80286 should be given
80386 TSSs so they can take full advantage of paging.

Chapter 14 80386 Real-Address Mode

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

The real-address mode of the 80386 executes object code designed for
execution on 8086, 8088, 80186, or 80188 processors, or for execution in the
real-address mode of an 80286:

In effect, the architecture of the 80386 in this mode is almost identical
to that of the 8086, 8088, 80186, and 80188. To a programmer, an 80386 in
real-address mode appears as a high-speed 8086 with extensions to the
instruction set and registers. The principal features of this architecture
are defined in Chapters 2 and 3.

This chapter discusses certain additional topics that complete the system
programmer's view of the 80386 in real-address mode:

þ Address formation.
þ Extensions to registers and instructions.
þ Interrupt and exception handling.
þ Entering and leaving real-address mode.
þ Real-address-mode exceptions.
þ Differences from 8086.
þ Differences from 80286 real-address mode.

14.1 Physical Address Formation

The 80386 provides a one Mbyte + 64 Kbyte memory space for an 8086 program.
Segment relocation is performed as in the 8086: the 16-bit value in a
segment selector is shifted left by four bits to form the base address of a
segment. The effective address is extended with four high order zeros and
added to the base to form a linear address as Figure 14-1 illustrates. (The
linear address is equivalent to the physical address, because paging is not
used in real-address mode.) Unlike the 8086, the resulting linear address
may have up to 21 significant bits. There is a possibility of a carry when
the base address is added to the effective address. On the 8086, the carried
bit is truncated, whereas on the 80386 the carried bit is stored in bit
position 20 of the linear address.

Unlike the 8086 and 80286, 32-bit effective addresses can be generated (via
the address-size prefix); however, the value of a 32-bit address may not
exceed 65535 without causing an exception. For full compatibility with 80286
real-address mode, pseudo-protection faults (interrupt 12 or 13 with no
error code) occur if an effective address is generated outside the range 0
through 65535.

Figure 14-1. Real-Address Mode Address Formation

19 3 0
ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍ»
BASE º 16-BIT SEGMENT SELECTOR ³ 0 0 0 0 º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍ¼

+
19 15 0
ÉÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
OFFSET º 0 0 0 0 ³ 16-BIT EFFECTIVE ADDRESS º
ÈÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼

=
20 0
LINEAR ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
ADDRESS º X X X X X X X X X X X X X X X X X X X X X X º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼

14.2 Registers and Instructions

The register set available in real-address mode includes all the registers
defined for the 8086 plus the new registers introduced by the 80386: FS, GS,
debug registers, control registers, and test registers. New instructions
that explicitly operate on the segment registers FS and GS are available,
and the new segment-override prefixes can be used to cause instructions to
utilize FS and GS for address calculations. Instructions can utilize 32-bit
operands through the use of the operand size prefix.

The instruction codes that cause undefined opcode traps (interrupt 6)
include instructions of the protected mode that manipulate or interrogate
80386 selectors and descriptors; namely, VERR, VERW, LAR, LSL, LTR, STR,
LLDT, and SLDT. Programs executing in real-address mode are able to take
advantage of the new applications-oriented instructions added to the
architecture by the introduction of the 80186/80188, 80286 and 80386:

þ New instructions introduced by 80186/80188 and 80286.

ÄÄ PUSH immediate data
ÄÄ Push all and pop all (PUSHA and POPA)
ÄÄ Multiply immediate data
ÄÄ Shift and rotate by immediate count
ÄÄ String I/O
ÄÄ ENTER and LEAVE
ÄÄ BOUND

þ New instructions introduced by 80386.

ÄÄ LSS, LFS, LGS instructions
ÄÄ Long-displacement conditional jumps
ÄÄ Single-bit instructions
ÄÄ Bit scan
ÄÄ Double-shift instructions
ÄÄ Byte set on condition
ÄÄ Move with sign/zero extension
ÄÄ Generalized multiply
ÄÄ MOV to and from control registers
ÄÄ MOV to and from test registers
ÄÄ MOV to and from debug registers

14.3 Interrupt and Exception Handling

Interrupts and exceptions in 80386 real-address mode work as much as they
do on an 8086. Interrupts and exceptions vector to interrupt procedures via
an interrupt table. The processor multiplies the interrupt or exception
identifier by four to obtain an index into the interrupt table. The entries
of the interrupt table are far pointers to the entry points of interrupt or
exception handler procedures. When an interrupt occurs, the processor
pushes the current values of CS:IP onto the stack, disables interrupts,
clears TF (the single-step flag), then transfers control to the location
specified in the interrupt table. An IRET instruction at the end of the
handler procedure reverses these steps before returning control to the
interrupted procedure.

The primary difference in the interrupt handling of the 80386 compared to
the 8086 is that the location and size of the interrupt table depend on the
contents of the IDTR (IDT register). Ordinarily, this fact is not apparent
to programmers, because, after RESET, the IDTR contains a base address of 0
and a limit of 3FFH, which is compatible with the 8086. However, the LIDT
instruction can be used in real-address mode to change the base and limit
values in the IDTR. Refer to Chapter 9 for details on the IDTR, and the
LIDT and SIDT instructions. If an interrupt occurs and the corresponding
entry of the interrupt table is beyond the limit stored in the IDTR, the
processor raises exception 8.

14.4 Entering and Leaving Real-Address Mode

Real-address mode is in effect after a signal on the RESET pin. Even if the
system is going to be used in protected mode, the start-up program will
execute in real-address mode temporarily while initializing for protected
mode.

14.4.1 Switching to Protected Mode

The only way to leave real-address mode is to switch to protected mode. The
processor enters protected mode when a MOV to CR0 instruction sets the PE
(protection enable) bit in CR0. (For compatibility with the 80286, the LMSW
instruction may also be used to set the PE bit.)

Refer to Chapter 10 "Initialization" for other aspects of switching to
protected mode.

14.5 Switching Back to Real-Address Mode

The processor reenters real-address mode if software clears the PE bit in
CR0 with a MOV to CR0 instruction. A procedure that attempts to do this,
however, should proceed as follows:

1. If paging is enabled, perform the following sequence:

þ Transfer control to linear addresses that have an identity mapping;
i.e., linear addresses equal physical addresses.

þ Clear the PG bit in CR0.

þ Move zeros to CR3 to clear out the paging cache.

2. Transfer control to a segment that has a limit of 64K (FFFFH). This
loads the CS register with the limit it needs to have in real mode.

3. Load segment registers SS, DS, ES, FS, and GS with a selector that
points to a descriptor containing the following values, which are
appropriate to real mode:

þ Limit = 64K (FFFFH)
þ Byte granular (G = 0)
þ Expand up (E = 0)
þ Writable (W = 1)
þ Present (P = 1)
þ Base = any value

4. Disable interrupts. A CLI instruction disables INTR interrupts. NMIs
can be disabled with external circuitry.

5. Clear the PE bit.

6. Jump to the real mode code to be executed using a far JMP. This
action flushes the instruction queue and puts appropriate values in
the access rights of the CS register.

7. Use the LIDT instruction to load the base and limit of the real-mode
interrupt vector table.

8. Enable interrupts.

9. Load the segment registers as needed by the real-mode code.

14.6 Real-Address Mode Exceptions

The 80386 reports some exceptions differently when executing in
real-address mode than when executing in protected mode. Table 14-1 details
the real-address-mode exceptions.

14.7 Differences From 8086

In general, the 80386 in real-address mode will correctly execute ROM-based
software designed for the 8086, 8088, 80186, and 80188. Following is a list
of the minor differences between 8086 execution on the 80386 and on an 8086.

1. Instruction clock counts.

The 80386 takes fewer clocks for most instructions than the 8086/8088.
The areas most likely to be affected are:

þ Delays required by I/O devices between I/O operations.

þ Assumed delays with 8086/8088 operating in parallel with an 8087.

2. Divide Exceptions Point to the DIV instruction.

Divide exceptions on the 80386 always leave the saved CS:IP value
pointing to the instruction that failed. On the 8086/8088, the CS:IP
value points to the next instruction.

3. Undefined 8086/8088 opcodes.

Opcodes that were not defined for the 8086/8088 will cause exception
6 or will execute one of the new instructions defined for the 80386.

4. Value written by PUSH SP.

The 80386 pushes a different value on the stack for PUSH SP than the
8086/8088. The 80386 pushes the value of SP before SP is incremented
as part of the push operation; the 8086/8088 pushes the value of SP
after it is incremented. If the value pushed is important, replace
PUSH SP instructions with the following three instructions:

PUSH BP
MOV BP, SP
XCHG BP, [BP]

This code functions as the 8086/8088 PUSH SP instruction on the 80386.

5. Shift or rotate by more than 31 bits.

The 80386 masks all shift and rotate counts to the low-order five
bits. This MOD 32 operation limits the count to a maximum of 31 bits,
thereby limiting the time that interrupt response is delayed while
the instruction is executing.

6. Redundant prefixes.

The 80386 sets a limit of 15 bytes on instruction length. The only
way to violate this limit is by putting redundant prefixes before an
instruction. Exception 13 occurs if the limit on instruction length
is violated. The 8086/8088 has no instruction length limit.

7. Operand crossing offset 0 or 65,535.

On the 8086, an attempt to access a memory operand that crosses
offset 65,535 (e.g., MOV a word to offset 65,535) or offset 0 (e.g.,
PUSH a word when SP = 1) causes the offset to wrap around modulo
65,536. The 80386 raises an exception in these casesÄÄexception 13 if
the segment is a data segment (i.e., if CS, DS, ES, FS, or GS is being
used to address the segment), exception 12 if the segment is a stack
segment (i.e., if SS is being used).

8. Sequential execution across offset 65,535.

On the 8086, if sequential execution of instructions proceeds past
offset 65,535, the processor fetches the next instruction byte from
offset 0 of the same segment. On the 80386, the processor raises
exception 13 in such a case.

9. LOCK is restricted to certain instructions.

The LOCK prefix and its corresponding output signal should only be
used to prevent other bus masters from interrupting a data movement
operation. The 80386 always asserts the LOCK signal during an XCHG
instruction with memory (even if the LOCK prefix is not used). LOCK
may only be used with the following 80386 instructions when they
update memory: BTS, BTR, BTC, XCHG, ADD, ADC, SUB, SBB, INC, DEC,
AND, OR, XOR, NOT, and NEG. An undefined-opcode exception
(interrupt 6) results from using LOCK before any other instruction.

10. Single-stepping external interrupt handlers.

The priority of the 80386 single-step exception is different from that
of the 8086/8088. The change prevents an external interrupt handler
from being single-stepped if the interrupt occurs while a program is
being single-stepped. The 80386 single-step exception has higher
priority that any external interrupt. The 80386 will still single-step
through an interrupt handler invoked by the INT instructions or by an
exception.

11. IDIV exceptions for quotients of 80H or 8000H.

The 80386 can generate the largest negative number as a quotient for
the IDIV instruction. The 8086/8088 causes exception zero instead.

12. Flags in stack.

The setting of the flags stored by PUSHF, by interrupts, and by
exceptions is different from that stored by the 8086 in bit positions
12 through 15. On the 8086 these bits are stored as ones, but in
80386 real-address mode bit 15 is always zero, and bits 14 through 12
reflect the last value loaded into them.

13. NMI interrupting NMI handlers.

After an NMI is recognized on the 80386, the NMI interrupt is masked
until an IRET instruction is executed.

14. Coprocessor errors vector to interrupt 16.

Any 80386 system with a coprocessor must use interrupt vector 16 for
the coprocessor error exception. If an 8086/8088 system uses another
vector for the 8087 interrupt, both vectors should point to the
coprocessor-error exception handler.

15. Numeric exception handlers should allow prefixes.

On the 80386, the value of CS:IP saved for coprocessor exceptions
points at any prefixes before an ESC instruction. On 8086/8088
systems, the saved CS:IP points to the ESC instruction.

16. Coprocessor does not use interrupt controller.

The coprocessor error signal to the 80386 does not pass through an
interrupt controller (an 8087 INT signal does). Some instructions in
a coprocessor error handler may need to be deleted if they deal with
the interrupt controller.

17. Six new interrupt vectors.

The 80386 adds six exceptions that arise only if the 8086 program has
a hidden bug. It is recommended that exception handlers be added that
treat these exceptions as invalid operations. This additional
software does not significantly affect the existing 8086 software
because the interrupts do not normally occur. These interrupt
identifiers should not already have been used by the 8086 software,
because they are in the range reserved by Intel. Table 14-2 describes
the new 80386 exceptions.

18. One megabyte wraparound.

The 80386 does not wrap addresses at 1 megabyte in real-address mode.
On members of the 8086 family, it possible to specify addresses
greater than one megabyte. For example, with a selector value 0FFFFH
and an offset of 0FFFFH, the effective address would be 10FFEFH (1
Mbyte + 65519). The 8086, which can form adresses only up to 20 bits
long, truncates the high-order bit, thereby "wrapping" this address
to 0FFEFH. However, the 80386, which can form addresses up to 32
bits long does not truncate such an address.

Table 14-1. 80386 Real-Address Mode Exceptions

Description Interrupt Function that Can Return Address
Number Generate the Exception Points to Faulting
Instruction
Divide error 0 DIV, IDIV YES
Debug exceptions 1 All
Some debug exceptions point to the faulting instruction, others to the
next instruction. The exception handler can determine which has occurred by
examining DR6.

Breakpoint 3 INT NO
Overflow 4 INTO NO
Bounds check 5 BOUND YES
Invalid opcode 6 Any undefined opcode or LOCK YES
used with wrong instruction
Coprocessor not available 7 ESC or WAIT YES
Interrupt table limit too small 8 INT vector is not within IDTR YES
limit
Reserved 9-12
Stack fault 12 Memory operand crosses offset YES
0 or 0FFFFH
Pseudo-protection exception 13 Memory operand crosses offset YES
0FFFFH or attempt to execute
past offset 0FFFFH or
instruction longer than 15
bytes
Reserved 14,15
Coprocessor error 16 ESC or WAIT YES
Coprocessor errors are reported on the first ESC or WAIT instruction
after the ESC instruction that caused the error.

Two-byte SW interrupt 0-255 INT n NO

Table 14-2. New 80386 Exceptions

Interrupt Function
Identifier

5 A BOUND instruction was executed with a register value outside
the limit values.

6 An undefined opcode was encountered or LOCK was used improperly
before an instruction to which it does not apply.

7 The EM bit in the MSW is set when an ESC instruction was
encountered. This exception also occurs on a WAIT instruction
if TS is set.

8 An exception or interrupt has vectored to an interrupt table
entry beyond the interrupt table limit in IDTR. This can occur
only if the LIDT instruction has changed the limit from the
default value of 3FFH, which is enough for all 256 interrupt
IDs.

12 Operand crosses extremes of stack segment, e.g., MOV operation
at offset 0FFFFH or push with SP=1 during PUSH, CALL, or INT.

13 Operand crosses extremes of a segment other than a stack
segment; or sequential instruction execution attempts to
proceed beyond offset 0FFFFH; or an instruction is longer than
15 bytes (including prefixes).

14.8 Differences From 80286 Real-Address Mode

The few differences that exist between 80386 real-address mode and 80286
real-address mode are not likely to affect any existing 80286 programs
except possibly the system initialization procedures.

14.8.1 Bus Lock

The 80286 processor implements the bus lock function differently than the
80386. Programs that use forms of memory locking specific to the 80286 may
not execute properly if transported to a specific application of the 80386.

þ Bit test and change: BTS, BTR, BTC.
þ Exchange: XCHG.
þ One-operand arithmetic and logical: INC, DEC, NOT, and NEG.
þ Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR.

A locked instruction is guaranteed to lock only the area of memory defined
by the destination operand, but may lock a larger memory area. For example,
typical 8086 and 80286 configurations lock the entire physical memory space.
With the 80386, the defined area of memory is guranteed to be locked against
access by a processor executing a locked instruction on exactly the same
memory area, i.e., an operand with identical starting address and identical
length.

14.8.2 Location of First Instruction

The starting location is 0FFFFFFF0H (sixteen bytes from end of 32-bit
address space) on the 80386 rather than 0FFFFF0H (sixteen bytes from end of
24-bit address space) as on the 80286. Many 80286 ROM initialization
programs will work correctly in this new environment. Others can be made to
work correctly with external hardware that redefines the signals on
A{31-20}.

14.8.3 Initial Values of General Registers

On the 80386, certain general registers may contain different values after
RESET than on the 80286. This should not cause compatibility problems,
because the content of 8086 registers after RESET is undefined. If
self-test is requested during the reset sequence and errors are detected in
the 80386 unit, EAX will contain a nonzero value. EDX contains the component
and revision identifier. Refer to Chapter 10 for more information.

14.8.4 MSW Initialization

The 80286 initializes the MSW register to FFF0H, but the 80386 initializes
this register to 0000H. This difference should have no effect, because the
bits that are different are undefined on the 80286. Programs that read the
value of the MSW will behave differently on the 80386 only if they depend on
the setting of the undefined, high-order bits.

Chapter 15 Virtual 8086 Mode

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

The 80386 supports execution of one or more 8086, 8088, 80186, or 80188
programs in an 80386 protected-mode environment. An 8086 program runs in
this environment as part of a V86 (virtual 8086) task. V86 tasks take
advantage of the hardware support of multitasking offered by the protected
mode. Not only can there be multiple V86 tasks, each one executing an 8086
program, but V86 tasks can be multiprogrammed with other 80386 tasks.

The purpose of a V86 task is to form a "virtual machine" with which to
execute an 8086 program. A complete virtual machine consists not only of
80386 hardware but also of systems software. Thus, the emulation of an 8086
is the result of cooperation between hardware and software:

þ The hardware provides a virtual set of registers (via the TSS), a
virtual memory space (the first megabyte of the linear address space of
the task), and directly executes all instructions that deal with these
registers and with this address space.

þ The software controls the external interfaces of the virtual machine
(I/O, interrupts, and exceptions) in a manner consistent with the
larger environment in which it executes. In the case of I/O, software
can choose either to emulate I/O instructions or to let the hardware
execute them directly without software intervention.

Software that helps implement virtual 8086 machines is called a V86
monitor.

15.1 Executing 8086 Code

The processor executes in V86 mode when the VM (virtual machine) bit in the
EFLAGS register is set. The processor tests this flag under two general
conditions:

1. When loading segment registers to know whether to use 8086-style
address formation.

2. When decoding instructions to determine which instructions are
sensitive to IOPL.

Except for these two modifications to its normal operations, the 80386 in
V86 mode operated much as in protected mode.

15.1.1 Registers and Instructions

The register set available in V86 mode includes all the registers defined
for the 8086 plus the new registers introduced by the 80386: FS, GS, debug
registers, control registers, and test registers. New instructions that
explicitly operate on the segment registers FS and GS are available, and the
new segment-override prefixes can be used to cause instructions to utilize
FS and GS for address calculations. Instructions can utilize 32-bit
operands through the use of the operand size prefix.

8086 programs running as V86 tasks are able to take advantage of the new
applications-oriented instructions added to the architecture by the
introduction of the 80186/80188, 80286 and 80386:

þ New instructions introduced by 80186/80188 and 80286.
ÄÄ PUSH immediate data
ÄÄ Push all and pop all (PUSHA and POPA)
ÄÄ Multiply immediate data
ÄÄ Shift and rotate by immediate count
ÄÄ String I/O
ÄÄ ENTER and LEAVE
ÄÄ BOUND

þ New instructions introduced by 80386.
ÄÄ LSS, LFS, LGS instructions
ÄÄ Long-displacement conditional jumps
ÄÄ Single-bit instructions
ÄÄ Bit scan
ÄÄ Double-shift instructions
ÄÄ Byte set on condition
ÄÄ Move with sign/zero extension
ÄÄ Generalized multiply

15.1.2 Linear Address Formation

In V86 mode, the 80386 processor does not interpret 8086 selectors by
referring to descriptors; instead, it forms linear addresses as an 8086
would. It shifts the selector left by four bits to form a 20-bit base
address. The effective address is extended with four high-order zeros and
added to the base address to create a linear address as Figure 15-1
illustrates.

Because of the possibility of a carry, the resulting linear address may
contain up to 21 significant bits. An 8086 program may generate linear
addresses anywhere in the range 0 to 10FFEFH (one megabyte plus
approximately 64 Kbytes) of the task's linear address space.

V86 tasks generate 32-bit linear addresses. While an 8086 program can only
utilize the low-order 21 bits of a linear address, the linear address can be
mapped via page tables to any 32-bit physical address.

Unlike the 8086 and 80286, 32-bit effective addresses can be generated (via
the address-size prefix); however, the value of a 32-bit address may not
exceed 65,535 without causing an exception. For full compatibility with
80286 real-address mode, pseudo-protection faults (interrupt 12 or 13 with
no error code) occur if an address is generated outside the range 0 through
65,535.

Figure 15-1. V86 Mode Address Formation

15.2 Structure of a V86 Task

A V86 task consists partly of the 8086 program to be executed and partly of
80386 "native mode" code that serves as the virtual-machine monitor. The
task must be represented by an 80386 TSS (not an 80286 TSS). The processor
enters V86 mode to execute the 8086 program and returns to protected mode to
execute the monitor or other 80386 tasks.

To run successfully in V86 mode, an existing 8086 program needs the
following:

þ A V86 monitor.
þ Operating-system services.

The V86 monitor is 80386 protected-mode code that executes at
privilege-level zero. The monitor consists primarily of initialization and
exception-handling procedures. As for any other 80386 program,
executable-segment descriptors for the monitor must exist in the GDT or in
the task's LDT. The linear addresses above 10FFEFH are available for the
V86 monitor, the operating system, and other systems software. The monitor
may also need data-segment descriptors so that it can examine the interrupt
vector table or other parts of the 8086 program in the first megabyte of the
address space.

In general, there are two options for implementing the 8086 operating
system:

1. The 8086 operating system may run as part of the 8086 code. This
approach is desirable for any of the following reasons:

þ The 8086 applications code modifies the operating system.

þ There is not sufficient development time to reimplement the 8086
operating system as 80386 code.

2. The 8086 operating system may be implemented or emulated in the V86
monitor. This approach is desirable for any of the following reasons:

þ Operating system functions can be more easily coordinated among
several V86 tasks.

þ The functions of the 8086 operating system can be easily emulated
by calls to the 80386 operating system.

Note that, regardless of the approach chosen for implementing the 8086
operating system, different V86 tasks may use different 8086 operating
systems.

15.2.1 Using Paging for V86 Tasks

Paging is not necessary for a single V86 task, but paging is useful or
necessary for any of the following reasons:

þ To create multiple V86 tasks. Each task must map the lower megabyte of
linear addresses to different physical locations.

þ To emulate the megabyte wrap. On members of the 8086 family, it is
possible to specify addresses larger than one megabyte. For example,
with a selector value of 0FFFFH and an offset of 0FFFFH, the effective
address would be 10FFEFH (one megabyte + 65519). The 8086, which can
form addresses only up to 20 bits long, truncates the high-order bit,
thereby "wrapping" this address to 0FFEFH. The 80386, however, which
can form addresses up to 32 bits long does not truncate such an
address. If any 8086 programs depend on this addressing anomaly, the
same effect can be achieved in a V86 task by mapping linear addresses
between 100000H and 110000H and linear addresses between 0 and 10000H
to the same physical addresses.

þ To create a virtual address space larger than the physical address
space.

þ To share 8086 OS code or ROM code that is common to several 8086
programs that are executing simultaneously.

þ To redirect or trap references to memory-mapped I/O devices.

15.2.2 Protection within a V86 Task

Because it does not refer to descriptors while executing 8086 programs, the
processor also does not utilize the protection mechanisms offered by
descriptors. To protect the systems software that runs in a V86 task from
the 8086 program, software designers may follow either of these approaches:

þ Reserve the first megabyte (plus 64 kilobytes) of each task's linear
address space for the 8086 program. An 8086 task cannot generate
addresses outside this range.

þ Use the U/S bit of page-table entries to protect the virtual-machine
monitor and other systems software in each virtual 8086 task's space.
When the processor is in V86 mode, CPL is 3. Therefore, an 8086 program
has only user privileges. If the pages of the virtual-machine monitor
have supervisor privilege, they cannot be accessed by the 8086 program.

15.3 Entering and Leaving V86 Mode

Figure 15-2 summarizes the ways that the processor can enter and leave an
8086 program. The processor can enter V86 by either of two means:

1. A task switch to an 80386 task loads the image of EFLAGS from the new
TSS. The TSS of the new task must be an 80386 TSS, not an 80286 TSS,
because the 80286 TSS does not store the high-order word of EFLAGS,
which contains the VM flag. A value of one in the VM bit of the new
EFLAGS indicates that the new task is executing 8086 instructions;
therefore, while loading the segment registers from the TSS, the
processor forms base addresses as the 8086 would.

2. An IRET from a procedure of an 80386 task loads the image of EFLAGS
from the stack. A value of one in VM in this case indicates that the
procedure to which control is being returned is an 8086 procedure. The
CPL at the time the IRET is executed must be zero, else the processor
does not change VM.

The processor leaves V86 mode when an interrupt or exception occurs. There
are two cases:

1. The interrupt or exception causes a task switch. A task switch from a
V86 task to any other task loads EFLAGS from the TSS of the new task.
If the new TSS is an 80386 TSS and the VM bit in the EFLAGS image is
zero or if the new TSS is an 80286 TSS, then the processor clears the
VM bit of EFLAGS, loads the segment registers from the new TSS using
80386-style address formation, and begins executing the instructions
of the new task according to 80386 protected-mode semantics.

2. The interrupt or exception vectors to a privilege-level zero
procedure. The processor stores the current setting of EFLAGS on the
stack, then clears the VM bit. The interrupt or exception handler,
therefore, executes as "native" 80386 protected-mode code. If an
interrupt or exception vectors to a conforming segment or to a
privilege level other than three, the processor causes a
general-protection exception; the error code is the selector of the
executable segment to which transfer was attempted.

Systems software does not manipulate the VM flag directly, but rather
manipulates the image of the EFLAGS register that is stored on the stack or
in the TSS. The V86 monitor sets the VM flag in the EFLAGS image on the
stack or in the TSS when first creating a V86 task. Exception and interrupt
handlers can examine the VM flag on the stack. If the interrupted procedure
was executing in V86 mode, the handler may need to invoke the V86 monitor.

Figure 15-2. Entering and Leaving the 8086 Program

MODE TRANSITION DIAGRAM

ÉÍÍÍÍÍÍÍÍÍÍÍ»
TASK SWITCH º INITIAL º
ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¶ ENTRY º
³ OR IRET ÈÍÍÍÍÍÍÍÍÍÍÍ¼
³

ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» INTERRUPT, EXCEPTION ÉÍÍÍÍÍÍÍÍÍÍÍÍÍ»
º 8086 PROGRAM ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄº V86 MONITOR º
º (V86 MODE) ºÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¶ (PROTECTED º
ÈÍÍÍÍÍÍÍÑÍÍÍÍÍÍ¼ IRET º MODE) º
³ ÈÍÍÍÍÍÑÍÍÍÍÍÍÍ¼
³ ³ ³
³ ³ ³ ³
³ ³ ³ ³
³ ³TASK SWITCH ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» TASK SWITCH ³
³ ÀÄÄÄÄÄÄÄÄÄÄÄº OTHER 80386 TASKS ºÄÄÄÄÄÄÄÄÄÙ ³
ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄ¶ (PROTECTED MODE) ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÙ
TASK SWITCH ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼ TASK SWITCH

15.3.1 Transitions Through Task Switches

A task switch to or from a V86 task may be due to any of three causes:

1. An interrupt that vectors to a task gate.
2. An action of the scheduler of the 80386 operating system.
3. An IRET when the NT flag is set.

In any of these cases, the processor changes the VM bit in EFLAGS according
to the image of EFLAGS in the new TSS. If the new TSS is an 80286 TSS, the
high-order word of EFLAGS is not in the TSS; the processor clears VM in this
case. The processor updates VM prior to loading the segment registers from
the images in the new TSS. The new setting of VM determines whether the
processor interprets the new segment-register images as 8086 selectors or
80386/80286 selectors.

15.3.2 Transitions Through Trap Gates and Interrupt Gates

The processor leaves V86 mode as the result of an exception or interrupt
that vectors via a trap or interrupt gate to a privilege-level zero
procedure. The exception or interrupt handler returns to the 8086 code by
executing an IRET.

Because it was designed for execution by an 8086 processor, an 8086 program
in a V86 task will have an 8086-style interrupt table starting at linear
address zero. However, the 80386 does not use this table directly. For all
exceptions and interrupts that occur in V86 mode, the processor vectors
through the IDT. The IDT entry for an interrupt or exception that occurs in
a V86 task must contain either:

þ A task gate.

þ An 80386 trap gate (type 14) or an 80386 interrupt gate (type 15),
which must point to a nonconforming, privilege-level zero, code
segment.

Interrupts and exceptions that have 80386 trap or interrupt gates in the
IDT vector to the appropriate handler procedure at privilege-level zero. The
contents of all the 8086 segment registers are stored on the PL 0 stack.
Figure 15-3 shows the format of the PL 0 stack after an exception or
interrupt that occurs while a V86 task is executing an 8086 program.

After the processor stores all the 8086 segment registers on the PL 0
stack, it loads all the segment registers with zeros before starting to
execute the handler procedure. This permits the interrupt handler to safely
save and restore the DS, ES, FS, and GS registers as 80386 selectors.
Interrupt handlers that may be invoked in the context of either a regular
task or a V86 task, can use the same prolog and epilog code for register
saving regardless of the kind of task. Restoring zeros to these registers
before execution of the IRET does not cause a trap in the interrupt handler.
Interrupt procedures that expect values in the segment registers or that
return values via segment registers have to use the register images stored
on the PL 0 stack. Interrupt handlers that need to know whether the
interrupt occurred in V86 mode can examine the VM bit in the stored EFLAGS
image.

An interrupt handler passes control to the V86 monitor if the VM bit is set
in the EFLAGS image stored on the stack and the interrupt or exception is
one that the monitor needs to handle. The V86 monitor may either:

þ Handle the interrupt completely within the V86 monitor.
þ Invoke the 8086 program's interrupt handler.

Reflecting an interrupt or exception back to the 8086 code involves the
following steps:

1. Refer to the 8086 interrupt vector to locate the appropriate handler
procedure.

2. Store the state of the 8086 program on the privilege-level three
stack.

3. Change the return link on the privilege-level zero stack to point to
the privilege-level three handler procedure.

4. Execute an IRET so as to pass control to the handler.

5. When the IRET by the privilege-level three handler again traps to the
V86 monitor, restore the return link on the privilege-level zero stack
to point to the originally interrupted, privilege-level three
procedure.

6. Execute an IRET so as to pass control back to the interrupted
procedure.

Figure 15-3. PL 0 Stack after Interrupt in V86 Task

WITHOUT ERROR CODE WITH ERROR CODE
31 0 31 0
ÉÍÍÍÍÍÍËÍÍÍÍÍÍÍ»ÄÄÄÄ¿ ÉÍÍÍÍÍÍËÍÍÍÍÍÍÍ»ÄÄÄÄ¿
º±±±±±±ºOLD GS º ³ º±±±±±±ºOLD GS º ³
ÌÍÍÍÍÍÍÎÍÍÍÍÍÍÍ¹ SS:ESP ÌÍÍÍÍÍÍÎÍÍÍÍÍÍÍ¹ SS:ESP
D O º±±±±±±ºOLD FS º FROM TSS º±±±±±±ºOLD FS º FROM TSS
I F ÌÍÍÍÍÍÍÎÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÎÍÍÍÍÍÍÍ¹
R º±±±±±±ºOLD DS º º±±±±±±ºOLD DS º
E E ÌÍÍÍÍÍÍÎÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÎÍÍÍÍÍÍÍ¹
C X º±±±±±±ºOLD ES º º±±±±±±ºOLD ES º
T P ÌÍÍÍÍÍÍÎÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÎÍÍÍÍÍÍÍ¹
I A º±±±±±±ºOLD SS º º±±±±±±ºOLD SS º
O N ÌÍÍÍÍÍÍÊÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÊÍÍÍÍÍÍÍ¹
N S º OLD ESP º º OLD ESP º
I ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
³ O º OLD EFLAGS º º OLD EFLAGS º
³ N ÌÍÍÍÍÍÍËÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍËÍÍÍÍÍÍÍ¹
³ º±±±±±±ºOLD CS º NEW º±±±±±±ºOLD CS º
ÌÍÍÍÍÍÍÊÍÍÍÍÍÍÍ¹ SS:EIP ÌÍÍÍÍÍÍÊÍÍÍÍÍÍÍ¹
º OLD EIP º ³ º OLD EIP º NEW
ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹ÄÄÄÙ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹ SS:EIP
º º º ERROR CODE º ³
ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹ÄÄÄÙ
º º

15.4 Additional Sensitive Instructions

When the 80386 is executing in V86 mode, the instructions PUSHF, POPF,
INT n, and IRET are sensitive to IOPL. The instructions IN, INS, OUT, and
OUTS, which are ordinarily sensitive in protected mode, are not sensitive
in V86 mode. Following is a complete list of instructions that are sensitive
in V86 mode:

CLI ÄÄ Clear Interrupt-Enable Flag
STI ÄÄ Set Interrupt-Enable Flag
LOCK ÄÄ Assert Bus-Lock Signal
PUSHF ÄÄ Push Flags
POPF ÄÄ Pop Flags
INT n ÄÄ Software Interrupt
RET ÄÄ Interrupt Return

CPL is always three in V86 mode; therefore, if IOPL < 3, these instructions
will trigger a general-protection exceptions. These instructions are made
sensitive so that their functions can be simulated by the V86 monitor.

15.4.1 Emulating 8086 Operating System Calls

INT n is sensitive so that the V86 monitor can intercept calls to the
8086 OS. Many 8086 operating systems are called by pushing parameters onto
the stack, then executing an INT n instruction. If IOPL < 3, INT n
instructions will be intercepted by the V86 monitor. The V86 monitor can
then emulate the function of the 8086 operating system or reflect the
interrupt back to the 8086 operating system in V86 mode.

15.4.2 Virtualizing the Interrupt-Enable Flag

When the processor is executing 8086 code in a V86 task, the instructions
PUSHF, POPF, and IRET are sensitive to IOPL so that the V86 monitor can
control changes to the interrupt-enable flag (IF). Other instructions that
affect IF (STI and CLI) are IOPL sensitive both in 8086 code and in
80386/80386 code.

Many 8086 programs that were designed to execute on single-task systems set
and clear IF to control interrupts. However, when these same programs are
executed in a multitasking environment, such control of IF can be
disruptive. If IOPL is less than three, all instructions that change or
interrogate IF will trap to the V86 monitor. The V86 monitor can then
control IF in a manner that both suits the needs of the larger environment
and is transparent to the 8086 program.

15.5 Virtual I/O

Many 8086 programs that were designed to execute on single-task systems use
I/O devices directly. However, when these same programs are executed in a
multitasking environment, such use of devices can be disruptive. The 80386
provides sufficient flexibility to control I/O in a manner that both suits
the needs of the new environment and is transparent to the 8086 program.
Designers may take any of several possible approaches to controlling I/O:

þ Implement or emulate the 8086 operating system as an 80386 program and
require the 8086 application to do I/O via software interrupts to the
operating system, trapping all attempts to do I/O directly.

þ Let the 8086 program take complete control of all I/O.

þ Selectively trap and emulate references that a task makes to specific
I/O ports.

þ Trap or redirect references to memory-mapped I/O addresses.

The method of controlling I/O depends upon whether I/O ports are I/O mapped
or memory mapped.

15.5.1 I/O-Mapped I/O

I/O-mapped I/O in V86 mode differs from protected mode only in that the
protection mechanism does not consult IOPL when executing the I/O
instructions IN, INS, OUT, OUTS. Only the I/O permission bit map controls
the right for V86 tasks to execute these I/O instructions.

The I/O permission map traps I/O instructions selectively depending on the
I/O addresses to which they refer. The I/O permission bit map of each V86
task determines which I/O addresses are trapped for that task. Because each
task may have a different I/O permission bit map, the addresses trapped for
one task may be different from those trapped for others. Refer to Chapter 8
for more information about the I/O permission map.

15.5.2 Memory-Mapped I/O

In hardware designs that utilize memory-mapped I/O, the paging facilities
of the 80386 can be used to trap or redirect I/O operations. Each task that
executes memory-mapped I/O must have a page (or pages) for the memory-mapped
address space. The V86 monitor may control memory-mapped I/O by any of
these means:

þ Assign the memory-mapped page to appropriate physical addresses.
Different tasks may have different physical addresses, thereby
preventing the tasks from interfering with each other.

þ Cause a trap to the monitor by forcing a page fault on the
memory-mapped page. Read-only pages trap writes. Not-present pages trap
both reads and writes.

Intervention for every I/O might be excessive for some kinds of I/O
devices. A page fault can still be used in this case to cause intervention
on the first I/O operation. The monitor can then at least make sure that the
task has exclusive access to the device. Then the monitor can change the
page status to present and read/write, allowing subsequent I/O to proceed at
full speed.

15.5.3 Special I/O Buffers

Buffers of intelligent controllers (for example, a bit-mapped graphics
buffer) can also be virtualized via page mapping. The linear space for the
buffer can be mapped to a different physical space for each virtual 8086
task. The V86 monitor can then assume responsibility for spooling the data
or assigning the virtual buffer to the real buffer at appropriate times.

15.6 Differences From 8086

In general, V86 mode will correctly execute software designed for the 8086,
8088, 80186, and 80188. Following is a list of the minor differences between
8086 execution on the 80386 and on an 8086.

1. Instruction clock counts.

The 80386 takes fewer clocks for most instructions than the
8086/8088. The areas most likely to be affected are:

þ Delays required by I/O devices between I/O operations.

þ Assumed delays with 8086/8088 operating in parallel with an 8087.

2. Divide exceptions point to the DIV instruction.

Divide exceptions on the 80386 always leave the saved CS:IP value
pointing to the instruction that failed. On the 8086/8088, the CS:IP
value points to the next instruction.

3. Undefined 8086/8088 opcodes.

Opcodes that were not defined for the 8086/8088 will cause exception
6 or will execute one of the new instructions defined for the 80386.

4. Value written by PUSH SP.

PUSH BP
MOV BP, SP
XCHG BP, [BP]

This code functions as the 8086/8088 PUSH SP instruction on the
80386.

5. Shift or rotate by more than 31 bits.

6. Redundant prefixes.

7. Operand crossing offset 0 or 65,535.

On the 8086, an attempt to access a memory operand that crosses
offset 65,535 (e.g., MOV a word to offset 65,535) or offset 0 (e.g.,
PUSH a word when SP = 1) causes the offset to wrap around modulo
65,536. The 80386 raises an exception in these casesÄÄexception 13 if
the segment is a data segment (i.e., if CS, DS, ES, FS, or GS is
being used to address the segment), exception 12 if the segment is a
stack segment (i.e., if SS is being used).

8. Sequential execution across offset 65,535.

9. LOCK is restricted to certain instructions.

The LOCK prefix and its corresponding output signal should only be
used to prevent other bus masters from interrupting a data movement
operation. The 80386 always asserts the LOCK signal during an XCHG
instruction with memory (even if the LOCK prefix is not used). LOCK
may only be used with the following 80386 instructions when they
update memory: BTS, BTR, BTC, XCHG, ADD, ADC, SUB, SBB, INC, DEC,
AND, OR, XOR, NOT, and NEG. An undefined-opcode exception (interrupt
6) results from using LOCK before any other instruction.

10. Single-stepping external interrupt handlers.

The priority of the 80386 single-step exception is different from
that of the 8086/8088. The change prevents an external interrupt
handler from being single-stepped if the interrupt occurs while a
program is being single-stepped. The 80386 single-step exception has
higher priority that any external interrupt. The 80386 will still
single-step through an interrupt handler invoked by the INT
instructions or by an exception.

11. IDIV exceptions for quotients of 80H or 8000H.

The 80386 can generate the largest negative number as a quotient for
the IDIV instruction. The 8086/8088 causes exception zero instead.

12. Flags in stack.

The setting of the flags stored by PUSHF, by interrupts, and by
exceptions is different from that stored by the 8086 in bit positions
12 through 15. On the 8086 these bits are stored as ones, but in V86
mode bit 15 is always zero, and bits 14 through 12 reflect the last
value loaded into them.

13. NMI interrupting NMI handlers.

After an NMI is recognized on the 80386, the NMI interrupt is masked
until an IRET instruction is executed.

14. Coprocessor errors vector to interrupt 16.

15. Numeric exception handlers should allow prefixes.

On the 80386, the value of CS:IP saved for coprocessor exceptions
points at any prefixes before an ESC instruction. On 8086/8088
systems, the saved CS:IP points to the ESC instruction itself.

16. Coprocessor does not use interrupt controller.

15.7 Differences From 80286 Real-Address Mode

The 80286 processor implements the bus lock function differently than the
80386. This fact may or may not be apparent to 8086 programs, depending on
how the V86 monitor handles the LOCK prefix. LOCKed instructions are
sensitive to IOPL; therefore, software designers can choose to emulate its
function. If, however, 8086 programs are allowed to execute LOCK directly,
programs that use forms of memory locking specific to the 8086 may not
execute properly when transported to a specific application of the 80386.

þ Bit test and change: BTS, BTR, BTC.
þ Exchange: XCHG.
þ One-operand arithmetic and logical: INC, DEC, NOT, and NEG.
þ Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR.

Chapter 16 Mixing 16-Bit and 32 Bit Code

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

The 80386 running in protected mode is a 32-bit microprocessor, but it is
designed to support 16-bit processing at three levels:

1. Executing 8086/80286 16-bit programs efficiently with complete
compatibility.

2. Mixing 16-bit modules with 32-bit modules.

3. Mixing 16-bit and 32-bit addresses and operands within one module.

The first level of support for 16-bit programs has already been discussed
in Chapter 13, Chapter 14, and Chapter 15. This chapter shows how 16-bit
and 32-bit modules can cooperate with one another, and how one module can
utilize both 16-bit and 32-bit operands and addressing.

The 80386 functions most efficiently when it is possible to distinguish
between pure 16-bit modules and pure 32-bit modules. A pure 16-bit module
has these characteristics:

þ All segments occupy 64 Kilobytes or less.
þ Data items are either 8 bits or 16 bits wide.
þ Pointers to code and data have 16-bit offsets.
þ Control is transferred only among 16-bit segments.

A pure 32-bit module has these characteristics:

þ Segments may occupy more than 64 Kilobytes (zero bytes to 4
gigabytes).

þ Data items are either 8 bits or 32 bits wide.

þ Pointers to code and data have 32-bit offsets.

þ Control is transferred only among 32-bit segments.

Pure 16-bit modules do exist; they are the modules designed for 16-bit
microprocessors. Pure 32-bit modules may exist in new programs designed
explicitly for the 80386. However, as systems designers move applications
from 16-bit processors to the 32-bit 80386, it will not always be possible
to maintain these ideals of pure 16-bit or 32-bit modules. It may be
expedient to execute old 16-bit modules in a new 32-bit environment without
making source-code changes to the old modules if any of the following
conditions is true:

þ Modules will be converted one-by-one from 16-bit environments to
32-bit environments.

þ Older, 16-bit compilers and software-development tools will be
utilized in the new32-bit operating environment until new 32-bit
versions can be created.

þ The source code of 16-bit modules is not available for modification.

þ The specific data structures used by a given module inherently utilize
16-bit words.

þ The native word size of the source language is 16 bits.

On the 80386, 16-bit modules can be mixed with 32-bit modules. To design a
system that mixes 16- and 32-bit code requires an understanding of the
mechanisms that the 80386 uses to invoke and control its 32-bit and 16-bit
features.

16.1 How the 80386 Implements 16-Bit and 32-Bit Features

The features of the architecture that permit the 80386 to work equally well
with 32-bit and 16-bit address and operand sizes include:

þ The D-bit (default bit) of code-segment descriptors, which determines
the default choice of operand-size and address-size for the
instructions of a code segment. (In real-address mode and V86 mode,
which do not use descriptors, the default is 16 bits.) A code segment
whose D-bit is set is known as a USE32 segment; a code segment whose
D-bit is zero is a USE16 segment. The D-bit eliminates the need to
encode the operand size and address size in instructions when all
instructions use operands and effective addresses of the same size.

þ Instruction prefixes that explicitly override the default choice of
operand size and address size (available in protected mode as well as
in real-address mode and V86 mode).

þ Separate 32-bit and 16-bit gates for intersegment control transfers
(including call gates, interrupt gates, and trap gates). The operand
size for the control transfer is determined by the type of gate, not by
the D-bit or prefix of the transfer instruction.

þ Registers that can be used both for 32-bit and 16-bit operands and
effective-address calculations.

þ The B-bit (big bit) of data-segment descriptors, which determines the
size of stack pointer (32-bit ESP or 16-bit SP) used by the CPU for
implicit stack references.

16.2 Mixing 32-Bit and 16-Bit Operations

The 80386 has two instruction prefixes that allow mixing of 32-bit and
16-bit operations within one segment:

þ The operand-size prefix (66H)
þ The address-size prefix (67H)

These prefixes reverse the default size selected by the D-bit. For example,
the processor can interpret the word-move instruction MOV mem, reg in any of
four ways:

þ In a USE32 segment:

1. Normally moves 32 bits from a 32-bit register to a 32-bit
effective address in memory.

2. If preceded by an operand-size prefix, moves 16 bits from a 16-bit
register to 32-bit effective address in memory.

3. If preceded by an address-size prefix, moves 32 bits from a 32-bit
register to a16-bit effective address in memory.

4. If preceded by both an address-size prefix and an operand-size
prefix, moves 16 bits from a 16-bit register to a 16-bit effective
address in memory.

þ In a USE16 segment:

1. Normally moves 16 bits from a 16-bit register to a 16-bit
effective address in memory.

2. If preceded by an operand-size prefix, moves 32 bits from a 32-bit
register to 16-bit effective address in memory.

3. If preceded by an address-size prefix, moves 16 bits from a 16-bit
register to a32-bit effective address in memory.

4. If preceded by both an address-size prefix and an operand-size
prefix, moves 32 bits from a 32-bit register to a 32-bit effective
address in memory.

These examples illustrate that any instruction can generate any combination
of operand size and address size regardless of whether the instruction is in
a USE16 or USE32 segment. The choice of the USE16 or USE32 attribute for a
code segment is based upon these criteria:

1. The need to address instructions or data in segments that are larger
than 64 Kilobytes.

2. The predominant size of operands.

3. The addressing modes desired. (Refer to Chapter 17 for an explanation
of the additional addressing modes that are available when 32-bit
addressing is used.)

Choosing a setting of the D-bit that is contrary to the predominant size of
operands requires the generation of an excessive number of operand-size
prefixes.

16.3 Sharing Data Segments Among Mixed Code Segments

Because the choice of operand size and address size is defined in code
segments and their descriptors, data segments can be shared freely among
both USE16 and USE32 code segments. The only limitation is the one imposed
by pointers with 16-bit offsets, which can only point to the first 64
Kilobytes of a segment. When a data segment that contains more than 64
Kilobytes is to be shared among USE32 and USE16 segments, the data that is
to be accessed by the USE16 segments must be located within the first 64
Kilobytes.

A stack that spans addresses less than 64K can be shared by both USE16 and
USE32 code segments. This class of stacks includes:

þ Stacks in expand-up segments with G=0 and B=0.

þ Stacks in expand-down segments with G=0 and B=0.

þ Stacks in expand-up segments with G=1 and B=0, in which the stack is
contained completely within the lower 64 Kilobytes. (Offsets greater
than 64K can be used for data, other than the stack, that is not
shared.)

The B-bit of a stack segment cannot, in general, be used to change the size
of stack used by a USE16 code segment. The size of stack pointer used by the
processor for implicit stack references is controlled by the B-bit of the
data-segment descriptor for the stack. Implicit references are those caused
by interrupts, exceptions, and instructions such as PUSH, POP, CALL, and
RET. One might be tempted, therefore, to try to increase beyond 64K the
size of the stack used by 16-bit code simply by supplying a larger stack
segment with the B-bit set. However, the B-bit does not control explicit
stack references, such as accesses to parameters or local variables. A USE16
code segment can utilize a "big" stack only if the code is modified so that
all explicit references to the stack are preceded by the address-size
prefix, causing those references to use 32-bit addressing.

In big, expand-down segments (B=1, G=1, and E=1), all offsets are greater
than 64K, therefore USE16 code cannot utilize such a stack segment unless
the code segment is modified to employ 32-bit addressing. (Refer to Chapter
6 for a review of the B, G, and E bits.)

16.4 Transferring Control Among Mixed Code Segments

When transferring control among procedures in USE16 and USE32 code
segments, programmers must be aware of three points:

þ Addressing limitations imposed by pointers with 16-bit offsets.

þ Matching of operand-size attribute in effect for the CALL/RET pair and
theInterrupt/IRET pair so as to manage the stack correctly.

þ Translation of parameters, especially pointer parameters.

Clearly, 16-bit effective addresses cannot be used to address data or code
located beyond 64K in a 32-bit segment, nor can large 32-bit parameters be
squeezed into a 16-bit word; however, except for these obvious limits, most
interfacing problems between 16-bit and 32-bit modules can be solved. Some
solutions involve inserting interface procedures between the procedures in
question.

16.4.1 Size of Code-Segment Pointer

For control-transfer instructions that use a pointer to identify the next
instruction (i.e., those that do not use gates), the size of the offset
portion of the pointer is determined by the operand-size attribute. The
implications of the use of two different sizes of code-segment pointer are:

þ JMP, CALL, or RET from 32-bit segment to 16-bit segment is always
possible using a 32-bit operand size.

þ JMP, CALL, or RET from 16-bit segment using a 16-bit operand size
cannot address the target in a 32-bit segment if the address of the
target is greater than 64K.

An interface procedure can enable transfers from USE16 segments to 32-bit
addresses beyond 64K without requiring modifications any more extensive than
relinking or rebinding the old programs. The requirements for such an
interface procedure are discussed later in this chapter.

16.4.2 Stack Management for Control Transfers

Because stack management is different for 16-bit CALL/RET than for 32-bit
CALL/RET, the operand size of RET must match that of CALL. (Refer to Figure
16-1.) A 16-bit CALL pushes the 16-bit IP and (for calls between privilege
levels) the 16-bit SP register. The corresponding RET must also use a 16-bit
operand size to POP these 16-bit values from the stack into the 16-bit
registers. A 32-bit CALL pushes the 32-bit EIP and (for interlevel calls)
the 32-bit ESP register. The corresponding RET must also use a 32-bit
operand size to POP these 32-bit values from the stack into the 32-bit
registers. If the two halves of a CALL/RET pair do not have matching operand
sizes, the stack will not be managed correctly and the values of the
instruction pointer and stack pointer will not be restored to correct
values.

When the CALL and its corresponding RET are in segments that have D-bits
with the same values (i.e., both have 32-bit defaults or both have 16-bit
defaults), there is no problem. When the CALL and its corresponding RET are
in segments that have different D-bit values, however, programmers (or
program development software) must ensure that the CALL and RET match.

There are three ways to cause a 16-bit procedure to execute a 32-bit call:

1. Use a 16-bit call to a 32-bit interface procedure that then uses a
32-bit call to invoke the intended target.

2. Bind the 16-bit call to a 32-bit call gate.

3. Modify the 16-bit procedure, inserting an operand-size prefix before
the call, thereby changing it to a 32-bit call.

Likewise, there are three ways to cause a 32-bit procedure to execute a
16-bit call:

1. Use a 32-bit call to a 32-bit interface procedure that then uses a
16-bit call to invoke the intended target.

2. Bind the 32-bit call to a 16-bit call gate.

3. Modify the 32-bit procedure, inserting an operand-size prefix before
the call, thereby changing it to a 16-bit call. (Be certain that the
return offset does not exceed 64K.)

Programmers can utilize any of the preceding methods to make a CALL in a
USE16 segment match the corresponding RET in a USE32 segment, or to make a
CALL in a USE32 segment match the corresponding RET in a USE16 segment.

Figure 16-1. Stack after Far 16-Bit and 32-Bit Calls

WITHOUT PRIVILEGE TRANSITION

AFTER 16-BIT CALL AFTER 32-BIT CALL

31 0 31 0
D O º º º º
I F ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
R º±±±±±±±±±±±±±±±º º±±±±±±±±±±±±±±±º
E E ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
C X º PARM2 ³ PARM1 º º PARM2 º
T P ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
I A º CS ³ IP ºÄÄSP º PARM1 º
O N ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
N S º º º±±±±±±±³ CS º
I ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
³ O º º º EIP ºÄÄESP
³ N ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
³ º º º º

WITH PRIVILEGE TRANSITION

AFTER 16-BIT CALL AFTER 32-BIT CALL

D O 31 0 31 0
I F ÉÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ»
R º SS ³ SP º º±±±±±±±³ SS º
E E ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
C X º PARM2 ³ PARM1 º º ESP º
T P ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
I A º CS ³ IP ºÄÄSP º PARM2 º
O N ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
N S º º º PARM1 º
I ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
³ O º º º±±±±±±±³ CS º
³ N ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
³ º º º EIP ºÄÄESP
ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ¹
º º º º

16.4.2.1 Controlling the Operand-Size for a Call

When the selector of the pointer referenced by a CALL instruction selects a
segment descriptor, the operand-size attribute in effect for the CALL
instruction is determined by the D-bit in the segment descriptor and by any
operand-size instruction prefix.

When the selector of the pointer referenced by a CALL instruction selects a
gate descriptor, the type of call is determined by the type of call gate. A
call via an 80286 call gate (descriptor type 4) always has a 16-bit
operand-size attribute; a call via an 80386 call gate (descriptor type 12)
always has a 32-bit operand-size attribute. The offset of the target
procedure is taken from the gate descriptor; therefore, even a 16-bit
procedure can call a procedure that is located more than 64 kilobytes from
the base of a 32-bit segment, because a 32-bit call gate contains a 32-bit
target offset.

An unmodified 16-bit code segment that has run successfully on an 8086 or
real-mode 80286 will always have a D-bit of zero and will not use
operand-size override prefixes; therefore, it will always execute 16-bit
versions of CALL. The only modification needed to make a16-bit procedure
effect a 32-bit call is to relink the call to an 80386 call gate.

16.4.2.2 Changing Size of Call

When adding 32-bit gates to 16-bit procedures, it is important to consider
the number of parameters. The count field of the gate descriptor specifies
the size of the parameter string to copy from the current stack to the stack
of the more privileged procedure. The count field of a 16-bit gate specifies
the number of words to be copied, whereas the count field of a 32-bit gate
specifies the number of doublewords to be copied; therefore, the 16-bit
procedure must use an even number of words as parameters.

16.4.3 Interrupt Control Transfers

With a control transfer due to an interrupt or exception, a gate is always
involved. The operand-size attribute for the interrupt is determined by the
type of IDT gate.

A 386 interrupt or trap gate (descriptor type 14 or 15) to a 32-bit
interrupt procedure can be used to interrupt either 32-bit or 16-bit
procedures. However, it is not generally feasible to permit an interrupt or
exception to invoke a 16-bit handler procedure when 32-bit code is
executing, because a 16-bit interrupt procedure has a return offset of only
16-bits on its stack. If the 32-bit procedure is executing at an address
greater than 64K, the 16-bit interrupt procedure cannot return correctly.

16.4.4 Parameter Translation

When segment offsets or pointers (which contain segment offsets) are passed
as parameters between 16-bit and 32-bit procedures, some translation is
required. Clearly, if a 32-bit procedure passes a pointer to data located
beyond 64K to a 16-bit procedure, the 16-bit procedure cannot utilize it.
Beyond this natural limitation, an interface procedure can perform any
format conversion between 32-bit and 16-bit pointers that may be needed.

Parameters passed by value between 32-bit and 16-bit code may also require
translation between 32-bit and 16-bit formats. Such translation requirements
are application dependent. Systems designers should take care to limit the
range of values passed so that such translations are possible.

16.4.5 The Interface Procedure

Interposing an interface procedure between 32-bit and 16-bit procedures can
be the solution to any of several interface requirements:

þ Allowing procedures in 16-bit segments to transfer control to
instructions located beyond 64K in 32-bit segments.

þ Matching of operand size for CALL/RET.

þ Parameter translation.

Interface procedures between USE32 and USE16 segments can be constructed
with these properties:

þ The procedures reside in a code segment whose D-bit is set, indicating
a default operand size of 32-bits.

þ All entry points that may be called by 16-bit procedures have offsets
that are actually less than 64K.

þ All points to which called 16-bit procedures may return also lie
within 64K.

The interface procedures do little more than call corresponding procedures
in other segments. There may be two kinds of procedures:

þ Those that are called by 16-bit procedures and call 32-bit procedures.
These interface procedures are called by 16-bit CALLs and use the
operand-size prefix before RET instructions to cause a 16-bit RET.
CALLs to 32-bit segments are 32-bit calls (by default, because the
D-bit is set), and the 32-bit code returns with 32-bit RET
instructions.

þ Those that are called by 32-bit procedures and call 16-bit procedures.
These interface procedures are called by 32-bit CALL instructions, and
return with 32-bit RET instructions (by default, because the D-bit is
set). CALLs to 16-bit procedures use the operand-size prefix;
procedures in the 16-bit code return with 16-bit RET instructions.

PART IV INSTRUCTION SET

Chapter 17 80386 Instruction Set

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

This chapter presents instructions for the 80386 in alphabetical order. For
each instruction, the forms are given for each operand combination,
including object code produced, operands required, execution time, and a
description. For each instruction, there is an operational description and a
summary of exceptions generated.

17.1 Operand-Size and Address-Size Attributes

When executing an instruction, the 80386 can address memory using either 16
or 32-bit addresses. Consequently, each instruction that uses memory
addresses has associated with it an address-size attribute of either 16 or
32 bits. 16-bit addresses imply both the use of a 16-bit displacement in
the instruction and the generation of a 16-bit address offset (segment
relative address) as the result of the effective address calculation.
32-bit addresses imply the use of a 32-bit displacement and the generation
of a 32-bit address offset. Similarly, an instruction that accesses words
(16 bits) or doublewords (32 bits) has an operand-size attribute of either
16 or 32 bits.

The attributes are determined by a combination of defaults, instruction
prefixes, and (for programs executing in protected mode) size-specification
bits in segment descriptors.

17.1.1 Default Segment Attribute

For programs executed in protected mode, the D-bit in executable-segment
descriptors determines the default attribute for both address size and
operand size. These default attributes apply to the execution of all
instructions in the segment. A value of zero in the D-bit sets the default
address size and operand size to 16 bits; a value of one, to 32 bits.

Programs that execute in real mode or virtual-8086 mode have 16-bit
addresses and operands by default.

17.1.2 Operand-Size and Address-Size Instruction Prefixes

The internal encoding of an instruction can include two byte-long prefixes:
the address-size prefix, 67H, and the operand-size prefix, 66H. (A later
section, "Instruction Format," shows the position of the prefixes in an
instruction's encoding.) These prefixes override the default segment
attributes for the instruction that follows. Table 17-1 shows the effect of
each possible combination of defaults and overrides.

17.1.3 Address-Size Attribute for Stack

Instructions that use the stack implicitly (for example: POP EAX also have
a stack address-size attribute of either 16 or 32 bits. Instructions with a
stack address-size attribute of 16 use the 16-bit SP stack pointer register;
instructions with a stack address-size attribute of 32 bits use the 32-bit
ESP register to form the address of the top of the stack.

The stack address-size attribute is controlled by the B-bit of the
data-segment descriptor in the SS register. A value of zero in the B-bit
selects a stack address-size attribute of 16; a value of one selects a stack
address-size attribute of 32.

Table 17-1. Effective Size Attributes

Segment Default D = ... 0 0 0 0 1 1 1 1
Operand-Size Prefix 66H N N Y Y N N Y Y
Address-Size Prefix 67H N Y N Y N Y N Y

Effective Operand Size 16 16 32 32 32 32 16 16
Effective Address Size 16 32 16 32 32 16 32 16

Y = Yes, this instruction prefix is present
N = No, this instruction prefix is not present

17.2 Instruction Format

All instruction encodings are subsets of the general instruction format
shown in Figure 17-1. Instructions consist of optional instruction
prefixes, one or two primary opcode bytes, possibly an address specifier
consisting of the ModR/M byte and the SIB (Scale Index Base) byte, a
displacement, if required, and an immediate data field, if required.

Smaller encoding fields can be defined within the primary opcode or
opcodes. These fields define the direction of the operation, the size of the
displacements, the register encoding, or sign extension; encoding fields
vary depending on the class of operation.

Most instructions that can refer to an operand in memory have an addressing
form byte following the primary opcode byte(s). This byte, called the ModR/M
byte, specifies the address form to be used. Certain encodings of the ModR/M
byte indicate a second addressing byte, the SIB (Scale Index Base) byte,
which follows the ModR/M byte and is required to fully specify the
addressing form.

Addressing forms can include a displacement immediately following either
the ModR/M or SIB byte. If a displacement is present, it can be 8-, 16- or
32-bits.

If the instruction specifies an immediate operand, the immediate operand
always follows any displacement bytes. The immediate operand, if specified,
is always the last field of the instruction.

The following are the allowable instruction prefix codes:

F3H REP prefix (used only with string instructions)
F3H REPE/REPZ prefix (used only with string instructions
F2H REPNE/REPNZ prefix (used only with string instructions)
F0H LOCK prefix

The following are the segment override prefixes:

2EH CS segment override prefix
36H SS segment override prefix
3EH DS segment override prefix
26H ES segment override prefix
64H FS segment override prefix
65H GS segment override prefix
66H Operand-size override
67H Address-size override

Figure 17-1. 80386 Instruction Format

ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
º INSTRUCTION º ADDRESS- º OPERAND- º SEGMENT º
º PREFIX º SIZE PREFIX º SIZE PREFIX º OVERRIDE º
ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
º 0 OR 1 0 OR 1 0 OR 1 0 OR 1 º
ÇÄ Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä¶
º NUMBER OF BYTES º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼

ÉÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍ»
º OPCODE º MODR/M º SIB º DISPLACEMENT º IMMEDIATE º
º º º º º º
ÌÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍ¹
º 1 OR 2 0 OR 1 0 OR 1 0,1,2 OR 4 0,1,2 OR 4 º
ÇÄ Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä¶
º NUMBER OF BYTES º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼

17.2.1 ModR/M and SIB Bytes

The ModR/M and SIB bytes follow the opcode byte(s) in many of the 80386
instructions. They contain the following information:

þ The indexing type or register number to be used in the instruction
þ The register to be used, or more information to select the instruction
þ The base, index, and scale information

The ModR/M byte contains three fields of information:

þ The mod field, which occupies the two most significant bits of the
byte, combines with the r/m field to form 32 possible values: eight
registers and 24 indexing modes

þ The reg field, which occupies the next three bits following the mod
field, specifies either a register number or three more bits of opcode
information. The meaning of the reg field is determined by the first
(opcode) byte of the instruction.

þ The r/m field, which occupies the three least significant bits of the
byte, can specify a register as the location of an operand, or can form
part of the addressing-mode encoding in combination with the field as
described above

The based indexed and scaled indexed forms of 32-bit addressing require the
SIB byte. The presence of the SIB byte is indicated by certain encodings of
the ModR/M byte. The SIB byte then includes the following fields:

þ The ss field, which occupies the two most significant bits of the
byte, specifies the scale factor

þ The index field, which occupies the next three bits following the ss
field and specifies the register number of the index register

þ The base field, which occupies the three least significant bits of the
byte, specifies the register number of the base register

Figure 17-2 shows the formats of the ModR/M and SIB bytes.

The values and the corresponding addressing forms of the ModR/M and SIB
bytes are shown in Tables 17-2, 17-3, and 17-4. The 16-bit addressing
forms specified by the ModR/M byte are in Table 17-2. The 32-bit addressing
forms specified by ModR/M are in Table 17-3. Table 17-4 shows the 32-bit
addressing forms specified by the SIB byte

Figure 17-2. ModR/M and SIB Byte Formats

MODR/M BYTE

7 6 5 4 3 2 1 0
ÉÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍ»
º MOD º REG/OPCODE º R/M º
ÈÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍ¼

SIB (SCALE INDEX BASE) BYTE

7 6 5 4 3 2 1 0
ÉÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍ»
º SS º INDEX º BASE º
ÈÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍ¼

Table 17-2. 16-Bit Addressing Forms with the ModR/M Byte

r8(/r) AL CL DL BL AH CH DH BH
r16(/r) AX CX DX BX SP BP SI DI
r32(/r) EAX ECX EDX EBX ESP EBP ESI EDI
/digit (Opcode) 0 1 2 3 4 5 6 7
REG = 000 001 010 011 100 101 110 111

Effective
ÚÄÄÄAddress
disp8 denotes an 8-bit displacement following the ModR/M byte, to be
sign-extended and added to the index. disp16 denotes a 16-bit displacement
following the ModR/M byte, to be added to the index. Default segment
register is SS for the effective addresses containing a BP index, DS for
other effective addresses.ÄÄ¿ ÚMod R/M¿ ÚÄÄÄÄÄÄÄÄModR/M Values in HexadecimalÄÄÄÄÄÄÄÄ¿

[BX + SI] 000 00 08 10 18 20 28 30 38
[BX + DI] 001 01 09 11 19 21 29 31 39
[BP + SI] 010 02 0A 12 1A 22 2A 32 3A
[BP + DI] 011 03 0B 13 1B 23 2B 33 3B
[SI] 00 100 04 0C 14 1C 24 2C 34 3C
[DI] 101 05 0D 15 1D 25 2D 35 3D
disp16 110 06 0E 16 1E 26 2E 36 3E
[BX] 111 07 0F 17 1F 27 2F 37 3F

[BX+SI]+disp8 000 40 48 50 58 60 68 70 78
[BX+DI]+disp8 001 41 49 51 59 61 69 71 79
[BP+SI]+disp8 010 42 4A 52 5A 62 6A 72 7A
[BP+DI]+disp8 011 43 4B 53 5B 63 6B 73 7B
[SI]+disp8 01 100 44 4C 54 5C 64 6C 74 7C
[DI]+disp8 101 45 4D 55 5D 65 6D 75 7D
[BP]+disp8 110 46 4E 56 5E 66 6E 76 7E
[BX]+disp8 111 47 4F 57 5F 67 6F 77 7F

[BX+SI]+disp16 000 80 88 90 98 A0 A8 B0 B8
[BX+DI]+disp16 001 81 89 91 99 A1 A9 B1 B9
[BX+SI]+disp16 010 82 8A 92 9A A2 AA B2 BA
[BX+DI]+disp16 011 83 8B 93 9B A3 AB B3 BB
[SI]+disp16 10 100 84 8C 94 9C A4 AC B4 BC
[DI]+disp16 101 85 8D 95 9D A5 AD B5 BD
[BP]+disp16 110 86 8E 96 9E A6 AE B6 BE
[BX]+disp16 111 87 8F 97 9F A7 AF B7 BF

EAX/AX/AL 000 C0 C8 D0 D8 E0 E8 F0 F8
ECX/CX/CL 001 C1 C9 D1 D9 E1 E9 F1 F9
EDX/DX/DL 010 C2 CA D2 DA E2 EA F2 FA
EBX/BX/BL 011 C3 CB D3 DB E3 EB F3 FB
ESP/SP/AH 11 100 C4 CC D4 DC E4 EC F4 FC
EBP/BP/CH 101 C5 CD D5 DD E5 ED F5 FD
ESI/SI/DH 110 C6 CE D6 DE E6 EE F6 FE
EDI/DI/BH 111 C7 CF D7 DF E7 EF F7 FF

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
NOTES:
disp8 denotes an 8-bit displacement following the ModR/M byte, to be
sign-extended and added to the index. disp16 denotes a 16-bit displacement
following the ModR/M byte, to be added to the index. Default segment
register is SS for the effective addresses containing a BP index, DS for
other effective addresses.
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

Table 17-3. 32-Bit Addressing Forms with the ModR/M Byte

r8(/r) AL CL DL BL AH CH DH BH
r16(/r) AX CX DX BX SP BP SI DI
r32(/r) EAX ECX EDX EBX ESP EBP ESI EDI
/digit (Opcode) 0 1 2 3 4 5 6 7
REG = 000 001 010 011 100 101 110 111

Effective
ÚÄÄÄAddress
[--] [--] means a SIB follows the ModR/M byte. disp8 denotes an 8-bit
displacement following the SIB byte, to be sign-extended and added to the
index. disp32 denotes a 32-bit displacement following the ModR/M byte, to
be added to the index.ÄÄ¿ ÚMod R/M¿ ÚÄÄÄÄÄÄÄÄÄModR/M Values in HexadecimalÄÄÄÄÄÄÄ¿

[EAX] 000 00 08 10 18 20 28 30 38
[ECX] 001 01 09 11 19 21 29 31 39
[EDX] 010 02 0A 12 1A 22 2A 32 3A
[EBX] 011 03 0B 13 1B 23 2B 33 3B
[--] [--] 00 100 04 0C 14 1C 24 2C 34 3C
disp32 101 05 0D 15 1D 25 2D 35 3D
[ESI] 110 06 0E 16 1E 26 2E 36 3E
[EDI] 111 07 0F 17 1F 27 2F 37 3F

disp8[EAX] 000 40 48 50 58 60 68 70 78
disp8[ECX] 001 41 49 51 59 61 69 71 79
disp8[EDX] 010 42 4A 52 5A 62 6A 72 7A
disp8[EPX]; 011 43 4B 53 5B 63 6B 73 7B
disp8[--] [--] 01 100 44 4C 54 5C 64 6C 74 7C
disp8[ebp] 101 45 4D 55 5D 65 6D 75 7D
disp8[ESI] 110 46 4E 56 5E 66 6E 76 7E
disp8[EDI] 111 47 4F 57 5F 67 6F 77 7F

disp32[EAX] 000 80 88 90 98 A0 A8 B0 B8
disp32[ECX] 001 81 89 91 99 A1 A9 B1 B9
disp32[EDX] 010 82 8A 92 9A A2 AA B2 BA
disp32[EBX] 011 83 8B 93 9B A3 AB B3 BB
disp32[--] [--] 10 100 84 8C 94 9C A4 AC B4 BC
disp32[EBP] 101 85 8D 95 9D A5 AD B5 BD
disp32[ESI] 110 86 8E 96 9E A6 AE B6 BE
disp32[EDI] 111 87 8F 97 9F A7 AF B7 BF

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
NOTES:
[--] [--] means a SIB follows the ModR/M byte. disp8 denotes an 8-bit
displacement following the SIB byte, to be sign-extended and added to the
index. disp32 denotes a 32-bit displacement following the ModR/M byte, to
be added to the index.
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

Table 17-4. 32-Bit Addressing Forms with the SIB Byte

r32 EAX ECX EDX EBX ESP [*]
[*] means a disp32 with no base if MOD is 00, [ESP] otherwise. This provides
the following addressing modes:
disp32[index] (MOD=00)
disp8[EBP][index] (MOD=01)
disp32[EBP][index] (MOD=10) ESI EDI
Base = 0 1 2 3 4 5 6 7
Base = 000 001 010 011 100 101 110 111

ÚScaled Index
[*] means a disp32 with no base if MOD is 00, [ESP] otherwise. This provides
the following addressing modes:
disp32[index] (MOD=00)
disp8[EBP][index] (MOD=01)
disp32[EBP][index] (MOD=10)¿ÚSS Index¿ ÚÄÄÄÄÄÄÄÄModR/M Values in HexadecimalÄÄÄÄÄÄÄÄ¿

[EAX] 000 00 01 02 03 04 05 06 07
[ECX] 001 08 09 0A 0B 0C 0D 0E 0F
[EDX] 010 10 11 12 13 14 15 16 17
[EBX] 011 18 19 1A 1B 1C 1D 1E 1F
none 00 100 20 21 22 23 24 25 26 27
[EBP] 101 28 29 2A 2B 2C 2D 2E 2F
[ESI] 110 30 31 32 33 34 35 36 37
[EDI] 111 38 39 3A 3B 3C 3D 3E 3F

[EAX*2] 000 40 41 42 43 44 45 46 47
[ECX*2] 001 48 49 4A 4B 4C 4D 4E 4F
[ECX*2] 010 50 51 52 53 54 55 56 57
[EBX*2] 011 58 59 5A 5B 5C 5D 5E 5F
none 01 100 60 61 62 63 64 65 66 67
[EBP*2] 101 68 69 6A 6B 6C 6D 6E 6F
[ESI*2] 110 70 71 72 73 74 75 76 77
[EDI*2] 111 78 79 7A 7B 7C 7D 7E 7F

[EAX*4] 000 80 81 82 83 84 85 86 87
[ECX*4] 001 88 89 8A 8B 8C 8D 8E 8F
[EDX*4] 010 90 91 92 93 94 95 96 97
[EBX*4] 011 98 89 9A 9B 9C 9D 9E 9F
none 10 100 A0 A1 A2 A3 A4 A5 A6 A7
[EBP*4] 101 A8 A9 AA AB AC AD AE AF
[ESI*4] 110 B0 B1 B2 B3 B4 B5 B6 B7
[EDI*4] 111 B8 B9 BA BB BC BD BE BF

[EAX*8] 000 C0 C1 C2 C3 C4 C5 C6 C7
[ECX*8] 001 C8 C9 CA CB CC CD CE CF
[EDX*8] 010 D0 D1 D2 D3 D4 D5 D6 D7
[EBX*8] 011 D8 D9 DA DB DC DD DE DF
none 11 100 E0 E1 E2 E3 E4 E5 E6 E7
[EBP*8] 101 E8 E9 EA EB EC ED EE EF
[ESI*8] 110 F0 F1 F2 F3 F4 F5 F6 F7
[EDI*8] 111 F8 F9 FA FB FC FD FE FF

ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ
NOTES:
[*] means a disp32 with no base if MOD is 00, [ESP] otherwise. This
provides the following addressing modes:
disp32[index] (MOD=00)
disp8[EBP][index] (MOD=01)
disp32[EBP][index] (MOD=10)
ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ

17.2.2 How to Read the Instruction Set Pages

The following is an example of the format used for each 80386 instruction
description in this chapter:

CMC ÄÄ Complement Carry Flag

Opcode Instruction Clocks Description

F5 CMC 2 Complement carry flag

The above table is followed by paragraphs labelled "Operation,"
"Description," "Flags Affected," "Protected Mode Exceptions," "Real
Address Mode Exceptions," and, optionally, "Notes." The following sections
explain the notational conventions and abbreviations used in these
paragraphs of the instruction descriptions.

17.2.2.1 Opcode

The "Opcode" column gives the complete object code produced for each form
of the instruction. When possible, the codes are given as hexadecimal bytes,
in the same order in which they appear in memory. Definitions of entries
other than hexadecimal bytes are as follows:

/digit: (digit is between 0 and 7) indicates that the ModR/M byte of the
instruction uses only the r/m (register or memory) operand. The reg field
contains the digit that provides an extension to the instruction's opcode.

/r: indicates that the ModR/M byte of the instruction contains both a
register operand and an r/m operand.

cb, cw, cd, cp: a 1-byte (cb), 2-byte (cw), 4-byte (cd) or 6-byte (cp)
value following the opcode that is used to specify a code offset and
possibly a new value for the code segment register.

ib, iw, id: a 1-byte (ib), 2-byte (iw), or 4-byte (id) immediate operand to
the instruction that follows the opcode, ModR/M bytes or scale-indexing
bytes. The opcode determines if the operand is a signed value. All words and
doublewords are given with the low-order byte first.

+rb, +rw, +rd: a register code, from 0 through 7, added to the hexadecimal
byte given at the left of the plus sign to form a single opcode byte. The
codes areÄÄ

rb rw rd
AL = 0 AX = 0 EAX = 0
CL = 1 CX = 1 ECX = 1
DL = 2 DX = 2 EDX = 2
BL = 3 BX = 3 EBX = 3
AH = 4 SP = 4 ESP = 4
CH = 5 BP = 5 EBP = 5
DH = 6 SI = 6 ESI = 6
BH = 7 DI = 7 EDI = 7

17.2.2.2 Instruction

The "Instruction" column gives the syntax of the instruction statement as
it would appear in an ASM386 program. The following is a list of the symbols
used to represent operands in the instruction statements:

rel8: a relative address in the range from 128 bytes before the end of the
instruction to 127 bytes after the end of the instruction.

rel16, rel32: a relative address within the same code segment as the
instruction assembled. rel16 applies to instructions with an operand-size
attribute of 16 bits; rel32 applies to instructions with an operand-size
attribute of 32 bits.

ptr16:16, ptr16:32: a FAR pointer, typically in a code segment different
from that of the instruction. The notation 16:16 indicates that the value of
the pointer has two parts. The value to the right of the colon is a 16-bit
selector or value destined for the code segment register. The value to the
left corresponds to the offset within the destination segment. ptr16:16 is
used when the instruction's operand-size attribute is 16 bits; ptr16:32 is
used with the 32-bit attribute.

r8: one of the byte registers AL, CL, DL, BL, AH, CH, DH, or BH.

r16: one of the word registers AX, CX, DX, BX, SP, BP, SI, or DI.

r32: one of the doubleword registers EAX, ECX, EDX, EBX, ESP, EBP, ESI, or
EDI.

imm8: an immediate byte value. imm8 is a signed number between -128 and
+127 inclusive. For instructions in which imm8 is combined with a word or
doubleword operand, the immediate value is sign-extended to form a word or
doubleword. The upper byte of the word is filled with the topmost bit of the
immediate value.

imm16: an immediate word value used for instructions whose operand-size
attribute is 16 bits. This is a number between -32768 and +32767 inclusive.

imm32: an immediate doubleword value used for instructions whose
operand-size attribute is 32-bits. It allows the use of a number between
+2147483647 and -2147483648.

r/m8: a one-byte operand that is either the contents of a byte register
(AL, BL, CL, DL, AH, BH, CH, DH), or a byte from memory.

r/m16: a word register or memory operand used for instructions whose
operand-size attribute is 16 bits. The word registers are: AX, BX, CX, DX,
SP, BP, SI, DI. The contents of memory are found at the address provided by
the effective address computation.

r/m32: a doubleword register or memory operand used for instructions whose
operand-size attribute is 32-bits. The doubleword registers are: EAX, EBX,
ECX, EDX, ESP, EBP, ESI, EDI. The contents of memory are found at the
address provided by the effective address computation.

m8: a memory byte addressed by DS:SI or ES:DI (used only by string
instructions).

m16: a memory word addressed by DS:SI or ES:DI (used only by string
instructions).

m32: a memory doubleword addressed by DS:SI or ES:DI (used only by string
instructions).

m16:16, M16:32: a memory operand containing a far pointer composed of two
numbers. The number to the left of the colon corresponds to the pointer's
segment selector. The number to the right corresponds to its offset.

m16 & 32, m16 & 16, m32 & 32: a memory operand consisting of data item pairs
whose sizes are indicated on the left and the right side of the ampersand.
All memory addressing modes are allowed. m16 & 16 and m32 & 32 operands are
used by the BOUND instruction to provide an operand containing an upper and
lower bounds for array indices. m16 & 32 is used by LIDT and LGDT to
provide a word with which to load the limit field, and a doubleword with
which to load the base field of the corresponding Global and Interrupt
Descriptor Table Registers.

moffs8, moffs16, moffs32: (memory offset) a simple memory variable of type
BYTE, WORD, or DWORD used by some variants of the MOV instruction. The
actual address is given by a simple offset relative to the segment base. No
ModR/M byte is used in the instruction. The number shown with moffs
indicates its size, which is determined by the address-size attribute of the
instruction.

Sreg: a segment register. The segment register bit assignments are ES=0,
CS=1, SS=2, DS=3, FS=4, and GS=5.

17.2.2.3 Clocks

The "Clocks" column gives the number of clock cycles the instruction takes
to execute. The clock count calculations makes the following assumptions:

þ The instruction has been prefetched and decoded and is ready for
execution.

þ Bus cycles do not require wait states.

þ There are no local bus HOLD requests delaying processor access to the
bus.

þ No exceptions are detected during instruction execution.

þ Memory operands are aligned.

Clock counts for instructions that have an r/m (register or memory) operand
are separated by a slash. The count to the left is used for a register
operand; the count to the right is used for a memory operand.

The following symbols are used in the clock count specifications:

þ n, which represents a number of repetitions.

þ m, which represents the number of components in the next instruction
executed, where the entire displacement (if any) counts as one
component, the entire immediate data (if any) counts as one component,
and every other byte of the instruction and prefix(es) each counts as
one component.

þ pm=, a clock count that applies when the instruction executes in
Protected Mode. pm= is not given when the clock counts are the same for
Protected and Real Address Modes.

When an exception occurs during the execution of an instruction and the
exception handler is in another task, the instruction execution time is
increased by the number of clocks to effect a task switch. This parameter
depends on several factors:

þ The type of TSS used to represent the current task (386 TSS or 286
TSS).

þ The type of TSS used to represent the new task.

þ Whether the current task is in V86 mode.

þ Whether the new task is in V86 mode.

Table 17-5 summarizes the task switch times for exceptions.

Table 17-5. Task Switch Times for Exceptions

New Task

Old 386 TSS 286 TSS
Task VM = 0

386 VM = 0 309 282
TSS

386 VM = 1 314 231
TSS

286 307 282
TSS

17.2.2.4 Description

The "Description" column following the "Clocks" column briefly explains the
various forms of the instruction. The "Operation" and "Description" sections
contain more details of the instruction's operation.

17.2.2.5 Operation

The "Operation" section contains an algorithmic description of the
instruction which uses a notation similar to the Algol or Pascal language.
The algorithms are composed of the following elements:

Comments are enclosed within the symbol pairs "(*" and "*)".

Compound statements are enclosed between the keywords of the "if" statement
(IF, THEN, ELSE, FI) or of the "do" statement (DO, OD), or of the "case"
statement (CASE ... OF, ESAC).

A register name implies the contents of the register. A register name
enclosed in brackets implies the contents of the location whose address is
contained in that register. For example, ES:[DI] indicates the contents of
the location whose ES segment relative address is in register DI. [SI]
indicates the contents of the address contained in register SI relative to
SI's default segment (DS) or overridden segment.

Brackets also used for memory operands, where they mean that the contents
of the memory location is a segment-relative offset. For example, [SRC]
indicates that the contents of the source operand is a segment-relative
offset.

A B; indicates that the value of B is assigned to A.

The symbols =, <>, ò, and ó are relational operators used to compare two
values, meaning equal, not equal, greater or equal, less or equal,
respectively. A relational expression such as A = B is TRUE if the value of
A is equal to B; otherwise it is FALSE.

The following identifiers are used in the algorithmic descriptions:

þ OperandSize represents the operand-size attribute of the instruction,
which is either 16 or 32 bits. AddressSize represents the address-size
attribute, which is either 16 or 32 bits. For example,

IF instruction = CMPSW
THEN OperandSize 16;
ELSE
IF instruction = CMPSD
THEN OperandSize 32;
FI;
FI;

indicates that the operand-size attribute depends on the form of the CMPS
instruction used. Refer to the explanation of address-size and operand-size
attributes at the beginning of this chapter for general guidelines on how
these attributes are determined.

þ StackAddrSize represents the stack address-size attribute associated
with the instruction, which has a value of 16 or 32 bits, as explained
earlier in the chapter.

þ SRC represents the source operand. When there are two operands, SRC is
the one on the right.

þ DEST represents the destination operand. When there are two operands,
DEST is the one on the left.

þ LeftSRC, RightSRC distinguishes between two operands when both are
source operands.

þ eSP represents either the SP register or the ESP register depending on
the setting of the B-bit for the current stack segment.

The following functions are used in the algorithmic descriptions:

þ Truncate to 16 bits(value) reduces the size of the value to fit in 16
bits by discarding the uppermost bits as needed.

þ Addr(operand) returns the effective address of the operand (the result
of the effective address calculation prior to adding the segment base).

þ ZeroExtend(value) returns a value zero-extended to the operand-size
attribute of the instruction. For example, if OperandSize = 32,
ZeroExtend of a byte value of -10 converts the byte from F6H to
doubleword with hexadecimal value 000000F6H. If the value passed to
ZeroExtend and the operand-size attribute are the same size,
ZeroExtend returns the value unaltered.

þ SignExtend(value) returns a value sign-extended to the operand-size
attribute of the instruction. For example, if OperandSize = 32,
SignExtend of a byte containing the value -10 converts the byte from
F6H to a doubleword with hexadecimal value FFFFFFF6H. If the value
passed to SignExtend and the operand-size attribute are the same size,
SignExtend returns the value unaltered.

þ Push(value) pushes a value onto the stack. The number of bytes pushed
is determined by the operand-size attribute of the instruction. The
action of Push is as follows:

IF StackAddrSize = 16
THEN
IF OperandSize = 16
THEN
SP SP - 2;
SS:[SP] value; (* 2 bytes assigned starting at
byte address in SP *)
ELSE (* OperandSize = 32 *)
SP SP - 4;
SS:[SP] value; (* 4 bytes assigned starting at
byte address in SP *)
FI;
ELSE (* StackAddrSize = 32 *)
IF OperandSize = 16
THEN
ESP ESP - 2;
SS:[ESP] value; (* 2 bytes assigned starting at
byte address in ESP*)
ELSE (* OperandSize = 32 *)
ESP ESP - 4;
SS:[ESP] value; (* 4 bytes assigned starting at
byte address in ESP*)
FI;
FI;

þ Pop(value) removes the value from the top of the stack and returns it.
The statement EAX Pop( ); assigns to EAX the 32-bit value that Pop
took from the top of the stack. Pop will return either a word or a
doubleword depending on the operand-size attribute. The action of Pop
is as follows:

IF StackAddrSize = 16
THEN
IF OperandSize = 16
THEN
ret val SS:[SP]; (* 2-byte value *)
SP SP + 2;
ELSE (* OperandSize = 32 *)
ret val SS:[SP]; (* 4-byte value *)
SP SP + 4;
FI;
ELSE (* StackAddrSize = 32 *)
IF OperandSize = 16
THEN
ret val SS:[ESP]; (* 2 bytes value *)
ESP ESP + 2;
ELSE (* OperandSize = 32 *)
ret val SS:[ESP]; (* 4 bytes value *)
ESP ESP + 4;
FI;
FI;
RETURN(ret val); (*returns a word or doubleword*)

þ Bit[BitBase, BitOffset] returns the address of a bit within a bit
string, which is a sequence of bits in memory or a register. Bits are
numbered from low-order to high-order within registers and within
memory bytes. In memory, the two bytes of a word are stored with the
low-order byte at the lower address.

If the base operand is a register, the offset can be in the range 0..31.
This offset addresses a bit within the indicated register. An example,
"BIT[EAX, 21]," is illustrated in Figure 17-3.

If BitBase is a memory address, BitOffset can range from -2 gigabits to 2
gigabits. The addressed bit is numbered (Offset MOD 8) within the byte at
address (BitBase + (BitOffset DIV 8)), where DIV is signed division with
rounding towards negative infinity, and MOD returns a positive number.
This is illustrated in Figure 17-4.

þ I-O-Permission(I-O-Address, width) returns TRUE or FALSE depending on
the I/O permission bitmap and other factors. This function is defined as
follows:

IF TSS type is 286 THEN RETURN FALSE; FI;
Ptr [TSS + 66]; (* fetch bitmap pointer *)
BitStringAddr SHR (I-O-Address, 3) + Ptr;
MaskShift I-O-Address AND 7;
CASE width OF:
BYTE: nBitMask 1;
WORD: nBitMask 3;
DWORD: nBitMask 15;
ESAC;
mask SHL (nBitMask, MaskShift);
CheckString [BitStringAddr] AND mask;
IF CheckString = 0
THEN RETURN (TRUE);
ELSE RETURN (FALSE);
FI;

þ Switch-Tasks is the task switching function described in Chapter 7.

17.2.2.6 Description

The "Description" section contains further explanation of the instruction's
operation.

Figure 17-3. Bit Offset for BIT[EAX, 21]

31 21 0
ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
º º º º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼

ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄBITOFFSET = 21ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ

Figure 17-4. Memory Bit Indexing

BIT INDEXING (POSITIVE OFFSET)

7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
ÉÍÍÍÍÑÍÑÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ»
º ³ ³ ³ ³ º
ÈÍÍÍÍÏÍÏÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ¼
³ BITBASE + 1 ³ BITBASE ³ BITBASE - 1 ³
³
ÀÄÄÄÄÄÄÄÄOFFSET = 13ÄÄÄÄÄÄÄÙ

BIT INDEXING (NEGATIVE OFFSET)

7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÑÍÑÍÍÍÍÍÍÍÍÍÍ»
º ³ ³ ³ ³ º
ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÏÍÏÍÍÍÍÍÍÍÍÍÍ¼
³ BITBASE ³ BITBASE - 1 ³ BITBASE - 2 ³
³
ÀÄÄÄÄÄOFFSET = -11ÄÄÄÙ

17.2.2.7 Flags Affected

The "Flags Affected" section lists the flags that are affected by the
instruction, as follows:

þ If a flag is always cleared or always set by the instruction, the
value is given (0 or 1) after the flag name. Arithmetic and logical
instructions usually assign values to the status flags in the uniform
manner described in Appendix C. Nonconventional assignments are
described in the "Operation" section.

þ The values of flags listed as "undefined" may be changed by the
instruction in an indeterminate manner.

All flags not listed are unchanged by the instruction.

17.2.2.8 Protected Mode Exceptions

This section lists the exceptions that can occur when the instruction is
executed in 80386 Protected Mode. The exception names are a pound sign (#)
followed by two letters and an optional error code in parentheses. For
example, #GP(0) denotes a general protection exception with an error code of
0. Table 17-6 associates each two-letter name with the corresponding
interrupt number.

Chapter 9 describes the exceptions and the 80386 state upon entry to the
exception.

Application programmers should consult the documentation provided with
their operating systems to determine the actions taken when exceptions
occur.

Table 17-6. 80386 Exceptions

Mnemonic Interrupt Description

17.2.2.9 Real Address Mode Exceptions

Because less error checking is performed by the 80386 in Real Address Mode,
this mode has fewer exception conditions. Refer to Chapter 14 for further
information on these exceptions.

continue

Created by: Astalalista - Last modification: 2004/Jan/26 at 02:57
Labelled with ICRA	This page is powered by Copyright Button(TM). Click here to read how this page is protected by copyright laws.	Please send any comments to Webmaster