Online Utenti | astalalista main site | mappa sito | pagina gratis | e-mail gratis | blog |
|
INTEL 80386 PROGRAMMER'S REFERENCE MANUAL 1986go to English versiontranslate
Chapter 10 Initialization ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ After a signal on the RESET pin, certain registers of the 80386 are set to predefined values. These values are adequate to enable execution of a bootstrap program, but additional initialization must be performed by software before all the features of the processor can be utilized. 10.1 Processor State After Reset The contents of EAX depend upon the results of the power-up self test. The self-test may be requested externally by assertion of BUSY# at the end of RESET. The EAX register holds zero if the 80386 passed the test. A nonzero value in EAX after self-test indicates that the particular 80386 unit is faulty. If the self-test is not requested, the contents of EAX after RESET is undefined. DX holds a component identifier and revision number after RESET as Figure 10-1 illustrates. DH contains 3, which indicates an 80386 component. DL contains a unique identifier of the revision level. Control register zero (CR0) contains the values shown in Figure 10-2. The ET bit of CR0 is set if an 80387 is present in the configuration (according to the state of the ERROR# pin after RESET). If ET is reset, the configuration either contains an 80287 or does not contain a coprocessor. A software test is required to distinguish between these latter two possibilities. The remaining registers and flags are set as follows: EFLAGS =00000002H IP =0000FFF0H CS selector =000H DS selector =0000H ES selector =0000H SS selector =0000H FS selector =0000H GS selector =0000H IDTR: base =0 limit =03FFH All registers not mentioned above are undefined. These settings imply that the processor begins in real-address mode with interrupts disabled. Figure 10-1. Contents of EDX after RESET EDX REGISTER 31 23 15 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ DH ³ DL º º±±±±±±±±±±±±UNDEFINED±±±±±±±±±±±±³ DEVICE ID ³ STEPPING ID º º±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±±³ 3 ³ (UNIQUE) º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Figure 10-2. Initial Contents of CR0 CONTROL REGISTER ZERO 31 23 15 7 4 3 1 0 ÉÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÑÍÑÍÑÍÑÍÑÍ» ºP³ ³E³T³E³M³Pº º ³ UNDEFINED ³ ³ ³ ³ ³ º ºG³ ³T³S³M³P³Eº ÈÑÏÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÏÑÏÑÏÑÏÑÏѼ ³ ³ ³ ³ ³ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄ0 - PAGING DISABLED ³ ³ ³ ³ ³ * - INDICATES PRESENCE OF 80387ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ ³ ³ 0 - NO TASK SWITCHÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ ³ 0 - DO NOT MONITOR COPROCESSORÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ ³ 0 - COPROCESSOR NOT PRESENTÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ ³ 0 - PROTECTION NOT ENABLED (REAL ADDRESS MODE)ÄÄÄÄÄÄÄÄÄÄÙ 10.2 Software Initialization for Real-Address Mode In real-address mode a few structures must be initialized before a program can take advantage of all the features available in this mode. 10.2.1 Stack No instructions that use the stack can be used until the stack-segment register (SS) has been loaded. SS must point to an area in RAM. 10.2.2 Interrupt Table The initial state of the 80386 leaves interrupts disabled; however, the processor will still attempt to access the interrupt table if an exception or nonmaskable interrupt (NMI) occurs. Initialization software should take one of the following actions: þ Change the limit value in the IDTR to zero. This will cause a shutdown if an exception or nonmaskable interrupt occurs. (Refer to the 80386 Hardware Reference Manual to see how shutdown is signalled externally.) þ Put pointers to valid interrupt handlers in all positions of the interrupt table that might be used by exceptions or interrupts. þ Change the IDTR to point to a valid interrupt table. 10.2.3 First Instructions After RESET, address lines A{31-20} are automatically asserted for instruction fetches. This fact, together with the initial values of CS:IP, causes instruction execution to begin at physical address FFFFFFF0H. Near (intrasegment) forms of control transfer instructions may be used to pass control to other addresses in the upper 64K bytes of the address space. The first far (intersegment) JMP or CALL instruction causes A{31-20} to drop low, and the 80386 continues executing instructions in the lower one megabyte of physical memory. This automatic assertion of address lines A{31-20} allows systems designers to use a ROM at the high end of the address space to initialize the system. 10.3 Switching to Protected Mode Setting the PE bit of the MSW in CR0 causes the 80386 to begin executing in protected mode. The current privilege level (CPL) starts at zero. The segment registers continue to point to the same linear addresses as in real address mode (in real address mode, linear addresses are the same physical addresses). Immediately after setting the PE flag, the initialization code must flush the processor's instruction prefetch queue by executing a JMP instruction. The 80386 fetches and decodes instructions and addresses before they are used; however, after a change into protected mode, the prefetched instruction information (which pertains to real-address mode) is no longer valid. A JMP forces the processor to discard the invalid information. 10.4 Software Initialization for Protected Mode Most of the initialization needed for protected mode can be done either before or after switching to protected mode. If done in protected mode, however, the initialization procedures must not use protected-mode features that are not yet initialized. 10.4.1 Interrupt Descriptor Table The IDTR may be loaded in either real-address or protected mode. However, the format of the interrupt table for protected mode is different than that for real-address mode. It is not possible to change to protected mode and change interrupt table formats at the same time; therefore, it is inevitable that, if IDTR selects an interrupt table, it will have the wrong format at some time. An interrupt or exception that occurs at this time will have unpredictable results. To avoid this unpredictability, interrupts should remain disabled until interrupt handlers are in place and a valid IDT has been created in protected mode. 10.4.2 Stack The SS register may be loaded in either real-address mode or protected mode. If loaded in real-address mode, SS continues to point to the same linear base-address after the switch to protected mode. 10.4.3 Global Descriptor Table Before any segment register is changed in protected mode, the GDT register must point to a valid GDT. Initialization of the GDT and GDTR may be done in real-address mode. The GDT (as well as LDTs) should reside in RAM, because the processor modifies the accessed bit of descriptors. 10.4.4 Page Tables Page tables and the PDBR in CR3 can be initialized in either real-address mode or in protected mode; however, the paging enabled (PG) bit of CR0 cannot be set until the processor is in protected mode. PG may be set simultaneously with PE, or later. When PG is set, the PDBR in CR3 should already be initialized with a physical address that points to a valid page directory. The initialization procedure should adopt one of the following strategies to ensure consistent addressing before and after paging is enabled: þ The page that is currently being executed should map to the same physical addresses both before and after PG is set. þ A JMP instruction should immediately follow the setting of PG. 10.4.5 First Task The initialization procedure can run awhile in protected mode without initializing the task register; however, before the first task switch, the following conditions must prevail: þ There must be a valid task state segment (TSS) for the new task. The stack pointers in the TSS for privilege levels numerically less than or equal to the initial CPL must point to valid stack segments. þ The task register must point to an area in which to save the current task state. After the first task switch, the information dumped in this area is not needed, and the area can be used for other purposes. 10.5 Initialization Example $TITLE ('Initial Task') NAME INIT init_stack SEGMENT RW DW 20 DUP(?) tos LABEL WORD init_stack ENDS init_data SEGMENT RW PUBLIC DW 20 DUP(?) init_data ENDS init_code SEGMENT ER PUBLIC ASSUME DS:init_data nop nop nop init_start: ; set up stack mov ax, init_stack mov ss, ax mov esp, offset tos mov a1,1 blink: xor a1,1 out 0e4h,a1 mov cx,3FFFh here: dec cx jnz here jmp SHORT blink hlt init_code ends END init_start, SS:init_stack, DS:init_data $TITLE('Protected Mode Transition -- 386 initialization') NAME RESET ;***************************************************************** ; Upon reset the 386 starts executing at address 0FFFFFFF0H. The ; upper 12 address bits remain high until a FAR call or jump is ; executed. ; ; Assume the following: ; ; ; - a short jump at address 0FFFFFFF0H (placed there by the ; system builder) causes execution to begin at START in segment ; RESET_CODE. ; ; ; - segment RESET_CODE is based at physical address 0FFFF0000H, ; i.e. at the start of the last 64K in the 4G address space. ; Note that this is the base of the CS register at reset. If ; you locate ROMcode above this address, you will need to ; figure out an adjustment factor to address things within this ; segment. ; ;***************************************************************** $EJECT ; ; Define addresses to locate GDT and IDT in RAM. ; These addresses are also used in the BLD386 file that defines ; the GDT and IDT. If you change these addresses, make sure you ; change the base addresses specified in the build file. GDTbase EQU 00001000H ; physical address for GDT base IDTbase EQU 00000400H ; physical address for IDT base PUBLIC GDT_EPROM PUBLIC IDT_EPROM PUBLIC START DUMMY segment rw ; ONLY for ASM386 main module stack init DW 0 DUMMY ends ;***************************************************************** ; ; Note: RESET CODE must be USEl6 because the 386 initally executes ; in real mode. ; RESET_CODE segment er PUBLIC USE16 ASSUME DS:nothing, ES:nothing ; ; 386 Descriptor template DESC STRUC lim_0_15 DW 0 ; limit bits (0..15) bas_0_15 DW 0 ; base bits (0..15) bas_16_23 DB 0 ; base bits (16..23) access DB 0 ; access byte gran DB 0 ; granularity byte bas_24_31 DB 0 ; base bits (24..31) DESC ENDS ; The following is the layout of the real GDT created by BLD386. ; It is located in EPROM and will be copied to RAM. ; ; GDT[O] ... NULL ; GDT[1] ... Alias for RAM GDT ; GDT[2] ... Alias for RAM IDT ; GDT[2] ... initial task TSS ; GDT[3] ... initial task TSS alias ; GDT[4] ... initial task LDT ; GDT[5] ... initial task LDT alias ; ; define entries in GDT and IDT. GDT_ENTRIES EQU 8 IDT_ENTRIES EQU 32 ; define some constants to index into the real GDT GDT_ALIAS EQU 1*SIZE DESC IDT_ALIAS EQU 2*SIZE DESC INIT_TSS EQU 3*SIZE DESC INIT_TSS_A EQU 4*SIZE DESC INIT_LDT EQU 5*SIZE DESC INIT_LDT_A EQU 6*SIZE DESC ; ; location of alias in INIT_LDT INIT_LDT_ALIAS EQU 1*SIZE DESC ; ; access rights byte for DATA and TSS descriptors DS_ACCESS EQU 010010010B TSS_ACCESS EQU 010001001B ; ; This temporary GDT will be used to set up the real GDT in RAM. Temp_GDT LABEL BYTE ; tag for begin of scratch GDT NULL_DES DESC <> ; NULL descriptor ; 32-Gigabyte data segment based at 0 FLAT_DES DESC <0FFFFH,0,0,92h,0CFh,0> GDT_eprom DP ? ; Builder places GDT address and limit ; in this 6 byte area. IDT_eprom DP ? ; Builder places IDT address and limit ; in this 6 byte area. ; ; Prepare operand for loadings GDTR and LDTR. TGDT_pword LABEL PWORD ; for temp GDT DW end_Temp_GDT_Temp_GDT -1 DD 0 GDT_pword LABEL PWORD ; for GDT in RAM DW GDT_ENTRIES * SIZE DESC -1 DD GDTbase IDT_pword LABEL PWORD ; for IDT in RAM DW IDT_ENTRIES * SIZE DESC -1 DD IDTbase end_Temp_GDT LABEL BYTE ; ; Define equates for addressing convenience. GDT_DES_FLAT EQU DS:GDT_ALIAS +GDTbase IDT_DES_FLAT EQU DS:IDT_ALIAS +GDTbase INIT_TSS_A_OFFSET EQU DS:INIT_TSS_A INIT_TSS_OFFSET EQU DS:INIT_TSS INIT_LDT_A_OFFSET EQU DS:INIT_LDT_A INIT_LDT_OFFSET EQU DS:INIT_LDT ; define pointer for first task switch ENTRY POINTER LABEL DWORD DW 0, INIT_TSS ;****************************************************************** ; ; Jump from reset vector to here. START: CLI ;disable interrupts CLD ;clear direction flag LIDT NULL_des ;force shutdown on errors ; ; move scratch GDT to RAM at physical 0 XOR DI,DI MOV ES,DI ;point ES:DI to physical location 0 MOV SI,OFFSET Temp_GDT MOV CX,end_Temp_GDT-Temp_GDT ;set byte count INC CX ; ; move table REP MOVS BYTE PTR ES:[DI],BYTE PTR CS:[SI] LGDT tGDT_pword ;load GDTR for Temp. GDT ;(located at 0) ; switch to protected mode MOV EAX,CR0 ;get current CRO MOV EAX,1 ;set PE bit MOV CRO,EAX ;begin protected mode ; ; clear prefetch queue JMP SHORT flush flush: ; set DS,ES,SS to address flat linear space (0 ... 4GB) MOV BX,FLAT_DES-Temp_GDT MOV US,BX MOV ES,BX MOV SS,BX ; ; initialize stack pointer to some (arbitrary) RAM location MOV ESP, OFFSET end_Temp_GDT ; ; copy eprom GDT to RAM MOV ESI,DWORD PTR GDT_eprom +2 ; get base of eprom GDT ; (put here by builder). MOV EDI,GDTbase ; point ES:EDI to GDT base in RAM. MOV CX,WORD PTR gdt_eprom +0 ; limit of eprom GDT INC CX SHR CX,1 ; easier to move words CLD REP MOVS WORD PTR ES:[EDI],WORD PTR DS:[ESI] ; ; copy eprom IDT to RAM ; MOV ESI,DWORD PTR IDT_eprom +2 ; get base of eprom IDT ; (put here by builder) MOV EDI,IDTbase ; point ES:EDI to IDT base in RAM. MOV CX,WORD PTR idt_eprom +0 ; limit of eprom IDT INC CX SHR CX,1 CLD REP MOVS WORD PTR ES:[EDI],WORD PTR DS:[ESI] ; switch to RAM GDT and IDT ; LIDT IDT_pword LGDT GDT_pword ; MOV BX,GDT_ALIAS ; point DS to GDT alias MOV DS,BX ; ; copy eprom TSS to RAM ; MOV BX,INIT_TSS_A ; INIT TSS A descriptor base ; has RAM location of INIT TSS. MOV ES,BX ; ES points to TSS in RAM MOV BX,INIT_TSS ; get inital task selector LAR DX,BX ; save access byte MOV [BX].access,DS_ACCESS ; set access as data segment MOV FS,BX ; FS points to eprom TSS XOR si,si ; FS:si points to eprom TSS XOR di,di ; ES:di points to RAM TSS MOV CX,[BX].lim_0_15 ; get count to move INC CX ; ; move INIT_TSS to RAM. REP MOVS BYTE PTR ES:[di],BYTE PTR FS:[si] MOV [BX].access,DH ; restore access byte ; ; change base of INIT TSS descriptor to point to RAM. MOV AX,INIT_TSS_A_OFFSET.bas_0_15 MOV INIT_TSS_OFFSET.bas_0_15,AX MOV AL,INIT_TSS_A_OFFSET.bas_16_23 MOV INIT_TSS_OFFSET.bas_16_23,AL MOV AL,INIT_TSS_A_OFFSET.bas_24_31 MOV INIT_TSS_OFFSET.bas_24_31,AL ; ; change INIT TSS A to form a save area for TSS on first task ; switch. Use RAM at location 0. MOV BX,INIT_TSS_A MOV WORD PTR [BX].bas_0_15,0 MOV [BX].bas_16_23,0 MOV [BX].bas_24_31,0 MOV [BX].access,TSS_ACCESS MOV [BX].gran,O LTR BX ; defines save area for TSS ; ; copy eprom LDT to RAM MOV BX,INIT_LDT_A ; INIT_LDT_A descriptor has ; base address in RAM for INIT_LDT. MOV ES,BX ; ES points LDT location in RAM. MOV AH,[BX].bas_24_31 MOV AL,[BX].bas_16_23 SHL EAX,16 MOV AX,[BX].bas_0_15 ; save INIT_LDT base (ram) in EAX MOV BX,INIT_LDT ; get inital LDT selector LAR DX,BX ; save access rights MOV [BX].access,DS_ACCESS ; set access as data segment MOV FS,BX ; FS points to eprom LDT XOR si,si ; FS:SI points to eprom LDT XOR di,di ; ES:DI points to RAM LDT MOV CX,[BX].lim_0_15 ; get count to move INC CX ; ; move initial LDT to RAM REP MOVS BYTE PTR ES:[di],BYTE PTR FS:[si] MOV [BX].access,DH ; restore access rights in ; INIT_LDT descriptor ; ; change base of alias (of INIT_LDT) to point to location in RAM. MOV ES:[INIT_LDT_ALIAS].bas_0_15,AX SHR EAX,16 MOV ES:[INIT_LDT_ALIAS].bas_16_23,AL MOV ES:[INIT_LDT_ALIAS].bas_24_31,AH ; ; now set the base value in INIT_LDT descriptor MOV AX,INIT_LDT_A_OFFSET.bas_0_15 MOV INIT_LDT_OFFSET.bas_0_15,AX MOV AL,INIT_LDT_A_OFFSET.bas_16_23 MOV INIT_LDT_OFFSET.bas_16_23,AL MOV AL,INIT_LDT_A_OFFSET.bas_24_31 MOV INIT_LDT_OFFSET.bas_24_31,AL ; ; Now GDT, IDT, initial TSS and initial LDT are all set up. ; ; Start the first task! ' JMP ENTRY_POINTER RESET_CODE ends END START, SS:DUMMY,DS:DUMMY 10.6 TLB Testing The 80386 provides a mechanism for testing the Translation Lookaside Buffer (TLB), the cache used for translating linear addresses to physical addresses. Although failure of the TLB hardware is extremely unlikely, users may wish to include TLB confidence tests among other power-up confidence tests for the 80386. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ NOTE This TLB testing mechanism is unique to the 80386 and may not be continued in the same way in future processors. Sortware that uses this mechanism may be incompatible with future processors. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ When testing the TLB it is recommended that paging be turned off (PG=0 in CR0) to avoid interference with the test data being written to the TLB. 10.6.1 Structure of the TLB The TLB is a four-way set-associative memory. Figure 10-3 illustrates the structure of the TLB. There are four sets of eight entries each. Each entry consists of a tag and data. Tags are 24-bits wide. They contain the high-order 20 bits of the linear address, the valid bit, and three attribute bits. The data portion of each entry contains the high-order 20 bits of the physical address. 10.6.2 Test Registers Two test registers, shown in Figure 10-4, are provided for the purpose of testing. TR6 is the test command register, and TR7 is the test data register. These registers are accessed by variants of the MOV instruction. A test register may be either the source operand or destination operand. The MOV instructions are defined in both real-address mode and protected mode. The test registers are privileged resources; in protected mode, the MOV instructions that access them can only be executed at privilege level 0. An attempt to read or write the test registers when executing at any other privilege level causes a general protection exception. The test command register (TR6) contains a command and an address tag to use in performing the command: C This is the command bit. There are two TLB testing commands: write entries into the TLB, and perform TLB lookups. To cause an immediate write into the TLB entry, move a doubleword into TR6 that contains a 0 in this bit. To cause an immediate TLB lookup, move a doubleword into TR6 that contains a 1 in this bit. Linear On a TLB write, a TLB entry is allocated to this linear address; Address the rest of that TLB entry is set per the value of TR7 and the value just written into TR6. On a TLB lookup, the TLB is interrogated per this value; if one and only one TLB entry matches, the rest of the fields of TR6 and TR7 are set from the matching TLB entry. V The valid bit for this TLB entry. The TLB uses the valid bit to identify entries that contain valid data. Entries of the TLB that have not been assigned values have zero in the valid bit. All valid bits can be cleared by writing to CR3. D, D# The dirty bit (and its complement) for/from the TLB entry. U, U# The U/S bit (and its complement) for/from the TLB entry. W, W# The R/W bit (and its complement) for/from the TLB entry. The meaning of these pairs of bits is given by Table 10-1, where X represents D, U, or W. The test data register (TR7) holds data read from or data to be written to the TLB. Physical This is the data field of the TLB. On a write to the TLB, the Address TLB entry allocated to the linear address in TR6 is set to this value. On a TLB lookup, if HT is set, the data field (physical address) from the TLB is read out to this field. If HT is not set, this field is undefined. HT For a TLB lookup, the HT bit indicates whether the lookup was a hit (HT 1) or a miss (HT 0). For a TLB write, HT must be set to 1. REP For a TLB write, selects which of four associative blocks of the TLB is to be written. For a TLB read, if HT is set, REP reports in which of the four associative blocks the tag was found; if HT is not set, REP is undefined. Table 10-1. Meaning of D, U, and W Bit Pairs X X# Effect during Value of bit X TLB Lookup after TLB Write 0 0 (undefined) (undefined) 0 1 Match if X=0 Bit X becomes 0 1 0 Match if X=1 Bit X becomes 1 1 1 (undefined) (undefined) Figure 10-3. TLB Structure ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» 7º TAG º DATA º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÚÄÄÄÄÄÄÄ ³ SET 11 ³ ÚÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ 1º TAG º DATA º ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ 0º TAG º DATA º ³ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ ³ ³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ³ 7º TAG º DATA º ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ ³ ÀÄÄ ³ SET 10 ³ ÚÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ 1º TAG º DATA º ³ D ³ ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ A ³ ³ ³ 0º TAG º DATA º ³ T ÀÄÄÄÄÄÄÙ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ A ³ ³ ÚÄÄÄÄÄÄ¿ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ B ³ ³ ³ 7º TAG º DATA º ³ U ³ ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ S ³ ³ ³ ³ ÀÄÄ ³ SET 01 ³ ÚÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ 1º TAG º DATA º ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ 0º TAG º DATA º ³ ³ ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ ³ ³ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ³ ³ 7º TAG º DATA º ³ ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ ³ ³ ÀÄÄ ³ SET 00 ÀÄÄÄÄÄÄÄ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ 1º TAG º DATA º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÎÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ 0º TAG º DATA º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ Figure 10-4. Test Registers 31 23 15 11 7 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍØÍÍÍÍÍÍÍØÍÍÍÍÍÑÍÑÍÍÍÑÍÍÍ» º ³ ³H³ ³ º º PHYSICAL ADDRESS ³0 0 0 0 0 0 0³ ³REP³0 0º TR7 º ³ ³T³ ³ º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÂÄÂÄÂÄÂÄÂÄÂÄÅÄÁÄÄÄÁÄÂĶ º ³ ³ ³D³ ³U³ ³W³ ³ º º LINEAR ADDRESS ³V³D³ ³U³ ³ ³ ³0 0 0 0³Cº TR8 º ³ ³ ³#³ ³#³ ³#³ ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍØÍÏÍÏÍÏÍØÍÏÍÏÍÏÍÍÍÍÍÍÍÏͼ NOTE: 0 INDICATES INTEL RESERVED. NO NOT DEFINE 10.6.3 Test Operations To write a TLB entry: 1. Move a doubleword to TR7 that contains the desired physical address, HT, and REP values. HT must contain 1. REP must point to the associative block in which to place the entry. 2. Move a doubleword to TR6 that contains the appropriate linear address, and values for V, D, U, and W. Be sure C=0 for "write" command. Be careful not to write duplicate tags; the results of doing so are undefined. To look up (read) a TLB entry: 1. Move a doubleword to TR6 that contains the appropriate linear address and attributes. Be sure C=1 for "lookup" command. 2. Store TR7. If the HT bit in TR7 indicates a hit, then the other values reveal the TLB contents. If HT indicates a miss, then the other values in TR7 are indeterminate. For the purposes of testing, the V bit functions as another bit of addresss. The V bit for a lookup request should usually be set, so that uninitialized tags do not match. Lookups with V=0 are unpredictable if any tags are uninitialized. Chapter 11 Coprocessing and Multiprocessing ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The 80386 has two levels of support for multiple parallel processing units: þ A highly specialized interface for very closely coupled processors of a type known as coprocessors. þ A more general interface for more loosely coupled processors of unspecified type. 11.1 Coprocessing The components of the coprocessor interface include: þ ET bit of control register zero (CR0) þ The EM, and MP bits of CR0 þ The ESC instructions þ The WAIT instruction þ The TS bit of CR0 þ Exceptions 11.1.1 Coprocessor Identification The 80386 is designed to operate with either an 80287 or 80387 math coprocessor. The ET bit of CR0 indicates which type of coprocessor is present. ET is set automatically by the 80386 after RESET according to the level detected on the ERROR# input. If desired, ET may also be set or reset by loading CR0 with a MOV instruction. If ET is set, the 80386 uses the 32-bit protocol of the 80387; if reset, the 80386 uses the 16-bit protocol of the 80287. 11.1.2 ESC and WAIT Instructions The 80386 interprets the pattern 11011B in the first five bits of an instruction as an opcode intended for a coprocessor. Instructions thus marked are called ESCAPE or ESC instructions. The CPU performs the following functions upon encountering an ESC instruction before sending the instruction to the coprocessor: þ Tests the emulation mode (EM) flag to determine whether coprocessor functions are being emulated by software. þ Tests the TS flag to determine whether there has been a context change since the last ESC instruction. þ For some ESC instructions, tests the ERROR# pin to determine whether the coprocessor detected an error in the previous ESC instruction. The WAIT instruction is not an ESC instruction, but WAIT causes the CPU to perform some of the same tests that it performs upon encountering an ESC instruction. The processor performs the following actions for a WAIT instruction: þ Waits until the coprocessor no longer asserts the BUSY# pin. þ Tests the ERROR# pin (after BUSY# goes inactive). If ERROR# is active, the 80386 signals exception 16, which indicates that the coprocessor encountered an error in the previous ESC instruction. þ WAIT can therefore be used to cause exception 16 if an error is pending from a previous ESC instruction. Note that, if no coprocessor is present, the ERROR# and BUSY# pins should be tied inactive to prevent WAIT from waiting forever or causing spurious exceptions. 11.1.3 EM and MP Flags The EM and MP flags of CR0 control how the processor reacts to coprocessor instructions. The EM bit indicates whether coprocessor functions are to be emulated. If the processor finds EM set when executing an ESC instruction, it signals exception 7, giving the exception handler an opportunity to emulate the ESC instruction. The MP (monitor coprocessor) bit indicates whether a coprocessor is actually attached. The MP flag controls the function of the WAIT instruction. If, when executing a WAIT instruction, the CPU finds MP set, then it tests the TS flag; it does not otherwise test TS during a WAIT instruction. If it finds TS set under these conditions, the CPU signals exception 7. The EM and MP flags can be changed with the aid of a MOV instruction using CR0 as the destination operand and read with the aid of a MOV instruction with CR0 as the source operand. These forms of the MOV instruction can be executed only at privilege level zero. 11.1.4 The Task-Switched Flag The TS bit of CR0 helps to determine when the context of the coprocessor does not match that of the task being executed by the 80386 CPU. The 80386 sets TS each time it performs a task switch (whether triggered by software or by hardware interrupt). If, when interpreting one of the ESC instructions, the CPU finds TS already set, it causes exception 7. The WAIT instruction also causes exception 7 if both TS and MP are set. Operating systems can use this exception to switch the context of the coprocessor to correspond to the current task. Refer to the 80386 System Software Writer's Guide for an example. The CLTS instruction (legal only at privilege level zero) resets the TS flag. 11.1.5 Coprocessor Exceptions Three exceptions aid in interfacing to a coprocessor: interrupt 7 (coprocessor not available), interrupt 9 (coprocessor segment overrun), and interrupt 16 (coprocessor error). 11.1.5.1 Interrupt 7 ÄÄ Coprocessor Not Available This exception occurs in either of two conditions: 1. The CPU encounters an ESC instruction and EM is set. In this case, the exception handler should emulate the instruction that caused the exception. TS may also be set. 2. The CPU encounters either the WAIT instruction or an ESC instruction when both MP and TS are set. In this case, the exception handler should update the state of the coprocessor, if necessary. 11.1.5.2 Interrupt 9 ÄÄ Coprocessor Segment Overrun This exception occurs in protected mode under the following conditions: þ An operand of a coprocessor instruction wraps around an addressing limit (0FFFFH for small segments, 0FFFFFFFFH for big segments, zero for expand-down segments). An operand may wrap around an addressing limit when the segment limit is near an addressing limit and the operand is near the largest valid address in the segment. Because of the wrap-around, the beginning and ending addresses of such an operand will be near opposite ends of the segment. þ Both the first byte and the last byte of the operand (considering wrap-around) are at addresses located in the segment and in present and accessible pages. þ The operand spans inaccessible addresses. There are two ways that such an operand may also span inaccessible addresses: 1. The segment limit is not equal to the addressing limit (e.g., addressing limit is FFFFH and segment limit is FFFDH); therefore, the operand will span addresses that are not within the segment (e.g., an 8-byte operand that starts at valid offset FFFC will span addresses FFFC-FFFF and 0000-0003; however, addresses FFFE and FFFF are not valid, because they exceed the limit); 2. The operand begins and ends in present and accessible pages but intermediate bytes of the operand fall either in a not-present page or in a page to which the current procedure does not have access rights. The address of the failing numerics instruction and data operand may be lost; an FSTENV does not return reliable addresses. As with the 80286/80287, the segment overrun exception should be handled by executing an FNINIT instruction (i.e., an FINIT without a preceding WAIT). The return address on the stack does not necessarily point to the failing instruction nor to the following instruction. The failing numerics instruction is not restartable. Case 2 can be avoided by either aligning all segments on page boundaries or by not starting them within 108 bytes of the start or end of a page. (The maximum size of a coprocessor operand is 108 bytes.) Case 1 can be avoided by making sure that the gap between the last valid offset and the first valid offset of a segment is either no less than 108 bytes or is zero (i.e., the segment is of full size). If neither software system design constraint is acceptable, the exception handler should execute FNINIT and should probably terminate the task. 11.1.5.3 Interrupt 16 ÄÄ Coprocessor Error The numerics coprocessors can detect six different exception conditions during instruction execution. If the detected exception is not masked by a bit in the control word, the coprocessor communicates the fact that an error occurred to the CPU by a signal at the ERROR# pin. The CPU causes interrupt 16 the next time it checks the ERROR# pin, which is only at the beginning of a subsequent WAIT or certain ESC instructions. If the exception is masked, the numerics coprocessor handles the exception according to on-board logic; it does not assert the ERROR# pin in this case. 11.2 General Multiprocessing The components of the general multiprocessing interface include: þ The LOCK# signal þ The LOCK instruction prefix, which gives programmed control of the LOCK# signal. þ Automatic assertion of the LOCK# signal with implicit memory updates by the processor 11.2.1 LOCK and the LOCK# Signal The LOCK instruction prefix and its corresponding output signal LOCK# can be used to prevent other bus masters from interrupting a data movement operation. LOCK may only be used with the following 80386 instructions when they modify memory. An undefined-opcode exception results from using LOCK before any instruction other than: þ Bit test and change: BTS, BTR, BTC. þ Exchange: XCHG. þ Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR. þ One-operand arithmetic and logical: INC, DEC, NOT, and NEG. A locked instruction is only guaranteed to lock the area of memory defined by the destination operand, but it may lock a larger memory area. For example, typical 8086 and 80286 configurations lock the entire physical memory space. The area of memory defined by the destination operand is guaranteed to be locked against access by a processor executing a locked instruction on exactly the same memory area, i.e., an operand with identical starting address and identical length. The integrity of the lock is not affected by the alignment of the memory field. The LOCK signal is asserted for as many bus cycles as necessary to update the entire operand. 11.2.2 Automatic Locking In several instances, the processor itself initiates activity on the data bus. To help ensure that such activities function correctly in multiprocessor configurations, the processor automatically asserts the LOCK# signal. These instances include: þ Acknowledging interrupts. After an interrupt request, the interrupt controller uses the data bus to send the interrupt ID of the interrupt source to the CPU. The CPU asserts LOCK# to ensure that no other data appears on the data bus during this time. þ Setting busy bit of TSS descriptor. The processor tests and sets the busy-bit in the type field of the TSS descriptor when switching to a task. To ensure that two different processors cannot simultaneously switch to the same task, the processor asserts LOCK# while testing and setting this bit. þ Loading of descriptors. While copying the contents of a descriptor from a descriptor table into a segment register, the processor asserts LOCK# so that the descriptor cannot be modified by another processor while it is being loaded. For this action to be effective, operating-system procedures that update descriptors should adhere to the following steps: ÄÄ Use a locked update to the access-rights byte to mark the descriptor not-present. ÄÄ Update the fields of the descriptor. (This may require several memory accesses; therefore, LOCK cannot be used.) ÄÄ Use a locked update to the access-rights byte to mark the descriptor present again. þ Updating page-table A and D bits. The processor exerts LOCK# while updating the A (accessed) and D (dirty) bits of page-table entries. Also the processor bypasses the page-table cache and directly updates these bits in memory. þ Executing XCHG instruction. The 80386 always asserts LOCK during an XCHG instruction that references memory (even if the LOCK prefix is not used). 11.2.3 Cache Considerations Systems programmers must take care when updating shared data that may also be stored in on-chip registers and caches. With the 80386, such shared data includes: þ Descriptors, which may be held in segment registers. A change to a descriptor that is shared among processors should be broadcast to all processors. Segment registers are effectively "descriptor caches". A change to a descriptor will not be utilized by another processor if that processor already has a copy of the old version of the descriptor in a segment register. þ Page tables, which may be held in the page-table cache. A change to a page table that is shared among processors should be broadcast to all processors, so that others can flush their page-table caches and reload them with up-to-date page tables from memory. Systems designers can employ an interprocessor interrupt to handle the above cases. When one processor changes data that may be cached by other processors, it can send an interrupt signal to all other processors that may be affected by the change. If the interrupt is serviced by an interrupt task, the task switch automatically flushes the segment registers. The task switch also flushes the page-table cache if the PDBR (the contents of CR3) of the interrupt task is different from the PDBR of every other task. In multiprocessor systems that need a cacheability signal from the CPU, it is recommended that physical address pin A31 be used to indicate cacheability. Such a system can then possess up to 2 Gbytes of physical memory. The virtual address range available to the programmer is not affected by this convention. Chapter 12 Debugging ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The 80386 brings to Intel's line of microprocessors significant advances in debugging power. The single-step exception and breakpoint exception of previous processors are still available in the 80386, but the principal debugging support takes the form of debug registers. The debug registers support both instruction breakpoints and data breakpoints. Data breakpoints are an important innovation that can save hours of debugging time by pinpointing, for example, exactly when a data structure is being overwritten. The breakpoint registers also eliminate the complexities associated with writing a breakpoint instruction into a code segment (requires a data-segment alias for the code segment) or a code segment shared by multiple tasks (the breakpoint exception can occur in the context of any of the tasks). Breakpoints can even be set in code contained in ROM. 12.1 Debugging Features of the Architecture The features of the 80386 architecture that support debugging include: Reserved debug interrupt vector Permits processor to automatically invoke a debugger task or procedure when an event occurs that is of interest to the debugger. Four debug address registers Permit programmers to specify up to four addresses that the CPU will automatically monitor. Debug control register Allows programmers to selectively enable various debug conditions associated with the four debug addresses. Debug status register Helps debugger identify condition that caused debug exception. Trap bit of TSS (T-bit) Permits monitoring of task switches. Resume flag (RF) of flags register Allows an instruction to be restarted after a debug exception without immediately causing another debug exception due to the same condition. Single-step flag (TF) Allows complete monitoring of program flow by specifying whether the CPU should cause a debug exception with the execution of every instruction. Breakpoint instruction Permits debugger intervention at any point in program execution and aids debugging of debugger programs. Reserved interrupt vector for breakpoint exception Permits processor to automatically invoke a handler task or procedure upon encountering a breakpoint instruction. These features make it possible to invoke a debugger that is either a separate task or a procedure in the context of the current task. The debugger can be invoked under any of the following kinds of conditions: þ Task switch to a specific task. þ Execution of the breakpoint instruction. þ Execution of every instruction. þ Execution of any instruction at a given address. þ Read or write of a byte, word, or doubleword at any specified address. þ Write to a byte, word, or doubleword at any specified address. þ Attempt to change a debug register. 12.2 Debug Registers Six 80386 registers are used to control debug features. These registers are accessed by variants of the MOV instruction. A debug register may be either the source operand or destination operand. The debug registers are privileged resources; the MOV instructions that access them can only be executed at privilege level zero. An attempt to read or write the debug registers when executing at any other privilege level causes a general protection exception. Figure 12-1 shows the format of the debug registers. Figure 12-1. Debug Registers 31 23 15 7 0 ÉÍÍÍÑÍÍÍÑÍÍÍÑÍÍÍØÍÍÍÑÍÍÍÑÍÍÍÑÍÍÍØÍÍÍÑÍÑÍÍÍÍÍÑÍÑÍØÍÑÍÑÍÑÍÑÍÑÍÑÍÑÍ» ºLEN³R/W³LEN³R/W³LEN³R/W³LEN³R/W³ ³ ³ ³G³L³G³L³G³L³G³L³G³Lº º ³ ³ ³ ³ ³ ³ ³ ³0 0³0³0 0 0³ ³ ³ ³ ³ ³ ³ ³ ³ ³ º DR7 º 3 ³ 3 ³ 2 ³ 2 ³ 1 ³ 1 ³ 0 ³ 0 ³ ³ ³ ³E³E³3³3³2³2³1³1³0³0º ÇÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÁÄÄÄÅÄÂÄÅÄÅÄÄÄÄÄÁÄÁÄÁÄÁÄÁÄÁÄÅÄÅÄÅÄÅĶ º ³B³B³B³ ³B³B³B³Bº º0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0³ ³ ³ ³0 0 0 0 0 0 0 0 0³ ³ ³ ³ º DR6 º ³T³S³D³ ³3³2³1³0º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÁÄÁÄÁÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÁÄÁÄÁÄÁĶ º º º RESERVED º DR5 º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º RESERVED º DR4 º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º BREAKPOINT 3 LINEAR ADDRESS º DR3 º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º BREAKPOINT 2 LINEAR ADDRESS º DR2 º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º BREAKPOINT 1 LINEAR ADDRESS º DR1 º º ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÅÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ º º º BREAKPOINT 0 LINEAR ADDRESS º DR0 º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ NOTE 0 MEANS INTEL RESERVED. DO NOT DEFINE. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 12.2.1 Debug Address Registers (DR0-DR3) Each of these registers contains the linear address associated with one of four breakpoint conditions. Each breakpoint condition is further defined by bits in DR7. The debug address registers are effective whether or not paging is enabled. The addresses in these registers are linear addresses. If paging is enabled, the linear addresses are translated into physical addresses by the processor's paging mechanism (as explained in Chapter 5). If paging is not enabled, these linear addresses are the same as physical addresses. Note that when paging is enabled, different tasks may have different linear-to-physical address mappings. When this is the case, an address in a debug address register may be relevant to one task but not to another. For this reason the 80386 has both global and local enable bits in DR7. These bits indicate whether a given debug address has a global (all tasks) or local (current task only) relevance. 12.2.2 Debug Control Register (DR7) The debug control register shown in Figure 12-1 both helps to define the debug conditions and selectively enables and disables those conditions. For each address in registers DR0-DR3, the corresponding fields R/W0 through R/W3 specify the type of action that should cause a breakpoint. The processor interprets these bits as follows: 00 ÄÄ Break on instruction execution only 01 ÄÄ Break on data writes only 10 ÄÄ undefined 11 ÄÄ Break on data reads or writes but not instruction fetches Fields LEN0 through LEN3 specify the length of data item to be monitored. A length of 1, 2, or 4 bytes may be specified. The values of the length fields are interpreted as follows: 00 ÄÄ one-byte length 01 ÄÄ two-byte length 10 ÄÄ undefined 11 ÄÄ four-byte length If RWn is 00 (instruction execution), then LENn should also be 00. Any other length is undefined. The low-order eight bits of DR7 (L0 through L3 and G0 through G3) selectively enable the four address breakpoint conditions. There are two levels of enabling: the local (L0 through L3) and global (G0 through G3) levels. The local enable bits are automatically reset by the processor at every task switch to avoid unwanted breakpoint conditions in the new task. The global enable bits are not reset by a task switch; therefore, they can be used for conditions that are global to all tasks. The LE and GE bits control the "exact data breakpoint match" feature of the processor. If either LE or GE is set, the processor slows execution so that data breakpoints are reported on the instruction that causes them. It is recommended that one of these bits be set whenever data breakpoints are armed. The processor clears LE at a task switch but does not clear GE. 12.2.3 Debug Status Register (DR6) The debug status register shown in Figure 12-1 permits the debugger to determine which debug conditions have occurred. When the processor detects an enabled debug exception, it sets the low-order bits of this register (B0 thru B3) before entering the debug exception handler. Bn is set if the condition described by DRn, LENn, and R/Wn occurs. (Note that the processor sets Bn regardless of whether Gn or Ln is set. If more than one breakpoint condition occurs at one time and if the breakpoint trap occurs due to an enabled condition other than n, Bn may be set, even though neither Gn nor Ln is set.) The BT bit is associated with the T-bit (debug trap bit) of the TSS (refer to 7 for the location of the T-bit). The processor sets the BT bit before entering the debug handler if a task switch has occurred and the T-bit of the new TSS is set. There is no corresponding bit in DR7 that enables and disables this trap; the T-bit of the TSS is the sole enabling bit. The BS bit is associated with the TF (trap flag) bit of the EFLAGS register. The BS bit is set if the debug handler is entered due to the occurrence of a single-step exception. The single-step trap is the highest-priority debug exception; therefore, when BS is set, any of the other debug status bits may also be set. The BD bit is set if the next instruction will read or write one of the eight debug registers and ICE-386 is also using the debug registers at the same time. Note that the bits of DR6 are never cleared by the processor. To avoid any confusion in identifying the next debug exception, the debug handler should move zeros to DR6 immediately before returning. 12.2.4 Breakpoint Field Recognition The linear address and LEN field for each of the four breakpoint conditions define a range of sequential byte addresses for a data breakpoint. The LEN field permits specification of a one-, two-, or four-byte field. Two-byte fields must be aligned on word boundaries (addresses that are multiples of two) and four-byte fields must be aligned on doubleword boundaries (addresses that are multiples of four). These requirements are enforced by the processor; it uses the LEN bits to mask the low-order bits of the addresses in the debug address registers. Improperly aligned code or data breakpoint addresses will not yield the expected results. A data read or write breakpoint is triggered if any of the bytes participating in a memory access is within the field defined by a breakpoint address register and the corresponding LEN field. Table 12-1 gives some examples of breakpoint fields with memory references that both do and do not cause traps. To set a data breakpoint for a misaligned field longer than one byte, it may be desirable to put two sets of entries in the breakpoint register such that each entry is properly aligned and the two entries together span the length of the field. Instruction breakpoint addresses must have a length specification of one byte (LEN = 00); other values are undefined. The processor recognizes an instruction breakpoint address only when it points to the first byte of an instruction. If the instruction has any prefixes, the breakpoint address must point to the first prefix. Table 12-1. Breakpoint Field Recognition Examples Address (hex) Length DR0 0A0001 1 (LEN0 = 00) Register Contents DR1 0A0002 1 (LEN1 = 00) DR2 0B0002 2 (LEN2 = 01) DR3 0C0000 4 (LEN3 = 11) Some Examples of Memory 0A0001 1 References That Cause Traps 0A0002 1 0A0001 2 0A0002 2 0B0002 2 0B0001 4 0C0000 4 0C0001 2 0C0003 1 Some Examples of Memory 0A0000 1 References That Don't Cause Traps 0A0003 4 0B0000 2 0C0004 4 12.3 Debug Exceptions Two of the interrupt vectors of the 80386 are reserved for exceptions that relate to debugging. Interrupt 1 is the primary means of invoking debuggers designed expressly for the 80386; interrupt 3 is intended for debugging debuggers and for compatibility with prior processors in Intel's 8086 processor family. 12.3.1 Interrupt 1 ÄÄ Debug Exceptions The handler for this exception is usually a debugger or part of a debugging system. The processor causes interrupt 1 for any of several conditions. The debugger can check flags in DR6 and DR7 to determine what condition caused the exception and what other conditions might be in effect at the same time. Table 12-2 associates with each breakpoint condition the combination of bits that indicate when that condition has caused the debug exception. Instruction address breakpoint conditions are faults, while other debug conditions are traps. The debug exception may report either or both at one time. The following paragraphs present details for each class of debug exception. Table 12-2. Debug Exception Conditions Flags to Test Condition BS=1 Single-step trap B0=1 AND (GE0=1 OR LE0=1) Breakpoint DR0, LEN0, R/W0 B1=1 AND (GE1=1 OR LE1=1) Breakpoint DR1, LEN1, R/W1 B2=1 AND (GE2=1 OR LE2=1) Breakpoint DR2, LEN2, R/W2 B3=1 AND (GE3=1 OR LE3=1) Breakpoint DR3, LEN3, R/W3 BD=1 Debug registers not available; in use by ICE-386. BT=1 Task switch 12.3.1.1 Instruction Addrees Breakpoint The processor reports an instruction-address breakpoint before it executes the instruction that begins at the given address; i.e., an instruction- address breakpoint exception is a fault. The RF (restart flag) permits the debug handler to retry instructions that cause other kinds of faults in addition to debug faults. When it detects a fault, the processor automatically sets RF in the flags image that it pushes onto the stack. (It does not, however, set RF for traps and aborts.) When RF is set, it causes any debug fault to be ignored during the next instruction. (Note, however, that RF does not cause breakpoint traps to be ignored, nor other kinds of faults.) The processor automatically clears RF at the successful completion of every instruction except after the IRET instruction, after the POPF instruction, and after a JMP, CALL, or INT instruction that causes a task switch. These instructions set RF to the value specified by the memory image of the EFLAGS register. The processor automatically sets RF in the EFLAGS image on the stack before entry into any fault handler. Upon entry into the fault handler for instruction address breakpoints, for example, RF is set in the EFLAGS image on the stack; therefore, the IRET instruction at the end of the handler will set RF in the EFLAGS register, and execution will resume at the breakpoint address without generating another breakpoint fault at the same address. If, after a debug fault, RF is set and the debug handler retries the faulting instruction, it is possible that retrying the instruction will raise other faults. The retry of the instruction after these faults will also be done with RF=1, with the result that debug faults continue to be ignored. The processor clears RF only after successful completion of the instruction. Real-mode debuggers can control the RF flag by using a 32-bit IRET. A 16-bit IRET instruction does not affect the RF bit (which is in the high-order 16 bits of EFLAGS). To use a 32-bit IRET, the debugger must rearrange the stack so that it holds appropriate values for the 32-bit EIP, CS, and EFLAGS (with RF set in the EFLAGS image). Then executing an IRET with an operand size prefix causes a 32-bit return, popping the RF flag into EFLAGS. 12.3.1.2 Data Address Breakpoint A data-address breakpoint exception is a trap; i.e., the processor reports a data-address breakpoint after executing the instruction that accesses the given memory item. When using data breakpoints it is recommended that either the LE or GE bit of DR7 be set also. If either LE or GE is set, any data breakpoint trap is reported exactly after completion of the instruction that accessed the specified memory item. This exact reporting is accomplished by forcing the 80386 execution unit to wait for completion of data operand transfers before beginning execution of the next instruction. If neither GE nor LE is set, data breakpoints may not be reported until one instruction after the data is accessed or may not be reported at all. This is due to the fact that, normally, instruction execution is overlapped with memory transfers to such a degree that execution of the next instruction may begin before memory transfers for the prior instruction are completed. If a debugger needs to preserve the contents of a write breakpoint location, it should save the original contents before setting a write breakpoint. Because data breakpoints are traps, a write into a breakpoint location will complete before the trap condition is reported. The handler can report the saved value after the breakpoint is triggered. The data in the debug registers can be used to address the new value stored by the instruction that triggered the breakpoint. 12.3.1.3 General Detect Fault This exception occurs when an attempt is made to use the debug registers at the same time that ICE-386 is using them. This additional protection feature is provided to guarantee that ICE-386 can have full control over the debug-register resources when required. ICE-386 uses the debug-registers; therefore, a software debugger that also uses these registers cannot run while ICE-386 is in use. The exception handler can detect this condition by examining the BD bit of DR6. 12.3.1.4 Single-Step Trap This debug condition occurs at the end of an instruction if the trap flag (TF) of the flags register held the value one at the beginning of that instruction. Note that the exception does not occur at the end of an instruction that sets TF. For example, if POPF is used to set TF, a single-step trap does not occur until after the instruction that follows POPF. The processor clears the TF bit before invoking the handler. If TF=1 in the flags image of a TSS at the time of a task switch, the exception occurs after the first instruction is executed in the new task. The single-step flag is normally not cleared by privilege changes inside a task. INT instructions, however, do clear TF. Therefore, software debuggers that single-step code must recognize and emulate INT n or INTO rather than executing them directly. To maintain protection, system software should check the current execution privilege level after any single step interrupt to see whether single stepping should continue at the current privilege level. The interrupt priorities in hardware guarantee that if an external interrupt occurs, single stepping stops. When both an external interrupt and a single step interrupt occur together, the single step interrupt is processed first. This clears the TF bit. After saving the return address or switching tasks, the external interrupt input is examined before the first instruction of the single step handler executes. If the external interrupt is still pending, it is then serviced. The external interrupt handler is not single-stepped. To single step an interrupt handler, just single step an INT n instruction that refers to the interrupt handler. 12.3.1.5 Task Switch Breakpoint The debug exception also occurs after a switch to an 80386 task if the T-bit of the new TSS is set. The exception occurs after control has passed to the new task, but before the first instruction of that task is executed. The exception handler can detect this condition by examining the BT bit of the debug status register DR6. Note that if the debug exception handler is a task, the T-bit of its TSS should not be set. Failure to observe this rule will cause the processor to enter an infinite loop. 12.3.2 Interrupt 3 ÄÄ Breakpoint Exception This exception is caused by execution of the breakpoint instruction INT 3. Typically, a debugger prepares a breakpoint by substituting the opcode of the one-byte breakpoint instruction in place of the first opcode byte of the instruction to be trapped. When execution of the INT 3 instruction causes the exception handler to be invoked, the saved value of ES:EIP points to the byte following the INT 3 instruction. With prior generations of processors, this feature is used extensively for trapping execution of specific instructions. With the 80386, the needs formerly filled by this feature are more conveniently solved via the debug registers and interrupt 1. However, the breakpoint exception is still useful for debugging debuggers, because the breakpoint exception can vector to a different exception handler than that used by the debugger. The breakpoint exception can also be useful when it is necessary to set a greater number of breakpoints than permitted by the debug registers. PART III COMPATIBILITY Chapter 13 Executing 80286 Protected-Mode Code ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 13.1 80286 Code Executes as a Subset of the 80386 In general, programs designed for execution in protected mode on an 80286 execute without modification on the 80386, because the features of the 80286 are a subset of those of the 80386. All the descriptors used by the 80286 are supported by the 80386 as long as the Intel-reserved word (last word) of the 80286 descriptor is zero. The descriptors for data segments, executable segments, local descriptor tables, and task gates are common to both the 80286 and the 80386. Other 80286 descriptorsÄÄTSS segment, call gate, interrupt gate, and trap gateÄÄare supported by the 80386. The 80386 also has new versions of descriptors for TSS segment, call gate, interrupt gate, and trap gate that support the 32-bit nature of the 80386. Both sets of descriptors can be used simultaneously in the same system. For those descriptors that are common to both the 80286 and the 80386, the presence of zeros in the final word causes the 80386 to interpret these descriptors exactly as 80286 does; for example: Base Address The high-order eight bits of the 32-bit base address are zero, limiting base addresses to 24 bits. Limit The high-order four bits of the limit field are zero, restricting the value of the limit field to 64K. Granularity bit The granularity bit is zero, which implies that the value of the 16-bit limit is interpreted in units of one byte. B-bit In a data-segment descriptor, the B-bit is zero, implying that the segment is no larger than 64 Kbytes. D-bit In an executable-segment descriptor, the D-bit is zero, implying that 16-bit addressing and operands are the default. For formats of these descriptors and documentation of their use refer to the iAPX 286 Programmer's Reference Manual. 13.2 Two ways to Execute 80286 Tasks When porting 80286 programs to the 80386, there are two cases to consider: 1. Porting an entire 80286 system to the 80386, complete with 80286 operating system, loader, and system builder. In this case, all tasks will have 80286 TSSs. The 80386 is being used as a faster 286. 2. Porting selected 80286 applications to run in an 80386 environment with an 80386 operating system, loader, and system builder. In this case, the TSSs used to represent 80286 tasks should be changed to 80386 TSSs. It is theoretically possible to mix 80286 and 80386 TSSs, but the benefits are slight and the problems are great. It is recommended that all tasks in a 80386 software system have 80386 TSSs. It is not necessary to change the 80286 object modules themselves; TSSs are usually constructed by the operating system, by the loader, or by the system builder. Refer to Chapter 16 for further discussion of the interface between 16-bit and 32-bit code. 13.3 Differences From 80286 The few differences that do exist primarily affect operating system code. 13.3.1 Wraparound of 80286 24-Bit Physical Address Space With the 80286, any base and offset combination that addresses beyond 16M bytes wraps around to the first megabyte of the 80286 address space. With the 80386, since it has a greater physical address space, any such address falls into the 17th megabyte. In the unlikely event that any software depends on this anomaly, the same effect can be simulated on the 80386 by using paging to map the first 64K bytes of the 17th megabyte of logical addresses to physical addresses in the first megabyte. 13.3.2 Reserved Word of Descriptor Because the 80386 uses the contents of the reserved word (last word) of every descriptor, 80286 programs that place values in this word may not execute correctly on the 80386. 13.3.3 New Descriptor Type Codes Operating-system code that manages space in descriptor tables often uses an invalid value in the access-rights field of descriptor-table entries to identify unused entries. Access rights values of 80H and 00H remain invalid for both the 80286 and 80386. Other values that were invalid on for the 80286 may be valid for the 80386 because of the additional descriptor types defined by the 80386. 13.3.4 Restricted Semantics of LOCK The 80286 processor implements the bus lock function differently than the 80386. Programs that use forms of memory locking specific to the 80286 may not execute properly when transported to a specific application of the 80386. The LOCK prefix and its corresponding output signal should only be used to prevent other bus masters from interrupting a data movement operation. LOCK may only be used with the following 80386 instructions when they modify memory. An undefined-opcode exception results from using LOCK before any other instruction. þ Bit test and change: BTS, BTR, BTC. þ Exchange: XCHG. þ One-operand arithmetic and logical: INC, DEC, NOT, and NEG. þ Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR. A locked instruction is guaranteed to lock only the area of memory defined by the destination operand, but may lock a larger memory area. For example, typical 8086 and 80286 configurations lock the entire physical memory space. With the 80386, the defined area of memory is guaranteed to be locked against access by a processor executing a locked instruction on exactly the same memory area, i.e., an operand with identical starting address and identical length. 13.3.5 Additional Exceptions The 80386 defines new exceptions that can occur even in systems designed for the 80286. þ Exception #6 ÄÄ invalid opcode This exception can result from improper use of the LOCK instruction. þ Exception #14 ÄÄ page fault This exception may occur in an 80286 program if the operating system enables paging. Paging can be used in a system with 80286 tasks as long as all tasks use the same page directory. Because there is no place in an 80286 TSS to store the PDBR, switching to an 80286 task does not change the value of PDBR. Tasks ported from the 80286 should be given 80386 TSSs so they can take full advantage of paging. Chapter 14 80386 Real-Address Mode ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The real-address mode of the 80386 executes object code designed for execution on 8086, 8088, 80186, or 80188 processors, or for execution in the real-address mode of an 80286: In effect, the architecture of the 80386 in this mode is almost identical to that of the 8086, 8088, 80186, and 80188. To a programmer, an 80386 in real-address mode appears as a high-speed 8086 with extensions to the instruction set and registers. The principal features of this architecture are defined in Chapters 2 and 3. This chapter discusses certain additional topics that complete the system programmer's view of the 80386 in real-address mode: þ Address formation. þ Extensions to registers and instructions. þ Interrupt and exception handling. þ Entering and leaving real-address mode. þ Real-address-mode exceptions. þ Differences from 8086. þ Differences from 80286 real-address mode. 14.1 Physical Address Formation The 80386 provides a one Mbyte + 64 Kbyte memory space for an 8086 program. Segment relocation is performed as in the 8086: the 16-bit value in a segment selector is shifted left by four bits to form the base address of a segment. The effective address is extended with four high order zeros and added to the base to form a linear address as Figure 14-1 illustrates. (The linear address is equivalent to the physical address, because paging is not used in real-address mode.) Unlike the 8086, the resulting linear address may have up to 21 significant bits. There is a possibility of a carry when the base address is added to the effective address. On the 8086, the carried bit is truncated, whereas on the 80386 the carried bit is stored in bit position 20 of the linear address. Unlike the 8086 and 80286, 32-bit effective addresses can be generated (via the address-size prefix); however, the value of a 32-bit address may not exceed 65535 without causing an exception. For full compatibility with 80286 real-address mode, pseudo-protection faults (interrupt 12 or 13 with no error code) occur if an effective address is generated outside the range 0 through 65535. Figure 14-1. Real-Address Mode Address Formation 19 3 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍ» BASE º 16-BIT SEGMENT SELECTOR ³ 0 0 0 0 º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍͼ + 19 15 0 ÉÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» OFFSET º 0 0 0 0 ³ 16-BIT EFFECTIVE ADDRESS º ÈÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ = 20 0 LINEAR ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ADDRESS º X X X X X X X X X X X X X X X X X X X X X X º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 14.2 Registers and Instructions The register set available in real-address mode includes all the registers defined for the 8086 plus the new registers introduced by the 80386: FS, GS, debug registers, control registers, and test registers. New instructions that explicitly operate on the segment registers FS and GS are available, and the new segment-override prefixes can be used to cause instructions to utilize FS and GS for address calculations. Instructions can utilize 32-bit operands through the use of the operand size prefix. The instruction codes that cause undefined opcode traps (interrupt 6) include instructions of the protected mode that manipulate or interrogate 80386 selectors and descriptors; namely, VERR, VERW, LAR, LSL, LTR, STR, LLDT, and SLDT. Programs executing in real-address mode are able to take advantage of the new applications-oriented instructions added to the architecture by the introduction of the 80186/80188, 80286 and 80386: þ New instructions introduced by 80186/80188 and 80286. ÄÄ PUSH immediate data ÄÄ Push all and pop all (PUSHA and POPA) ÄÄ Multiply immediate data ÄÄ Shift and rotate by immediate count ÄÄ String I/O ÄÄ ENTER and LEAVE ÄÄ BOUND þ New instructions introduced by 80386. ÄÄ LSS, LFS, LGS instructions ÄÄ Long-displacement conditional jumps ÄÄ Single-bit instructions ÄÄ Bit scan ÄÄ Double-shift instructions ÄÄ Byte set on condition ÄÄ Move with sign/zero extension ÄÄ Generalized multiply ÄÄ MOV to and from control registers ÄÄ MOV to and from test registers ÄÄ MOV to and from debug registers 14.3 Interrupt and Exception Handling Interrupts and exceptions in 80386 real-address mode work as much as they do on an 8086. Interrupts and exceptions vector to interrupt procedures via an interrupt table. The processor multiplies the interrupt or exception identifier by four to obtain an index into the interrupt table. The entries of the interrupt table are far pointers to the entry points of interrupt or exception handler procedures. When an interrupt occurs, the processor pushes the current values of CS:IP onto the stack, disables interrupts, clears TF (the single-step flag), then transfers control to the location specified in the interrupt table. An IRET instruction at the end of the handler procedure reverses these steps before returning control to the interrupted procedure. The primary difference in the interrupt handling of the 80386 compared to the 8086 is that the location and size of the interrupt table depend on the contents of the IDTR (IDT register). Ordinarily, this fact is not apparent to programmers, because, after RESET, the IDTR contains a base address of 0 and a limit of 3FFH, which is compatible with the 8086. However, the LIDT instruction can be used in real-address mode to change the base and limit values in the IDTR. Refer to Chapter 9 for details on the IDTR, and the LIDT and SIDT instructions. If an interrupt occurs and the corresponding entry of the interrupt table is beyond the limit stored in the IDTR, the processor raises exception 8. 14.4 Entering and Leaving Real-Address Mode Real-address mode is in effect after a signal on the RESET pin. Even if the system is going to be used in protected mode, the start-up program will execute in real-address mode temporarily while initializing for protected mode. 14.4.1 Switching to Protected Mode The only way to leave real-address mode is to switch to protected mode. The processor enters protected mode when a MOV to CR0 instruction sets the PE (protection enable) bit in CR0. (For compatibility with the 80286, the LMSW instruction may also be used to set the PE bit.) Refer to Chapter 10 "Initialization" for other aspects of switching to protected mode. 14.5 Switching Back to Real-Address Mode The processor reenters real-address mode if software clears the PE bit in CR0 with a MOV to CR0 instruction. A procedure that attempts to do this, however, should proceed as follows: 1. If paging is enabled, perform the following sequence: þ Transfer control to linear addresses that have an identity mapping; i.e., linear addresses equal physical addresses. þ Clear the PG bit in CR0. þ Move zeros to CR3 to clear out the paging cache. 2. Transfer control to a segment that has a limit of 64K (FFFFH). This loads the CS register with the limit it needs to have in real mode. 3. Load segment registers SS, DS, ES, FS, and GS with a selector that points to a descriptor containing the following values, which are appropriate to real mode: þ Limit = 64K (FFFFH) þ Byte granular (G = 0) þ Expand up (E = 0) þ Writable (W = 1) þ Present (P = 1) þ Base = any value 4. Disable interrupts. A CLI instruction disables INTR interrupts. NMIs can be disabled with external circuitry. 5. Clear the PE bit. 6. Jump to the real mode code to be executed using a far JMP. This action flushes the instruction queue and puts appropriate values in the access rights of the CS register. 7. Use the LIDT instruction to load the base and limit of the real-mode interrupt vector table. 8. Enable interrupts. 9. Load the segment registers as needed by the real-mode code. 14.6 Real-Address Mode Exceptions The 80386 reports some exceptions differently when executing in real-address mode than when executing in protected mode. Table 14-1 details the real-address-mode exceptions. 14.7 Differences From 8086 In general, the 80386 in real-address mode will correctly execute ROM-based software designed for the 8086, 8088, 80186, and 80188. Following is a list of the minor differences between 8086 execution on the 80386 and on an 8086. 1. Instruction clock counts. The 80386 takes fewer clocks for most instructions than the 8086/8088. The areas most likely to be affected are: þ Delays required by I/O devices between I/O operations. þ Assumed delays with 8086/8088 operating in parallel with an 8087. 2. Divide Exceptions Point to the DIV instruction. Divide exceptions on the 80386 always leave the saved CS:IP value pointing to the instruction that failed. On the 8086/8088, the CS:IP value points to the next instruction. 3. Undefined 8086/8088 opcodes. Opcodes that were not defined for the 8086/8088 will cause exception 6 or will execute one of the new instructions defined for the 80386. 4. Value written by PUSH SP. The 80386 pushes a different value on the stack for PUSH SP than the 8086/8088. The 80386 pushes the value of SP before SP is incremented as part of the push operation; the 8086/8088 pushes the value of SP after it is incremented. If the value pushed is important, replace PUSH SP instructions with the following three instructions: PUSH BP MOV BP, SP XCHG BP, [BP] This code functions as the 8086/8088 PUSH SP instruction on the 80386. 5. Shift or rotate by more than 31 bits. The 80386 masks all shift and rotate counts to the low-order five bits. This MOD 32 operation limits the count to a maximum of 31 bits, thereby limiting the time that interrupt response is delayed while the instruction is executing. 6. Redundant prefixes. The 80386 sets a limit of 15 bytes on instruction length. The only way to violate this limit is by putting redundant prefixes before an instruction. Exception 13 occurs if the limit on instruction length is violated. The 8086/8088 has no instruction length limit. 7. Operand crossing offset 0 or 65,535. On the 8086, an attempt to access a memory operand that crosses offset 65,535 (e.g., MOV a word to offset 65,535) or offset 0 (e.g., PUSH a word when SP = 1) causes the offset to wrap around modulo 65,536. The 80386 raises an exception in these casesÄÄexception 13 if the segment is a data segment (i.e., if CS, DS, ES, FS, or GS is being used to address the segment), exception 12 if the segment is a stack segment (i.e., if SS is being used). 8. Sequential execution across offset 65,535. On the 8086, if sequential execution of instructions proceeds past offset 65,535, the processor fetches the next instruction byte from offset 0 of the same segment. On the 80386, the processor raises exception 13 in such a case. 9. LOCK is restricted to certain instructions. The LOCK prefix and its corresponding output signal should only be used to prevent other bus masters from interrupting a data movement operation. The 80386 always asserts the LOCK signal during an XCHG instruction with memory (even if the LOCK prefix is not used). LOCK may only be used with the following 80386 instructions when they update memory: BTS, BTR, BTC, XCHG, ADD, ADC, SUB, SBB, INC, DEC, AND, OR, XOR, NOT, and NEG. An undefined-opcode exception (interrupt 6) results from using LOCK before any other instruction. 10. Single-stepping external interrupt handlers. The priority of the 80386 single-step exception is different from that of the 8086/8088. The change prevents an external interrupt handler from being single-stepped if the interrupt occurs while a program is being single-stepped. The 80386 single-step exception has higher priority that any external interrupt. The 80386 will still single-step through an interrupt handler invoked by the INT instructions or by an exception. 11. IDIV exceptions for quotients of 80H or 8000H. The 80386 can generate the largest negative number as a quotient for the IDIV instruction. The 8086/8088 causes exception zero instead. 12. Flags in stack. The setting of the flags stored by PUSHF, by interrupts, and by exceptions is different from that stored by the 8086 in bit positions 12 through 15. On the 8086 these bits are stored as ones, but in 80386 real-address mode bit 15 is always zero, and bits 14 through 12 reflect the last value loaded into them. 13. NMI interrupting NMI handlers. After an NMI is recognized on the 80386, the NMI interrupt is masked until an IRET instruction is executed. 14. Coprocessor errors vector to interrupt 16. Any 80386 system with a coprocessor must use interrupt vector 16 for the coprocessor error exception. If an 8086/8088 system uses another vector for the 8087 interrupt, both vectors should point to the coprocessor-error exception handler. 15. Numeric exception handlers should allow prefixes. On the 80386, the value of CS:IP saved for coprocessor exceptions points at any prefixes before an ESC instruction. On 8086/8088 systems, the saved CS:IP points to the ESC instruction. 16. Coprocessor does not use interrupt controller. The coprocessor error signal to the 80386 does not pass through an interrupt controller (an 8087 INT signal does). Some instructions in a coprocessor error handler may need to be deleted if they deal with the interrupt controller. 17. Six new interrupt vectors. The 80386 adds six exceptions that arise only if the 8086 program has a hidden bug. It is recommended that exception handlers be added that treat these exceptions as invalid operations. This additional software does not significantly affect the existing 8086 software because the interrupts do not normally occur. These interrupt identifiers should not already have been used by the 8086 software, because they are in the range reserved by Intel. Table 14-2 describes the new 80386 exceptions. 18. One megabyte wraparound. The 80386 does not wrap addresses at 1 megabyte in real-address mode. On members of the 8086 family, it possible to specify addresses greater than one megabyte. For example, with a selector value 0FFFFH and an offset of 0FFFFH, the effective address would be 10FFEFH (1 Mbyte + 65519). The 8086, which can form adresses only up to 20 bits long, truncates the high-order bit, thereby "wrapping" this address to 0FFEFH. However, the 80386, which can form addresses up to 32 bits long does not truncate such an address. Table 14-1. 80386 Real-Address Mode Exceptions Description Interrupt Function that Can Return Address Number Generate the Exception Points to Faulting Instruction Divide error 0 DIV, IDIV YES Debug exceptions 1 All Some debug exceptions point to the faulting instruction, others to the next instruction. The exception handler can determine which has occurred by examining DR6. Breakpoint 3 INT NO Overflow 4 INTO NO Bounds check 5 BOUND YES Invalid opcode 6 Any undefined opcode or LOCK YES used with wrong instruction Coprocessor not available 7 ESC or WAIT YES Interrupt table limit too small 8 INT vector is not within IDTR YES limit Reserved 9-12 Stack fault 12 Memory operand crosses offset YES 0 or 0FFFFH Pseudo-protection exception 13 Memory operand crosses offset YES 0FFFFH or attempt to execute past offset 0FFFFH or instruction longer than 15 bytes Reserved 14,15 Coprocessor error 16 ESC or WAIT YES Coprocessor errors are reported on the first ESC or WAIT instruction after the ESC instruction that caused the error. Two-byte SW interrupt 0-255 INT n NO Table 14-2. New 80386 Exceptions Interrupt Function Identifier 5 A BOUND instruction was executed with a register value outside the limit values. 6 An undefined opcode was encountered or LOCK was used improperly before an instruction to which it does not apply. 7 The EM bit in the MSW is set when an ESC instruction was encountered. This exception also occurs on a WAIT instruction if TS is set. 8 An exception or interrupt has vectored to an interrupt table entry beyond the interrupt table limit in IDTR. This can occur only if the LIDT instruction has changed the limit from the default value of 3FFH, which is enough for all 256 interrupt IDs. 12 Operand crosses extremes of stack segment, e.g., MOV operation at offset 0FFFFH or push with SP=1 during PUSH, CALL, or INT. 13 Operand crosses extremes of a segment other than a stack segment; or sequential instruction execution attempts to proceed beyond offset 0FFFFH; or an instruction is longer than 15 bytes (including prefixes). 14.8 Differences From 80286 Real-Address Mode The few differences that exist between 80386 real-address mode and 80286 real-address mode are not likely to affect any existing 80286 programs except possibly the system initialization procedures. 14.8.1 Bus Lock The 80286 processor implements the bus lock function differently than the 80386. Programs that use forms of memory locking specific to the 80286 may not execute properly if transported to a specific application of the 80386. The LOCK prefix and its corresponding output signal should only be used to prevent other bus masters from interrupting a data movement operation. LOCK may only be used with the following 80386 instructions when they modify memory. An undefined-opcode exception results from using LOCK before any other instruction. þ Bit test and change: BTS, BTR, BTC. þ Exchange: XCHG. þ One-operand arithmetic and logical: INC, DEC, NOT, and NEG. þ Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR. A locked instruction is guaranteed to lock only the area of memory defined by the destination operand, but may lock a larger memory area. For example, typical 8086 and 80286 configurations lock the entire physical memory space. With the 80386, the defined area of memory is guranteed to be locked against access by a processor executing a locked instruction on exactly the same memory area, i.e., an operand with identical starting address and identical length. 14.8.2 Location of First Instruction The starting location is 0FFFFFFF0H (sixteen bytes from end of 32-bit address space) on the 80386 rather than 0FFFFF0H (sixteen bytes from end of 24-bit address space) as on the 80286. Many 80286 ROM initialization programs will work correctly in this new environment. Others can be made to work correctly with external hardware that redefines the signals on A{31-20}. 14.8.3 Initial Values of General Registers On the 80386, certain general registers may contain different values after RESET than on the 80286. This should not cause compatibility problems, because the content of 8086 registers after RESET is undefined. If self-test is requested during the reset sequence and errors are detected in the 80386 unit, EAX will contain a nonzero value. EDX contains the component and revision identifier. Refer to Chapter 10 for more information. 14.8.4 MSW Initialization The 80286 initializes the MSW register to FFF0H, but the 80386 initializes this register to 0000H. This difference should have no effect, because the bits that are different are undefined on the 80286. Programs that read the value of the MSW will behave differently on the 80386 only if they depend on the setting of the undefined, high-order bits. Chapter 15 Virtual 8086 Mode ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The 80386 supports execution of one or more 8086, 8088, 80186, or 80188 programs in an 80386 protected-mode environment. An 8086 program runs in this environment as part of a V86 (virtual 8086) task. V86 tasks take advantage of the hardware support of multitasking offered by the protected mode. Not only can there be multiple V86 tasks, each one executing an 8086 program, but V86 tasks can be multiprogrammed with other 80386 tasks. The purpose of a V86 task is to form a "virtual machine" with which to execute an 8086 program. A complete virtual machine consists not only of 80386 hardware but also of systems software. Thus, the emulation of an 8086 is the result of cooperation between hardware and software: þ The hardware provides a virtual set of registers (via the TSS), a virtual memory space (the first megabyte of the linear address space of the task), and directly executes all instructions that deal with these registers and with this address space. þ The software controls the external interfaces of the virtual machine (I/O, interrupts, and exceptions) in a manner consistent with the larger environment in which it executes. In the case of I/O, software can choose either to emulate I/O instructions or to let the hardware execute them directly without software intervention. Software that helps implement virtual 8086 machines is called a V86 monitor. 15.1 Executing 8086 Code The processor executes in V86 mode when the VM (virtual machine) bit in the EFLAGS register is set. The processor tests this flag under two general conditions: 1. When loading segment registers to know whether to use 8086-style address formation. 2. When decoding instructions to determine which instructions are sensitive to IOPL. Except for these two modifications to its normal operations, the 80386 in V86 mode operated much as in protected mode. 15.1.1 Registers and Instructions The register set available in V86 mode includes all the registers defined for the 8086 plus the new registers introduced by the 80386: FS, GS, debug registers, control registers, and test registers. New instructions that explicitly operate on the segment registers FS and GS are available, and the new segment-override prefixes can be used to cause instructions to utilize FS and GS for address calculations. Instructions can utilize 32-bit operands through the use of the operand size prefix. 8086 programs running as V86 tasks are able to take advantage of the new applications-oriented instructions added to the architecture by the introduction of the 80186/80188, 80286 and 80386: þ New instructions introduced by 80186/80188 and 80286. ÄÄ PUSH immediate data ÄÄ Push all and pop all (PUSHA and POPA) ÄÄ Multiply immediate data ÄÄ Shift and rotate by immediate count ÄÄ String I/O ÄÄ ENTER and LEAVE ÄÄ BOUND þ New instructions introduced by 80386. ÄÄ LSS, LFS, LGS instructions ÄÄ Long-displacement conditional jumps ÄÄ Single-bit instructions ÄÄ Bit scan ÄÄ Double-shift instructions ÄÄ Byte set on condition ÄÄ Move with sign/zero extension ÄÄ Generalized multiply 15.1.2 Linear Address Formation In V86 mode, the 80386 processor does not interpret 8086 selectors by referring to descriptors; instead, it forms linear addresses as an 8086 would. It shifts the selector left by four bits to form a 20-bit base address. The effective address is extended with four high-order zeros and added to the base address to create a linear address as Figure 15-1 illustrates. Because of the possibility of a carry, the resulting linear address may contain up to 21 significant bits. An 8086 program may generate linear addresses anywhere in the range 0 to 10FFEFH (one megabyte plus approximately 64 Kbytes) of the task's linear address space. V86 tasks generate 32-bit linear addresses. While an 8086 program can only utilize the low-order 21 bits of a linear address, the linear address can be mapped via page tables to any 32-bit physical address. Unlike the 8086 and 80286, 32-bit effective addresses can be generated (via the address-size prefix); however, the value of a 32-bit address may not exceed 65,535 without causing an exception. For full compatibility with 80286 real-address mode, pseudo-protection faults (interrupt 12 or 13 with no error code) occur if an address is generated outside the range 0 through 65,535. Figure 15-1. V86 Mode Address Formation 19 3 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍ» BASE º 16-BIT SEGMENT SELECTOR ³ 0 0 0 0 º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍͼ + 19 15 0 ÉÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» OFFSET º 0 0 0 0 ³ 16-BIT EFFECTIVE ADDRESS º ÈÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ = 20 0 LINEAR ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» ADDRESS º X X X X X X X X X X X X X X X X X X X X X X º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 15.2 Structure of a V86 Task A V86 task consists partly of the 8086 program to be executed and partly of 80386 "native mode" code that serves as the virtual-machine monitor. The task must be represented by an 80386 TSS (not an 80286 TSS). The processor enters V86 mode to execute the 8086 program and returns to protected mode to execute the monitor or other 80386 tasks. To run successfully in V86 mode, an existing 8086 program needs the following: þ A V86 monitor. þ Operating-system services. The V86 monitor is 80386 protected-mode code that executes at privilege-level zero. The monitor consists primarily of initialization and exception-handling procedures. As for any other 80386 program, executable-segment descriptors for the monitor must exist in the GDT or in the task's LDT. The linear addresses above 10FFEFH are available for the V86 monitor, the operating system, and other systems software. The monitor may also need data-segment descriptors so that it can examine the interrupt vector table or other parts of the 8086 program in the first megabyte of the address space. In general, there are two options for implementing the 8086 operating system: 1. The 8086 operating system may run as part of the 8086 code. This approach is desirable for any of the following reasons: þ The 8086 applications code modifies the operating system. þ There is not sufficient development time to reimplement the 8086 operating system as 80386 code. 2. The 8086 operating system may be implemented or emulated in the V86 monitor. This approach is desirable for any of the following reasons: þ Operating system functions can be more easily coordinated among several V86 tasks. þ The functions of the 8086 operating system can be easily emulated by calls to the 80386 operating system. Note that, regardless of the approach chosen for implementing the 8086 operating system, different V86 tasks may use different 8086 operating systems. 15.2.1 Using Paging for V86 Tasks Paging is not necessary for a single V86 task, but paging is useful or necessary for any of the following reasons: þ To create multiple V86 tasks. Each task must map the lower megabyte of linear addresses to different physical locations. þ To emulate the megabyte wrap. On members of the 8086 family, it is possible to specify addresses larger than one megabyte. For example, with a selector value of 0FFFFH and an offset of 0FFFFH, the effective address would be 10FFEFH (one megabyte + 65519). The 8086, which can form addresses only up to 20 bits long, truncates the high-order bit, thereby "wrapping" this address to 0FFEFH. The 80386, however, which can form addresses up to 32 bits long does not truncate such an address. If any 8086 programs depend on this addressing anomaly, the same effect can be achieved in a V86 task by mapping linear addresses between 100000H and 110000H and linear addresses between 0 and 10000H to the same physical addresses. þ To create a virtual address space larger than the physical address space. þ To share 8086 OS code or ROM code that is common to several 8086 programs that are executing simultaneously. þ To redirect or trap references to memory-mapped I/O devices. 15.2.2 Protection within a V86 Task Because it does not refer to descriptors while executing 8086 programs, the processor also does not utilize the protection mechanisms offered by descriptors. To protect the systems software that runs in a V86 task from the 8086 program, software designers may follow either of these approaches: þ Reserve the first megabyte (plus 64 kilobytes) of each task's linear address space for the 8086 program. An 8086 task cannot generate addresses outside this range. þ Use the U/S bit of page-table entries to protect the virtual-machine monitor and other systems software in each virtual 8086 task's space. When the processor is in V86 mode, CPL is 3. Therefore, an 8086 program has only user privileges. If the pages of the virtual-machine monitor have supervisor privilege, they cannot be accessed by the 8086 program. 15.3 Entering and Leaving V86 Mode Figure 15-2 summarizes the ways that the processor can enter and leave an 8086 program. The processor can enter V86 by either of two means: 1. A task switch to an 80386 task loads the image of EFLAGS from the new TSS. The TSS of the new task must be an 80386 TSS, not an 80286 TSS, because the 80286 TSS does not store the high-order word of EFLAGS, which contains the VM flag. A value of one in the VM bit of the new EFLAGS indicates that the new task is executing 8086 instructions; therefore, while loading the segment registers from the TSS, the processor forms base addresses as the 8086 would. 2. An IRET from a procedure of an 80386 task loads the image of EFLAGS from the stack. A value of one in VM in this case indicates that the procedure to which control is being returned is an 8086 procedure. The CPL at the time the IRET is executed must be zero, else the processor does not change VM. The processor leaves V86 mode when an interrupt or exception occurs. There are two cases: 1. The interrupt or exception causes a task switch. A task switch from a V86 task to any other task loads EFLAGS from the TSS of the new task. If the new TSS is an 80386 TSS and the VM bit in the EFLAGS image is zero or if the new TSS is an 80286 TSS, then the processor clears the VM bit of EFLAGS, loads the segment registers from the new TSS using 80386-style address formation, and begins executing the instructions of the new task according to 80386 protected-mode semantics. 2. The interrupt or exception vectors to a privilege-level zero procedure. The processor stores the current setting of EFLAGS on the stack, then clears the VM bit. The interrupt or exception handler, therefore, executes as "native" 80386 protected-mode code. If an interrupt or exception vectors to a conforming segment or to a privilege level other than three, the processor causes a general-protection exception; the error code is the selector of the executable segment to which transfer was attempted. Systems software does not manipulate the VM flag directly, but rather manipulates the image of the EFLAGS register that is stored on the stack or in the TSS. The V86 monitor sets the VM flag in the EFLAGS image on the stack or in the TSS when first creating a V86 task. Exception and interrupt handlers can examine the VM flag on the stack. If the interrupted procedure was executing in V86 mode, the handler may need to invoke the V86 monitor. Figure 15-2. Entering and Leaving the 8086 Program MODE TRANSITION DIAGRAM ÉÍÍÍÍÍÍÍÍÍÍÍ» TASK SWITCH º INITIAL º ÚÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ ENTRY º ³ OR IRET ÈÍÍÍÍÍÍÍÍÍÍͼ ³ ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» INTERRUPT, EXCEPTION ÉÍÍÍÍÍÍÍÍÍÍÍÍÍ» º 8086 PROGRAM ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĺ V86 MONITOR º º (V86 MODE) ºÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄĶ (PROTECTED º ÈÍÍÍÍÍÍÍÑÍÍÍÍÍͼ IRET º MODE) º ³ ÈÍÍÍÍÍÑÍÍÍÍÍÍͼ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³ ³TASK SWITCH ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» TASK SWITCH ³ ³ ÀÄÄÄÄÄÄÄÄÄÄĺ OTHER 80386 TASKS ºÄÄÄÄÄÄÄÄÄÙ ³ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄĶ (PROTECTED MODE) ÇÄÄÄÄÄÄÄÄÄÄÄÄÄÙ TASK SWITCH ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ TASK SWITCH 15.3.1 Transitions Through Task Switches A task switch to or from a V86 task may be due to any of three causes: 1. An interrupt that vectors to a task gate. 2. An action of the scheduler of the 80386 operating system. 3. An IRET when the NT flag is set. In any of these cases, the processor changes the VM bit in EFLAGS according to the image of EFLAGS in the new TSS. If the new TSS is an 80286 TSS, the high-order word of EFLAGS is not in the TSS; the processor clears VM in this case. The processor updates VM prior to loading the segment registers from the images in the new TSS. The new setting of VM determines whether the processor interprets the new segment-register images as 8086 selectors or 80386/80286 selectors. 15.3.2 Transitions Through Trap Gates and Interrupt Gates The processor leaves V86 mode as the result of an exception or interrupt that vectors via a trap or interrupt gate to a privilege-level zero procedure. The exception or interrupt handler returns to the 8086 code by executing an IRET. Because it was designed for execution by an 8086 processor, an 8086 program in a V86 task will have an 8086-style interrupt table starting at linear address zero. However, the 80386 does not use this table directly. For all exceptions and interrupts that occur in V86 mode, the processor vectors through the IDT. The IDT entry for an interrupt or exception that occurs in a V86 task must contain either: þ A task gate. þ An 80386 trap gate (type 14) or an 80386 interrupt gate (type 15), which must point to a nonconforming, privilege-level zero, code segment. Interrupts and exceptions that have 80386 trap or interrupt gates in the IDT vector to the appropriate handler procedure at privilege-level zero. The contents of all the 8086 segment registers are stored on the PL 0 stack. Figure 15-3 shows the format of the PL 0 stack after an exception or interrupt that occurs while a V86 task is executing an 8086 program. After the processor stores all the 8086 segment registers on the PL 0 stack, it loads all the segment registers with zeros before starting to execute the handler procedure. This permits the interrupt handler to safely save and restore the DS, ES, FS, and GS registers as 80386 selectors. Interrupt handlers that may be invoked in the context of either a regular task or a V86 task, can use the same prolog and epilog code for register saving regardless of the kind of task. Restoring zeros to these registers before execution of the IRET does not cause a trap in the interrupt handler. Interrupt procedures that expect values in the segment registers or that return values via segment registers have to use the register images stored on the PL 0 stack. Interrupt handlers that need to know whether the interrupt occurred in V86 mode can examine the VM bit in the stored EFLAGS image. An interrupt handler passes control to the V86 monitor if the VM bit is set in the EFLAGS image stored on the stack and the interrupt or exception is one that the monitor needs to handle. The V86 monitor may either: þ Handle the interrupt completely within the V86 monitor. þ Invoke the 8086 program's interrupt handler. Reflecting an interrupt or exception back to the 8086 code involves the following steps: 1. Refer to the 8086 interrupt vector to locate the appropriate handler procedure. 2. Store the state of the 8086 program on the privilege-level three stack. 3. Change the return link on the privilege-level zero stack to point to the privilege-level three handler procedure. 4. Execute an IRET so as to pass control to the handler. 5. When the IRET by the privilege-level three handler again traps to the V86 monitor, restore the return link on the privilege-level zero stack to point to the originally interrupted, privilege-level three procedure. 6. Execute an IRET so as to pass control back to the interrupted procedure. Figure 15-3. PL 0 Stack after Interrupt in V86 Task WITHOUT ERROR CODE WITH ERROR CODE 31 0 31 0 ÉÍÍÍÍÍÍËÍÍÍÍÍÍÍ»ÄÄÄÄ¿ ÉÍÍÍÍÍÍËÍÍÍÍÍÍÍ»ÄÄÄÄ¿ º±±±±±±ºOLD GS º ³ º±±±±±±ºOLD GS º ³ ÌÍÍÍÍÍÍÎÍÍÍÍÍÍ͹ SS:ESP ÌÍÍÍÍÍÍÎÍÍÍÍÍÍ͹ SS:ESP D O º±±±±±±ºOLD FS º FROM TSS º±±±±±±ºOLD FS º FROM TSS I F ÌÍÍÍÍÍÍÎÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÎÍÍÍÍÍÍ͹ R º±±±±±±ºOLD DS º º±±±±±±ºOLD DS º E E ÌÍÍÍÍÍÍÎÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÎÍÍÍÍÍÍ͹ C X º±±±±±±ºOLD ES º º±±±±±±ºOLD ES º T P ÌÍÍÍÍÍÍÎÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÎÍÍÍÍÍÍ͹ I A º±±±±±±ºOLD SS º º±±±±±±ºOLD SS º O N ÌÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ N S º OLD ESP º º OLD ESP º I ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ ³ O º OLD EFLAGS º º OLD EFLAGS º ³ N ÌÍÍÍÍÍÍËÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍËÍÍÍÍÍÍ͹ ³ º±±±±±±ºOLD CS º NEW º±±±±±±ºOLD CS º ÌÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ SS:EIP ÌÍÍÍÍÍÍÊÍÍÍÍÍÍ͹ º OLD EIP º ³ º OLD EIP º NEW ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ÄÄÄÙ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ SS:EIP º º º ERROR CODE º ³ ÌÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ÄÄÄÙ º º 15.4 Additional Sensitive Instructions When the 80386 is executing in V86 mode, the instructions PUSHF, POPF, INT n, and IRET are sensitive to IOPL. The instructions IN, INS, OUT, and OUTS, which are ordinarily sensitive in protected mode, are not sensitive in V86 mode. Following is a complete list of instructions that are sensitive in V86 mode: CLI ÄÄ Clear Interrupt-Enable Flag STI ÄÄ Set Interrupt-Enable Flag LOCK ÄÄ Assert Bus-Lock Signal PUSHF ÄÄ Push Flags POPF ÄÄ Pop Flags INT n ÄÄ Software Interrupt RET ÄÄ Interrupt Return CPL is always three in V86 mode; therefore, if IOPL < 3, these instructions will trigger a general-protection exceptions. These instructions are made sensitive so that their functions can be simulated by the V86 monitor. 15.4.1 Emulating 8086 Operating System Calls INT n is sensitive so that the V86 monitor can intercept calls to the 8086 OS. Many 8086 operating systems are called by pushing parameters onto the stack, then executing an INT n instruction. If IOPL < 3, INT n instructions will be intercepted by the V86 monitor. The V86 monitor can then emulate the function of the 8086 operating system or reflect the interrupt back to the 8086 operating system in V86 mode. 15.4.2 Virtualizing the Interrupt-Enable Flag When the processor is executing 8086 code in a V86 task, the instructions PUSHF, POPF, and IRET are sensitive to IOPL so that the V86 monitor can control changes to the interrupt-enable flag (IF). Other instructions that affect IF (STI and CLI) are IOPL sensitive both in 8086 code and in 80386/80386 code. Many 8086 programs that were designed to execute on single-task systems set and clear IF to control interrupts. However, when these same programs are executed in a multitasking environment, such control of IF can be disruptive. If IOPL is less than three, all instructions that change or interrogate IF will trap to the V86 monitor. The V86 monitor can then control IF in a manner that both suits the needs of the larger environment and is transparent to the 8086 program. 15.5 Virtual I/O Many 8086 programs that were designed to execute on single-task systems use I/O devices directly. However, when these same programs are executed in a multitasking environment, such use of devices can be disruptive. The 80386 provides sufficient flexibility to control I/O in a manner that both suits the needs of the new environment and is transparent to the 8086 program. Designers may take any of several possible approaches to controlling I/O: þ Implement or emulate the 8086 operating system as an 80386 program and require the 8086 application to do I/O via software interrupts to the operating system, trapping all attempts to do I/O directly. þ Let the 8086 program take complete control of all I/O. þ Selectively trap and emulate references that a task makes to specific I/O ports. þ Trap or redirect references to memory-mapped I/O addresses. The method of controlling I/O depends upon whether I/O ports are I/O mapped or memory mapped. 15.5.1 I/O-Mapped I/O I/O-mapped I/O in V86 mode differs from protected mode only in that the protection mechanism does not consult IOPL when executing the I/O instructions IN, INS, OUT, OUTS. Only the I/O permission bit map controls the right for V86 tasks to execute these I/O instructions. The I/O permission map traps I/O instructions selectively depending on the I/O addresses to which they refer. The I/O permission bit map of each V86 task determines which I/O addresses are trapped for that task. Because each task may have a different I/O permission bit map, the addresses trapped for one task may be different from those trapped for others. Refer to Chapter 8 for more information about the I/O permission map. 15.5.2 Memory-Mapped I/O In hardware designs that utilize memory-mapped I/O, the paging facilities of the 80386 can be used to trap or redirect I/O operations. Each task that executes memory-mapped I/O must have a page (or pages) for the memory-mapped address space. The V86 monitor may control memory-mapped I/O by any of these means: þ Assign the memory-mapped page to appropriate physical addresses. Different tasks may have different physical addresses, thereby preventing the tasks from interfering with each other. þ Cause a trap to the monitor by forcing a page fault on the memory-mapped page. Read-only pages trap writes. Not-present pages trap both reads and writes. Intervention for every I/O might be excessive for some kinds of I/O devices. A page fault can still be used in this case to cause intervention on the first I/O operation. The monitor can then at least make sure that the task has exclusive access to the device. Then the monitor can change the page status to present and read/write, allowing subsequent I/O to proceed at full speed. 15.5.3 Special I/O Buffers Buffers of intelligent controllers (for example, a bit-mapped graphics buffer) can also be virtualized via page mapping. The linear space for the buffer can be mapped to a different physical space for each virtual 8086 task. The V86 monitor can then assume responsibility for spooling the data or assigning the virtual buffer to the real buffer at appropriate times. 15.6 Differences From 8086 In general, V86 mode will correctly execute software designed for the 8086, 8088, 80186, and 80188. Following is a list of the minor differences between 8086 execution on the 80386 and on an 8086. 1. Instruction clock counts. The 80386 takes fewer clocks for most instructions than the 8086/8088. The areas most likely to be affected are: þ Delays required by I/O devices between I/O operations. þ Assumed delays with 8086/8088 operating in parallel with an 8087. 2. Divide exceptions point to the DIV instruction. Divide exceptions on the 80386 always leave the saved CS:IP value pointing to the instruction that failed. On the 8086/8088, the CS:IP value points to the next instruction. 3. Undefined 8086/8088 opcodes. Opcodes that were not defined for the 8086/8088 will cause exception 6 or will execute one of the new instructions defined for the 80386. 4. Value written by PUSH SP. The 80386 pushes a different value on the stack for PUSH SP than the 8086/8088. The 80386 pushes the value of SP before SP is incremented as part of the push operation; the 8086/8088 pushes the value of SP after it is incremented. If the value pushed is important, replace PUSH SP instructions with the following three instructions: PUSH BP MOV BP, SP XCHG BP, [BP] This code functions as the 8086/8088 PUSH SP instruction on the 80386. 5. Shift or rotate by more than 31 bits. The 80386 masks all shift and rotate counts to the low-order five bits. This MOD 32 operation limits the count to a maximum of 31 bits, thereby limiting the time that interrupt response is delayed while the instruction is executing. 6. Redundant prefixes. The 80386 sets a limit of 15 bytes on instruction length. The only way to violate this limit is by putting redundant prefixes before an instruction. Exception 13 occurs if the limit on instruction length is violated. The 8086/8088 has no instruction length limit. 7. Operand crossing offset 0 or 65,535. On the 8086, an attempt to access a memory operand that crosses offset 65,535 (e.g., MOV a word to offset 65,535) or offset 0 (e.g., PUSH a word when SP = 1) causes the offset to wrap around modulo 65,536. The 80386 raises an exception in these casesÄÄexception 13 if the segment is a data segment (i.e., if CS, DS, ES, FS, or GS is being used to address the segment), exception 12 if the segment is a stack segment (i.e., if SS is being used). 8. Sequential execution across offset 65,535. On the 8086, if sequential execution of instructions proceeds past offset 65,535, the processor fetches the next instruction byte from offset 0 of the same segment. On the 80386, the processor raises exception 13 in such a case. 9. LOCK is restricted to certain instructions. The LOCK prefix and its corresponding output signal should only be used to prevent other bus masters from interrupting a data movement operation. The 80386 always asserts the LOCK signal during an XCHG instruction with memory (even if the LOCK prefix is not used). LOCK may only be used with the following 80386 instructions when they update memory: BTS, BTR, BTC, XCHG, ADD, ADC, SUB, SBB, INC, DEC, AND, OR, XOR, NOT, and NEG. An undefined-opcode exception (interrupt 6) results from using LOCK before any other instruction. 10. Single-stepping external interrupt handlers. The priority of the 80386 single-step exception is different from that of the 8086/8088. The change prevents an external interrupt handler from being single-stepped if the interrupt occurs while a program is being single-stepped. The 80386 single-step exception has higher priority that any external interrupt. The 80386 will still single-step through an interrupt handler invoked by the INT instructions or by an exception. 11. IDIV exceptions for quotients of 80H or 8000H. The 80386 can generate the largest negative number as a quotient for the IDIV instruction. The 8086/8088 causes exception zero instead. 12. Flags in stack. The setting of the flags stored by PUSHF, by interrupts, and by exceptions is different from that stored by the 8086 in bit positions 12 through 15. On the 8086 these bits are stored as ones, but in V86 mode bit 15 is always zero, and bits 14 through 12 reflect the last value loaded into them. 13. NMI interrupting NMI handlers. After an NMI is recognized on the 80386, the NMI interrupt is masked until an IRET instruction is executed. 14. Coprocessor errors vector to interrupt 16. Any 80386 system with a coprocessor must use interrupt vector 16 for the coprocessor error exception. If an 8086/8088 system uses another vector for the 8087 interrupt, both vectors should point to the coprocessor-error exception handler. 15. Numeric exception handlers should allow prefixes. On the 80386, the value of CS:IP saved for coprocessor exceptions points at any prefixes before an ESC instruction. On 8086/8088 systems, the saved CS:IP points to the ESC instruction itself. 16. Coprocessor does not use interrupt controller. The coprocessor error signal to the 80386 does not pass through an interrupt controller (an 8087 INT signal does). Some instructions in a coprocessor error handler may need to be deleted if they deal with the interrupt controller. 15.7 Differences From 80286 Real-Address Mode The 80286 processor implements the bus lock function differently than the 80386. This fact may or may not be apparent to 8086 programs, depending on how the V86 monitor handles the LOCK prefix. LOCKed instructions are sensitive to IOPL; therefore, software designers can choose to emulate its function. If, however, 8086 programs are allowed to execute LOCK directly, programs that use forms of memory locking specific to the 8086 may not execute properly when transported to a specific application of the 80386. The LOCK prefix and its corresponding output signal should only be used to prevent other bus masters from interrupting a data movement operation. LOCK may only be used with the following 80386 instructions when they modify memory. An undefined-opcode exception results from using LOCK before any other instruction. þ Bit test and change: BTS, BTR, BTC. þ Exchange: XCHG. þ One-operand arithmetic and logical: INC, DEC, NOT, and NEG. þ Two-operand arithmetic and logical: ADD, ADC, SUB, SBB, AND, OR, XOR. A locked instruction is guaranteed to lock only the area of memory defined by the destination operand, but may lock a larger memory area. For example, typical 8086 and 80286 configurations lock the entire physical memory space. With the 80386, the defined area of memory is guaranteed to be locked against access by a processor executing a locked instruction on exactly the same memory area, i.e., an operand with identical starting address and identical length. Chapter 16 Mixing 16-Bit and 32 Bit Code ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ The 80386 running in protected mode is a 32-bit microprocessor, but it is designed to support 16-bit processing at three levels: 1. Executing 8086/80286 16-bit programs efficiently with complete compatibility. 2. Mixing 16-bit modules with 32-bit modules. 3. Mixing 16-bit and 32-bit addresses and operands within one module. The first level of support for 16-bit programs has already been discussed in Chapter 13, Chapter 14, and Chapter 15. This chapter shows how 16-bit and 32-bit modules can cooperate with one another, and how one module can utilize both 16-bit and 32-bit operands and addressing. The 80386 functions most efficiently when it is possible to distinguish between pure 16-bit modules and pure 32-bit modules. A pure 16-bit module has these characteristics: þ All segments occupy 64 Kilobytes or less. þ Data items are either 8 bits or 16 bits wide. þ Pointers to code and data have 16-bit offsets. þ Control is transferred only among 16-bit segments. A pure 32-bit module has these characteristics: þ Segments may occupy more than 64 Kilobytes (zero bytes to 4 gigabytes). þ Data items are either 8 bits or 32 bits wide. þ Pointers to code and data have 32-bit offsets. þ Control is transferred only among 32-bit segments. Pure 16-bit modules do exist; they are the modules designed for 16-bit microprocessors. Pure 32-bit modules may exist in new programs designed explicitly for the 80386. However, as systems designers move applications from 16-bit processors to the 32-bit 80386, it will not always be possible to maintain these ideals of pure 16-bit or 32-bit modules. It may be expedient to execute old 16-bit modules in a new 32-bit environment without making source-code changes to the old modules if any of the following conditions is true: þ Modules will be converted one-by-one from 16-bit environments to 32-bit environments. þ Older, 16-bit compilers and software-development tools will be utilized in the new32-bit operating environment until new 32-bit versions can be created. þ The source code of 16-bit modules is not available for modification. þ The specific data structures used by a given module inherently utilize 16-bit words. þ The native word size of the source language is 16 bits. On the 80386, 16-bit modules can be mixed with 32-bit modules. To design a system that mixes 16- and 32-bit code requires an understanding of the mechanisms that the 80386 uses to invoke and control its 32-bit and 16-bit features. 16.1 How the 80386 Implements 16-Bit and 32-Bit Features The features of the architecture that permit the 80386 to work equally well with 32-bit and 16-bit address and operand sizes include: þ The D-bit (default bit) of code-segment descriptors, which determines the default choice of operand-size and address-size for the instructions of a code segment. (In real-address mode and V86 mode, which do not use descriptors, the default is 16 bits.) A code segment whose D-bit is set is known as a USE32 segment; a code segment whose D-bit is zero is a USE16 segment. The D-bit eliminates the need to encode the operand size and address size in instructions when all instructions use operands and effective addresses of the same size. þ Instruction prefixes that explicitly override the default choice of operand size and address size (available in protected mode as well as in real-address mode and V86 mode). þ Separate 32-bit and 16-bit gates for intersegment control transfers (including call gates, interrupt gates, and trap gates). The operand size for the control transfer is determined by the type of gate, not by the D-bit or prefix of the transfer instruction. þ Registers that can be used both for 32-bit and 16-bit operands and effective-address calculations. þ The B-bit (big bit) of data-segment descriptors, which determines the size of stack pointer (32-bit ESP or 16-bit SP) used by the CPU for implicit stack references. 16.2 Mixing 32-Bit and 16-Bit Operations The 80386 has two instruction prefixes that allow mixing of 32-bit and 16-bit operations within one segment: þ The operand-size prefix (66H) þ The address-size prefix (67H) These prefixes reverse the default size selected by the D-bit. For example, the processor can interpret the word-move instruction MOV mem, reg in any of four ways: þ In a USE32 segment: 1. Normally moves 32 bits from a 32-bit register to a 32-bit effective address in memory. 2. If preceded by an operand-size prefix, moves 16 bits from a 16-bit register to 32-bit effective address in memory. 3. If preceded by an address-size prefix, moves 32 bits from a 32-bit register to a16-bit effective address in memory. 4. If preceded by both an address-size prefix and an operand-size prefix, moves 16 bits from a 16-bit register to a 16-bit effective address in memory. þ In a USE16 segment: 1. Normally moves 16 bits from a 16-bit register to a 16-bit effective address in memory. 2. If preceded by an operand-size prefix, moves 32 bits from a 32-bit register to 16-bit effective address in memory. 3. If preceded by an address-size prefix, moves 16 bits from a 16-bit register to a32-bit effective address in memory. 4. If preceded by both an address-size prefix and an operand-size prefix, moves 32 bits from a 32-bit register to a 32-bit effective address in memory. These examples illustrate that any instruction can generate any combination of operand size and address size regardless of whether the instruction is in a USE16 or USE32 segment. The choice of the USE16 or USE32 attribute for a code segment is based upon these criteria: 1. The need to address instructions or data in segments that are larger than 64 Kilobytes. 2. The predominant size of operands. 3. The addressing modes desired. (Refer to Chapter 17 for an explanation of the additional addressing modes that are available when 32-bit addressing is used.) Choosing a setting of the D-bit that is contrary to the predominant size of operands requires the generation of an excessive number of operand-size prefixes. 16.3 Sharing Data Segments Among Mixed Code Segments Because the choice of operand size and address size is defined in code segments and their descriptors, data segments can be shared freely among both USE16 and USE32 code segments. The only limitation is the one imposed by pointers with 16-bit offsets, which can only point to the first 64 Kilobytes of a segment. When a data segment that contains more than 64 Kilobytes is to be shared among USE32 and USE16 segments, the data that is to be accessed by the USE16 segments must be located within the first 64 Kilobytes. A stack that spans addresses less than 64K can be shared by both USE16 and USE32 code segments. This class of stacks includes: þ Stacks in expand-up segments with G=0 and B=0. þ Stacks in expand-down segments with G=0 and B=0. þ Stacks in expand-up segments with G=1 and B=0, in which the stack is contained completely within the lower 64 Kilobytes. (Offsets greater than 64K can be used for data, other than the stack, that is not shared.) The B-bit of a stack segment cannot, in general, be used to change the size of stack used by a USE16 code segment. The size of stack pointer used by the processor for implicit stack references is controlled by the B-bit of the data-segment descriptor for the stack. Implicit references are those caused by interrupts, exceptions, and instructions such as PUSH, POP, CALL, and RET. One might be tempted, therefore, to try to increase beyond 64K the size of the stack used by 16-bit code simply by supplying a larger stack segment with the B-bit set. However, the B-bit does not control explicit stack references, such as accesses to parameters or local variables. A USE16 code segment can utilize a "big" stack only if the code is modified so that all explicit references to the stack are preceded by the address-size prefix, causing those references to use 32-bit addressing. In big, expand-down segments (B=1, G=1, and E=1), all offsets are greater than 64K, therefore USE16 code cannot utilize such a stack segment unless the code segment is modified to employ 32-bit addressing. (Refer to Chapter 6 for a review of the B, G, and E bits.) 16.4 Transferring Control Among Mixed Code Segments When transferring control among procedures in USE16 and USE32 code segments, programmers must be aware of three points: þ Addressing limitations imposed by pointers with 16-bit offsets. þ Matching of operand-size attribute in effect for the CALL/RET pair and theInterrupt/IRET pair so as to manage the stack correctly. þ Translation of parameters, especially pointer parameters. Clearly, 16-bit effective addresses cannot be used to address data or code located beyond 64K in a 32-bit segment, nor can large 32-bit parameters be squeezed into a 16-bit word; however, except for these obvious limits, most interfacing problems between 16-bit and 32-bit modules can be solved. Some solutions involve inserting interface procedures between the procedures in question. 16.4.1 Size of Code-Segment Pointer For control-transfer instructions that use a pointer to identify the next instruction (i.e., those that do not use gates), the size of the offset portion of the pointer is determined by the operand-size attribute. The implications of the use of two different sizes of code-segment pointer are: þ JMP, CALL, or RET from 32-bit segment to 16-bit segment is always possible using a 32-bit operand size. þ JMP, CALL, or RET from 16-bit segment using a 16-bit operand size cannot address the target in a 32-bit segment if the address of the target is greater than 64K. An interface procedure can enable transfers from USE16 segments to 32-bit addresses beyond 64K without requiring modifications any more extensive than relinking or rebinding the old programs. The requirements for such an interface procedure are discussed later in this chapter. 16.4.2 Stack Management for Control Transfers Because stack management is different for 16-bit CALL/RET than for 32-bit CALL/RET, the operand size of RET must match that of CALL. (Refer to Figure 16-1.) A 16-bit CALL pushes the 16-bit IP and (for calls between privilege levels) the 16-bit SP register. The corresponding RET must also use a 16-bit operand size to POP these 16-bit values from the stack into the 16-bit registers. A 32-bit CALL pushes the 32-bit EIP and (for interlevel calls) the 32-bit ESP register. The corresponding RET must also use a 32-bit operand size to POP these 32-bit values from the stack into the 32-bit registers. If the two halves of a CALL/RET pair do not have matching operand sizes, the stack will not be managed correctly and the values of the instruction pointer and stack pointer will not be restored to correct values. When the CALL and its corresponding RET are in segments that have D-bits with the same values (i.e., both have 32-bit defaults or both have 16-bit defaults), there is no problem. When the CALL and its corresponding RET are in segments that have different D-bit values, however, programmers (or program development software) must ensure that the CALL and RET match. There are three ways to cause a 16-bit procedure to execute a 32-bit call: 1. Use a 16-bit call to a 32-bit interface procedure that then uses a 32-bit call to invoke the intended target. 2. Bind the 16-bit call to a 32-bit call gate. 3. Modify the 16-bit procedure, inserting an operand-size prefix before the call, thereby changing it to a 32-bit call. Likewise, there are three ways to cause a 32-bit procedure to execute a 16-bit call: 1. Use a 32-bit call to a 32-bit interface procedure that then uses a 16-bit call to invoke the intended target. 2. Bind the 32-bit call to a 16-bit call gate. 3. Modify the 32-bit procedure, inserting an operand-size prefix before the call, thereby changing it to a 16-bit call. (Be certain that the return offset does not exceed 64K.) Programmers can utilize any of the preceding methods to make a CALL in a USE16 segment match the corresponding RET in a USE32 segment, or to make a CALL in a USE32 segment match the corresponding RET in a USE16 segment. Figure 16-1. Stack after Far 16-Bit and 32-Bit Calls WITHOUT PRIVILEGE TRANSITION AFTER 16-BIT CALL AFTER 32-BIT CALL 31 0 31 0 D O º º º º I F ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ R º±±±±±±±±±±±±±±±º º±±±±±±±±±±±±±±±º E E ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ C X º PARM2 ³ PARM1 º º PARM2 º T P ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ I A º CS ³ IP ºÄÄSP º PARM1 º O N ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ N S º º º±±±±±±±³ CS º I ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ³ O º º º EIP ºÄÄESP ³ N ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ³ º º º º WITH PRIVILEGE TRANSITION AFTER 16-BIT CALL AFTER 32-BIT CALL D O 31 0 31 0 I F ÉÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ» ÉÍÍÍÍÍÍÍØÍÍÍÍÍÍÍ» R º SS ³ SP º º±±±±±±±³ SS º E E ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ C X º PARM2 ³ PARM1 º º ESP º T P ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ I A º CS ³ IP ºÄÄSP º PARM2 º O N ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ N S º º º PARM1 º I ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ³ O º º º±±±±±±±³ CS º ³ N ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ³ º º º EIP ºÄÄESP ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ ÌÍÍÍÍÍÍÍØÍÍÍÍÍÍ͹ º º º º 16.4.2.1 Controlling the Operand-Size for a Call When the selector of the pointer referenced by a CALL instruction selects a segment descriptor, the operand-size attribute in effect for the CALL instruction is determined by the D-bit in the segment descriptor and by any operand-size instruction prefix. When the selector of the pointer referenced by a CALL instruction selects a gate descriptor, the type of call is determined by the type of call gate. A call via an 80286 call gate (descriptor type 4) always has a 16-bit operand-size attribute; a call via an 80386 call gate (descriptor type 12) always has a 32-bit operand-size attribute. The offset of the target procedure is taken from the gate descriptor; therefore, even a 16-bit procedure can call a procedure that is located more than 64 kilobytes from the base of a 32-bit segment, because a 32-bit call gate contains a 32-bit target offset. An unmodified 16-bit code segment that has run successfully on an 8086 or real-mode 80286 will always have a D-bit of zero and will not use operand-size override prefixes; therefore, it will always execute 16-bit versions of CALL. The only modification needed to make a16-bit procedure effect a 32-bit call is to relink the call to an 80386 call gate. 16.4.2.2 Changing Size of Call When adding 32-bit gates to 16-bit procedures, it is important to consider the number of parameters. The count field of the gate descriptor specifies the size of the parameter string to copy from the current stack to the stack of the more privileged procedure. The count field of a 16-bit gate specifies the number of words to be copied, whereas the count field of a 32-bit gate specifies the number of doublewords to be copied; therefore, the 16-bit procedure must use an even number of words as parameters. 16.4.3 Interrupt Control Transfers With a control transfer due to an interrupt or exception, a gate is always involved. The operand-size attribute for the interrupt is determined by the type of IDT gate. A 386 interrupt or trap gate (descriptor type 14 or 15) to a 32-bit interrupt procedure can be used to interrupt either 32-bit or 16-bit procedures. However, it is not generally feasible to permit an interrupt or exception to invoke a 16-bit handler procedure when 32-bit code is executing, because a 16-bit interrupt procedure has a return offset of only 16-bits on its stack. If the 32-bit procedure is executing at an address greater than 64K, the 16-bit interrupt procedure cannot return correctly. 16.4.4 Parameter Translation When segment offsets or pointers (which contain segment offsets) are passed as parameters between 16-bit and 32-bit procedures, some translation is required. Clearly, if a 32-bit procedure passes a pointer to data located beyond 64K to a 16-bit procedure, the 16-bit procedure cannot utilize it. Beyond this natural limitation, an interface procedure can perform any format conversion between 32-bit and 16-bit pointers that may be needed. Parameters passed by value between 32-bit and 16-bit code may also require translation between 32-bit and 16-bit formats. Such translation requirements are application dependent. Systems designers should take care to limit the range of values passed so that such translations are possible. 16.4.5 The Interface Procedure Interposing an interface procedure between 32-bit and 16-bit procedures can be the solution to any of several interface requirements: þ Allowing procedures in 16-bit segments to transfer control to instructions located beyond 64K in 32-bit segments. þ Matching of operand size for CALL/RET. þ Parameter translation. Interface procedures between USE32 and USE16 segments can be constructed with these properties: þ The procedures reside in a code segment whose D-bit is set, indicating a default operand size of 32-bits. þ All entry points that may be called by 16-bit procedures have offsets that are actually less than 64K. þ All points to which called 16-bit procedures may return also lie within 64K. The interface procedures do little more than call corresponding procedures in other segments. There may be two kinds of procedures: þ Those that are called by 16-bit procedures and call 32-bit procedures. These interface procedures are called by 16-bit CALLs and use the operand-size prefix before RET instructions to cause a 16-bit RET. CALLs to 32-bit segments are 32-bit calls (by default, because the D-bit is set), and the 32-bit code returns with 32-bit RET instructions. þ Those that are called by 32-bit procedures and call 16-bit procedures. These interface procedures are called by 32-bit CALL instructions, and return with 32-bit RET instructions (by default, because the D-bit is set). CALLs to 16-bit procedures use the operand-size prefix; procedures in the 16-bit code return with 16-bit RET instructions. PART IV INSTRUCTION SET Chapter 17 80386 Instruction Set ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ This chapter presents instructions for the 80386 in alphabetical order. For each instruction, the forms are given for each operand combination, including object code produced, operands required, execution time, and a description. For each instruction, there is an operational description and a summary of exceptions generated. 17.1 Operand-Size and Address-Size Attributes When executing an instruction, the 80386 can address memory using either 16 or 32-bit addresses. Consequently, each instruction that uses memory addresses has associated with it an address-size attribute of either 16 or 32 bits. 16-bit addresses imply both the use of a 16-bit displacement in the instruction and the generation of a 16-bit address offset (segment relative address) as the result of the effective address calculation. 32-bit addresses imply the use of a 32-bit displacement and the generation of a 32-bit address offset. Similarly, an instruction that accesses words (16 bits) or doublewords (32 bits) has an operand-size attribute of either 16 or 32 bits. The attributes are determined by a combination of defaults, instruction prefixes, and (for programs executing in protected mode) size-specification bits in segment descriptors. 17.1.1 Default Segment Attribute For programs executed in protected mode, the D-bit in executable-segment descriptors determines the default attribute for both address size and operand size. These default attributes apply to the execution of all instructions in the segment. A value of zero in the D-bit sets the default address size and operand size to 16 bits; a value of one, to 32 bits. Programs that execute in real mode or virtual-8086 mode have 16-bit addresses and operands by default. 17.1.2 Operand-Size and Address-Size Instruction Prefixes The internal encoding of an instruction can include two byte-long prefixes: the address-size prefix, 67H, and the operand-size prefix, 66H. (A later section, "Instruction Format," shows the position of the prefixes in an instruction's encoding.) These prefixes override the default segment attributes for the instruction that follows. Table 17-1 shows the effect of each possible combination of defaults and overrides. 17.1.3 Address-Size Attribute for Stack Instructions that use the stack implicitly (for example: POP EAX also have a stack address-size attribute of either 16 or 32 bits. Instructions with a stack address-size attribute of 16 use the 16-bit SP stack pointer register; instructions with a stack address-size attribute of 32 bits use the 32-bit ESP register to form the address of the top of the stack. The stack address-size attribute is controlled by the B-bit of the data-segment descriptor in the SS register. A value of zero in the B-bit selects a stack address-size attribute of 16; a value of one selects a stack address-size attribute of 32. Table 17-1. Effective Size Attributes Segment Default D = ... 0 0 0 0 1 1 1 1 Operand-Size Prefix 66H N N Y Y N N Y Y Address-Size Prefix 67H N Y N Y N Y N Y Effective Operand Size 16 16 32 32 32 32 16 16 Effective Address Size 16 32 16 32 32 16 32 16 Y = Yes, this instruction prefix is present N = No, this instruction prefix is not present 17.2 Instruction Format All instruction encodings are subsets of the general instruction format shown in Figure 17-1. Instructions consist of optional instruction prefixes, one or two primary opcode bytes, possibly an address specifier consisting of the ModR/M byte and the SIB (Scale Index Base) byte, a displacement, if required, and an immediate data field, if required. Smaller encoding fields can be defined within the primary opcode or opcodes. These fields define the direction of the operation, the size of the displacements, the register encoding, or sign extension; encoding fields vary depending on the class of operation. Most instructions that can refer to an operand in memory have an addressing form byte following the primary opcode byte(s). This byte, called the ModR/M byte, specifies the address form to be used. Certain encodings of the ModR/M byte indicate a second addressing byte, the SIB (Scale Index Base) byte, which follows the ModR/M byte and is required to fully specify the addressing form. Addressing forms can include a displacement immediately following either the ModR/M or SIB byte. If a displacement is present, it can be 8-, 16- or 32-bits. If the instruction specifies an immediate operand, the immediate operand always follows any displacement bytes. The immediate operand, if specified, is always the last field of the instruction. The following are the allowable instruction prefix codes: F3H REP prefix (used only with string instructions) F3H REPE/REPZ prefix (used only with string instructions F2H REPNE/REPNZ prefix (used only with string instructions) F0H LOCK prefix The following are the segment override prefixes: 2EH CS segment override prefix 36H SS segment override prefix 3EH DS segment override prefix 26H ES segment override prefix 64H FS segment override prefix 65H GS segment override prefix 66H Operand-size override 67H Address-size override Figure 17-1. 80386 Instruction Format ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º INSTRUCTION º ADDRESS- º OPERAND- º SEGMENT º º PREFIX º SIZE PREFIX º SIZE PREFIX º OVERRIDE º ÌÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍ͹ º 0 OR 1 0 OR 1 0 OR 1 0 OR 1 º ÇÄ Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä¶ º NUMBER OF BYTES º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÉÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍ» º OPCODE º MODR/M º SIB º DISPLACEMENT º IMMEDIATE º º º º º º º ÌÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍ͹ º 1 OR 2 0 OR 1 0 OR 1 0,1,2 OR 4 0,1,2 OR 4 º ÇÄ Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä Ä¶ º NUMBER OF BYTES º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ 17.2.1 ModR/M and SIB Bytes The ModR/M and SIB bytes follow the opcode byte(s) in many of the 80386 instructions. They contain the following information: þ The indexing type or register number to be used in the instruction þ The register to be used, or more information to select the instruction þ The base, index, and scale information The ModR/M byte contains three fields of information: þ The mod field, which occupies the two most significant bits of the byte, combines with the r/m field to form 32 possible values: eight registers and 24 indexing modes þ The reg field, which occupies the next three bits following the mod field, specifies either a register number or three more bits of opcode information. The meaning of the reg field is determined by the first (opcode) byte of the instruction. þ The r/m field, which occupies the three least significant bits of the byte, can specify a register as the location of an operand, or can form part of the addressing-mode encoding in combination with the field as described above The based indexed and scaled indexed forms of 32-bit addressing require the SIB byte. The presence of the SIB byte is indicated by certain encodings of the ModR/M byte. The SIB byte then includes the following fields: þ The ss field, which occupies the two most significant bits of the byte, specifies the scale factor þ The index field, which occupies the next three bits following the ss field and specifies the register number of the index register þ The base field, which occupies the three least significant bits of the byte, specifies the register number of the base register Figure 17-2 shows the formats of the ModR/M and SIB bytes. The values and the corresponding addressing forms of the ModR/M and SIB bytes are shown in Tables 17-2, 17-3, and 17-4. The 16-bit addressing forms specified by the ModR/M byte are in Table 17-2. The 32-bit addressing forms specified by ModR/M are in Table 17-3. Table 17-4 shows the 32-bit addressing forms specified by the SIB byte Figure 17-2. ModR/M and SIB Byte Formats MODR/M BYTE 7 6 5 4 3 2 1 0 ÉÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍ» º MOD º REG/OPCODE º R/M º ÈÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍͼ SIB (SCALE INDEX BASE) BYTE 7 6 5 4 3 2 1 0 ÉÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍËÍÍÍÍÍÍÍÍÍÍÍÍÍ» º SS º INDEX º BASE º ÈÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÍÍÍÍÍÍÍÍÍÍÍͼ Table 17-2. 16-Bit Addressing Forms with the ModR/M Byte r8(/r) AL CL DL BL AH CH DH BH r16(/r) AX CX DX BX SP BP SI DI r32(/r) EAX ECX EDX EBX ESP EBP ESI EDI /digit (Opcode) 0 1 2 3 4 5 6 7 REG = 000 001 010 011 100 101 110 111 Effective ÚÄÄÄAddress disp8 denotes an 8-bit displacement following the ModR/M byte, to be sign-extended and added to the index. disp16 denotes a 16-bit displacement following the ModR/M byte, to be added to the index. Default segment register is SS for the effective addresses containing a BP index, DS for other effective addresses.ÄÄ¿ ÚMod R/M¿ ÚÄÄÄÄÄÄÄÄModR/M Values in HexadecimalÄÄÄÄÄÄÄÄ¿ [BX + SI] 000 00 08 10 18 20 28 30 38 [BX + DI] 001 01 09 11 19 21 29 31 39 [BP + SI] 010 02 0A 12 1A 22 2A 32 3A [BP + DI] 011 03 0B 13 1B 23 2B 33 3B [SI] 00 100 04 0C 14 1C 24 2C 34 3C [DI] 101 05 0D 15 1D 25 2D 35 3D disp16 110 06 0E 16 1E 26 2E 36 3E [BX] 111 07 0F 17 1F 27 2F 37 3F [BX+SI]+disp8 000 40 48 50 58 60 68 70 78 [BX+DI]+disp8 001 41 49 51 59 61 69 71 79 [BP+SI]+disp8 010 42 4A 52 5A 62 6A 72 7A [BP+DI]+disp8 011 43 4B 53 5B 63 6B 73 7B [SI]+disp8 01 100 44 4C 54 5C 64 6C 74 7C [DI]+disp8 101 45 4D 55 5D 65 6D 75 7D [BP]+disp8 110 46 4E 56 5E 66 6E 76 7E [BX]+disp8 111 47 4F 57 5F 67 6F 77 7F [BX+SI]+disp16 000 80 88 90 98 A0 A8 B0 B8 [BX+DI]+disp16 001 81 89 91 99 A1 A9 B1 B9 [BX+SI]+disp16 010 82 8A 92 9A A2 AA B2 BA [BX+DI]+disp16 011 83 8B 93 9B A3 AB B3 BB [SI]+disp16 10 100 84 8C 94 9C A4 AC B4 BC [DI]+disp16 101 85 8D 95 9D A5 AD B5 BD [BP]+disp16 110 86 8E 96 9E A6 AE B6 BE [BX]+disp16 111 87 8F 97 9F A7 AF B7 BF EAX/AX/AL 000 C0 C8 D0 D8 E0 E8 F0 F8 ECX/CX/CL 001 C1 C9 D1 D9 E1 E9 F1 F9 EDX/DX/DL 010 C2 CA D2 DA E2 EA F2 FA EBX/BX/BL 011 C3 CB D3 DB E3 EB F3 FB ESP/SP/AH 11 100 C4 CC D4 DC E4 EC F4 FC EBP/BP/CH 101 C5 CD D5 DD E5 ED F5 FD ESI/SI/DH 110 C6 CE D6 DE E6 EE F6 FE EDI/DI/BH 111 C7 CF D7 DF E7 EF F7 FF ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ NOTES: disp8 denotes an 8-bit displacement following the ModR/M byte, to be sign-extended and added to the index. disp16 denotes a 16-bit displacement following the ModR/M byte, to be added to the index. Default segment register is SS for the effective addresses containing a BP index, DS for other effective addresses. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Table 17-3. 32-Bit Addressing Forms with the ModR/M Byte r8(/r) AL CL DL BL AH CH DH BH r16(/r) AX CX DX BX SP BP SI DI r32(/r) EAX ECX EDX EBX ESP EBP ESI EDI /digit (Opcode) 0 1 2 3 4 5 6 7 REG = 000 001 010 011 100 101 110 111 Effective ÚÄÄÄAddress [--] [--] means a SIB follows the ModR/M byte. disp8 denotes an 8-bit displacement following the SIB byte, to be sign-extended and added to the index. disp32 denotes a 32-bit displacement following the ModR/M byte, to be added to the index.ÄÄ¿ ÚMod R/M¿ ÚÄÄÄÄÄÄÄÄÄModR/M Values in HexadecimalÄÄÄÄÄÄÄ¿ [EAX] 000 00 08 10 18 20 28 30 38 [ECX] 001 01 09 11 19 21 29 31 39 [EDX] 010 02 0A 12 1A 22 2A 32 3A [EBX] 011 03 0B 13 1B 23 2B 33 3B [--] [--] 00 100 04 0C 14 1C 24 2C 34 3C disp32 101 05 0D 15 1D 25 2D 35 3D [ESI] 110 06 0E 16 1E 26 2E 36 3E [EDI] 111 07 0F 17 1F 27 2F 37 3F disp8[EAX] 000 40 48 50 58 60 68 70 78 disp8[ECX] 001 41 49 51 59 61 69 71 79 disp8[EDX] 010 42 4A 52 5A 62 6A 72 7A disp8[EPX]; 011 43 4B 53 5B 63 6B 73 7B disp8[--] [--] 01 100 44 4C 54 5C 64 6C 74 7C disp8[ebp] 101 45 4D 55 5D 65 6D 75 7D disp8[ESI] 110 46 4E 56 5E 66 6E 76 7E disp8[EDI] 111 47 4F 57 5F 67 6F 77 7F disp32[EAX] 000 80 88 90 98 A0 A8 B0 B8 disp32[ECX] 001 81 89 91 99 A1 A9 B1 B9 disp32[EDX] 010 82 8A 92 9A A2 AA B2 BA disp32[EBX] 011 83 8B 93 9B A3 AB B3 BB disp32[--] [--] 10 100 84 8C 94 9C A4 AC B4 BC disp32[EBP] 101 85 8D 95 9D A5 AD B5 BD disp32[ESI] 110 86 8E 96 9E A6 AE B6 BE disp32[EDI] 111 87 8F 97 9F A7 AF B7 BF EAX/AX/AL 000 C0 C8 D0 D8 E0 E8 F0 F8 ECX/CX/CL 001 C1 C9 D1 D9 E1 E9 F1 F9 EDX/DX/DL 010 C2 CA D2 DA E2 EA F2 FA EBX/BX/BL 011 C3 CB D3 DB E3 EB F3 FB ESP/SP/AH 11 100 C4 CC D4 DC E4 EC F4 FC EBP/BP/CH 101 C5 CD D5 DD E5 ED F5 FD ESI/SI/DH 110 C6 CE D6 DE E6 EE F6 FE EDI/DI/BH 111 C7 CF D7 DF E7 EF F7 FF ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ NOTES: [--] [--] means a SIB follows the ModR/M byte. disp8 denotes an 8-bit displacement following the SIB byte, to be sign-extended and added to the index. disp32 denotes a 32-bit displacement following the ModR/M byte, to be added to the index. ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ Table 17-4. 32-Bit Addressing Forms with the SIB Byte r32 EAX ECX EDX EBX ESP [*] [*] means a disp32 with no base if MOD is 00, [ESP] otherwise. This provides the following addressing modes: disp32[index] (MOD=00) disp8[EBP][index] (MOD=01) disp32[EBP][index] (MOD=10) ESI EDI Base = 0 1 2 3 4 5 6 7 Base = 000 001 010 011 100 101 110 111 ÚScaled Index [*] means a disp32 with no base if MOD is 00, [ESP] otherwise. This provides the following addressing modes: disp32[index] (MOD=00) disp8[EBP][index] (MOD=01) disp32[EBP][index] (MOD=10)¿ÚSS Index¿ ÚÄÄÄÄÄÄÄÄModR/M Values in HexadecimalÄÄÄÄÄÄÄÄ¿ [EAX] 000 00 01 02 03 04 05 06 07 [ECX] 001 08 09 0A 0B 0C 0D 0E 0F [EDX] 010 10 11 12 13 14 15 16 17 [EBX] 011 18 19 1A 1B 1C 1D 1E 1F none 00 100 20 21 22 23 24 25 26 27 [EBP] 101 28 29 2A 2B 2C 2D 2E 2F [ESI] 110 30 31 32 33 34 35 36 37 [EDI] 111 38 39 3A 3B 3C 3D 3E 3F [EAX*2] 000 40 41 42 43 44 45 46 47 [ECX*2] 001 48 49 4A 4B 4C 4D 4E 4F [ECX*2] 010 50 51 52 53 54 55 56 57 [EBX*2] 011 58 59 5A 5B 5C 5D 5E 5F none 01 100 60 61 62 63 64 65 66 67 [EBP*2] 101 68 69 6A 6B 6C 6D 6E 6F [ESI*2] 110 70 71 72 73 74 75 76 77 [EDI*2] 111 78 79 7A 7B 7C 7D 7E 7F [EAX*4] 000 80 81 82 83 84 85 86 87 [ECX*4] 001 88 89 8A 8B 8C 8D 8E 8F [EDX*4] 010 90 91 92 93 94 95 96 97 [EBX*4] 011 98 89 9A 9B 9C 9D 9E 9F none 10 100 A0 A1 A2 A3 A4 A5 A6 A7 [EBP*4] 101 A8 A9 AA AB AC AD AE AF [ESI*4] 110 B0 B1 B2 B3 B4 B5 B6 B7 [EDI*4] 111 B8 B9 BA BB BC BD BE BF [EAX*8] 000 C0 C1 C2 C3 C4 C5 C6 C7 [ECX*8] 001 C8 C9 CA CB CC CD CE CF [EDX*8] 010 D0 D1 D2 D3 D4 D5 D6 D7 [EBX*8] 011 D8 D9 DA DB DC DD DE DF none 11 100 E0 E1 E2 E3 E4 E5 E6 E7 [EBP*8] 101 E8 E9 EA EB EC ED EE EF [ESI*8] 110 F0 F1 F2 F3 F4 F5 F6 F7 [EDI*8] 111 F8 F9 FA FB FC FD FE FF ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ NOTES: [*] means a disp32 with no base if MOD is 00, [ESP] otherwise. This provides the following addressing modes: disp32[index] (MOD=00) disp8[EBP][index] (MOD=01) disp32[EBP][index] (MOD=10) ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄ 17.2.2 How to Read the Instruction Set Pages The following is an example of the format used for each 80386 instruction description in this chapter: CMC ÄÄ Complement Carry Flag Opcode Instruction Clocks Description F5 CMC 2 Complement carry flag The above table is followed by paragraphs labelled "Operation," "Description," "Flags Affected," "Protected Mode Exceptions," "Real Address Mode Exceptions," and, optionally, "Notes." The following sections explain the notational conventions and abbreviations used in these paragraphs of the instruction descriptions. 17.2.2.1 Opcode The "Opcode" column gives the complete object code produced for each form of the instruction. When possible, the codes are given as hexadecimal bytes, in the same order in which they appear in memory. Definitions of entries other than hexadecimal bytes are as follows: /digit: (digit is between 0 and 7) indicates that the ModR/M byte of the instruction uses only the r/m (register or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode. /r: indicates that the ModR/M byte of the instruction contains both a register operand and an r/m operand. cb, cw, cd, cp: a 1-byte (cb), 2-byte (cw), 4-byte (cd) or 6-byte (cp) value following the opcode that is used to specify a code offset and possibly a new value for the code segment register. ib, iw, id: a 1-byte (ib), 2-byte (iw), or 4-byte (id) immediate operand to the instruction that follows the opcode, ModR/M bytes or scale-indexing bytes. The opcode determines if the operand is a signed value. All words and doublewords are given with the low-order byte first. +rb, +rw, +rd: a register code, from 0 through 7, added to the hexadecimal byte given at the left of the plus sign to form a single opcode byte. The codes areÄÄ rb rw rd AL = 0 AX = 0 EAX = 0 CL = 1 CX = 1 ECX = 1 DL = 2 DX = 2 EDX = 2 BL = 3 BX = 3 EBX = 3 AH = 4 SP = 4 ESP = 4 CH = 5 BP = 5 EBP = 5 DH = 6 SI = 6 ESI = 6 BH = 7 DI = 7 EDI = 7 17.2.2.2 Instruction The "Instruction" column gives the syntax of the instruction statement as it would appear in an ASM386 program. The following is a list of the symbols used to represent operands in the instruction statements: rel8: a relative address in the range from 128 bytes before the end of the instruction to 127 bytes after the end of the instruction. rel16, rel32: a relative address within the same code segment as the instruction assembled. rel16 applies to instructions with an operand-size attribute of 16 bits; rel32 applies to instructions with an operand-size attribute of 32 bits. ptr16:16, ptr16:32: a FAR pointer, typically in a code segment different from that of the instruction. The notation 16:16 indicates that the value of the pointer has two parts. The value to the right of the colon is a 16-bit selector or value destined for the code segment register. The value to the left corresponds to the offset within the destination segment. ptr16:16 is used when the instruction's operand-size attribute is 16 bits; ptr16:32 is used with the 32-bit attribute. r8: one of the byte registers AL, CL, DL, BL, AH, CH, DH, or BH. r16: one of the word registers AX, CX, DX, BX, SP, BP, SI, or DI. r32: one of the doubleword registers EAX, ECX, EDX, EBX, ESP, EBP, ESI, or EDI. imm8: an immediate byte value. imm8 is a signed number between -128 and +127 inclusive. For instructions in which imm8 is combined with a word or doubleword operand, the immediate value is sign-extended to form a word or doubleword. The upper byte of the word is filled with the topmost bit of the immediate value. imm16: an immediate word value used for instructions whose operand-size attribute is 16 bits. This is a number between -32768 and +32767 inclusive. imm32: an immediate doubleword value used for instructions whose operand-size attribute is 32-bits. It allows the use of a number between +2147483647 and -2147483648. r/m8: a one-byte operand that is either the contents of a byte register (AL, BL, CL, DL, AH, BH, CH, DH), or a byte from memory. r/m16: a word register or memory operand used for instructions whose operand-size attribute is 16 bits. The word registers are: AX, BX, CX, DX, SP, BP, SI, DI. The contents of memory are found at the address provided by the effective address computation. r/m32: a doubleword register or memory operand used for instructions whose operand-size attribute is 32-bits. The doubleword registers are: EAX, EBX, ECX, EDX, ESP, EBP, ESI, EDI. The contents of memory are found at the address provided by the effective address computation. m8: a memory byte addressed by DS:SI or ES:DI (used only by string instructions). m16: a memory word addressed by DS:SI or ES:DI (used only by string instructions). m32: a memory doubleword addressed by DS:SI or ES:DI (used only by string instructions). m16:16, M16:32: a memory operand containing a far pointer composed of two numbers. The number to the left of the colon corresponds to the pointer's segment selector. The number to the right corresponds to its offset. m16 & 32, m16 & 16, m32 & 32: a memory operand consisting of data item pairs whose sizes are indicated on the left and the right side of the ampersand. All memory addressing modes are allowed. m16 & 16 and m32 & 32 operands are used by the BOUND instruction to provide an operand containing an upper and lower bounds for array indices. m16 & 32 is used by LIDT and LGDT to provide a word with which to load the limit field, and a doubleword with which to load the base field of the corresponding Global and Interrupt Descriptor Table Registers. moffs8, moffs16, moffs32: (memory offset) a simple memory variable of type BYTE, WORD, or DWORD used by some variants of the MOV instruction. The actual address is given by a simple offset relative to the segment base. No ModR/M byte is used in the instruction. The number shown with moffs indicates its size, which is determined by the address-size attribute of the instruction. Sreg: a segment register. The segment register bit assignments are ES=0, CS=1, SS=2, DS=3, FS=4, and GS=5. 17.2.2.3 Clocks The "Clocks" column gives the number of clock cycles the instruction takes to execute. The clock count calculations makes the following assumptions: þ The instruction has been prefetched and decoded and is ready for execution. þ Bus cycles do not require wait states. þ There are no local bus HOLD requests delaying processor access to the bus. þ No exceptions are detected during instruction execution. þ Memory operands are aligned. Clock counts for instructions that have an r/m (register or memory) operand are separated by a slash. The count to the left is used for a register operand; the count to the right is used for a memory operand. The following symbols are used in the clock count specifications: þ n, which represents a number of repetitions. þ m, which represents the number of components in the next instruction executed, where the entire displacement (if any) counts as one component, the entire immediate data (if any) counts as one component, and every other byte of the instruction and prefix(es) each counts as one component. þ pm=, a clock count that applies when the instruction executes in Protected Mode. pm= is not given when the clock counts are the same for Protected and Real Address Modes. When an exception occurs during the execution of an instruction and the exception handler is in another task, the instruction execution time is increased by the number of clocks to effect a task switch. This parameter depends on several factors: þ The type of TSS used to represent the current task (386 TSS or 286 TSS). þ The type of TSS used to represent the new task. þ Whether the current task is in V86 mode. þ Whether the new task is in V86 mode. Table 17-5 summarizes the task switch times for exceptions. Table 17-5. Task Switch Times for Exceptions New Task Old 386 TSS 286 TSS Task VM = 0 386 VM = 0 309 282 TSS 386 VM = 1 314 231 TSS 286 307 282 TSS 17.2.2.4 Description The "Description" column following the "Clocks" column briefly explains the various forms of the instruction. The "Operation" and "Description" sections contain more details of the instruction's operation. 17.2.2.5 Operation The "Operation" section contains an algorithmic description of the instruction which uses a notation similar to the Algol or Pascal language. The algorithms are composed of the following elements: Comments are enclosed within the symbol pairs "(*" and "*)". Compound statements are enclosed between the keywords of the "if" statement (IF, THEN, ELSE, FI) or of the "do" statement (DO, OD), or of the "case" statement (CASE ... OF, ESAC). A register name implies the contents of the register. A register name enclosed in brackets implies the contents of the location whose address is contained in that register. For example, ES:[DI] indicates the contents of the location whose ES segment relative address is in register DI. [SI] indicates the contents of the address contained in register SI relative to SI's default segment (DS) or overridden segment. Brackets also used for memory operands, where they mean that the contents of the memory location is a segment-relative offset. For example, [SRC] indicates that the contents of the source operand is a segment-relative offset. A B; indicates that the value of B is assigned to A. The symbols =, <>, ò, and ó are relational operators used to compare two values, meaning equal, not equal, greater or equal, less or equal, respectively. A relational expression such as A = B is TRUE if the value of A is equal to B; otherwise it is FALSE. The following identifiers are used in the algorithmic descriptions: þ OperandSize represents the operand-size attribute of the instruction, which is either 16 or 32 bits. AddressSize represents the address-size attribute, which is either 16 or 32 bits. For example, IF instruction = CMPSW THEN OperandSize 16; ELSE IF instruction = CMPSD THEN OperandSize 32; FI; FI; indicates that the operand-size attribute depends on the form of the CMPS instruction used. Refer to the explanation of address-size and operand-size attributes at the beginning of this chapter for general guidelines on how these attributes are determined. þ StackAddrSize represents the stack address-size attribute associated with the instruction, which has a value of 16 or 32 bits, as explained earlier in the chapter. þ SRC represents the source operand. When there are two operands, SRC is the one on the right. þ DEST represents the destination operand. When there are two operands, DEST is the one on the left. þ LeftSRC, RightSRC distinguishes between two operands when both are source operands. þ eSP represents either the SP register or the ESP register depending on the setting of the B-bit for the current stack segment. The following functions are used in the algorithmic descriptions: þ Truncate to 16 bits(value) reduces the size of the value to fit in 16 bits by discarding the uppermost bits as needed. þ Addr(operand) returns the effective address of the operand (the result of the effective address calculation prior to adding the segment base). þ ZeroExtend(value) returns a value zero-extended to the operand-size attribute of the instruction. For example, if OperandSize = 32, ZeroExtend of a byte value of -10 converts the byte from F6H to doubleword with hexadecimal value 000000F6H. If the value passed to ZeroExtend and the operand-size attribute are the same size, ZeroExtend returns the value unaltered. þ SignExtend(value) returns a value sign-extended to the operand-size attribute of the instruction. For example, if OperandSize = 32, SignExtend of a byte containing the value -10 converts the byte from F6H to a doubleword with hexadecimal value FFFFFFF6H. If the value passed to SignExtend and the operand-size attribute are the same size, SignExtend returns the value unaltered. þ Push(value) pushes a value onto the stack. The number of bytes pushed is determined by the operand-size attribute of the instruction. The action of Push is as follows: IF StackAddrSize = 16 THEN IF OperandSize = 16 THEN SP SP - 2; SS:[SP] value; (* 2 bytes assigned starting at byte address in SP *) ELSE (* OperandSize = 32 *) SP SP - 4; SS:[SP] value; (* 4 bytes assigned starting at byte address in SP *) FI; ELSE (* StackAddrSize = 32 *) IF OperandSize = 16 THEN ESP ESP - 2; SS:[ESP] value; (* 2 bytes assigned starting at byte address in ESP*) ELSE (* OperandSize = 32 *) ESP ESP - 4; SS:[ESP] value; (* 4 bytes assigned starting at byte address in ESP*) FI; FI; þ Pop(value) removes the value from the top of the stack and returns it. The statement EAX Pop( ); assigns to EAX the 32-bit value that Pop took from the top of the stack. Pop will return either a word or a doubleword depending on the operand-size attribute. The action of Pop is as follows: IF StackAddrSize = 16 THEN IF OperandSize = 16 THEN ret val SS:[SP]; (* 2-byte value *) SP SP + 2; ELSE (* OperandSize = 32 *) ret val SS:[SP]; (* 4-byte value *) SP SP + 4; FI; ELSE (* StackAddrSize = 32 *) IF OperandSize = 16 THEN ret val SS:[ESP]; (* 2 bytes value *) ESP ESP + 2; ELSE (* OperandSize = 32 *) ret val SS:[ESP]; (* 4 bytes value *) ESP ESP + 4; FI; FI; RETURN(ret val); (*returns a word or doubleword*) þ Bit[BitBase, BitOffset] returns the address of a bit within a bit string, which is a sequence of bits in memory or a register. Bits are numbered from low-order to high-order within registers and within memory bytes. In memory, the two bytes of a word are stored with the low-order byte at the lower address. If the base operand is a register, the offset can be in the range 0..31. This offset addresses a bit within the indicated register. An example, "BIT[EAX, 21]," is illustrated in Figure 17-3. If BitBase is a memory address, BitOffset can range from -2 gigabits to 2 gigabits. The addressed bit is numbered (Offset MOD 8) within the byte at address (BitBase + (BitOffset DIV 8)), where DIV is signed division with rounding towards negative infinity, and MOD returns a positive number. This is illustrated in Figure 17-4. þ I-O-Permission(I-O-Address, width) returns TRUE or FALSE depending on the I/O permission bitmap and other factors. This function is defined as follows: IF TSS type is 286 THEN RETURN FALSE; FI; Ptr [TSS + 66]; (* fetch bitmap pointer *) BitStringAddr SHR (I-O-Address, 3) + Ptr; MaskShift I-O-Address AND 7; CASE width OF: BYTE: nBitMask 1; WORD: nBitMask 3; DWORD: nBitMask 15; ESAC; mask SHL (nBitMask, MaskShift); CheckString [BitStringAddr] AND mask; IF CheckString = 0 THEN RETURN (TRUE); ELSE RETURN (FALSE); FI; þ Switch-Tasks is the task switching function described in Chapter 7. 17.2.2.6 Description The "Description" section contains further explanation of the instruction's operation. Figure 17-3. Bit Offset for BIT[EAX, 21] 31 21 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍËÍËÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º º º º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÊÍÊÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ÀÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄÄBITOFFSET = 21ÄÄÄÄÄÄÄÄÄÄÄÄÄÄÙ Figure 17-4. Memory Bit Indexing BIT INDEXING (POSITIVE OFFSET) 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 ÉÍÍÍÍÑÍÑÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍ» º ³ ³ ³ ³ º ÈÍÍÍÍÏÍÏÍÍÍÍÍÍÍÍÍÏÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍͼ ³ BITBASE + 1 ³ BITBASE ³ BITBASE - 1 ³ ³ ÀÄÄÄÄÄÄÄÄOFFSET = 13ÄÄÄÄÄÄÄÙ BIT INDEXING (NEGATIVE OFFSET) 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 ÉÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÑÍÍÍÑÍÑÍÍÍÍÍÍÍÍÍÍ» º ³ ³ ³ ³ º ÈÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍØÍÍÍÍÍÍÍÍÍÍÍÍÍÍÍÏÍÍÍÏÍÏÍÍÍÍÍÍÍÍÍͼ ³ BITBASE ³ BITBASE - 1 ³ BITBASE - 2 ³ ³ ÀÄÄÄÄÄOFFSET = -11ÄÄÄÙ 17.2.2.7 Flags Affected The "Flags Affected" section lists the flags that are affected by the instruction, as follows: þ If a flag is always cleared or always set by the instruction, the value is given (0 or 1) after the flag name. Arithmetic and logical instructions usually assign values to the status flags in the uniform manner described in Appendix C. Nonconventional assignments are described in the "Operation" section. þ The values of flags listed as "undefined" may be changed by the instruction in an indeterminate manner. All flags not listed are unchanged by the instruction. 17.2.2.8 Protected Mode Exceptions This section lists the exceptions that can occur when the instruction is executed in 80386 Protected Mode. The exception names are a pound sign (#) followed by two letters and an optional error code in parentheses. For example, #GP(0) denotes a general protection exception with an error code of 0. Table 17-6 associates each two-letter name with the corresponding interrupt number. Chapter 9 describes the exceptions and the 80386 state upon entry to the exception. Application programmers should consult the documentation provided with their operating systems to determine the actions taken when exceptions occur. Table 17-6. 80386 Exceptions Mnemonic Interrupt Description 17.2.2.9 Real Address Mode Exceptions Because less error checking is performed by the 80386 in Real Address Mode, this mode has fewer exception conditions. Refer to Chapter 14 for further information on these exceptions.continue |
Creato da: Astalalista - Ultima modifica: 26/Gen/2004 alle 02:57 | ||
Labelled with ICRA |
This page is powered by Copyright Button(TM). Click here to read how this page is protected by copyright laws. |
Please send any comments to Webmaster |