Introduction to Protected Mode Programming
A Brief History of the 80x86
Back in 1971 Intel was approached by a (now defunct) Japanese corporation
to build a custom circuit for a new calculator. Intel designer Ted Hoff
proposed that a programmable, general-purpose computing circuit be built
instead, and the 4004 was born. The 4040 and the 8008 chips soon followed,
but they lacked many characteristics of microprocessors as we know them today.
In 1974 Intel introduced the 8080, which were used in such systems as the
Altair and the IMSAI. Soon after that Motorola introduced the 6800 and
MOS Technology came out with the 6502. Two of the 8080 designers
left Intel for Zilog Corporation, which came out with the Z80 (which
was compatable with the 8080, but was twice as fast and had an expanded
instruction set).
The 8080 was an 8-bit machine. It had a single accumulator (the A register)
and six secondary registers (B, C, D, E, H, and L). These six registers
could be used in 8-bit arithmetic operations or combined as pairs (BC, DE,
or HL) to hold 16-bit memory addresses. A 16-bit address is able
to access only 64KB of memory (216 bytes). In 1978 Intel moved to a
16-bit architecture with the 8086. Unfortunately, programs on the 8080
wouldn't run on the 8086 (we miss the retc (return on carry) instruction).
However, every new generation of processor since then has been able to run
software written for the previous generation.
The 8086 introduced segmentation to the microprocessor world. A
segment is a block of memory beginning at a fixed address that is determined
by the value in the appropiate segment register. This was probably
the most despised feature of the 8086 because of the restrictions it
imposes. Each segment was only 64K in length. However, using segmentation,
software could expand the amount of memory the chip could address. The
8086 provides four segment registers that can point anywhere in the 1MB
address space. Those four are:
- CS - The code segment register. All calls and jumps refer to locations
within the code segment.
- DS - The data segment register. Most memory reference instructions
refer to an offset within the data segment.
- SS - The stack segment register. All PUSH and POP instructions access
data in the stack segment. Additionally, Borland Pascal's
default segment register is
SS.
- ES - The extra segment register. This segment specifies the destination
segment in certain string and memory move instructions (ie, stosb, movsb,
stosw, movsw, scasb, etc).
In 1982 Intel introduced the 80286. The 286 supported two modes:
real mode(RM) and
protected mode (PM). Real mode, which emulates the 8086,
is the default mode. The 286 placed a new interpretation on the contents
of the segment registers that control how memory is accessed. PM allowed
memory from 1MB to 16MB to be physically addressable. Due to the lack of
support for protected mode (and that it was a real pain to program for on
the 286), many programs didn't take advantage of PM.
With the advent of the 80386 chip, most of the shortcomings of the previous
processors were fixed. It was a true 32-bit processor, with 32-bit addressing.
However, in order to maintain compatibility, the 386+ processors boot up
in real mode, use 16-bit registers and the 16-bit segmentation scheme, and
is subject to the 1MB memory limitation. But the 386 can also be switched
into protected mode. In PM, each segment is marked by a bit that designates
whether the segment is a PM segment containing 16-bit 80286 code or a 32-bit
PM segment.
Addressing Differences
16-bit Real Mode
16-bit real mode pointers can either be 16 or 32-bits. When coding in
pure assembly memory allocation is done in paragraph chunks (16-bytes
at a time). Because segments fall on paragraph boundaries, it is
enough to return a segment value. Offsets simply start at 0 into the
segment (ie, if ES is the allocated memory segment, ES:[0] is the first
byte of the memory). Since there is no memory protection, the only way
to generate a protection fault is to use 32-bit addressing. Other languages
use a 16-bit segment and 16-bit offset. 1 megabyte is addressable, but
all the memory above 640K is taken up by the system (screen memory, BIOS
area, etc).
32-bit Real Mode (unreal mode)
32-bit real mode was first introduced to the general public by a demo group
back in early 1992 (I'm not sure which group released it first). Origin's
Ultima VII utilized this mode. This mode requires that the machine be dropped
into protected mode, the segment limits set to 4 gigabytes, and then the
machine is popped back to real-mode without a CPU reset. All of the normal
real-mode functions work fine until another program goes into protected mode.
Since EMM386 does this to access extended memory, EMM386 must be disabled.
32-bit addressing is allows in this mode, so most of the time segment registers
are set to 0 and only the 32-bit offset it used. The only way to generate a
protection fault is to write to a memory address above the memory installed or
above the segment limits (which are almost always set to 4 gigabytes). Normal
BIOS calls can still be executed and most software will work in this mode. The
most memory addressable is 4 gigabytes in this mode.
16-bit Protected Mode
16-bit Protected Mode is (AFAIK) exclusive to Borland Pascal 7.0.
The segment limits are set to 64K and the compiler will only do
16-bit addressing. Most BIOS interrupts work, but some special care
has to be taken in order to do some real-mode specific things. Any
BIOS interrupt that accepts values in segment registers has to be
called with a Real-Mode callback. This is the easiest protected-mode
to program for since very few modifications have to be made for 16-bit
RM programs (making for easy ports of applications). Pointers are usually
32-bits, 16-bit selector and 16-bit offset. The most memory that can be
allocated is 16 megabytes.
32-bit Protected Mode
32-bit protected mode is almost a standard now with C and C++ compilers.
Watcom C, GNU C and Borland C are all now 32-bit compilers. For the most
part the segment registers are set to the base of memory and only the
32-bit offset is used. Therefore, pointers are 32-bits. Programming
in 32-bit PM is very difficult, as most of the BIOS calls don't work
directly (a real-mode callback must be used). The most memory addressable
is 4 gigabytes.
48-bit Protected Mode
48-bit protected mode is also (AFAIK) exclusive to Borland Pascal 7.0.
While the compiler does not support 48-bit addressing, it is possible to
use 32-bit offsets with the selectors. A special unit (called NewFrontier)
has to be used in order to allocate the 48-bit pointers (16-bit selector,
32-bit offset). The same BIOS problems in 16-bit protected mode apply to
48-bit protected mode as well. This mode allows a maximum alloctaion of 64
terabytes of memory (the maximum amount supported by the Intel chipset).
Protected Mode in Borland Pascal 7.0
Borland Pascal 7.0 is a 16-bit protected mode compiler and it allows
programmers to use up to 16 MB of memory. The only drawback is that it
uses 286 code and only allows for 16-bit addressing. However, with
some work, it is possible to do 32-bit addressing, and even 48-bit
addressing.
Most of the differences between coding RM applications and PM applications
is the way that one accesses memory. In RM, one could put any value into
a segment register and use that as a base address (or as a temporary
variable). In PM, this isn't possible, since there is no correlation
between what is in the segment register and the memory it accesses. Instead,
the value in the segment register is really an offset into a table (known
as the LDT and GDT, Local Descriptor Table and Global Descriptor Table) that
holds the real memory value. If you do try to load an invalid value into
a segment register, the program will produce a General Protection Fault
(GPF).
Borland Pascal's DPMI server automatically deals with allocating
selectors, which means that you, as a programmer, don't need to worry
about the memory you allocate. What you do need to worry about is whether you
have any absolute memory addresses hard-wired into your programs.
For example instead of loading 0a000h into a segment register,
use the BP defined variable SegA000.
Self Modifying Code (SMC)
Self modifying code is generally frowned upon as a bad coding practice.
Unless you absolutely (and I mean absolutely) need to use it, I
recommend that you do not. In Real Mode, SMC is not difficult, since one
can write to the code segment. In Protected Mode, however, each selector has
a flag that designates read/write or read-only, and code selectors are always
flagged as read-only. A GPF occurs if you attempt to write to a code selector.
If necessary you can use an alias to write SMC.
Other Problems
Most of the Interrupt calls are understood and dealt with by the DPMI server.
Some, such as VESA
information calls, are not. The reason they are not handled is that the
Real Mode segment value of a memory address must be passed as a parameter
in a segment register. Since the RM value is not necessarily a valid
selector, the call will most likely crash. Actually, just loading the
segment register will produce a GPF, and the interrupt call will never
occur. Once the Real Mode callback is written, another problem occurs.
The VESA information call returns RM pointers. One must covert
those pointers to PM pointers before dereferencing them. There is a DPMI
interrupt call to do this, which will be discussed later.