Detecting DOSBox from within the Box

Created on 2025-12-15 at 19:04

If you're the sort of person who reads blogs, I assume you need no introduction to DOSBox. It's an MS-DOS emulator, which necessitates it being a sort of x86 emulator. But unlike x86 emulators like 86Box or QEMU, the DOS parts are an inextricable part of it. There are BIOS interrupts and a POST, but not a BIOS in the sense of "a ROM chip mapped into memory." There isn't even really a DOS, in the traditional sense. But when you're running inside DOSBox, you wouldn't know it. Almost any DOS API you can expect is available, and effort was put into making sure features like Long File Names don't appear if your reported version is too old to have supported it. So how can you detect that which seeks not to be detected?

Most MS-DOS-likes aren't perfect replicas of MS-DOS, and you can usually use those quirks or extra functions to figure out what you're running on.^[1] And one would imagine DOSBox is the same! "Quirks" are more likely bugs waiting to be resolved, but commands like MOUNT and VER seem to have the ability to poke through to the outside world, so maybe there's an extra function somewhere?

Easy Mode: The Correct Way

Okay, I know you're screaming it at your screen: the simplest way is to just get the string at FE00:0061—which everybody knows is the common Award BIOS version string address^[2]—and check if it starts with DOSBox. But that's so brittle, y'know? I could just modify a non-DOSBox BIOS to have that version string, or modify DOSBox to have the model string be something else. There's even a comment in DOSBox-X (source) that alludes to this being a desirable change in the future:

    /* TODO: *DO* allow dynamic relocation however if the dosbox-x.conf indicates that the user
     *       is not interested in IBM BIOS compatibility. Also, it would be really cool if
     *       dosbox-x.conf could override these strings and the user could enter custom BIOS
     *       version and ID strings. Heh heh heh.. :) */

So of course I can't take this route! There are other easier ways, like checking the serial number of the Z: drive (or if it exists, for that matter). But these can all be faked pretty easily. No, we must find something that's an inherent part of the emulator. Something that proves this is DOSBox.

Inventing Instructions

Let's go back to how DOSBox can talk to the outside world with commands like MOUNT.COM. COM files are just machine code, meaning we can run it directly through a disassembler. So let's do that, with a copy of MOUNT.COM from DOSBox:

$ ndisasm MOUNT.COM
00000000  BC0004            mov sp,0x400
00000003  BB4000            mov bx,0x40
00000006  B44A              mov ah,0x4a
00000008  CD21              int byte 0x21
0000000A  FE                db 0xfe
0000000B  3805              cmp [di],al
0000000D  00B8004C          add [bx+si+0x4c00],bh
00000011  CD21              int byte 0x21
00000013  02                db 0x02

The first four lines make sense: INT 21h Function 4Ah shrinks the stack to 0x40 paragraphs (128 bytes). But the next couple lines are... basically garbage. db 0xfe just means "there's a byte here, 0xfe", and your typical x86 CPU would balk at this and throw an Invalid Instruction exception.

But when you're writing an x86 CPU, you can just invent your own instructions! Lo and behold, in the DOSBox sources:

/* Snippet from src/cpu/core_normal/prefix_none.h */
CASE_B(0xfe)               /* GRP4 Eb */
    {
	    GetRM;Bitu which=(rm>>3)&7;
	    switch (which) {
			case 0x00:     /* INC Eb */
			    RMEb(INCB);
			    break;
			case 0x01:     /* DEC Eb */
			    RMEb(DECB);
			    break;
			case 0x07:     /* CallBack */
			    {
			        Bitu cb=Fetchw();
			        FillFlags();SAVEIP;
			        return cb;
			    }
			default:
				E_Exit("Illegal GRP4 Call %d",(rm>>3) & 7);
				break;
	    }
	    break;
    }

This is the code for decoding the FE group of opcodes. 0x00 is INC and 0x01 is DEC, both real opcodes on x86.

But that last one, 0x07, that is a DOSBox exclusive. The word after the opcode is used to say which callback should be called...back, and breaks out. So, to fix up the disassembly from earlier, it might look like this:

00000000  BC0004            mov sp,0x400
00000003  BB4000            mov bx,0x40
00000006  B44A              mov ah,0x4a
00000008  CD21              int byte 0x21
0000000A  FE380500          CallBack 0x0005
0000000E  B8004C            mov ax,0x4c00
00000011  CD21              int byte 0x21

Aside: Tripping and falling into the weeds of x86 Instruction Encoding

In the first draft of this, I wrote:

I'll try not to trip and fall into the weeds of x86 instruction coding [...]

But the way the callback opcode works is directly because of how x86 opcodes work. And I don't feel like it's fair to expect anyone to know how x86 instructions are encoded. I want my ramblings to be at least somewhat accessible, even if I've already thrown assembly code at you in the first half.

If you already know or don't care how this works, feel free to skip this. If you're really curious, my primary source here is Volume 2 of the Intel 64 and IA-32 Architectures Software Developer's Manual, found here. I'll cite chapters in parentheses through the rest of this section.

So. Machine code is split up into quite a few parts, with the opcode itself only being one-and-a-half. (2.1) For the sake of conciseness, we'll ignore everything but the opcode, ModR/M, and Immediate bytes, since that's what we're using here.

Let's take a snippet of that earlier disassembly:

0000000A  FE380500          CallBack 0x0005
0000000E  B8004C            mov ax,0x4c00

and turn it into hex:

FE 38 05 00    00 B8 00 4C

Without a prefix of 0F, we know the opcode is just FE. (A.3, Table A-2) But this is a group, "INC/DEC Grp 4," which uses the Opcode bits of the next byte, the ModR/M byte, to actually determine the opcode. That byte is split up like this:

Byte:   00 111 000  (0x38)
Mod:    00          (0x00)
Opcode:    111      (0x07)
R/M:           000  (0x00)

For our purposes, only the Opcode field matters. So this can be read as FE /7. According to the Opcode Extensions table, (A.4.2, Table A-6) this doesn't actually exist. Only FE /0 and FE /1 exist in this group. But we know DOSBox supports a secret FE /7, so we'll have to rely on its source code to know what to do next. And it does this:

Bitu cb=Fetchw();
FillFlags();SAVEIP;
return cb;

Importantly, Fetchw() fetches the next word and returns it (effectively, telling the machine "call this callback"). Since x86 is little-endian, 05 00 becomes 00 05.

Once the callback is complete, the next instruction is called. That'll be B8 00 4C. B8 is MOV AX,XXXX. This instruction takes a 16-bit immediate, which is the 00 4C value (4c00 in little-endian). And so on and so forth.

Anyway, here it is in the part of the code that generates virtual programs like MOUNT.COM:

/* Snippet from src/misc/programs.cpp */
static Bit8u exe_block[]={
    0xbc,0x00,0x04,                 //MOV SP,0x400 decrease stack size
    0xbb,0x40,0x00,                 //MOV BX,0x040 for memory resize
    0xb4,0x4a,                      //MOV AH,0x4A   Resize memory block
    0xcd,0x21,                      //INT 0x21
//pos 12 is callback number
    0xFE,0x38,0x00,0x00,            //CALLBack number
    0xb8,0x00,0x4c,                 //Mov ax,4c00
    0xcd,0x21,                      //INT 0x21
};

Conveniently, since callbacks are returned the same way as the general status, FE 38 00 00 is effectively a four-byte NOP! On DOSBox, anyway.

Other x86 CPUs won't have such fortune. Since the 80186, invalid instructions trigger a #UD (Undefined Opcode) exception, or Interrupt 06h. So we just need to write an exception handler. Something like this:

_catchUD:
	; Current IP is at the top of the stack, so +4 after we push ax/bx
	push bx
	push ax
	
	mov bx, sp
	mov bx, WORD [ss:bx+4]
	mov ax, bx
	
	mov bx, WORD [cs:bx] ; will copy little-endian (i.e. 0x38fe)
	and bh, 38h
	cmp bx, 38feh
	je .notDosbox
	
	; if we end up here, something went really wrong! clean up the IVT
	; and IRET so the actual #UD handler is called.
	; Since we don't modify the IP, it'll re-run the invalid opcode.
	push es
	xor ax, ax
	mov es, ax
	mov bx, [oldUDAddr] ; previous int 06h addr
	mov [es:18h], bx ; 06h*4
	mov bx, [oldUDSeg] ; previous int 06h segment
	mov [es:20h], bx ; (06h*4)+2
	pop es
	pop ax
	jmp .catchDone
	
	.notDosbox:
	; Not DOSBox -- increment the IP and zero AX
	; You can of course do whatever here, like setting a global
	add ax, 4
	mov bx, sp
	mov WORD [ss:bx+4], ax
	xor ax, ax
	add sp, 2 ; AX unneeded
	
	.catchDone:
	pop bx
	iret

which, once set up, could be tested like this:

	mov ax, 42
	db 0xfe, 0x38, 0x00, 0x00
	; was the exception handler here?
	cmp ax, 0
	jz .notDosbox ; Not DOSBox!
	; DOSBox-only code starts here!
	.notDosbox:
	; Non-DOSBox code starts here!

Add in some extra instructions to reset the interrupt 06h vector once done, and we should have a pretty good check for DOSBox!

DEBUGging x86

At this point in writing, I decided it'd be a good time to test this on hardware. But my Pentium II systems are currently a bit buried, and it'd be hard to get good screenshots of them anyway... so I figured I'd use 86Box.

This did not go according to plan:

Screenshot of a DOS program saying, "Yep, that is a DOSBox!"

Importantly, this is not DOSBox. But that's okay, because we can just step through it with the DEBUG program in MS-DOS and see what's going wrong. It's not the most, er, friendly program, but it's enough to get the job done in a case like this.

There's a command, t, which lets you step through the code one instruction at a time. (Well, mostly.) So we'll step through to the callback instruction, and we can see here DEBUG has no idea what's going on, even if it encodes it correctly enough... but then steps through it as though it's valid!?

Screenshot of DOS DEBUG.COM

At this point, I was entirely confused. Is this some secret undocumented instruction? Does 86Box ignore invalid instructions for some reason? Do invalid instruction exceptions not work how I thought they do? Could a DOS driver somehow mask the interrupt?

I'll save you the days of troubleshooting I spent on this: 86Box inherited a bug from PCem where any ModR/M opcode modifier other than 0 was treated as FE /1.^[3] So FE /2, FE /4, and FE /7 all acted as DEC calls. Thankfully the fix was pretty simple, and it's already been merged upstream.

As mentioned in the PR, special thanks to linear for testing this on actual hardware so we can be (at least somewhat) sure this isn't just an Intel documentation issue.

The Finished Product(?)

If you want to run the sample program I wrote for this, you can get it on my Git forge here. You'll need NASM to compile it. It'll run fine on DOSBox and DOSBox-X, at least.

While this was a fun project on its own, my intent wasn't just to detect DOSBox. It just happened to be the trickiest to figure out. NTVDM and the Win9x MS-DOS Prompt are easier to detect, basically just a single INT 2Fh call. There's another DOS emulator for linux, aptly named DOSEMU, which has... a surprising amount of callback APIs. They're all implemented as COM files (e.g. UNIX.COM lets you run arbitrary commands on the host system), so it's not like they're hidden features. Of course, none of these are quite as hard to spoof as a custom CPU instruction, but they're more liable to cause side effects than changing a BIOS string would.

Just ask Microsoft!
Okay but actually, I struggled to find any definitive information on whether this originated with Award BIOS, or even official documentation on it. If you have any authoritative documentation on this that I've missed, please feel free to let me know!
And to be clear, I don't blame the developers of either project for this bug sticking around this long. It's such a niche use case, I'd be surprised if anybody was doing this sort of thing!


Menu

	Home Post Archive Projects girlwiki 2ki Friends & Links Switch Dark/Light Mode	Detecting DOSBox from within the Box Created on 2025-12-15 at 19:04 If you're the sort of person who reads blogs, I assume you need no introduction to DOSBox. It's an MS-DOS emulator, which necessitates it being a sort of x86 emulator. But unlike x86 emulators like 86Box or QEMU, the DOS parts are an inextricable part of it. There are BIOS interrupts and a POST, but not a BIOS in the sense of "a ROM chip mapped into memory." There isn't even really a DOS, in the traditional sense. But when you're running inside DOSBox, you wouldn't know it. Almost any DOS API you can expect is available, and effort was put into making sure features like Long File Names don't appear if your reported version is too old to have supported it. So how can you detect that which seeks not to be detected? Most MS-DOS-likes aren't perfect replicas of MS-DOS, and you can usually use those quirks or extra functions to figure out what you're running on.^[1] And one would imagine DOSBox is the same! "Quirks" are more likely bugs waiting to be resolved, but commands like `MOUNT` and `VER` seem to have the ability to poke through to the outside world, so maybe there's an extra function somewhere? Easy Mode: The Correct Way Okay, I know you're screaming it at your screen: the simplest way is to just get the string at `FE00:0061`—which everybody knows is the common Award BIOS version string address^[2]—and check if it starts with `DOSBox`. But that's so brittle, y'know? I could just modify a non-DOSBox BIOS to have that version string, or modify DOSBox to have the model string be something else. There's even a comment in DOSBox-X (source) that alludes to this being a desirable change in the future: `/* TODO: DO allow dynamic relocation however if the dosbox-x.conf indicates that the user * is not interested in IBM BIOS compatibility. Also, it would be really cool if * dosbox-x.conf could override these strings and the user could enter custom BIOS * version and ID strings. Heh heh heh.. :) /` So of course I can't take this route! There are other easier ways, like checking the serial number of the Z: drive (or if it exists, for that matter). But these can all be faked pretty easily. No, we must find something that's an inherent part of the emulator. Something that proves this is DOSBox. Inventing Instructions Let's go back to how DOSBox can talk to the outside world with commands like `MOUNT.COM`. COM files are just machine code, meaning we can run it directly through a disassembler. So let's do that, with a copy of MOUNT.COM from DOSBox: `$ ndisasm MOUNT.COM 00000000 BC0004 mov sp,0x400 00000003 BB4000 mov bx,0x40 00000006 B44A mov ah,0x4a 00000008 CD21 int byte 0x21 0000000A FE db 0xfe 0000000B 3805 cmp [di],al 0000000D 00B8004C add [bx+si+0x4c00],bh 00000011 CD21 int byte 0x21 00000013 02 db 0x02` The first four lines make sense: `INT 21h Function 4Ah` shrinks the stack to 0x40 paragraphs (128 bytes). But the next couple lines are... basically garbage. `db 0xfe` just means "there's a byte here, `0xfe`", and your typical x86 CPU would balk at this and throw an Invalid Instruction exception. But when you're writing an x86 CPU, you can just invent your own instructions! Lo and behold, in the DOSBox sources: `/ Snippet from src/cpu/core_normal/prefix_none.h / CASE_B(0xfe) / GRP4 Eb / { GetRM;Bitu which=(rm>>3)&7; switch (which) { case 0x00: / INC Eb / RMEb(INCB); break; case 0x01: / DEC Eb / RMEb(DECB); break; case 0x07: / CallBack / { Bitu cb=Fetchw(); FillFlags();SAVEIP; return cb; } default: E_Exit("Illegal GRP4 Call %d",(rm>>3) & 7); break; } break; }` This is the code for decoding the `FE` group of opcodes. `0x00` is INC and `0x01` is DEC, both real opcodes on x86. But that last one, `0x07`, that* is a DOSBox exclusive. The word after the opcode is used to say which callback should be called...back, and breaks out. So, to fix up the disassembly from earlier, it might look like this: `00000000 BC0004 mov sp,0x400 00000003 BB4000 mov bx,0x40 00000006 B44A mov ah,0x4a 00000008 CD21 int byte 0x21 0000000A FE380500 CallBack 0x0005 0000000E B8004C mov ax,0x4c00 00000011 CD21 int byte 0x21` Aside: Tripping and falling into the weeds of x86 Instruction Encoding In the first draft of this, I wrote: I'll try not to trip and fall into the weeds of x86 instruction coding [...] But the way the callback opcode works is directly because of how x86 opcodes work. And I don't feel like it's fair to expect anyone to know how x86 instructions are encoded. I want my ramblings to be at least somewhat accessible, even if I've already thrown assembly code at you in the first half. If you already know or don't care how this works, feel free to skip this. If you're really curious, my primary source here is Volume 2 of the Intel 64 and IA-32 Architectures Software Developer's Manual, found here. I'll cite chapters in parentheses through the rest of this section. So. Machine code is split up into quite a few parts, with the opcode itself only being one-and-a-half. (2.1) For the sake of conciseness, we'll ignore everything but the opcode, ModR/M, and Immediate bytes, since that's what we're using here. Let's take a snippet of that earlier disassembly: `0000000A FE380500 CallBack 0x0005 0000000E B8004C mov ax,0x4c00` and turn it into hex: `FE 38 05 00 00 B8 00 4C` Without a prefix of `0F`, we know the opcode is just `FE`. (A.3, Table A-2) But this is a group, "INC/DEC Grp 4," which uses the Opcode bits of the next byte, the ModR/M byte, to actually determine the opcode. That byte is split up like this: `Byte: 00 111 000 (0x38) Mod: 00 (0x00) Opcode: 111 (0x07) R/M: 000 (0x00)` For our purposes, only the Opcode field matters. So this can be read as `FE /7`. According to the Opcode Extensions table, (A.4.2, Table A-6) this doesn't actually exist. Only `FE /0` and `FE /1` exist in this group. But we know DOSBox supports a secret `FE /7`, so we'll have to rely on its source code to know what to do next. And it does this: `Bitu cb=Fetchw(); FillFlags();SAVEIP; return cb;` Importantly, `Fetchw()` fetches the next word and returns it (effectively, telling the machine "call this callback"). Since x86 is little-endian, `05 00` becomes `00 05`. Once the callback is complete, the next instruction is called. That'll be `B8 00 4C`. `B8` is `MOV AX,XXXX`. This instruction takes a 16-bit immediate, which is the `00 4C` value (`4c00` in little-endian). And so on and so forth. Anyway, here it is in the part of the code that generates virtual programs like MOUNT.COM: `/* Snippet from src/misc/programs.cpp / static Bit8u exe_block[]={ 0xbc,0x00,0x04, //MOV SP,0x400 decrease stack size 0xbb,0x40,0x00, //MOV BX,0x040 for memory resize 0xb4,0x4a, //MOV AH,0x4A Resize memory block 0xcd,0x21, //INT 0x21 //pos 12 is callback number 0xFE,0x38,0x00,0x00, //CALLBack number 0xb8,0x00,0x4c, //Mov ax,4c00 0xcd,0x21, //INT 0x21 };` Conveniently, since callbacks are returned the same way as the general status, `FE 38 00 00` is effectively a four-byte NOP! On DOSBox, anyway. Other x86 CPUs won't have such fortune. Since the 80186, invalid instructions trigger a `#UD` (Undefined Opcode) exception, or Interrupt 06h. So we just need to write an exception handler. Something like this: _catchUD: ; Current IP is at the top of the stack, so +4 after we push ax/bx push bx push ax mov bx, sp mov bx, WORD [ss:bx+4] mov ax, bx mov bx, WORD [cs:bx] ; will copy little-endian (i.e. 0x38fe) and bh, 38h cmp bx, 38feh je .notDosbox ; if we end up here, something went really wrong! clean up the IVT ; and IRET so the actual #UD handler is called. ; Since we don't modify the IP, it'll re-run the invalid opcode. push es xor ax, ax mov es, ax mov bx, [oldUDAddr] ; previous int 06h addr mov [es:18h], bx ; 06h4 mov bx, [oldUDSeg] ; previous int 06h segment mov [es:20h], bx ; (06h*4)+2 pop es pop ax jmp .catchDone .notDosbox: ; Not DOSBox -- increment the IP and zero AX ; You can of course do whatever here, like setting a global add ax, 4 mov bx, sp mov WORD [ss:bx+4], ax xor ax, ax add sp, 2 ; AX unneeded .catchDone: pop bx iret which, once set up, could be tested like this: `mov ax, 42 db 0xfe, 0x38, 0x00, 0x00 ; was the exception handler here? cmp ax, 0 jz .notDosbox ; Not DOSBox! ; DOSBox-only code starts here! .notDosbox: ; Non-DOSBox code starts here!` Add in some extra instructions to reset the interrupt 06h vector once done, and we should have a pretty good check for DOSBox! DEBUGging x86 At this point in writing, I decided it'd be a good time to test this on hardware. But my Pentium II systems are currently a bit buried, and it'd be hard to get good screenshots of them anyway... so I figured I'd use 86Box. This did not go according to plan: Importantly, this is not DOSBox. But that's okay, because we can just step through it with the `DEBUG` program in MS-DOS and see what's going wrong. It's not the most, er, friendly program, but it's enough to get the job done in a case like this. There's a command, `t`, which lets you step through the code one instruction at a time. (Well, mostly.) So we'll step through to the callback instruction, and we can see here DEBUG has no idea what's going on, even if it encodes it correctly enough... but then steps through it as though it's valid!? At this point, I was entirely confused. Is this some secret undocumented instruction? Does 86Box ignore invalid instructions for some reason? Do invalid instruction exceptions not work how I thought they do? Could a DOS driver somehow mask the interrupt? I'll save you the days of troubleshooting I spent on this: 86Box inherited a bug from PCem where any ModR/M opcode modifier other than 0 was treated as `FE /1`.^[3] So `FE /2`, `FE /4`, and `FE /7` all acted as DEC calls. Thankfully the fix was pretty simple, and it's already been merged upstream. As mentioned in the PR, special thanks to linear for testing this on actual hardware so we can be (at least somewhat) sure this isn't just an Intel documentation issue. The Finished Product(?) If you want to run the sample program I wrote for this, you can get it on my Git forge here. You'll need NASM to compile it. It'll run fine on DOSBox and DOSBox-X, at least. While this was a fun project on its own, my intent wasn't just to detect DOSBox. It just happened to be the trickiest to figure out. NTVDM and the Win9x MS-DOS Prompt are easier to detect, basically just a single `INT 2Fh` call. There's another DOS emulator for linux, aptly named DOSEMU, which has... a surprising amount of callback APIs. They're all implemented as COM files (e.g. `UNIX.COM` lets you run arbitrary commands on the host system), so it's not like they're hidden features. Of course, none of these are quite as hard to spoof as a custom CPU instruction, but they're more liable to cause side effects than changing a BIOS string would. Just ask Microsoft! Okay but actually, I struggled to find any definitive information on whether this originated with Award BIOS, or even official documentation on it. If you have any authoritative documentation on this that I've missed, please feel free to let me know! And to be clear, I don't blame the developers of either project for this bug sticking around this long. It's such a niche use case, I'd be surprised if anybody was doing this sort of thing!
Site by snow flurry, 2022-2025. Writing and code snippets are licensed under CC BY-NC 4.0, except where noted. Best viewed in 1024x768 resolution or better. Tested with Chromium, WebKit, NetSurf, and Gecko (Mozilla 1.6/Netscape 7.0 or later). Site by snow flurry. Writing CC BY-NC 4.0 except where noted.