Yesterday, we took a break after long hours of intensive coding, and a coworker started establishing similarities between our current frantic coding and the (fortunately) gone days of college homework. I specifically recalled a project I had to build by using MS-DEBUG: a simple calculator in assembly, which also required the hassle of dealing with pretty and safe user input. I have no intention of looking for such listings, but I thought about revisiting, for a moment, the old and dear friend MS-DEBUG 🙂 I’ll harness a previous post in this blog, and try to build a little ‘hello, world’ program in MS-DEBUG. I know this has little value outside a personal feeling and a tad of nostalgia, maybe.
I remember that a 100
tells DEBUG to accept code in memory starting out from CS:0100
. Ok.
Now comes the data, which here simply consist of the string 'hello, world!'
. A neat output, however, would require newlines before and after our intended string. A newline is comprised of a carriage return (CR = ASCII 13) and a line feed (LF = ASCII 10) on the display. As DEBUG only understand hexadecimal numbers, we must use 0Dh and 0Ah for CR and LF, respectively. Except in code, we will represent hexadecimal numbers by following the value with ‘h’.) Fortunately, DEBUG also admits ASCII characters directly (and the pseudo-instructions DB and DW!), so we can express our complete string as
db 0d,0a,"hello, world!",0d,0a,"$"
The “$” requires an explanation. In order to print our string in STDOUT, we are going to recur to the function 09h of INT 21h. This function prints all the characters in memory, beginning at DX and finishing when the “$” (24h) sign is encountered (i.e., “$” acts as the zero in C strings.) Finally, we invoke function 00h of INT 21h for terminating the program after the string is echoed. And that’s it.
- a 100
CS:0100 jmp 114 ; Jump over the 18 bytes of the string
CS:0102 db 0d,0a,"hello, world!",0d,0a,"$"
CS:0114 mov ah,9 ; Print function
CS:0116 mov dx,102
CS:0119 int 21
CS:011B mov ah, 0 ; Terminate the program
CS:011D int 21
CS:011F
-g =100
The Go command (g) will run the program starting at the given address (in this case, CS:0100) If everything goes right, the program should output the intended “hello, world!” string, and finish with the message “Program terminated normally.”
You may save this program to a folder in your hard drive (I’ll use c:\tmp). Simply input:
-n c:\tmp\hello.com
-rcx
CX 0000
:20
-w
-rcx
allows to change the content of the CX register. We store there the value “20” which is the size of our program (why? note that our program occupies exactly 32 bytes… “32” is “20” in hexadecimal.) Write (w) uses CX for knowing how much bytes it has to save.
Now, you can dump memory contents (d) and view your program in hexadecimal (or open hello.com with a hexadecimal viewer):
-d
CS:0100 EB 12 0D 0A 68 65 6C 6C 6F 2C 20 77 6F 72 6C 64
CS:0110 21 0D 0A 24 B4 09 BA 02 01 CD 21 B4 00 CD 21 0D
The hexadecimal patterns are very clear. For example, 68 65 6C 6C 6F 2C 20 77 6F 72 6C 64 21 is our “hello, world!” string. 0D and 0A are CR and LF, respectively. CD 21 is INT 21h. B4 09 is MOV AH,9. And so further.
Finally, remember that you may use Enter (e) to write your code directly into memory:
-e 100 EB 12 0D 0A 68 65 6C 6C 6F 2C 20 77 6F 72 6C 64
-e 110 21 0D 0A 24 B4 09 BA 02 01 CD 21 B4 00 CD 21 0D
-g =100
Pretty, uh? 🙂
Aaaghh… Why did you have to remember this to us???
I just barely approved that course! 🙂
Jejeje… what a nostalgic post!
Man, do you want to go back to those days?
I understand most of the hexadecimal string, except for the reason to translating “jmp 114” into EB12
what’s the logic behind such translation?
I think what NaruFan is asking is “how” (not “why”) jmp 114 translates into EB 12…
The “logic” is that Intel decided such translation. You may want to read the Intel Architecture Software Developer’s Manual (Vol. 2).
when are you going to write about assembly translation ???
how do we know when to use jmp 126 or 113 etc..
hahaha, nice explanation… the topic is a bit weird, though 😛
(Y)