Architecture 1001: x86-64 Assembly Notes
This is just a page of notes and important things to me to remember while learning Assembly
Constants, Registers, Memory
“12” means decimal 12; “0xF0
” is hex. “some_function” is the address of the first instruction of the function. Memory access (use register as pointer): “[rax]
”. Same as C “*rax
”.
Memory access with offset (use register + offset as pointer): “[rax+4]
”. Same as C “*(rax+4)
”.
Memory access with scaled index (register + another register * scale): “[rax+rbx*4]
”. Same as C “*(rax+rbx*4)
”.
Endianness
Little Endian Example:
0x12345678
= 0x78, 0x56, 0x34, 0x12
Big Endian Example:
0x12345678
= 0x12, 0x34, 0x56, 0x78
Network traffic is sent in Big Endian
- Endianness applies to memory, not registers!
- Endianness applies to bytes, not bits!
Memory Hierarchy
Top of Hierarchy is small storange, quick access, volatile.
- Registers, Cache, RAM
- Small size
- Small capacity
- Fast/very fast to access
Lower part of Hierarchy is large, slow, non volatile storage.
- Flash/USB memory, Harddrives, Tape drives
- Slow/very slow to access
- Large/Very large capacity
Architecture Registers
- Small memory storage areas built into the processor (still volatile)
- Intel has 16 “general purpose” registers + the instruction pointer
- On x86-32, registers are 32 bits wide
- on x86-64, registers are 64 bits wide
Register Evolution: AL, AH, AX, EAX, RAX.
Register Conventions
- RAX - Stores function return values
- RBX - Base pointer to the data section
- RCX - Counter for string and loop operations
- RDX - I/O pointer
- RSI - Source Index pointer for string operations
- RDI - Destination Index pointer for string operations
- RSP - Stack (top) Pointer
- RBP - Stack frame Base Pointer
- RIP - Pointer to next instruction to execute (Instruction Pointer)
These all start with E instead of R in x32 programs
Segment Registers
- SS, Stack Segment, Pointer to the stack
- CS, Code Segment, Pointer to the code
- DS, Data Segment, Pointer to the data
- ES, Extra Segment, Pointer to extra data
- FS, F Segment, Pointer to even more extra data
- GD, G Segment, Pointer to EVEN MORE extra extra data
Registers are also called general purpose registers and their capacity is 32 bits: 4 bytes (4 sets of 8 bits).
The Program Status and Control Register is EFLAGS, which is a collection of 1-bit flags.
The Flags aint important, apart from the Trap Flag which basically allows debuggers to single-step through instructions.
My First Instruction: NOP
- No-Operation! No registers, no values.
- Just there to pad/align bytes, or to delay time.
- Attackers use it to make simple exploits more reliable.
xchg eax, eax
is an alias of NOP, because exchanging two of the same registers does nothing.
The Stack
- Last In First Out (LIFO) data structure where data is “pushed” onto the top and “popped” off of the top
- Conceptual area of memory (RAM) which is designated by the OS when a program is started
- The Stack grows toward lower memory addresses
- Adding something to the stack means the top of the stack is now at a lower memory address
- Up is down.
- RSP points to the top of the stack - the lowest address which is being used
What can be found on the stack?
- Return addresses so a called function can return back to the function that called it
- Local variables
- Sometimes used to pass arguments between functions
- Save space for registers so functions can share registers without smashing the value for eachother
- Save space for registers when the compiler has to juggle too many in a function
- Dynamically allocated memory via
alloca()
Push & Pop Instructions
PUSH
- Places (pushes) an operand onto the top of stack
- Automatically decrements the stack pointer RSP by 8 (ESP by 4)
r/mX
r/m8, r/m16, r/m32, r/m64
- It is a way to specify a register or memory value.
- Square brackets meant to treat the value within as a memory address and to fetch the value at that address
r/mX can take 4 forms:
- Register ->
rbx
- Memory, base-only ->
[rbx]
- Memory, base + index * scale ->
[rbx+rcx*X]
- For X = 1, 2, 4 or 8
- Memory, base + index * scale + displacement ->
[rbx+rcx*X+Y]
- For Y of 1 byte (0-2^8) or 4 bytes (0-2^32)
Basically a complicated way of addressing memory locations
r/mX could be a single register like rbx
or it could be a complicated memory address calculation like [rbx+rcx*X+Y]
r/mX Examples:
A scenario: push RAX
What happened?
RSP
value decremented by 8 because the stack pointer(RSP)
is automatically decremented by 8 after being pushed onto the stackRSP
WAS at memory address0x014FE08
. AFTERWARDS it is at0x014FE00
. Again because the stack pointer is automatically decremented by 8 after being pushed.- Also, the value of memory address
0x014FE00
is now3
, because theRAX
value was3
. Previously, it wasundefined
- In the end,
0x014FE00
is the new stack pointer, and its value is still3
. (it was always going to be 3 anyways, because we are pushing a value onto the stack from the predefined registerRAX
)
POP
- Pop a value from the stack
Opposite scenario: pop RAX
What happened?
RAX
had some random value before the POP, in this case it is5007
RSP
was pointing at our previous memory address:0x014FE00
with the value still being3
- After execution, the
3
value from the stack is popped off of the stack and into theRAX
register, overwriting the5007
RSP
register is incremented by 8 as a side effect automatically after being popped off of the stack- Memory address
0x014FE00
is now known asundefined
RSP
is now at memory address0x014FE08
with a memory address value of2
- The value
3
is now stored in theRAX
register - Data does still exist past the end of the stack.
32-bit Information
- Executing in 32-bit mode, push/pop will add/remove values 32 bits at a time rather than 64 bits, therefore they decrement/increment the
RSP
register by 4 instead of 8. - Likewise with 16-bit mode, push/pop 16 bits at a time, and decrement/increment by 2.
CALL
- Transfer control to a different function
- First it pushes the address of the next instruction onto the stack (for use by return address for when the procedure is done)
- Then changes
RIP
to the address given in the instruction - Destination address for the target function can be specified in multiple ways
-
- Absolute address
-
- Relative address
RET - Return from Procedure
- Two forms:
- Pop the top of the stack into
RIP
(remember that pop implicitly increments the stack pointer,RSP
) -
- ^^^ In this form, the instruction is just written as
ret
- ^^^ In this form, the instruction is just written as
- Pop the top of the stack into
RIP
and also add a constant number of bytes toRSP
-
- ^^^ In this form, the instruction is written as
ret 0x8
, orret 0x20
, etc etc.
- ^^^ In this form, the instruction is written as
How to read two-operand instructions: Intel vs AT&T Syntax
MOV (aka Move)
Can move:
- Register to Register
- Memory to register, register to memory
- Immediate to register, immidate to memory
- NEVER memory to memory!
- Memory addresses are given in
r/mX
form
See examples below:
ADD and SUB
- Adds or Subtracts, just as expected
- Destination operant can be
r/mX
or register - Source operand can be
r/mX
or register or immediate - No source and desination as
r/mX
, because that could allow for memory to memory transfer which isn’t allowed on x86
Examples:
add rsp, 8
-> (rsp = rsp + 8)
sub rax, [rbx*2]
-> (rax = rax - memorypointedtoby(rbx * 2))
Writing in Visual Studio
Writing some simple subroutine call code:
int func() {
return 0xbeef;
}
int main() {
func();
return 0xf00d;
}
Debugging the code results in the RSP
register being changed after hitting the main()
breakpoint and stepping into the function:
Before
After