Read previous part: OS X Anti-Forensics Techniques - How the Leopard Hides His Spots
Proceeding with presenting his research on Mac OS X anti-forensics techniques, the Grugq now delves into HFS+ file system attacks.
Now, specifically just to look at OS X, we’re going to look at HFS+, and then I also have the application file format attacks that kind of come under OS X here, but you can use them on anything.
So, I’m now going to give everyone a lightning-quick introduction to HFS+ file system. Here’s the introduction. HFS+ stands for the Hierarchical File System Plus; it came out with OS X; there’s some backwards compatibility stuff in there for OS-9 and earlier. It was very heavily influenced by HFS, which is just the Hierarchical File System, no Plus. The on-disk structure is actually quite complicated because they used B*trees; and B*trees are specific form of binary tree that, when it gets written out to disk, it’s a little bit annoying to deal with. If anyone is really interested in the technical specifications for this, the Apple Technical Note 1150 is actually very-very good, and I highly recommend it – makes great reading, you can take it on a train, have a nice cup of coffee… Maybe it’s just me, but they really write it quite well.
This is what the file system looks like. It’s mostly grey and white, with some black bits. You can see at the top, there’s the header, then there’re several special files, then there’s allocation space, there’s a copy of the header at the end – and that’s pretty much it.
The components that I mentioned earlier: we have the header – in this case the header is called the ‘volume header’; a block is called a ‘block’; and a node is actually made up of different parts (the block lists are made up of extents which can be stored in a couple of different places, and the metadata is stored in several different catalog file entries). The catalog file basically contains your file system structure, the whole directory structure that you’re familiar with. And then the map itself is the catalog file entries.
So, there’re sort of two core concepts that you need to get your head around (see right-hand image). First of all, there’re data forks; and the other one is B*trees. Data forks are pretty straightforward. Basically, a data fork just is a stream of data that’s stored in a specific location. So, what you have is the location information that says “Start here and reach this number of bytes”, and it’s got a series of those records, and then it’s also got overall size of the user data contained within that. So, what we’re looking at is this (see left-hand image). Your ForkData is basically this header at the top which has a logicalSize that’s the size of the internal data. There’s the totalBlocks, which is the total number of blocks that have been allocated to store all the data. And then there’s an array of these descriptors, which basically tell you the first block and then the number of blocks to read.
One of the areas that you should not store data is between the end of the logicalSize and the end of allocated blocks. That’s what’s usually called ‘slack space’. Every forensics investigators knows how to find data there. Basically, that’s the only place that they know how to look, so if you tell them that you’re using slack space, this is where they’re going to go – they’re going to look at all of the blocks that have been allocated and they’re going to find the end of your user data and that space at the end; they’re going to look at that and be really excited if they find something. So, don’t put your shit there, like the anti-forensics toolkit inside Metasploit – that’s where it puts all of its data. Don’t do that, it’s stupid.
There are a number of special files within the HFS+ file system. There’s the allocationFile, which basically stores bitmap to indicate which blocks within the file system have been allocated or not. There’s the catalogFile, which stores information on all the metadata about a file itself specifically. There’s the extentsFile, which stores fragmented file block information. There’s the startupFile, which is optional; it’s used on very old archaic systems, and it tells you where to find the kernel loader on a file system. And then there’s the attributesFile, which stores extended attribute information. We’ll come back to all of these when we start attacking them.
Okay, B*trees are used very-very heavily within the HFS+ file system. Basically, what a B*tree does is it’s a binary tree, where all of the leaf nodes are connected in a link list. The idea is that a file is divided up into nodes; each node has an address, you can point to it by indicating the address, it’s pretty straightforward stuff.
Nodes are basically one of four different types. You have header nodes which store metadata about the entire B*tree. You have map nodes which tell you about the allocation; they have a bitmap indicating the allocation of nodes within that file. You have index nodes which are basically a pairing of a key and a node pointer so that you can rapidly do you binary tree stuff. Everyone’s very comfortable with binary trees, right? And then finally, you have your leaf nodes which store a key and a data record.
So, once again, I have a very cool illustration which I did not copy directly out of the technical docs. This is what a node actually looks like, this is kind of important. The key thing to look at over there is the part where it says “Free space.” You can cross that out and write in “Reserved for hacker usage only.” The idea with the B*tree node here is that you’ve got a number of records inside of there, and any unallocated space that isn’t being occupied by a record is available as free space that can be used later on.
For everyone who wants to see what a B*tree looks like, it looks like this. It’s got a red bit, and orange bit, and some other colors. At the top, what you have is the header node which has a pointer to the first leaf node and the last leaf node. The ones in the middle are index nodes, so you would follow down through these index nodes to get your actual data in the leaf nodes; and the ones at the very bottom are leaf nodes containing the keys and the actual data.
Okay, so let’s break that. We’re going to look at specific attacks that we can do against HFS+, starting out with just file allocation attacks. As I mentioned, there are special files; you can do cool things with special files. One of them is there’s a special file called the ‘bad blocks file’. The bad blocks file exists to allocate blocks which contain bad sectors and prevent them from being used by user data. So you go to the extentsFile and you mark extents that hold bad blocks as belonging to the file number 5. So any extent that owed by catalog node ID5 is not flagged by the file system checker and it’s basically ignored by forensics tools, because, once again, the old bad blocks trick is the oldest one in the book, and it still works, particularly on the HFS+.
But this is way too lame for us, it’s very-very obvious, we’re not going to do it. So there’s also, as I mentioned, the startupFile – once again, the startupFile is used on very archaic systems. The idea is that if you have a bootloader and it needs to load the kernel, you don’t want to have to go and write the entire HFS+ file system into that bootloader. So you load the kernel, you put it in a specific location, and you write your bootloader to be able to read just this one file and then to be able to go there itself. That makes it a lot easier. No one actually uses this because f**ing ancient. So, what you have instead is a 0-length startup file which is ignored by everything. What can you do? Well, it doesn’t have to be 0-length, it can an arbitrary size. So you simply take the startup file, you allocate space for it, and you put your data in there. Done.
Similarly, you can do this with the allocation file. The allocation file has the space that’s been allocated for the bitmap, and the kernel will not actually look beyond the last element of that bitmap. But there’s nothing that tests that the size of the allocation file is exactly the right size to store that bitmap, so you can increase the size of the allocation file and put your data at the end there.
I mentioned that there was some backwards compatibility. One of the things that happens is older Mac OS 9 systems wouldn’t be able to access an HFS+ drive, so what they did is they put the HFS+ drive inside and HFS file system – that’s called an HFS wrapper. So you can actually have your HFS+ file system embedded inside an HFS file system – you get HFS file system inside HFS+. Everyone looks at the HFS+, but you actually have two file systems that you can do this stuff with. So you can actually use the HFS Wrapper for data storage yourself. So you can allocate additional space within the HFS Wrapper, particularly at the end of the bad blocks file. So the embedded HFS+ volume is marked as bad blocks within HFS. You can extend the size of the bad blocks file and then use the slack space at the end. Very straightforward stuff. This will work probably for the next few years, because even if you tell them what’s wrong with forensics tools, they don’t fix them. It’s not really part of their business to fix problems; it’s only to come out with new features and stuff.
Okay, those are attacks that we can do with allocation files. I haven’t implemented them yet because it’s actually really irritating to do allocation with all of the B*trees. I’ve spent most of my time implementing my B*tree code.
So, attacks that we can do against binary trees themselves – these are much more interesting, because an allocation attack, where you basically take a file that should have 0 length and you make it non-0, is very-very easy to detect. If you have a startup file that’s not 0-length, maybe there’s something in there that’s a bit weird and you can look into it. If you have a large number of bad blocks, maybe you want to look at those. These are not subtle. This stuff is a bit better.
So, one of the things that you can do is a B*tree has a special type of node called a ‘map node’, which, once again, contains a bitmap to indicate which node within that B*tree has been allocated.There’s nothing that says the number of map nodes has to be exactly the right number to hold a bitmap that fits the entire tree. You can extend the number of map; you can add new map nodes and use those maps for data storage. So you can just add map nodes at the end of the linked list, and they’re not going to be accessed by the kernel because they would indicate nodes that go beyond to the end of the file, so they’re outside the scope of that B*tree – nothing to worry about. This is very-very easy to implement, but for some reason my code doesn’t work, so this demo gets skipped.
As I mentioned, there’s free space inside nodes; you can actually do data storage inside that free space. Once again, my code doesn’t work, so we’re going to skip this demo. But what I can show you is how it mostly gets up into the point that it gets written to disk when it stops working. It drove me nuts, I spent about an hour on it. At the very-very lowest layer I have, I can see the buffers that are supposedly getting written to disk and they contain all the data, and then when I read it back in – the data is gone. So I’m going to lodge a complaint with Apple saying that there’s something wrong with their disk driver that erases my data hiding attack before it gets down to the disk. I’m pretty sure that’s the only possible explanation at this point.
Read next part: OS X Anti-Forensics Techniques 3 - Expanding the Attack Space