The Grugq, a well-known anti-forensics researcher, outlines the key issues related to counter-forensics for the OS X platform during HIRBSecConf event.
I was going to do this whole thing on anti-forensics, but none of my code actually works, and so, what I decided to do was to do what I know best. So this is going to be called “Panduan Mencadi Gadis Malam”. There’s three people laughing; for everyone else that means “The Guide to Finding Women of the Night” in Malay.
“Gadis malam,” the women of the night… “Mana? Berapa lama? Berapa banyak? Sangat mahal!” Okay, for everyone else – I just said: “Where? How old? How much? Incredibly expensive!”
So, on to the actual presentation; it’s called “How the Leopard Hides His Spots” because I assumed everyone here from Malaysia is familiar with, you know, classical colonial literature by Rudyard Kipling, right? The idea is that this is a talk on anti-forensics specifically for the OS X platform. As it happened, I spent quite a lot of time working on application-level file format attacks, so a lot of this is actually cross-platform. We’ll spend some time looking at HFS-specific attacks, but also things we can do against, for example, SQLite.
Briefly, an introduction of myself. My name is the Grugq. I’ve been an anti-forensics researcher since 1999, which basically makes me the longest-running anti-forensics in the field of about two people – I’m number one! I work as an independent security researcher, which is the nice term for “hacker”. And I live in Thailand, so I’m quite close to here.
A lot of people ask me why I do anti-forensics. Actually, the specific example I have for this is one I gave in an earlier version of this talk – my old one on Unix – to a police conference. I had this very-very big, like, two-meter tall, shaved-head guy with his name on his knuckles come up and say: “I’ve got one question”. I said: “Yes?” And he goes: “Why are you doing this?!” Police don’t like it when you make their job difficult, they really hate it. And the problem that I see is that forensics is actually an integral part of the information security lifecycle – from penetration testing, intrusion prevention, intrusion detection, incident response; forensics is all part of the security lifecycle, and as a result, it should be exposed to the same sorts of research as other parts of the security lifecycle such as finding bugs and other pieces of software.
So, unless someone is doing this, forensics is never actually going to catch up, it’s going to continue to remain vulnerable and insecure. The other thing is that it’s still very much “green field” research, like, if you’re looking at doing buffer overflows on Windows, you may as well just spend your time reading everyone else’s papers and following their footsteps because it’s going to be really hard to do something new. But if you’re doing anti-forensics you basically just pick a platform, and anything you do is the first stuff that’s ever been done on it. It’s really easy. It’s great. It’s like kicking someone when they’re down.
Okay, now you get a one slide introduction to digital forensic analysis. The first thing to know is that forensics only exists within the context of an investigation, so anything that’s done to just look around doesn’t count as forensics. Again, forensics only exists within the context of an investigation. You’re responding to an incident: you’re investigating that incident and then drawing some sort of conclusion.
There’re only about three things you need to do: you need to preserve the data in the original state; you need to extract all the evidentiary data possible from that snapshot of the original state; and then you need to present the evidence. We’re going to be looking at all of the things that happen at Step 2, where you extract the evidentiary data from the snapshot. So let’s look at that as anti-forensics.
Anti-forensics gets a two-slide introduction because it’s a lot more complicated. Fact number 1 – data is evidence. Everything that you can possibly do on a system leaves some sort of trace behind. Everything that you do in some way affects or creates data that needs to be removed or hidden. So your goal when doing anti-forensics is to reduce the quantity and quality of evidentiary data. And there’re several different ways of doing that; there’re basically three types of strategies that you can employ.
First off, we have data destruction: they can’t find what isn’t there. So, if you destroy the data, there’s nothing to be recovered and used against you in a court of law. What this comes down to is removing evidentiary data ‘ex post facto’, which means ‘after the fact’ (I looked it up on Google). The idea is that after you’ve done something that you’re not very proud of, you get rid of the evidence completely by destroying absolutely everything. It’s very difficult to do this in a subtle way, like, it’s very difficult to only remove part of a file system.
Part of the reason of that is modern systems scatter data absolutely everywhere. When you download a file there’s a cache that might update the registry on a Windows machine; it will be written to disk in multiple different locations that will create directory entries – all sorts of things happen all over the place, and finding and tracking down each of these instances and reverting them to a state prior to, you know, the bad thing – that’s actually incredibly difficult to do. So, generally speaking, the scorched earth policy is the way to go. What that means is – just burn it all! Don’t try and get rid of only the dirty pictures you downloaded; just destroy the hard disk. Get a hammer out, go at it – that’s the way to do it.
Proper data destruction is very difficult. Fortunately, in the real world you don’t actually have to do proper data destruction, you need to do just enough to make a forensics investigation difficult. So you don’t need to destroy all evidence, you just need to make the evidence useless in the court of law – that’s generally sufficient. But we’re hackers, we’re far cooler than that, so no data destruction is for us. That’s strategy 1.
Strategy 2 – data hiding: if they can’t see it, they can’t find it. So we’re going to stick all the stuff somewhere they can’t see it, where the sun don’t shine. The idea is to store data outside of the scope of the investigators’ tools. So, if your data is stored beyond their visibility range – it’s secure. The problem is it’s only secure for a short period of time. Basically, you’re dealing with bug death: when you’re exploiting a bug you don’t have an infinite period of time before that bug dies. That’s just like everything else in the Internet.
The good thing is forensics is a very slow-moving field. When I first did a release of a bug back in 2002 or so, it took the Sleuth Kit nine months to fix what was essentially a one-line patch. So bug death is a very-very slow process, you don’t really need to worry about it that much. It’s not like hours on the Internet; it’s possibly years. Because you need to worry about bug death, don’t forget to encrypt.
So, there we go – data destruction, data hiding. Next one – data contraception: they can’t find what was never there in the first place. So the trick with this is you never touch the disk. The idea is – if you don’t create evidentiary data, you don’t have to remove it and you don’t have to hide it. So you stay in RAM; you avoid accessing things that will lead back to you; you avoid using tools which are custom-made; you use tools which are generic so that it’s very hard to create a profile, so it’s very hard to trace you down to a specific individual.
The problem with this is it’s a real pain in the ass to do, particularly if you’re drunk. So you really have to plan and be prepared. You can’t just go out and start hacking whatever you feel like. You have to stick to “Okay, today I’m going to be going after this, here’s the plan.” You script everything out beforehand, you get your tools ready, and then you execute. Most people don’t really do that. I wrote a tool to make it a lot easier by allowing you to script interactions with the command line. That tools is called “Hash”, you can download it from www.tacvoip.com. But generally speaking, doing proper data contraception is very annoying.
So, those are the three strategies that you can employ. Now we’re going to be looking at tactics, ways of how you can actually implement them. We’re only actually going to be looking at data hiding attacks because they are far sexier than the other ones, like data destruction is easy – you just destroy everything; data contraception is hard and annoying, and I did it last year. And so, let’s do something that involves reading the source code and finding bugs and writing code.
When we are looking at a good data hiding attack, there’re only, really, two things we need to worry about. First of all, it has to be hidden. If it’s not hidden, it’s not a data hiding attack. You’d be surprised how much this foils people. I remember working with a guy who told me that his clever technique for hiding systems on Unix was to put a dot in front of the name. It actually worked – that’s the sad thing. When we do pen tests and we find rootkits, we find that they usually don’t even bother with the dot; they just pick a home directory for someone who hadn’t logged in for a year, and stuff all their stuff in there, no one cares. But we’re not that blatant, we want to do something clever. We’re going to go for a hidden attack.
The other thing is we want it to be robust, which means that when we put our data in there, when we come back to pull it out, it shouldn’t have changed to be destroyed or lost. So we don’t want our data to vanish. If we wanted our data to vanish, we would use a data destruction attack.
What we’re going to be looking at specifically is ways of exploiting structured storage bugs. Structured storage is pretty fundamental to computers. Everyone’s intimately familiar with all sorts of structured storage approaches, yes? I can just skip the next section. Okay, what the hell!
Okay, the first structured storage we’re going to look at is file systems. File systems are basically a way of pairing user data with names. That’s all they do. The way that they implement it is a bit more complicated. The idea is you take a discreet stream of data and you associate a human-readable name with it. And then you can create a path with a number of these names and you get to that specific piece of data. That’s all it does. The way that it implements it on disk is what we’re going to be spending most of our time looking at.
There’re two types of data within the file system: there’s data which we completely ignore because data belongs to the user and it changes all the time; and there’s metadata which is used to organize a file system, to organize the internal structure of that structured storage medium – and that’s what we’re going to be looking at.
File systems basically provide an operation system level of CRUD, which is “create, read, update, delete,” obviously. Files have basically four critical components. All sorts of files use different names for them, but file systems break down into only four types of objects. There’s a header, which tells you the global layout and properties of that file system – that has to be at a fixed location so you can find it. Anything else can be created dynamically, but you have to be able to find the header. Then you have a block, which is just a discreet chunk of data; it’s going to be the lowest atomic addressable component of that file system. Then you have a node. A node is a collection of metadata for a single file, plus one or more block streams. A block stream is basically an ordered series of blocks containing data. So you’ve got your block list, you’ve got your node, and then you have a map. What a map does is it takes a name and it pairs it with a node.
To make this concrete, I’ll give everyone examples from NTFS, because you’re all intimately familiar with the internal structure of the NTFS file system. Right? Well, the blank stare of affirmation…Okay, so the header on the NTFS system is called the ‘boot block’, for historical reasons. A block is called a ‘cluster’; there’re usually four letters that comes afterwards, but I’ll leave those out because there’re ladies in the audience. Basically, the idea of a cluster is that there’re multiple sectors put together as a cluster of sectors; it makes it easier to read large portions of data off the disk. A node is actually an entity within what’s called a ‘Master File Table’. The Master File Table is itself a file that has entries for itself. And then there’re maps, which will be directory files as you know them, but they’re actually also entries within the Master File Table, which makes it a lot clearer.
Okay, this one might work a bit better – FAT. Everyone’s familiar with the FAT file system, it’s used on mobile phones, it’s everywhere, the file system that will not die. It’s older than I am; it’s from around the early 70s. The idea is that there’s this boot block which describes the entire file system, and that’s the first sector of the disk. A block is, once again, called a ‘cluster’. A node is a directory entry; so inside a directory file there’s actually additional metadata associated with that filename. And then there’s a FAT chain, which is, within the FAT table itself there’s a chain of blocks which create the block list for user data. And then finally, your map is itself a directory file. So, now everyone is very-very comfortable with the idea of directory, block, node, map, etc. I’ve given you a couple of examples that you’re all drawing on your vast resources of low-level disk hacking. Okay, never mind…
So, let’s talk about how one goes about doing a data hiding attack against structured storage. It’s very simple. Basically, you need to allocate space where you’re going to put your data. The way to do that is basically to exploit bugs in the code that interprets and uses that structured data. There’re roughly three types of bugs that we’re going after. There’re bugs in parsing that might incorrectly read the structured data. There’re bugs in interpretation – might incorrectly interpret the structured data on the disk. And also there are bugs in presentation – when it takes that data and presents it to the user in a forensics tool, it might inaccurately show of what they’re looking at.
Once you’ve allocated that space, you need to inject data into it. How do you do that? Very simple – FISTing. FISTing is the File System Insertion and Subversion Technique. It’s a very generic technique for exploiting structured data storage. Basically, all you need to do to FIST is you find a hole and you FIST it. Right now you’re probably thinking: “What holes can I FIST?”
Well, first you need to find a FIST sized hole. Generally speaking, these things that you’re looking for are special files within the file system, so they’re files that have implicit readings that are dealt with implicitly rather than explicitly. You look for slack space; usually there’re ways to allocate slack space within metadata structures. And the best part is slack space is inaccessible from userland, and it’s actually frequently inaccessible from forensics tools as well. And finally, we have reserved portions of metadata structure. And the thing to remember is reserved means reserved for hacker use only. So, anytime you see ‘reserved’ in a metadata structure, you can put your data there. Go ahead and FIST it, no one is going to notice.
Forensics tool bugs that we care about are, generally speaking, related to incomplete or ignorant implementations. So the guys who put together the file system parsing code for forensics tools are, generally speaking, idiots. They don’t read the specs properly and they implement just enough to access the specified data. They don’t go the next step. So there’re usually underused, or atypical usages of structured storage features that they ignore. There’re also logic bugs, so there’re ways that you can access edge cases in certain types of structured storage, and you can exploit those edge cases for data storage. And finally, there’re straightforward security bugs like buffer overflows, integer wraps, that sort of stuff.
I don’t recommend going for a security bug because it’s very difficult to anticipate ahead of time what sort of forensics tool is going to be used against the disk image. So, if you go out of your way to put some really clever EnCase attack in there, and the guy uses Sleuth Kit and FTK – then you’re screwed. So you’re better off just playing it subtle, just avoid making waves and upsetting people.
So, FISTing for all. The great thing is you can basically FIST any structured data storage at all. We are going to look first of all at FISTing file systems, and we are going to look at application file formats.
Read next part: OS X Anti-Forensics Techniques 2 - Assaulting OS X