I wrote my master’s thesis at the Norwegian University of Science and Technology (NTNU) during the spring of 2010. My assignment was “Covert channels in the Domain Name System”, but for the most part it is about detecting an IP-over-DNS client from observing its queries. It is published in Norwegian, and available over email if you’re interested.
To put it in Wikipedia terms, I believe this blog now contains original research. Crazy.
The setup was to look at packet dumps from a recursive DNS cache, or more precisely, the answers from the local DNS cache to the client. PCAPs were taken at one of the DNS-recursors at NTNU, and also packet dumps of ssh, web and keepalive traffic for nstx, iodine and TUNS were taken.
I was able to within 10-30 seconds detect a client running iodine among university background traffic with no false positives. It was implemented in python with impacket, and works in real time with negligible resource usage.
First of all, I was not able to get the following detection mechanisms to work very well:
- client’s bandwidth usage per time unit. May work if you whitelist your email servers and use 30-60 seconds detection time, but not a very promising method.
- Kolmogorov-complexity. Zip the data parts of the DNS response. The idea was that the complexity (==lower ability to compress it) was higher on IP-over-DNS traffic than on the usual DNS traffic. Didn’t quite work out, but may be feasible with more work.
What I didn’t try, but seemed cool:
- autocorrelation of time between queries. It should be a quite different, since the client is polling constantly.
- time series analysis with wavelets. Complicated math.
I also believe that most people interested in machine learning will find this a very simple task. I did not look into it, but probably should have.
Top tip for people (without training, like me) is to never never never attempt to use uniform sampling for this stuff. I wasted probably a month on figuring this out. You already have a perfect event based sample set, use it for what it is worth.
So, since this blog post is pretty long already, I’m going to save the good stuff for a followup post later this weekend.