As mentioned earlier, I did some work on detecting IP-over-DNS traffic as a part of my masters degree from NTNU in Communication Technology with focus on information security.
My final method was to, and this may be cheating, look at a pcap or live dump of packets and group per domain the client requested DNS answers from. In essence, utilize that all public IP-over-DNS implementations use a single domain name. The cheating part is because there is no reason for a IP-over-DNS implementation to do this.. I figure the less ethical of IP-over-DNS users can patch this easily.
After collecting n replies from the recursive DNS server to the client and compute the following metrics:
- percentage of replies to the same domain (domain_max_percent)
- bytes per second over the time period it took for n replies to be seen. (bps)
- average number of queries seen per second in the time period. (qps)
- mean packet size in the time period (mps)
The values I got the best results with were:
- mps > 140 bytes
- qps > 2.27
- bps > 560 bytes
- domain_max_percent >= 98%
- n=70 packets
For each sample consisting of n packets from a single client, compute these. If any of the rules above are false, the client is not an IP-over-DNS client. (by this definition)
Early attempts used n=30 packets and mps > 240 bytes, but detection avoidance attempts with extremely low fragment size showed that n=70 packets and mps > 140 byte gave the best results.
Critics may point out that a client may send a lot of fake requests in addition to the IP-over-DNS traffic to a different domain and then skew the max_domain_percent below 98%. Yes, this is possible.
I have Python implementations of all of the methods attempted. I guess realtime detection is the most interesting, and I will put it up on my github account sometime in the near future after cleaning up the code a bit.
In the time since I did my prestudies last spring there seem to have been published a paper on IP-over-DNS detection. It’s on Arxiv, and they use the character distribution in the query strings to match IP-over-DNS clients. This sounds cool, and way better than the Kolmogorovcomplexity attempts that I did.