Introduction - If you have any usage issues, please Google them yourself
The Normalized Compression Distance (NCD) has been used in
a number of domains to compare objects with varying feature
types. This exibility comes from the use of general purpose compression algorithms as the means of computing distances between
byte sequences. Such exibility makes NCD particularly attractive
for cases where the right features to use are not obvious, such as
malware classication. However, NCD can be computationally demanding, thereby restricting the scale at which it can be applied.
We introduce an alternative metric also inspired by compression,
the Lempel-Ziv Jaccard Distance (LZJD). We show that this new
distance has desirable theoretical properties, as well as comparable
or superior performance for malware classication, while being
easy to implement and orders of magnitude faster in practice
Packet : alternative-ncd-lzjd.zip filelist
alternative-ncd-lzjd.pdf