Tools used for our small playground.
Why do we care about entropy for malware analysis?
It allows us to quickly discover whether the executable file (.exe; .dll; etc..) is a packet or encrypted.
Is it important to us?
Yes. Many legitimate programs (a lot off a bunch of it) are not packet or encrypted. Based on this assumption, we can quickly scan using EDR tools to preselect those files which high entropy as the first indicator of malwareness. I want to stress at this point that high entropy level doesn't automatically mean that we are dealing with malware.
Why does someone want to encrypt or pack a program?
For legitimate purposes, to conceal intellectual property, for malicious purposes of putting malware analyst in a time-consuming and challenging process of reverse engineering and understand how malware works. We will see a massive difference between those two approaches later in the article.
We can find a definition of entropy on Wikipedia https://en.wikipedia.org/wiki/Entropy_(computing), but for someone who is not familiar with its practical usage, this explanation could be little bizarre.
I believe the simplest explanation of entropy is that it measures randomness a character distribution for a particular data – text, file etc.
If entropy level is as much higher as somewhere around 6.8+, that is a pretty good indicator of a high level of randomness.
Lyda, R.; Hamrock, J., "Using Entropy Analysis to Find Encrypted and Packed Malware," Security & Privacy, IEEE , vol.5, no.2, pp.40,45, March-April 2007
Does it help to solve the hunting problem? Yes! Yes!
If you have at your disposal enterprise EDR security solution like Fidelis Endpoint, FireEye HX, RSA ECAT you can filter out for 6.8+ entropy executable files.
Start hunting by quickly narrow a large amount of executable files and look for potential malware.
It would be nice to have at hand a hash database of good MD5/SHA256 to get rid of good ones quickly and left alone with potential bad ones.
For a purpose of demonstration, I use cmd.exe from c:\windows\system32\
Start with PESTUDIO and DiE to quickly show the entropy level of standard none packed/encrypted typical OS command.
As we can see, both programs are giving us pretty much consistent value which is related to Native Executable value (table above).
Internal section values also look nice, showing no encryption.
Without encrypted/packed data ( low entropy value), a reverse engineer, threat hunter or anyone curious enough can quickly look into imports section, strings and figure out what is the real purpose of the following program.
We can easily read function names from which library they came from and what is its purpose.
For example by googling "NTOpenFile win32 API" we will end up in Microsoft Win32 API documentation and study exact function purpose: NTOpenFile
Of course, this is just an example, but by googling other function, we can understand the functionality behind the EXE file.
Making import/strings entirely unreadable for an analyst, it is the sole purpose of packing/encrypting malware.
The same file packed with UPX packer:upx -1 -o cmd.ent.exe cmd.exe
Running upx.exe without any parameter will give nice help; thus, I will not explain the above command.
It was a little surprising to me when 11/71 AV engines marked packed cmd.exe by UPX as a malicious file.
I will look into this in another article 😊 although I guess that some specific functions imported into the executable part are triggering this alert.
Expect, the packed program has higher entropy, thus giving us an IOC of possible malicious intentions.
If we look into function calls ................
We get only a few of them. Thus we have no idea what is the potential functionality of the program.
Those functions are very characteristic and well known to be responsible for dynamic functions load and unpack the executable file.