Executive Summary
In our TrustDefender Labs report last month, we looked at different samples of the notorious Zeus trojan that are based on the leaked source code. We focused on the changes the authors did to the encryption of the configuration files in order to circumvent automated detection tools and to stay undetected for longer. So the analysis was quite limited to a fairly narrow set of changes.
This time, we look at similar Zeus trojans from a different angle. We consider the entire set of changes and we try and estimate the complexity of those changes, which may give us an insight into the level of understanding shown by the respective authors when altering the Zeus source code.
Much to our surprise we were able to conclude that these changes were made either by the original Zeus author or by authors who know the source code very, very well and are specialists in developing malware.
In any case, the release of a series of versions, with quite substantial changes, shows that whoever is working on the code is quite capable of making extensive and sometimes fundamental changes in a short space of time.
We can therefore expect the pace of development to continue into the future, and even if the original author is no longer working on the code, we expect further versions to emerge at short intervals.
Methodology
Rather than looking in minute detail at every routine, which would be painstaking, and take too long to be viable, we look at the overall interactions between functions – which functions call other functions, and we then map that call pattern back to the reference copy of Zeus at the time of the source code release. This lets us see which new functions have been introduced, which have been deleted and which have been changed.
This is a much smaller problem, is possible to automate, and still gives us a good feel for the changes made. There are still some challenges – for instance, the way Zeus is built means that sometimes several functions are folded into one, and sometimes they are not. This is done by the Zeus author to evade detection from signature scanners. Also, some functions are interchangeable, so any of a set may be called. Lastly, compile-time defines can be used to include or remove functionality, possibly so that some functions can be sold at a premium.
However, we can compensate for all these.
Once this is done, we can then take a more detailed look at a smaller set of functions that stand out as a result of this analysis.
This approach does not pick up all changes – for instance, small changes in logic in functions do not affect the flow, and may not therefore be picked up. However, even with this limitation, the results are useful.
The IDA Pro program was used extensively in this analysis, as this has a great deal of functionality enabling the cross referencing of functions.
Benchmark Copy
We used a benchmark copy of Zeus to compare the other copies to. The copy was chosen to be very similar to the copies of Zeus created by compiling the source code. However, it is an in-the-wild copy – we did not create it by compiling the source code.
It is though not an exact map to the source code, and therefore we also took care to compensate for this in our analysis.
ICE IX
The copy of ICE IX we used had MD5 checksum of 62f770d7db6dd6825b793ec5c456d7e2 and a size of 99840 bytes.
When we analysed this copy we found it was generated from 58 source modules which contained 634 routines . All of the source modules were present in the original Zeus source. Of the 634 routines, only one of them could be said to be new code. All the others were from the original Zeus source code, and although there may have been minor logic changes to these, in essence, the code remains the same.
The one routine added was not complex, being a string manipulation routine to convert a 16 byte hash to an ASCII string.
We would have to class ICE IX as containing only the most basic of changes, which could be done with little understanding of how the Zeus code fits together.
In addition, though we did not verify this ourselves, it has been reported elsewhere that the author has also introduced a string handling bug which causes a memory leak.
Ironically, this sample turned out to be closer to the Zeus source code than the representative sample we had been previously using.
RC4 replaced with AES
The next copy of Zeus considered is the one where the RC4 algorithm was replaced by AES. The sample used had an MD5 hash of d67d38800d6463d3db835f64224654e6 and a size of 200192 bytes.
The sample contained 655 routines in 59 modules. There was one new module introduced, which contained 11 functions used to disguise kernel calls.
In total there were 23 new functions. The other new functions w ere added to existing modules. 3 were added to the CoreInject module to handle thread injection, and 9 to the Crypt module to replace the RC4 algorithm with AES.
This is a much bigger set of changes than with ICE IX.
However, there are still some interesting points.
As noted in our previous paper, there are some anomalies with the way the AES algorithm is called. AES is only ever used to encrypt 16 bytes at a time, never the entire block to be encrypted. This suggests that the author was unfamiliar with the AES code which is imported. The likelihood is that it was found somewhere on the internet and just copied in. The net effect is that the new encryption is technically weaker than the old.
The new module disguises the way that kernel calls are made. It does this by pushing Kernel module names onto the stack, and then searching the stack to match the names with hash values. This is nothing new – malware has been doing this for a long time; indeed, I remember using this as a technique to detect if a program is malicious several years ago. This therefore adds nothing to the security of the malware, and makes it more likely, not less likely that it will be detected by security programs. It also does not fit in with the ‘ethos’ of the rest of the malware. Everywhere else, if a string needs to be disguised, then the CryptedStrings routine is used. The disguise is also not used consistently. Only a few kernel calls are protected in this way, with no obvious reasons why these should be selected over other system calls. In short, this may be another piece of pre-existing code that the author has imported.
Lastly, the three thread handling routines are very similar to the old thread handling, but use the new module to disguise some of the system calls.
Overall, more knowledge of the Zeus architecture was needed to make these changes than for ICE IX, but they are still not substantial.
Registry Storage Version
The sample used had an MD5 hash of 8807fbdc494e946e25bfdad74cd756d9 and a size of 169984 bytes. It was obtained from the wild – the only one of the three samples where this was the case.
This version of Zeus changes the way that data is stored in the registry and as other researchers have noted, also uses a peer to peer network as its command and control infrastructure.
It contains 79 modules and 788 routines. It is therefore immediately apparent that the changes are on a whole different scale than the other two samples, with 21 new modules and around 241 new routines.
As well as the new functionality, some of the existing modules were almost completely rewritten, most notably the socket handling. Altogether, 110 old routines were retired, and 137 routines were altered.
There has also been some rationalising of code, with the hash routines and random number handling code moved out of the Crypt module into their own modules, and the PESETTINGS code moved from the Core module into its own module.
The new routines incorporate quite substantial changes. A brief summary follows.
The registry code has been altered to change the name registry subkey names are generated. Previously, only three keys were used, their names were randomly generated, and a note of the keys used was stored in encrypted form in the PESETTINGS area. Now the keys used are string representations of base 20 numbers. There is a complex algorithm, described in our previous paper, used to generate these numbers. This allows more data to be stored in the registry, which may have been needed as a result of the changes to the communications architecture.
The communications architecture has been changed to use a peer to peer network. Other researchers have noted that this appears to be based on the Kademlia protocol. This means that the botnet is more robust, and cannot be taken down so easily. This change will also require a change to the C&C code to also use the new peer to peer architecture.
The Mersenne Twister algorithm has been added to the random number code. This may have been copied from code available on the internet – a Google search found the exact same code. It is not clear that the author understands the purpose of the algorithm. It is intended to generate random number sequences. However, in the code it is just used to transform the random number generated by the original code into a different random number.
The crypt code has been changed to allow for the easier use of different encryption algorithms. Currently this is not actually used, so may be a preparatory architecture change for future versions.
Several new kernel calls are hooked
- nspr4.dll::PR_Poll
- wininet.dll::InternetSetOptionA
- wininet.dll::InternetSetStatusCallback
- wininet.dll::InternetSetStatusCallbackW
- ws2_32.dll::WSARecv
- ws2_32.dll::recv
The HTTP inject mechanism has support for regular expression matching added. This also means that the way that configuration data controlling this activity is stored is changed. Previously this was stored in the configuration area using an ID of 20007. The new data is stored in a different format using ID 30003. Items are separated using the ‘magic’ data ‘ERCP’. This may stand (when little endian reversed) for Perl Compatible Regular Expression. Items with ID of 30004 and 30005 were also added. These changes will also need a change to the Zeus configuration builder code, so that the new formats of configuration data can be packaged.
Code has been added to rebuild the import table, and also to adjust it after any kernel hooks are removed. Code has also been added to adjust any relocation entries during the routine decryption process, which will otherwise stop the decryption failing if the program is loaded anywhere other than 0×400000.
There is some special code to handle two security urls and set a cookie:
- https://*/ebc_ebc1961/ebc1961.asp*=RemoteLogon*
- https://securentrycorp.*/Authentication/zbf/index?*domainId=*
The overall impression is that there are a large number of changes that have been made to quite different areas of the program, and that the person doing this was very familiar with the architecture of Zeus. The changes made are sympathetic to the way Zeus ‘does things’.
Other copies of Zeus
During the investigation, we also noticed that several routines in this copy of Zeus were also present in earlier copies we got from the wild. This suggested some other lines of research. It’s possible to get some idea of the evolution of malware by looking at the changes from version to version.
If we see two copies with the same changes, then this will not have occurred by chance. The most likely scenarios are that either the same author has produced the two versions, or that two groups of malware writers are sharing source code with each other.
We were able to find four in-the-wild copies of Zeus which showed a distinct evolution. The first copy was from June 2011. The next copy was the copy under analysis, which was from September 2011. Two further copies were from October 2011.
The June 2011 copy contained 700 routines. It changed the way the HTTP inject mechanism happens, and the way that configuration data controlling this activity was stored, introducing ID 30003 and others. The way the botnet C&C was contacted changed, with alternative URLS being generated and tried on a daily basis if the original C&C becomes unavailable.
The September 2001 copy contained 788 routines. It added base 20 registry key naming, peer-to-peer networking.
The two October 2011 copies are still awaiting full analysis. However, they added a third routine to the collection of encrypted subroutines . This requires corresponding changes to the builder program that creates Zeus, and is not a trivial change. They contained 833 and 823 routines.
Regular Expression Handling
The fact that the copies all contain regular expression handling is very interesting. The version of source code released was 2.0.8.9. However, regular expression handling was only added to Zeus in version 2.1, which was released around October 2010 .
We therefore went back again in our archives and located a copy of the 2.1 Zeus release to compare the regular expression handling. It was effectively identical.
This means that someone out there has the 2.1 source of Zeus, which was never publically released, and is still actively working on it.
Is the original Author Still Active?
From the changes we see that there is intense ongoing development with this branch of Zeus.
We now stray into the realms of conjecture. Someone has access to the source code for the 2.1 version of Zeus which was never publically released. This person making these changes is obviously very familiar with the Zeus architecture and has made complicated changes. The replacement of the communications architecture with the peer to peer architecture is a non-trivial change and requires back end changes to the C&C as well. The addition of a third encrypted subroutine requires changes to the build process. The addition of a new format for the configuration inject data requires changes to the configuration packaging code. These indicate the author has an understanding of the whole Zeus package, not just the malware client itself.
The way the changes are made is sympathetic to Zeus. The modules added contain small numbers of routines and are logically organised. The CryptedStrings routines are used to hide interesting strings from casual view. Often when a new person takes over maintenance of code, they start doing things in their own preferred way which is different to the original author. However, this does not appear to be happening here.
The copies are in the wild, suggesting a buyer or buyers are readily available.
The Zeus source code was released around the weekend of 7th May . The first version of this new branch of Zeus was detected at the beginning of June 2011 – less than one month later. This is a very short space of time for someone new to Zeus to read and understand the code, and then to create such an extensive derivative work.
We have to wonder therefore, if this is actually still the work of the original author. He announced his retirement, and released the source code. However, this may have just been a smokescreen. Certainly, whoever is continuing to maintain the Zeus code is very familiar with the whole Zeus environment, not just the malware itself. It could also be someone who has worked closely with the original author in the past, or who the author has trained up to take his place.
Are there any signs that this might not be the original author? Well, possibly. In the June 2011 version, the new configuration data handling introduced two memory handling subroutines that were placed in their own module, and not in the standard memory handling module. The fragmenting of routines into different modules is also arguably finer-grained than with the original code.
As with all such guesses, only time will tell. All we can do now is sit back and wait for further information.
Conclusion
The first two copies analysed have fairly trivial changes which are (in my opinion) not the work of the original author.
The third copy is far more interesting. There is a case to be made from the availability of 2.1 source code, the complexity of changes, and the short period between the release of the source code and the emergence of versions of this variant, that the original Zeus author is still at work.
This is still conjecture at this point, and it may also be due to other authors familiar with coding this type of malware.
In any case, the release of a series of versions, with quite substantial changes, shows that whoever is working on the code is quite capable of making extensive and sometimes fundamental changes in a short space of time.
We can therefore expect the pace of development to continue into the future, and even if the original author is no longer working on the code, we expect further versions to emerge at short intervals.
Applications used
The applications used to research this document were:
- IDA PRO 6.2
- ActiveState Perl
- 010 Hex Editor
- Editplus text editor
A series of IDA scripts and perl programs were used to auto-analyse each Zeus sample and create a cross-reference listing of routines. A further series of perl programs were then used to analyse and compare the cross reference listings.