Decrypting Synology Patchfiles
NOTE: If you're already a seasoned reverser and don't care about the steps taken and would rather just look at the final code you can teleport to the repository :)
One faithful day not too long ago I decided to update my Synology NAS, an action that was way overdue. For reasons I always poke around in the patch a little, this time however I was greeted by a message extracting the TAR archive:
Only one thing left to do at this point of course; spending my dear time figuring out why TAR is telling me lies. Just kidding, TAR was right, but why...
If not a TAR, what is it?
The first thing I do in these kind of situations is grabbing xxd
and checking if it's not just some garbage prepended to the file which confuses tar
:
Okay, apart from that adbeef
and what looks to be some bytes which could mean something followed immediately (spoiler alert: it does) after, I can't make out what I am looking at just by staring at this hexdump...
Apparently I had been living under a rock and Synology actually added this quite some time ago to their patch files for DSM. I set out to Google some more about these .pat
files and quickly found out that more people ran into this too, they even solved this in quite an elegant manner! One such solution is the Synology_Archive_Extractor. Another one is making use of a Docker container that holds the binary and the shared objects it used to extract these kind of files on the Synology itself. Both worked flawlessly for decrypting the archive and extracting its contents, that's it then right? This is where we just happily decrypt the file and look at the contents like we normally would?
No, of course not. At this point I actually was a little bit annoyed that I didn't know how the encryption works, next to that it seemed like good target practice for me.
Dynamic Analysis is lame!
It's not, but I am lazy, and gdb
is daunting to me (we will come back to this later).
I kept telling myself that dynamic analysis would be too much of a hassle for me because I am not running an x86-64 machine myself and would thus have to resort to using a VPS, and nowadays we have a free IDA decompiler! How hard can static analysis be for this?
I used the files from the Docker container as it saves me the time figuring out which are the right files and we can simply use these files for our static analysis as well. By checking which binary is used as the entrypoint of the Docker container we can start by looking at the scemd
file (syno_extract_system_patch
command) in IDA and figure out the steps we would need to take to decrypt this badboy ourselves.
Somewhere in the humongous main
function we can see the calls made when we use the syno_extract_system_patch
command:
So far it's pretty straight forward, we follow some more calls and look at what they're doing (roughly), at some point I noticed something interesting:
In the above code snippet we can see that it will initialize the archive, but it is calling this type System
. I didn't necessarily speculate anything seeing this, I simply wrote it down in my notes and continued following along. When continueing the clicking along we can find a function called synoarchive_open_with_keytype
and one of its arguments is actually using the 0
that we observe as the third argument in the snippet above. The synoarchive_open_with_keytype
function is an export of libsynocodesign-ng-virtual-junior-wins.so.7
so we will continue our hunt in this binary next.
Figuring out wth a keytype is
We were now on the hunt to figure out what this keytype meant in the context of our next binary. When looking at what happens in the synoarchive_open_with_keytype
function we can quickly deduce that the keytype
is actually used to specify what kind of patchfile we're dealing with, and depending on the patchfile it will set a keybuffer which is later used to decrypt the contents of the archive. The function responsible for setting these buffers I have aptly named set_keybuffers
is showing a large switch statement, and depending on the given keytype
it will use a value which is hardcoded in this binary:
When I looked at the numbers in the switch statement I immediately thought of the types defined in the Synology_Archive_Extractor project, our keytype
of 0
is specifying this is a DSM system patch! With this new knowledge I quickly jotted down the 2 values of the keybuffer to see if we can do something with this.
The cryptograhpy dance
Immediately I started searching for any crypto functions, and sure enough I found some present in the shared object:
ChaCha20! Easy mode, we just slap those keybuffers we found earlier on that baby and call it done! The libsodium ChaCha20 implementation we see in the above image consists of the following steps as specified in their documentation:
Call
crypto_secretstream_xchacha20poly1305_init_pull
to initialize a state using a header and a key;Call
crypto_secretstream_xchacha20poly1305_pull
for every encrypted buffer using the intialized state;
So we fire up a Python shell, import pysodium and call the function. It doesn't even matter which keybuffer we use because we have only 2, so we just try them both! We add the header using the first 24 bytes that look like encrypted data as I am assuming this is the start of the encrypted buffer and likely used to initialize the state:
I won't take up more of your time by just telling you that it didn't work for the other key either. Well, I guess this is to be expected as I totally worked based off of assumptions again without looking at how the binary itself is doing the actual decryption, lessons learned! But how does it handle its decryption? Let's dig a little deeper into what happens after these keybuffers are set.
Synoarchive open thyself, please?
Right after the keybuffers are set there are some function calls that are opening the archive and reading certain sections of it. The function we will zoom into now is called archive_read_support_format
and is actually based on the same function in libarchive. At this point I finally figured out that we're actually looking at a slightly modified version of libarchive itself! This is good news because we can now annotate certain things in the IDA decompiled output a lot better by using the libarchive
repository to gain an understanding in the process.
This archive_read_support_format
function looks very interesting, just by scrolling through the decompiled code we can see some interesting functions being called:
fopen
is used to open the archive;fread
is called to read some bytes;crypto_sign_verify_detached
is called...?msgpack_unpack_next
is called...?
The first couple of functions make sense, but what are these other functions doing here? When I was looking at this I had no knowledge of the crypto_sign_verify_detached
function so I looked this up first:
The
crypto_sign_verify_detached()
function verifies thatsig
is a valid signature for the messagem
, whose length ismlen
bytes, using the signer’s public keypk
.
Alright, easy enough, it is probably checking if the signature of the bytes we've read with fread
are matching a certain signature given the public key... Public key, hmm? Tracing back the calls we can see that the argument given for the crypto_sign_verify_detached
call is indeed one of the keybuffers that was set during the set_keybuffers
routine. So let's see if we can verify a buffer ourselves.
Verifying something with a key you say?
The verification process in the decompiled output looks like this (annotated):
Finally some recognition;
We can see the header magic being read (
& 0xFFFFFF
is used to discard the first byte);The size of the first message being read (remember the
85 08 00 00
value in the hexdump);The next 64 bytes are read as part of the signature to check;
The bytes of the message are checked using the signature that was read and the public key we found earlier;
So let's prototype this using Python:
Yes, this works (if the message was unverified we'd be greeted with a ValueError
)! We now know we have the means to verify the blocks we're reading and can continue our merry way into discovering it is actually a msgpack serialized TAR... I honestly didn't read too much into it and just haphazardly began to try out some stuff and started to write a small script with the knowledge we have gained so far.
As I could see that the msgpack_unpack_next
was called before calling a key derivation using the crypto_kdf_derive_from_key
function I started to get excited, this could be it, maybe this is where we will finally get our key used for decryption from this function!
Maybe dynamic analysis is not *that* lame
This part of the blog uses a combination of static and dynamic analysis (sorry static purists). I had already played around a little bit with the Docker container and changed the Dockerfile itself to include some tools I like to use:
To assist a little bit in understanding what is happening during debugging I have installed peda, a small framework that was build to aid in developing exploits, but it also has a way nicer way of interacting with gdb
due to its colorised output:
I also installed strace
in the hopes of catching what is happening under the hood, and although it showed a little bit of the execution flow I ended up not doing much with it (or maybe I was just holding it wrong).
Deriving our key
I started with the msgpack
stuff as the decompiled output was showing me that parts of this were used in the next step. Without looking too much into it I simply put the bytes we just verified as being valid into the msgpack_unpack
function of the Python msgpack package, and got a bunch of messagepack headers back along with the size
of the buffer and some bytes which we will come back to later.
Okay, nice, we got some msgpack
blocks, what now? This is where my static analysis knowledge failed and I knew I couldn't really postpone setting up the Docker container to look at the binary using gdb
...
It is good to note that a part of my struggle was probably because I had made an error somewhere while defining the archive
structure definition which I had copied (and modified slightly to make IDA happy) from the libarchive repository... My decompiled output looked something like this:
There will probably be a bunch of people reading this thinking "lol noob, just read the ASM", and to those I say; yes, thanks for that suggestion. I knew at this point that the other buffer we had not used yet was the masterkey
argument for the function, but I could just not figure out what would be the subkey
for the function...
After tinkering around in gdb
and just stupidly printing whatever was in certain memory addresses and registers at any given point, I eventually figured out that the derivation looks something like this:
I know the above code looks a bit weird with the reversing of the bytes in the msgpack_object
buffer, but this seemed to happen looking at gdb
and thus I simply did the same. In any case I figured out the decompiled output in IDA and was able to generate a key.
There's just a small problem, this key derivation will always output a key as there is no verification process of the buffer used to derive the key with... A little bit annoying, but okay we at least now know (and could verify dynamically) the inputs used to derive the key itself.
More verification
At this point I was going through all of the functions that were called along the way, and unfortunately I stumbled upon another function that did some kind of verification, but this time it would verify all of the msgpack
message blocks, time to see if we can verify our buffers in the same manner. It's important to show the format of the msgpack messages after unpacking them with the Python package:
Every entry in the list of msgpack
messages consists of a list
that contains an integer and 32 bytes of which we currently don't know the purpose of.
As I saw the the messages being read one by one, then reading the amount of bytes specified in the entry and calling the crypto_generichash_init
function immediately after, I assumed that the other entry in the msgpack
is actually the hash value that is being checked. Under the hood this is just a blake2b hash initialization for which Python's hashlib
has an implementation:
Great, now we have verified this works, let's test this for every message (at this point I was working on a script):
Awesome, only one final step left!
The final dance
Following the execution flow I stumbled upon the tar_read_header
function which is a modified version of libarchive's one. This was also the time to verify if our derived key earlier is actually the key used in the decryption process, exciting times for us nerds!
I am leaving out a lot of the code for this blogpost, but our decompiled output for the important part of the function looks something like this:
At first glance this looks easy enough (don't ask me about that memset
), we just read the amount of bytes specified by the msgpack
message (remember it has this integer) and then we copy... 0x193
bytes from this to decrypt... Err that doesn't seem to be right... And it wasn't.
I was quite confused about this whole thing until I started to pay attention to gdb
(yesyes just read the ASM I hear you say, go touch grass):
Interesting, if we look for the bytes used for arg[1]
(variable archive_entry_header
in the decompiled output) in the encrypted archive we can see that this is at offset 2253
:
I was kind of curious what happened at which offset in my script as well so I decided to print our offset in the encrypted archive after we have done some verifications. I found out that the offset of 2253
was actually the exact offset in the archive after the signature verification with the public key! So let's continue the execution flow and see if there's some pattern to where it reads its bytes everytime the crypto_secretstream_xchacha20poly1305_init_pull
function is called.
Let's check the offset of arg[1]
again and see this in relation to the previous offset:
You have to imagine that I had looked at the msgpack
message array a little bit already so when I saw that the difference in offset was actually the integer in the msgpack
message entry I immediately understood how this was being read! If we look at our msgpack message array we can see that the integer in the first msgpack entry is indeed 1215
!
We're simply skipping the amount of bytes relative to the archive's current position. I didn't look further into how it actually got this value in the binary as I am a lazy man 🦦.
With this knowledge I started implementing some code that could read every TAR entry header (remember we're looking at the tar_read_header
function which is just reading the header for every TAR entry.
Using dissect.cstruct I copied over the TAR structure type definitions and implemented a small loop that would iterate over the entries using our derived key from earlier, and....
Great success 🥳
The decryption of each entry is not much different, it is a little bit weird to initialize the ChaCha20 state for every entry as this could easily just be done once. It doesn't bring any more security doing this for each block, but they chose to do it this way for some reason.
Decrypting the entries can be summarized into the following steps:
Take note of the TAR entry header's offset and increment this with
512
bytes (blocksize) to get to the start of the encrypted buffer;Read the size of the entry unless this size is bigger than
0x400000
Increment the size by...
17
(I still do not know why)Decrypt the entry, repeat from step 2 if the size was bigger than
0x400000
Conclusion
We could already decrypt the Synology .pat
files, but now we also know how the decryption of these archives is done! Next to that I like to have these kind of scripts as it's a more portable way of decrypting these files instead of a Docker container (not that I didn't like this solution). I hope I could shine some light on my thought process and inspire and/or help others that want to get into these kind of masochistic practices.
I have published the Python script I made along the way, not every format is implemented as I haven't encountered these for the files I wanted to decrypt personally. If you have suggestions or issues with the script I'd appreciate it if you'd create an issue in the repository :)
Last updated