Decrypting Synology Patchfiles

NOTE: If you're already a seasoned reverser and don't care about the steps taken and would rather just look at the final code you can teleport to the repository :)

One faithful day not too long ago I decided to update my Synology NAS, an action that was way overdue. For reasons I always poke around in the patch a little, this time however I was greeted by a message extracting the TAR archive:

tar: Error opening archive: Unrecognized archive format

Only one thing left to do at this point of course; spending my dear time figuring out why TAR is telling me lies. Just kidding, TAR was right, but why...

If not a TAR, what is it?

The first thing I do in these kind of situations is grabbing xxd and checking if it's not just some garbage prepended to the file which confuses tar:

00000000: 70ad beef 8508 0000 95c4 2027 9e84 12e8  p......... '....
00000010: c9b3 0f4b 0049 5bbe e961 76d5 fc26 c15f  ...K.I[..av..&._
00000020: 5872 fe51 f8a9 0bc6 f519 a3dc 0037 92cd  Xr.Q.........7..
00000030: 04bf c420 f419 ad56 a362 aa85 892b 3902  ... ...V.b...+9.
00000040: 4001 695c f809 cad9 1b6f 3166 382e 6f61  @.i\.....o1f8.oa

Okay, apart from that adbeef and what looks to be some bytes which could mean something followed immediately (spoiler alert: it does) after, I can't make out what I am looking at just by staring at this hexdump...

Apparently I had been living under a rock and Synology actually added this quite some time ago to their patch files for DSM. I set out to Google some more about these .pat files and quickly found out that more people ran into this too, they even solved this in quite an elegant manner! One such solution is the Synology_Archive_Extractor. Another one is making use of a Docker container that holds the binary and the shared objects it used to extract these kind of files on the Synology itself. Both worked flawlessly for decrypting the archive and extracting its contents, that's it then right? This is where we just happily decrypt the file and look at the contents like we normally would?

No, of course not. At this point I actually was a little bit annoyed that I didn't know how the encryption works, next to that it seemed like good target practice for me.

Dynamic Analysis is lame!

It's not, but I am lazy, and gdb is daunting to me (we will come back to this later).

I kept telling myself that dynamic analysis would be too much of a hassle for me because I am not running an x86-64 machine myself and would thus have to resort to using a VPS, and nowadays we have a free IDA decompiler! How hard can static analysis be for this?

I used the files from the Docker container as it saves me the time figuring out which are the right files and we can simply use these files for our static analysis as well. By checking which binary is used as the entrypoint of the Docker container we can start by looking at the scemd file (syno_extract_system_patch command) in IDA and figure out the steps we would need to take to decrypt this badboy ourselves.

Somewhere in the humongous main function we can see the calls made when we use the syno_extract_system_patch command:

if ( !strcmp(cmd, "syno_extract_system_patch") ) {
    if ( argc != 3 ) {
        pat_location = *patchfile;
        sub_45900E();
        __printf_chk(1LL, "%s PATCH_PATH DEST_PATH\n", pat_location);
        return -1;
    }
    status = extract_patch(patchfile[1], patchfile[2]);
    return -(status != 0);
}

So far it's pretty straight forward, we follow some more calls and look at what they're doing (roughly), at some point I noticed something interesting:

if ( patchfile && outdir ) {
    v3 = v2;
    __syslog_chk(
      6LL,
      1LL,
      "%s:%d synoinstall: synoarchive: [%s][OPEN] Type: System",
      "lib/synoarchive.c",
      441LL,
      patchfile);
    return init_archive(patchfile, outdir, 0, v3);
}

In the above code snippet we can see that it will initialize the archive, but it is calling this type System. I didn't necessarily speculate anything seeing this, I simply wrote it down in my notes and continued following along. When continueing the clicking along we can find a function called synoarchive_open_with_keytype and one of its arguments is actually using the 0 that we observe as the third argument in the snippet above. The synoarchive_open_with_keytype function is an export of libsynocodesign-ng-virtual-junior-wins.so.7 so we will continue our hunt in this binary next.

Figuring out wth a keytype is

We were now on the hunt to figure out what this keytype meant in the context of our next binary. When looking at what happens in the synoarchive_open_with_keytype function we can quickly deduce that the keytype is actually used to specify what kind of patchfile we're dealing with, and depending on the patchfile it will set a keybuffer which is later used to decrypt the contents of the archive. The function responsible for setting these buffers I have aptly named set_keybuffers is showing a large switch statement, and depending on the given keytype it will use a value which is hardcoded in this binary:

switch ( keytype ) {
    case 0:  // DSM System patch
    case 10: // DSM Support patch
      *keybuffer1 = qword_23850;
      *keybuffer2 = qword_23878;
      result = 1LL;
      break;

When I looked at the numbers in the switch statement I immediately thought of the types defined in the Synology_Archive_Extractor project, our keytype of 0 is specifying this is a DSM system patch! With this new knowledge I quickly jotted down the 2 values of the keybuffer to see if we can do something with this.

The cryptograhpy dance

Immediately I started searching for any crypto functions, and sure enough I found some present in the shared object:

ChaCha20! Easy mode, we just slap those keybuffers we found earlier on that baby and call it done! The libsodium ChaCha20 implementation we see in the above image consists of the following steps as specified in their documentation:

  • Call crypto_secretstream_xchacha20poly1305_init_pull to initialize a state using a header and a key;

  • Call crypto_secretstream_xchacha20poly1305_pull for every encrypted buffer using the intialized state;

So we fire up a Python shell, import pysodium and call the function. It doesn't even matter which keybuffer we use because we have only 2, so we just try them both! We add the header using the first 24 bytes that look like encrypted data as I am assuming this is the start of the encrypted buffer and likely used to initialize the state:

>>> fh.seek(8)
>>> header = fh.read(24)
>>> key = bytes.fromhex("078a7529a07a998cffadb87d7378993b7d9ccfa7171f5c47f150838a6a7caf61")
>>> state = pysodium.crypto_secretstream_xchacha20poly1305_init_pull(header, key)
>>> enc_buffer = fh.read(0x200)
>>> pysodium.crypto_secretstream_xchacha20poly1305_pull(state, enc_buffer, b"")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/pysodium/pysodium/__init__.py", line 62, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/pysodium/pysodium/__init__.py", line 1198, in crypto_secretstream_xchacha20poly1305_pull
    __check(sodium.crypto_secretstream_xchacha20poly1305_pull(
  File "/root/pysodium/pysodium/__init__.py", line 309, in __check
    raise ValueError
ValueError

I won't take up more of your time by just telling you that it didn't work for the other key either. Well, I guess this is to be expected as I totally worked based off of assumptions again without looking at how the binary itself is doing the actual decryption, lessons learned! But how does it handle its decryption? Let's dig a little deeper into what happens after these keybuffers are set.

Synoarchive open thyself, please?

Right after the keybuffers are set there are some function calls that are opening the archive and reading certain sections of it. The function we will zoom into now is called archive_read_support_format and is actually based on the same function in libarchive. At this point I finally figured out that we're actually looking at a slightly modified version of libarchive itself! This is good news because we can now annotate certain things in the IDA decompiled output a lot better by using the libarchive repository to gain an understanding in the process.

This archive_read_support_format function looks very interesting, just by scrolling through the decompiled code we can see some interesting functions being called:

  • fopen is used to open the archive;

  • fread is called to read some bytes;

  • crypto_sign_verify_detached is called...?

  • msgpack_unpack_next is called...?

The first couple of functions make sense, but what are these other functions doing here? When I was looking at this I had no knowledge of the crypto_sign_verify_detached function so I looked this up first:

The crypto_sign_verify_detached() function verifies that sig is a valid signature for the message m, whose length is mlen bytes, using the signer’s public key pk.

Alright, easy enough, it is probably checking if the signature of the bytes we've read with fread are matching a certain signature given the public key... Public key, hmm? Tracing back the calls we can see that the argument given for the crypto_sign_verify_detached call is indeed one of the keybuffers that was set during the set_keybuffers routine. So let's see if we can verify a buffer ourselves.

Verifying something with a key you say?

The verification process in the decompiled output looks like this (annotated):

if ( fread(&ptr, 4uLL, 1uLL, fh) ) {        // read 4 bytes once
    archive_magic = _byteswap_ulong(ptr) & 0xFFFFFF;
    if ( archive_magic == 0xBFBAAD || archive_magic == 0xADBEEF ) { // Check magic
        if ( fread(size, 1uLL, 4uLL, fh) == 4 ) {
        size_of_m = size[0];                // 0x855
        if ( fread(m, size_of_m, 1uLL, fh)  // Read size_of_m bytes
        && ( fread(sig, 0x40uLL, 1uLL, fh) ) { // Read the signature
            offset = 8LL;
            for ( pk = *_archive[1].error_string.length; pk; offset += 8LL ) {
                if ( !crypto_sign_verify_detached(sig, m, size[0], pk) )

Finally some recognition;

  1. We can see the header magic being read (& 0xFFFFFF is used to discard the first byte);

  2. The size of the first message being read (remember the 85 08 00 00 value in the hexdump);

  3. The next 64 bytes are read as part of the signature to check;

  4. The bytes of the message are checked using the signature that was read and the public key we found earlier;

So let's prototype this using Python:

# Skip the magic
>>> fh.seek(4)
>>> header_length = struct.unpack("<I", fh.read(4))[0]
>>> m = fh.read(header_length)
>>> sig = fh.read(0x40)
>>> pysodium.crypto_sign_verify_detached(sig, m, pk)
>>>

Yes, this works (if the message was unverified we'd be greeted with a ValueError)! We now know we have the means to verify the blocks we're reading and can continue our merry way into discovering it is actually a msgpack serialized TAR... I honestly didn't read too much into it and just haphazardly began to try out some stuff and started to write a small script with the knowledge we have gained so far.

As I could see that the msgpack_unpack_next was called before calling a key derivation using the crypto_kdf_derive_from_key function I started to get excited, this could be it, maybe this is where we will finally get our key used for decryption from this function!

Maybe dynamic analysis is not *that* lame

This part of the blog uses a combination of static and dynamic analysis (sorry static purists). I had already played around a little bit with the Docker container and changed the Dockerfile itself to include some tools I like to use:

FROM ubuntu:22.04

RUN mkdir /tmp/synology/
RUN mkdir /tmp/synology/data/

COPY ./extractor/DSM_DS723+_42962.pat /tmp/synology
COPY ./extractor/lib* /lib/
COPY ./extractor/scemd /bin/syno_extract_system_patch

RUN apt-get update && apt-get install strace gdb git -y
RUN git clone https://github.com/longld/peda.git ~/peda
RUN echo "source ~/peda/peda.py" >> ~/.gdbinit

To assist a little bit in understanding what is happening during debugging I have installed peda, a small framework that was build to aid in developing exploits, but it also has a way nicer way of interacting with gdb due to its colorised output:

I also installed strace in the hopes of catching what is happening under the hood, and although it showed a little bit of the execution flow I ended up not doing much with it (or maybe I was just holding it wrong).

Deriving our key

I started with the msgpack stuff as the decompiled output was showing me that parts of this were used in the next step. Without looking too much into it I simply put the bytes we just verified as being valid into the msgpack_unpack function of the Python msgpack package, and got a bunch of messagepack headers back along with the size of the buffer and some bytes which we will come back to later.

Okay, nice, we got some msgpack blocks, what now? This is where my static analysis knowledge failed and I knew I couldn't really postpone setting up the Docker container to look at the binary using gdb...

It is good to note that a part of my struggle was probably because I had made an error somewhere while defining the archive structure definition which I had copied (and modified slightly to make IDA happy) from the libarchive repository... My decompiled output looked something like this:

buf = calloc(1uLL, 0x20uLL);
msgpack_header = u64->via.array.ptr;
_archive[1].error_string.buffer_length = buf;
masterkey = key;  // Given as the function argument
_header = _mm_loadu_si128(msgpack_header);
v56 = 0;
*buf = _header;
buf[1] = _mm_loadu_si128(msgpack_header + 1);
v32 = u64->via.array.ptr;          // points to first message of msgpack messages
subkey_id = v32->via.array.ptr;    // first message + 0x10
*ctx = v32[1].type;
v55 = *(&v32[1].type + 2);
LOBYTE(v56) = *(&v32[1].type + 6);
LODWORD(result) = crypto_kdf_derive_from_key(
    &_archive[1].archive_format_name,  // subkey
    0x20uLL,
    subkey_id,
    ctx,
    masterkey
);

There will probably be a bunch of people reading this thinking "lol noob, just read the ASM", and to those I say; yes, thanks for that suggestion. I knew at this point that the other buffer we had not used yet was the masterkey argument for the function, but I could just not figure out what would be the subkey for the function...

After tinkering around in gdb and just stupidly printing whatever was in certain memory addresses and registers at any given point, I eventually figured out that the derivation looks something like this:

msgpack_object = self.msgpack_messages[0][::-1]
# Derive the ChaCha20 key that is used for the decryption of the archive
subkey_id = struct.unpack(">Q", msgpack_object[0x8 : 0x8 + 8])[0]
ctx = (msgpack_object[0x1 : 0x1 + 7][::-1] + b"\x00")  # Pad to 8 bytes
chacha20_key = pysodium.crypto_kdf_derive_from_key(
    len(SUBKEY),
    subkey_id,
    ctx,
    SUBKEY
)

I know the above code looks a bit weird with the reversing of the bytes in the msgpack_object buffer, but this seemed to happen looking at gdb and thus I simply did the same. In any case I figured out the decompiled output in IDA and was able to generate a key.

There's just a small problem, this key derivation will always output a key as there is no verification process of the buffer used to derive the key with... A little bit annoying, but okay we at least now know (and could verify dynamically) the inputs used to derive the key itself.

More verification

At this point I was going through all of the functions that were called along the way, and unfortunately I stumbled upon another function that did some kind of verification, but this time it would verify all of the msgpack message blocks, time to see if we can verify our buffers in the same manner. It's important to show the format of the msgpack messages after unpacking them with the Python package:

[1215, b"somebuffer"]

Every entry in the list of msgpack messages consists of a list that contains an integer and 32 bytes of which we currently don't know the purpose of.

As I saw the the messages being read one by one, then reading the amount of bytes specified in the entry and calling the crypto_generichash_init function immediately after, I assumed that the other entry in the msgpack is actually the hash value that is being checked. Under the hood this is just a blake2b hash initialization for which Python's hashlib has an implementation:

>>> from hashlib import blake2b
>>> verification_bytes = fh.read(message[0])
>>> blake = blake2b(digest_size=32)
>>> blake.update(verification_bytes)
>>> blake.hexdigest() == message[1].hex()
True

Great, now we have verified this works, let's test this for every message (at this point I was working on a script):

2024-04-20 16:49:17 - INFO - Opening archive: DSM_DS723+_42962.pat
2024-04-20 16:49:17 - INFO - Verified magic: 0xadbeef
2024-04-20 16:49:17 - INFO - Verified signature: 9a9f28007b7f307ac80033566ab0e417f1ccf9c97bc3d99d4a8de0026bddffa3512fbaa911ae7d8887619265a326433430ecdd6b7c5e0161a645a94e4f3b8c01
2024-04-20 16:49:17 - INFO - Derived ChaCha20 key: d00817ada1c86fff061130d016dc7aa296ca54b93bb7e8eca86a563a7c627050
2024-04-20 16:49:17 - INFO - Verified message #0 [f419ad56a362aa85892b39024001695cf809cad91b6f3166382e6f610d3e9041]
2024-04-20 16:49:17 - INFO - Verified message #1 [b3d1db9b2375a8cddeafeaa0460976896b97856b9b36935bfe00d5dffbd8520b]
2024-04-20 16:49:17 - INFO - Verified message #2 [8010b3d551225f494aba39e8352dd046ab9bc23d5d1ab999f78ef43831f1e1f4]
2024-04-20 16:49:17 - INFO - Verified message #3 [b394b51a0ccaacb2af7d4b428c677cadb2d9f37176f03545a7c1ff05bc6a6902]
2024-04-20 16:49:17 - INFO - Verified message #4 [bfac05f8036befa3ed4019fa2c5428d8db17609a9b9d5d1b6d4ffb531eba4c0f]
2024-04-20 16:49:17 - INFO - Verified message #5 [0adb7ef6304211ceff6f54c5365487e3a550911797575c8162578b135a899c3d]
...
2024-04-20 16:49:17 - INFO - Verified message #52 [6a50af90914285426ba256bca35f222de8e17035472b2db289be28804f9e87db]
2024-04-20 16:49:17 - INFO - Verified message #53 [9130d1ca6b629ce057c539e49b8ba6b74a0580f4e001aaca03ed56a294403d3c]
2024-04-20 16:49:17 - INFO - Verified message #54 [9604d4b4d10769f5c319b5acb33ddf5a40ffd5902fc9419c977939f454aadf17]
2024-04-20 16:49:17 - INFO - Verified msgpack messageblocks

Awesome, only one final step left!

The final dance

Following the execution flow I stumbled upon the tar_read_header function which is a modified version of libarchive's one. This was also the time to verify if our derived key earlier is actually the key used in the decryption process, exciting times for us nerds!

I am leaving out a lot of the code for this blogpost, but our decompiled output for the important part of the function looks something like this:

has_magic = archive->skip_file_ino == 0xADBEEF;// check the magic
if ( has_magic ) { // initialize the chacha20 if the magic was found
    header_bytes = _mm_loadu_si128(archive_entry_header);
    if ( crypto_secretstream_xchacha20poly1305_init_pull(
             state, &header_bytes, &archive->client
         ) ) {
        archive_set_error(&archive->archive, -3, "Incomplete cipher header");
        goto LABEL_32;
      }
      qmemcpy(c, &archive_entry_header->name[24], 403uLL);
      *archive_entry_header->name = 0LL;
      *&archive_entry_header[1].name[9] = 0LL;
      memset(
          (&archive_entry_header->name[8] & 0xFFFFFFFFFFFFFFF8LL),
          0u,
          8LL * ((archive_entry_header - ((archive_entry_header + 8) & 0xFFFFFFF8) + 512) >> 3));
      v21 = crypto_secretstream_xchacha20poly1305_pull(
              state,
              archive_entry_header,
              &mlen_p,
              &tag_p,
              c,
              0x193uLL,
              0LL,
              0LL
            );
...
}

At first glance this looks easy enough (don't ask me about that memset), we just read the amount of bytes specified by the msgpack message (remember it has this integer) and then we copy... 0x193 bytes from this to decrypt... Err that doesn't seem to be right... And it wasn't.

I was quite confused about this whole thing until I started to pay attention to gdb (yesyes just read the ASM I hear you say, go touch grass):

Guessed arguments:
arg[0]: 0x7ffd589e7e80 --> 0x7fa054da288be69a
arg[1]: 0x7ffd589e7ec0 --> 0x8009ea303ff1920
arg[2]: 0x172e7d8 --> 0xff6fc8a1ad1708d0

Interesting, if we look for the bytes used for arg[1] (variable archive_entry_header in the decompiled output) in the encrypted archive we can see that this is at offset 2253:

>>> fh.seek(0)
>>> contents = fh.read()
>>> contents.index(bytes.fromhex("08009ea303ff1920")[::-1])
2253

I was kind of curious what happened at which offset in my script as well so I decided to print our offset in the encrypted archive after we have done some verifications. I found out that the offset of 2253 was actually the exact offset in the archive after the signature verification with the public key! So let's continue the execution flow and see if there's some pattern to where it reads its bytes everytime the crypto_secretstream_xchacha20poly1305_init_pull function is called.

Guessed arguments:
arg[0]: 0x7ffd589e7e80 --> 0x340a899126e00c41
arg[1]: 0x7ffd589e7ec0 --> 0x6eb078761d721f5d
arg[2]: 0x172e7d8 --> 0xff6fc8a1ad1708d0

Let's check the offset of arg[1] again and see this in relation to the previous offset:

>>> contents.index(bytes.fromhex("6eb078761d721f5d")[::-1])
3468
>>> 3468 - 2253
1215

You have to imagine that I had looked at the msgpack message array a little bit already so when I saw that the difference in offset was actually the integer in the msgpack message entry I immediately understood how this was being read! If we look at our msgpack message array we can see that the integer in the first msgpack entry is indeed 1215!

>>> msgpack_messages
[[1215, b"buf1"], [3547177, b"buf2"], [13348420, b"buf3"], [5171770, b"buf4"], ...

We're simply skipping the amount of bytes relative to the archive's current position. I didn't look further into how it actually got this value in the binary as I am a lazy man 🦦.

With this knowledge I started implementing some code that could read every TAR entry header (remember we're looking at the tar_read_header function which is just reading the header for every TAR entry.

Using dissect.cstruct I copied over the TAR structure type definitions and implemented a small loop that would iterate over the entries using our derived key from earlier, and....

2024-04-20 17:20:41 - INFO - Opening archive: DSM_DS723+_42962.pat
2024-04-20 17:20:41 - INFO - Verified magic: 0xadbeef
2024-04-20 17:20:41 - INFO - Verified signature: 9a9f28007b7f307ac80033566ab0e417f1ccf9c97bc3d99d4a8de0026bddffa3512fbaa911ae7d8887619265a326433430ecdd6b7c5e0161a645a94e4f3b8c01
2024-04-20 17:20:41 - INFO - Encrypted data offset: 0x8cd
2024-04-20 17:20:41 - INFO - ChaCha20 key: d00817ada1c86fff061130d016dc7aa296ca54b93bb7e8eca86a563a7c627050
2024-04-20 17:20:41 - INFO - Verified msgpack messageblocks
2024-04-20 17:20:41 - INFO - Entry: VERSION [662 bytes] | uname: root gname: root
2024-04-20 17:20:41 - INFO - Entry: zImage [3546624 bytes] | uname: root gname: root
2024-04-20 17:20:41 - INFO - Entry: updater [13347816 bytes] | uname: root gname: root
2024-04-20 17:20:41 - INFO - Entry: DiskCompatibilityDB.tar [5171200 bytes] | uname: root gname: root
...
2024-04-20 17:20:41 - INFO - Entry: texts/tha/strings [23189 bytes] | uname: root gname: root
2024-04-20 17:20:41 - INFO - Entry: texts/trk/strings [12949 bytes] | uname: root gname: root

Great success 🥳

The decryption of each entry is not much different, it is a little bit weird to initialize the ChaCha20 state for every entry as this could easily just be done once. It doesn't bring any more security doing this for each block, but they chose to do it this way for some reason.

Decrypting the entries can be summarized into the following steps:

  1. Take note of the TAR entry header's offset and increment this with 512 bytes (blocksize) to get to the start of the encrypted buffer;

  2. Read the size of the entry unless this size is bigger than 0x400000

  3. Increment the size by... 17 (I still do not know why)

  4. Decrypt the entry, repeat from step 2 if the size was bigger than 0x400000

Conclusion

We could already decrypt the Synology .pat files, but now we also know how the decryption of these archives is done! Next to that I like to have these kind of scripts as it's a more portable way of decrypting these files instead of a Docker container (not that I didn't like this solution). I hope I could shine some light on my thought process and inspire and/or help others that want to get into these kind of masochistic practices.

I have published the Python script I made along the way, not every format is implemented as I haven't encountered these for the files I wanted to decrypt personally. If you have suggestions or issues with the script I'd appreciate it if you'd create an issue in the repository :)

Last updated