Nikon Bluetooth LE Reverse Engineering

An Unexpected Journey

I have reverse engineered a number of camera bluetooth protocols in the pursuit of making furble1 a universal remote.

The typical workflow for an unsupported camera or feature is:

I assumed the typical workflow would apply to my Nikon camera and would take a week or so.

I was wrong, very wrong.

Unbeknownst to me this was the beginning of a journey that would span several months and involve countless hours of vacuous staring.

A Firm Handshake

Cameras typically have two phases over the bluetooth connection lifetime:

The handshake phase is typically used to identify both parties, determine capabilities and configure the session for the actual operation, where shutter control is performed.

After obtaining the HCI log in the normal way, I was able to isolate the following as the handshake protocol:

Source Message
App 010dc194694f54f71e26b994648624b814
Camera 023714aff738a3f8171003e69e47bb38cc
App 030dc194694f54f71e7535ca81770781f5
Camera 043714aff738a3f8173235303036303332

Usually, connecting to a saved connection looks a bit different, so get a ’re-pair’ trace:

Source Message
App 01013c69b055ecd6ca26b994648624b814
Camera 0211f77bca403e25ac0ecedc06f59cb6c8
App 03013c69b055ecd6ca9570ee41b6486092
Camera 0411f77bca403e25ac3235303036303332

Stare at these two traces long enough and several patterns begin to appear:

By applying a bit of formatting to the initial messages, we get:

Message Index Fixed Changing
010dc194694f54f71e26b994648624b814 01 0dc194694f54f71e 26b994648624b814
023714aff738a3f8171003e69e47bb38cc 02 3714aff738a3f817 1003e69e47bb38cc
030dc194694f54f71e7535ca81770781f5 03 0dc194694f54f71e 7535ca81770781f5
043714aff738a3f8173235303036303332 04 3714aff738a3f817 3235303036303332

More staring results in:

So, we can guess the app sends some identifier and the last step of the handshake is the camera’s serial number being transmitted, but what about the rest?

Riddles in the Dark

As furble replaces the app, we can try talking to the camera ourselves, in particular, replaying captures or handcrafted messages. With step 1 replays, a different step 2 response was observed each time. They don’t match the original captures either, thus there is some temporal (or other) input factor at play.

Eventually I just sent zeroes with the following timestamped results:

Timestamp (ms) Sent (index) Sent (fixed) Sent (changing) Response (index) Response (fixed) Response (changing)
17526 01 0000000000000000 0000000000000000 02 6bffc2e145dac17f 1245e2e8fe08f744
17626 01 0000000000000000 0000000000000000 02 6c0324930b3260c8 3c70943158f121b9
17726 01 0000000000000000 0000000000000000 02 6c068644508a0010 20ca08961d23feb4
17826 01 0000000000000000 0000000000000000 02 6c09e7f615e19f59 5464c01f27dbb6f5
17926 01 0000000000000000 0000000000000000 02 6c0d49a85b393ea1 f9fa78b33e202815
18026 01 0000000000000000 0000000000000000 02 6c10ab5a2090ddea 906ea25802e8276a

The timestamp is milliseconds since boot, and the message offsets are deliberately 100ms apart to avoid spamming the target. What’s curious to note is the ‘fixed’ portion of the response appears to be incrementing. Further inspection suggests the most significant 32-bits of the ‘fixed’ portion is changing less frequently than the least significant 32-bits.

Let’s convert into something human-readable by splitting the bytes:

Response (fixed)(MSB)(hex) Reponse (fixed)(LSB)(hex) Decimal (MSB) Difference to Previous (MSB)
6bffc2e1 45dac17f 1811923681
6c032493 0b3260c8 1812145299 221618
6c068644 508a0010 1812366916 221617
6c09e7f6 15e19f59 1812588534 221618
6c0d49a8 5b393ea1 1812810152 221618
6c10ab5a 2090ddea 1813031770 221618

That difference is exceedingly consistent. Chances are good this is a timestamp or counter of some sort from the camera. In comparison the least significant 32-bits appear almost random.

This leaves 24 bytes (8 bytes fixed, 16 bytes changing) of unknown response payload to determine.

Out of the Frying-Pan and into the Fire

After capturing hundreds of additional messages and weeks of random byte crunching (eg. CRCs, checksums) I realised the following:

Previously I’d always used a clean-room reverse engineering method, but this time it was unavoidable, I had to pull the Android app APK apart to progress. APKs are archives of Java bytecode and decompiling into mostly readable Java source is a straightforward exercise. Once I got the sources out, I was able to confirm a few things:

After more weeks of building a general map of the Java (which is huge and sprawling) I eventually found the calls to generate the stage 1 and expected stage 2 reply.

And it was a Java Native Interface (JNI) call to a precompiled binary shared library.
Ugh.

Companies typically ship binary code for optimisation or obfuscation purposes. Perhaps I could find an older version of the APK with non-binary bits that preceded the shared library? I searched for the oldest version of any variant of the app and found one over a decade old. And it was still making JNI calls to a precompiled shared library.
Double ugh.

This particular puzzle box has been locked for over a decade.
Triple ugh.

I had no choice, I had to go further and disassemble the shared library.

Enter the Ghidra

Initially I used objdump --disassemble to get the assembly. Whilst I can read some assembly, I can’t do it well and rapidly got lost in offsets, operands and registers like this:

push   %esi
lea    -0x8(%esp),%esp
mov    0x18(%esp),%ecx
mov    0x1c(%esp),%eax
mov    0x20(%esp),%edi
lea    0x4(%ecx),%edx
lea    0x44(%ecx),%esi
mov    (%eax),%eax
mov    (%edi),%edi
mov    %edx,0x4(%esp)
lea    0x0(%esi,%eiz,1),%esi
xor    (%esi),%eax
sub    $0x4,%esi

There had to be a better way, and that way was Ghidra.

Ghidra3 is an NSA developed tool suite for the explicit purpose of reverse engineering software. The invaluable key features:

After a few clicks and waiting for ‘analysis’ I got a nice dump of semi-readable C source. With Ghidra’s help I was now able to see:

There was some good news, the call stack wasn’t particularly deep, maybe 4 at the most. A few functions were referenced multiple times, some functions were small, others were not.

After more weeks of staring, two functions started to make sense.

void FUN_00012400(undefined *param_1,undefined *param_2,ushort param_3)
{
    ushort uVar1;
    undefined4 *puVar2;
    undefined4 *puVar3;
    undefined uVar4;
    undefined4 uVar5;
    undefined4 uVar6;
    undefined4 uVar7;
    undefined *puVar8;
    undefined *puVar9;
    ushort uVar10;
    ushort uVar11;
    int iVar12;
    if (0 < (short)param_3) {
        if ((param_2 < param_1 + 0x10 && param_1 < param_2 + 0x10) || (param_3 < 0x10)) {
            puVar8 = param_1;
            do {
                uVar4 = *param_2;
                puVar9 = puVar8 + 1;
                param_2 = param_2 + 1;
                *puVar8 = uVar4;
                puVar8 = puVar9;
            } while (puVar9 != param_1 + (ushort)(param_3 - 1) + 1);
            return;
        }
        uVar11 = 0;
        uVar1 = ((ushort)(param_3 - 0x10) >> 4) + 1;
        uVar10 = uVar1 * 0x10;
        iVar12 = 0;
        do {
            puVar2 = (undefined4 *)(param_2 + iVar12);
            uVar5 = puVar2[1];
            uVar6 = puVar2[2];
            uVar7 = puVar2[3];
            uVar11 = uVar11 + 1;
            puVar3 = (undefined4 *)(param_1 + iVar12);
            *puVar3 = *puVar2;
            puVar3[1] = uVar5;
            puVar3[2] = uVar6;
            puVar3[3] = uVar7;
            iVar12 = iVar12 + 0x10;
        } while (uVar11 < uVar1);
        param_1 = param_1 + uVar10;
        param_2 = param_2 + uVar10;
        if (((((param_3 != uVar10) && (*param_1 = *param_2, (short)(uVar10 + 1) < (short)param_3)) &&
              (param_1[1] = param_2[1], (short)(uVar10 + 2) < (short)param_3)) &&
             (((param_1[2] = param_2[2], (short)(uVar10 + 3) < (short)param_3 &&
                (param_1[3] = param_2[3], (short)(uVar10 + 4) < (short)param_3)) &&
               ((param_1[4] = param_2[4], (short)(uVar10 + 5) < (short)param_3 &&
                 ((param_1[5] = param_2[5], (short)(uVar10 + 6) < (short)param_3 &&
                   (param_1[6] = param_2[6], (short)(uVar10 + 7) < (short)param_3)))))))) &&
            ((param_1[7] = param_2[7], (short)(uVar10 + 8) < (short)param_3 &&
              (((((param_1[8] = param_2[8], (short)(uVar10 + 9) < (short)param_3 &&
                   (param_1[9] = param_2[9], (short)(uVar10 + 10) < (short)param_3)) &&
                  (param_1[10] = param_2[10], (short)(uVar10 + 0xb) < (short)param_3)) &&
                 ((param_1[0xb] = param_2[0xb], (short)(uVar10 + 0xc) < (short)param_3 &&
                   (param_1[0xc] = param_2[0xc], (short)(uVar10 + 0xd) < (short)param_3)))) &&
                (param_1[0xd] = param_2[0xd], (short)(uVar10 + 0xe) < (short)param_3)))))) {
            param_1[0xe] = param_2[0xe];
        }
    }
    return;
}

This function returns void and takes 3 parameters:

There’s lots of variable assignments and some loops. I stared long enough to realise this is an obtuse version of memcpy(). The do/while loop copies 4 bytes at a time whilst the crazy if state handles trailing bytes.

int FUN_000125c0(byte *param_1,byte *param_2,short param_3)
{
    byte *pbVar1;
    int iVar2;
    if (param_3 < 1) {
        iVar2 = 0;
    }
    else {
        iVar2 = (uint)*param_1 - (uint)*param_2;
        if (iVar2 == 0) {
            pbVar1 = param_1 + (ushort)(param_3 - 1) + 1;
            while( true ) {
                param_1 = param_1 + 1;
                param_2 = param_2 + 1;
                if (param_1 == pbVar1) break;
                if ((uint)*param_1 - (uint)*param_2 != 0) {
                    return (uint)*param_1 - (uint)*param_2;
                }
            }
        }
    }
    return iVar2;
}

This function returns an integer and takes 3 parameters:

There’s comparisons and, curiously, pointer dereference subtractions. This means the function can easily return positive and negative values. Stare at this one long enough and it looks like memcmp().

Ghidra allows you to rename symbols, so FUN_00012400 became memcpy and FUN_000125c0 became memcmp.

Two additional functions were identified, bytes2uint and uint2bytes for transforming arrays of bytes to and from a uint32_t.

As you keep identifying and labelling functions, you start increasing the signal to noise ratio.

Eventually two particular functions became interesting.

void FUN_00011500(uint *param_1,uint *param_2,uint *param_3)
{
    uint uVar1;
    uint uVar2;
    uint *puVar3;
    uint uVar4;
    uVar2 = *param_2;
    puVar3 = param_1;
    uVar4 = *param_3;
    do {
        uVar1 = uVar2 ^ *puVar3;
        puVar3 = puVar3 + 1;
        uVar2 = (param_1[(uVar1 >> 0x18) + 0x12] + param_1[(uVar1 >> 0x10 & 0xff) + 0x112] ^
                 param_1[(uVar1 >> 8 & 0xff) + 0x212]) + param_1[(uVar1 & 0xff) + 0x312] ^ uVar4;
        uVar4 = uVar1;
    } while (puVar3 != param_1 + 0x10);
    uVar4 = param_1[0x10];
    *param_2 = uVar1 ^ param_1[0x11];
    *param_3 = uVar2 ^ uVar4;
    return;
}

There’s a lot of shifts and XORs there, so let’s (creatively) call this xor().

void FUN_000108c0(undefined *param_1,int param_2,ushort param_3)
{
    undefined uVar1;
    short sVar2;
    int iVar3;
    undefined *puVar4;
    int in_GS_OFFSET;
    uint local_54;
    uint local_50;
    undefined *local_4c;
    uint local_30;
    uint local_2c;
    undefined local_28 [4];
    undefined local_24 [4];
    int local_20;
    undefined4 uStack_14;
    uStack_14 = 0x108c9;
    local_4c = param_1;
    local_20 = *(int *)(in_GS_OFFSET + 0x14);
    if (DAT_00017040 == 0) {
        DAT_00017040 = 1;
    }
    if (param_3 < 8) {
        local_50 = 0x5060708;
        local_54 = 0x1020304;
    }
    else {
        local_50 = 0x5060708;
        local_54 = 0x1020304;
        local_4c = param_1 + (uint)((ushort)(param_3 - 8) >> 3) * 8 + 8;
        do {
            bytes2uint(param_1,&local_30);
            puVar4 = param_1 + 4;
            param_1 = param_1 + 8;
            bytes2uint(puVar4,&local_2c);
            local_30 = local_30 ^ local_54;
            local_2c = local_2c ^ local_50;
            xor(&DAT_00013180,&local_30,&local_2c);
            local_54 = local_30;
            local_50 = local_2c;
        } while (local_4c != param_1);
        param_3 = param_3 & 7;
    }

    // snip a large, crazy if condition

    uint2bytes(param_2,&local_30);
    uint2bytes(param_2 + 4,&local_2c);
    if (local_20 == *(int *)(in_GS_OFFSET + 0x14)) {
        return;
    }
    FUN_00010850();
}

This function calls xor() in a loop, with even more XORs. It looks like it’s scrambling the data, so let’s call it mix().

At this point, it was obvious I would never have made progress from analysing the traffic captures. There was far more occurring than I had anticipated.

On the Doorstep

The most interesting line in xor() is the do/while loop:

        uVar2 = (param_1[(uVar1 >> 0x18) + 0x12] + param_1[(uVar1 >> 0x10 & 0xff) + 0x112] ^
                 param_1[(uVar1 >> 8 & 0xff) + 0x212]) + param_1[(uVar1 & 0xff) + 0x312] ^ uVar4;

By following the function stack, param_1 references a memory location, a massive chunk of seemingly random values. This line is looking up values by an offset derived from the input and smashing them together.

The multitude of XORs and shifts in that function resembled a pseudo random number generator (RNG) implemented via a linear feedback shift register. However, I was unable find any RNGs with a similar algorithm.

After staring at the code for some time, a key piece becomes clear in the array indices of param_1, they all look like param_1[(uVar >> m & 0xff) + n]. The & 0xff is critical, it is masking the lower 8 bits, so the array index is limited to 28 == 256 entries. Furthermore, the n offsets are 0x012, 0x112, 0x212 and 0x312, exactly 0x100 == 256 values apart. That line is indexing four consecutive arrays that contain 256 values.

Furthermore, there’s this line:

    *param_2 = uVar1 ^ param_1[0x11];

Where 0x11 == 17, hardcoded access to the 18th value of the array (using zero based indices).

As the RNG search was fruitless, the next paradigm that uses XORs is cryptography. After a fair amount of searching, I chanced upon the Blowfish cipher which uses a P array of 18 values and four S-boxes of 256 values.

18 and 256. Coincidence? I think not.

With this new found insight, I was able to massage enough struct information into Ghidra to change the xor() function into this:

void xor(mixTable_t *pMixTable,uint *pL,uint *pR)
{
    uint tmp;
    uint xL;
    uint32_t *p;
    uint xR;
    xL = *pL;
    p = pMixTable->p_array;
    xR = *pR;
    do {
        tmp = xL ^ *p;
        p = p + 1;
        xL = (pMixTable->sbox0[tmp >> 0x18] + pMixTable->sbox1[tmp >> 0x10 & 0xff] ^
              pMixTable->sbox2[tmp >> 8 & 0xff]) + pMixTable->sbox3[tmp & 0xff] ^ xR;
        xR = tmp;
    } while (p != pMixTable->p_array + 0x10);
    xR = pMixTable->p_array[0x10];
    *pL = tmp ^ pMixTable->p_array[0x11];
    *pR = xL ^ xR;
    return;
}

For comparison, Schneier’s Blowfish implementation4 looks like:

unsigned long F(unsigned long x)
{
   unsigned short a;
   unsigned short b;
   unsigned short c;
   unsigned short d;
   unsigned long  y;

   d = x & 0x00FF;
   x >>= 8;
   c = x & 0x00FF;
   x >>= 8;
   b = x & 0x00FF;
   x >>= 8;
   a = x & 0x00FF;
   //y = ((S[0][a] + S[1][b]) ^ S[2][c]) + S[3][d];
   y = S[0][a] + S[1][b];
   y = y ^ S[2][c];
   y = y + S[3][d];

   return y;
}

which can be refactored by combining the shifts, additions and XORs into:

unsigned long F(unsigned long x)
{
   return ((S[0][x >> 24 & 0xFF + S[1][x >> 16 & 0xFF) ^ S[2][x >> 8 & 0xFF) + S[3][x & 0xFF];
}

And also from Schneier4:

void Blowfish_encipher(unsigned long *xl, unsigned long *xr)
{
   unsigned long  Xl;
   unsigned long  Xr;
   unsigned long  temp;
   short          i;

   Xl = *xl;
   Xr = *xr;

   for (i = 0; i < N; ++i) {
      Xl = Xl ^ P[i];
      Xr = F(Xl) ^ Xr;

      temp = Xl;
      Xl = Xr;
      Xr = temp;
   }

   temp = Xl;
   Xl = Xr;
   Xr = temp;

   Xr = Xr ^ P[N];
   Xl = Xl ^ P[N + 1];

   *xl = Xl;
   *xr = Xr;
}

When compared to a Blowfish implementation, our xor() is actually Blowfish_encipher() with the F() function embedded in the loop. Additionally, the dimensions of the memory chunk referenced in xor() match Blowfish’s P array and S-box definitions. This confirms the handshake is based on the Blowfish cipher.

At this point, we could copy the P array, S-boxes and various support functions, however, that would be a substantial quantity of appropriated data. The less copying the better.

Ideally we want the ability to regenerate the entire Blowfish key structure. This would require us to obtain the private encryption key, is this even possible? The whole point of modern cryptography is to provide one-way transforms that are computationally impractical to reverse.

Inside Information

Whilst the data structures and transforms match the Blowfish cipher, it was unclear if the implementation was ‘off-the-shelf’ or ‘in-house modified’. In particular, is the key structure properly derived or just randomly generated? To gain the answer requires us to delve deeper into the Blowfish cipher.

The Blowfish key structure is well documented5 and we are interested in the key expansion, roughly summarised as:

Assuming we have precomputed P array and S-boxes, we can check if the key was ‘properly’ expanded. We can unwind the last step of the expansion by replacing the S-boxes with the original values of π. We then encrypt the last P array entries and compare to the first S-box entries (effectively performing the key expansion, but starting part way).

These are the relevant entries of the extracted P array and S-box:

uint32_t p_array[18] = {
    ...
    0x96ea497e, // 17
    0x7c3f81ca  // 18
};

uint32_t s_box[4][256] = {
    {
        0x349a3164, // [0][0]
        0x85caba9f  // [0][1]
        ...
    },
    ...
};

Now we unwind the last step by replacing the S-box with the originally specified entries of π:

uint32_t p_array[18] = {
    ...
    0x96ea497e, // 17
    0x7c3f81ca  // 18
};

uint32_t s_box[4][256] = {
    {
        0xd1310ba6, // [0][0]
        0x98dfb5ac  // [0][1]
        ...
    }
};

And then we perform the first iteration of the S-box expansion:

   ...
   uint32_t xl = p_array[16];
   uint32_t xr = p_array[17];
   Blowfish_encipher(&xl, &xr);
   fprintf(stdout, "xl = %x, xr = %x\n", xl, xr);
   ...

With the following result:

xl = 349a3164, xr = 85caba9f

And we have a match to the extracted S-box entries!

By completing the expansion per the specification, our expanded S-box entries are an exact match for the ones extracted from the shared library.

We now have objective proof the key expansion was performed according to the specification.

Now, about that encryption key …

The Last Stage

After searching everywhere, I eventually found a single reference to how I might obtain the key in an obscure set of slides6 by Rich Schroeppel. By following ‘Undoing Equivalent Expanded Keys’, I was able to slowly work backwards to recover the π XORd P array.

A fellow (and awesome) FujiHack member, tiredboffin7 was a significant help here, publishing some code8 to automate undoing Blowfish expanded keys.

The ‘key’ concept lies in realising the following about the Blowfish cipher:

By carefully replacing P array values in reverse with XOR’d and ciphered versions, we can unwind the P array cipher stage of the key expansion. Once complete, the P array is now key XOR π. And finally, (key XOR π) XOR π = key.

I now had the private key9.

The full transform required finding some hardcoded salt values, realising mix() was actually a hashing function and determining the content and hashing order.

Stage 02

The stage 02 message is constructed by randomly selecting from a known, limited set of salts. This salt is then used in generating a hash from the timestamps and session IDs of both camera and connecting device. This message could be considered the challenge in authentication-speak.

Stage 03

Stage 03 is the response of a challenge-response authentication system. As the salt is randomly selected from a known set, it is necessary to iterate and hash through all the salts until a match is found. Once a salt is identified, it is used to generate the response, thereby “authenticating” ourselves with the camera.

And …

Once all the pieces were assembled, the 4 stage handshake could be completed and decoded as:

Stage 4 bytes 4 bytes 8 (left 4 + right 4) bytes
01 timestamp (tA) SessionID (sA) connectionID
02 timestamp (tB) SessionID (sB) hash(salt + tB + sB + tA + sA)
03 timestamp (tA) SessionID (sA) hash(salt + tA + sA + tB + sB)
04 timestamp (tB) SessionID (sB) Camera Serial Number

This puzzle box is no longer a puzzle.

Epilogue

Ultimately, furble is still unable to fully connect to a Nikon camera in smart device mode. After the initial handshake, the camera changes from Bluetooth LE to Bluetooth Classic to perform the secure bond. As the underlying library for furble is Bluetooth LE only, it cannot be supported without significant effort.

The hashing algorithm, salts and private key are within the source code for furble in the hope that others may achieve interoperability.