Nikon Bluetooth LE Reverse Engineering
An Unexpected Journey
I have reverse engineered a number of camera bluetooth protocols in the pursuit of making furble1 a universal remote.
The typical workflow for an unsupported camera or feature is:
- install the Android app
- enable the HCI snoop log
- use the app
- analyse the HCI traffic capture with Wireshark2
- reimplement the protocol
I assumed the typical workflow would apply to my Nikon camera and would take a week or so.
I was wrong, very wrong.
Unbeknownst to me this was the beginning of a journey that would span several months and involve countless hours of vacuous staring.
A Firm Handshake
Cameras typically have two phases over the bluetooth connection lifetime:
- handshake and/or pairing
- operation
The handshake phase is typically used to identify both parties, determine capabilities and configure the session for the actual operation, where shutter control is performed.
After obtaining the HCI log in the normal way, I was able to isolate the following as the handshake protocol:
| Source | Message |
|---|---|
| App | 010dc194694f54f71e26b994648624b814 |
| Camera | 023714aff738a3f8171003e69e47bb38cc |
| App | 030dc194694f54f71e7535ca81770781f5 |
| Camera | 043714aff738a3f8173235303036303332 |
Usually, connecting to a saved connection looks a bit different, so get a ’re-pair’ trace:
| Source | Message |
|---|---|
| App | 01013c69b055ecd6ca26b994648624b814 |
| Camera | 0211f77bca403e25ac0ecedc06f59cb6c8 |
| App | 03013c69b055ecd6ca9570ee41b6486092 |
| Camera | 0411f77bca403e25ac3235303036303332 |
Stare at these two traces long enough and several patterns begin to appear:
- the app initiates the connection with the first message
- each message is prefixed by a ‘step’ or ‘index’ counter
- app starts with
0x01 - camera follows with
0x02 - app follows with
0x03 - camera finishes with
0x04
- app starts with
- there are per-sender fixed portions of the messages
- there are changing portions of the messages
- the initial handshake and the re-connect handshake appear completely unrelated but consistent and similar
By applying a bit of formatting to the initial messages, we get:
| Message | Index | Fixed | Changing |
|---|---|---|---|
010dc194694f54f71e26b994648624b814 |
01 |
0dc194694f54f71e |
26b994648624b814 |
023714aff738a3f8171003e69e47bb38cc |
02 |
3714aff738a3f817 |
1003e69e47bb38cc |
030dc194694f54f71e7535ca81770781f5 |
03 |
0dc194694f54f71e |
7535ca81770781f5 |
043714aff738a3f8173235303036303332 |
04 |
3714aff738a3f817 |
3235303036303332 |
More staring results in:
- the index identifies the ‘step’ of the handshake
- this is common practice and enables a system to identify missing or repeated steps
- 1 byte (8 bits)
- each device has a fixed portion
- this seems to change for each session, perhaps a session ID?
- 16 bytes (64 bits)
- the changing portion looks fairly random
- except for step 1, which is the same
- perhaps some ID?
- and also except step 4, which is always the same
- furthermore, anytime there are many
0x3?, it’s usually ASCII 3235303036303332== “25006032” == serial number stamped on the camera
- furthermore, anytime there are many
- 16 bytes (64 bits)
- except for step 1, which is the same
So, we can guess the app sends some identifier and the last step of the handshake is the camera’s serial number being transmitted, but what about the rest?
Riddles in the Dark
As furble replaces the app, we can try talking to the camera ourselves, in particular, replaying captures or handcrafted messages. With step 1 replays, a different step 2 response was observed each time. They don’t match the original captures either, thus there is some temporal (or other) input factor at play.
Eventually I just sent zeroes with the following timestamped results:
| Timestamp (ms) | Sent (index) | Sent (fixed) | Sent (changing) | Response (index) | Response (fixed) | Response (changing) |
|---|---|---|---|---|---|---|
| 17526 | 01 |
0000000000000000 |
0000000000000000 |
02 |
6bffc2e145dac17f |
1245e2e8fe08f744 |
| 17626 | 01 |
0000000000000000 |
0000000000000000 |
02 |
6c0324930b3260c8 |
3c70943158f121b9 |
| 17726 | 01 |
0000000000000000 |
0000000000000000 |
02 |
6c068644508a0010 |
20ca08961d23feb4 |
| 17826 | 01 |
0000000000000000 |
0000000000000000 |
02 |
6c09e7f615e19f59 |
5464c01f27dbb6f5 |
| 17926 | 01 |
0000000000000000 |
0000000000000000 |
02 |
6c0d49a85b393ea1 |
f9fa78b33e202815 |
| 18026 | 01 |
0000000000000000 |
0000000000000000 |
02 |
6c10ab5a2090ddea |
906ea25802e8276a |
The timestamp is milliseconds since boot, and the message offsets are deliberately 100ms apart to avoid spamming the target. What’s curious to note is the ‘fixed’ portion of the response appears to be incrementing. Further inspection suggests the most significant 32-bits of the ‘fixed’ portion is changing less frequently than the least significant 32-bits.
Let’s convert into something human-readable by splitting the bytes:
| Response (fixed)(MSB)(hex) | Reponse (fixed)(LSB)(hex) | Decimal (MSB) | Difference to Previous (MSB) |
|---|---|---|---|
6bffc2e1 |
45dac17f |
1811923681 | |
6c032493 |
0b3260c8 |
1812145299 | 221618 |
6c068644 |
508a0010 |
1812366916 | 221617 |
6c09e7f6 |
15e19f59 |
1812588534 | 221618 |
6c0d49a8 |
5b393ea1 |
1812810152 | 221618 |
6c10ab5a |
2090ddea |
1813031770 | 221618 |
That difference is exceedingly consistent. Chances are good this is a timestamp or counter of some sort from the camera. In comparison the least significant 32-bits appear almost random.
This leaves 24 bytes (8 bytes fixed, 16 bytes changing) of unknown response payload to determine.
Out of the Frying-Pan and into the Fire
After capturing hundreds of additional messages and weeks of random byte crunching (eg. CRCs, checksums) I realised the following:
- the unknown payload seemed almost random
- no discernible pattern
- I needed a cheat code
Previously I’d always used a clean-room reverse engineering method, but this time it was unavoidable, I had to pull the Android app APK apart to progress. APKs are archives of Java bytecode and decompiling into mostly readable Java source is a straightforward exercise. Once I got the sources out, I was able to confirm a few things:
- the ‘fixed’ part was indeed a timestamp (of sorts)
- the ‘steps’ are referenced as ‘stages’
- the changing portion isn’t a basic transform
After more weeks of building a general map of the Java (which is huge and sprawling) I eventually found the calls to generate the stage 1 and expected stage 2 reply.
And it was a Java Native Interface (JNI) call to a precompiled binary shared
library.
Ugh.
Companies typically ship binary code for optimisation or obfuscation purposes.
Perhaps I could find an older version of the APK with non-binary bits that
preceded the shared library?
I searched for the oldest version of any variant of the app and found one over a
decade old. And it was still making JNI calls to a precompiled shared library.
Double ugh.
This particular puzzle box has been locked for over a decade.
Triple ugh.
I had no choice, I had to go further and disassemble the shared library.
Enter the Ghidra
Initially I used objdump --disassemble to get the assembly. Whilst I can read
some assembly, I can’t do it well and rapidly got lost in offsets, operands and
registers like this:
push %esi
lea -0x8(%esp),%esp
mov 0x18(%esp),%ecx
mov 0x1c(%esp),%eax
mov 0x20(%esp),%edi
lea 0x4(%ecx),%edx
lea 0x44(%ecx),%esi
mov (%eax),%eax
mov (%edi),%edi
mov %edx,0x4(%esp)
lea 0x0(%esi,%eiz,1),%esi
xor (%esi),%eax
sub $0x4,%esi
There had to be a better way, and that way was Ghidra.
Ghidra3 is an NSA developed tool suite for the explicit purpose of reverse engineering software. The invaluable key features:
- binary decompilation to C
- navigation tools
- memory structure analysis tools
After a few clicks and waiting for ‘analysis’ I got a nice dump of semi-readable C source. With Ghidra’s help I was now able to see:
- shared library entry points
- chunks of memory
- function calls
- function call references
There was some good news, the call stack wasn’t particularly deep, maybe 4 at the most. A few functions were referenced multiple times, some functions were small, others were not.
After more weeks of staring, two functions started to make sense.
void FUN_00012400(undefined *param_1,undefined *param_2,ushort param_3)
{
ushort uVar1;
undefined4 *puVar2;
undefined4 *puVar3;
undefined uVar4;
undefined4 uVar5;
undefined4 uVar6;
undefined4 uVar7;
undefined *puVar8;
undefined *puVar9;
ushort uVar10;
ushort uVar11;
int iVar12;
if (0 < (short)param_3) {
if ((param_2 < param_1 + 0x10 && param_1 < param_2 + 0x10) || (param_3 < 0x10)) {
puVar8 = param_1;
do {
uVar4 = *param_2;
puVar9 = puVar8 + 1;
param_2 = param_2 + 1;
*puVar8 = uVar4;
puVar8 = puVar9;
} while (puVar9 != param_1 + (ushort)(param_3 - 1) + 1);
return;
}
uVar11 = 0;
uVar1 = ((ushort)(param_3 - 0x10) >> 4) + 1;
uVar10 = uVar1 * 0x10;
iVar12 = 0;
do {
puVar2 = (undefined4 *)(param_2 + iVar12);
uVar5 = puVar2[1];
uVar6 = puVar2[2];
uVar7 = puVar2[3];
uVar11 = uVar11 + 1;
puVar3 = (undefined4 *)(param_1 + iVar12);
*puVar3 = *puVar2;
puVar3[1] = uVar5;
puVar3[2] = uVar6;
puVar3[3] = uVar7;
iVar12 = iVar12 + 0x10;
} while (uVar11 < uVar1);
param_1 = param_1 + uVar10;
param_2 = param_2 + uVar10;
if (((((param_3 != uVar10) && (*param_1 = *param_2, (short)(uVar10 + 1) < (short)param_3)) &&
(param_1[1] = param_2[1], (short)(uVar10 + 2) < (short)param_3)) &&
(((param_1[2] = param_2[2], (short)(uVar10 + 3) < (short)param_3 &&
(param_1[3] = param_2[3], (short)(uVar10 + 4) < (short)param_3)) &&
((param_1[4] = param_2[4], (short)(uVar10 + 5) < (short)param_3 &&
((param_1[5] = param_2[5], (short)(uVar10 + 6) < (short)param_3 &&
(param_1[6] = param_2[6], (short)(uVar10 + 7) < (short)param_3)))))))) &&
((param_1[7] = param_2[7], (short)(uVar10 + 8) < (short)param_3 &&
(((((param_1[8] = param_2[8], (short)(uVar10 + 9) < (short)param_3 &&
(param_1[9] = param_2[9], (short)(uVar10 + 10) < (short)param_3)) &&
(param_1[10] = param_2[10], (short)(uVar10 + 0xb) < (short)param_3)) &&
((param_1[0xb] = param_2[0xb], (short)(uVar10 + 0xc) < (short)param_3 &&
(param_1[0xc] = param_2[0xc], (short)(uVar10 + 0xd) < (short)param_3)))) &&
(param_1[0xd] = param_2[0xd], (short)(uVar10 + 0xe) < (short)param_3)))))) {
param_1[0xe] = param_2[0xe];
}
}
return;
}
This function returns void and takes 3 parameters:
- pointer
- pointer
- unsigned value
There’s lots of variable assignments and some loops.
I stared long enough to realise this is an obtuse version of memcpy(). The
do/while loop copies 4 bytes at a time whilst the crazy if state handles
trailing bytes.
int FUN_000125c0(byte *param_1,byte *param_2,short param_3)
{
byte *pbVar1;
int iVar2;
if (param_3 < 1) {
iVar2 = 0;
}
else {
iVar2 = (uint)*param_1 - (uint)*param_2;
if (iVar2 == 0) {
pbVar1 = param_1 + (ushort)(param_3 - 1) + 1;
while( true ) {
param_1 = param_1 + 1;
param_2 = param_2 + 1;
if (param_1 == pbVar1) break;
if ((uint)*param_1 - (uint)*param_2 != 0) {
return (uint)*param_1 - (uint)*param_2;
}
}
}
}
return iVar2;
}
This function returns an integer and takes 3 parameters:
- pointer
- pointer
- value
There’s comparisons and, curiously, pointer dereference subtractions.
This means the function can easily return positive and negative values.
Stare at this one long enough and it looks like memcmp().
Ghidra allows you to rename symbols, so FUN_00012400 became memcpy and
FUN_000125c0 became memcmp.
Two additional functions were identified, bytes2uint and uint2bytes for
transforming arrays of bytes to and from a uint32_t.
As you keep identifying and labelling functions, you start increasing the signal to noise ratio.
Eventually two particular functions became interesting.
void FUN_00011500(uint *param_1,uint *param_2,uint *param_3)
{
uint uVar1;
uint uVar2;
uint *puVar3;
uint uVar4;
uVar2 = *param_2;
puVar3 = param_1;
uVar4 = *param_3;
do {
uVar1 = uVar2 ^ *puVar3;
puVar3 = puVar3 + 1;
uVar2 = (param_1[(uVar1 >> 0x18) + 0x12] + param_1[(uVar1 >> 0x10 & 0xff) + 0x112] ^
param_1[(uVar1 >> 8 & 0xff) + 0x212]) + param_1[(uVar1 & 0xff) + 0x312] ^ uVar4;
uVar4 = uVar1;
} while (puVar3 != param_1 + 0x10);
uVar4 = param_1[0x10];
*param_2 = uVar1 ^ param_1[0x11];
*param_3 = uVar2 ^ uVar4;
return;
}
There’s a lot of shifts and XORs there, so let’s (creatively) call this xor().
void FUN_000108c0(undefined *param_1,int param_2,ushort param_3)
{
undefined uVar1;
short sVar2;
int iVar3;
undefined *puVar4;
int in_GS_OFFSET;
uint local_54;
uint local_50;
undefined *local_4c;
uint local_30;
uint local_2c;
undefined local_28 [4];
undefined local_24 [4];
int local_20;
undefined4 uStack_14;
uStack_14 = 0x108c9;
local_4c = param_1;
local_20 = *(int *)(in_GS_OFFSET + 0x14);
if (DAT_00017040 == 0) {
DAT_00017040 = 1;
}
if (param_3 < 8) {
local_50 = 0x5060708;
local_54 = 0x1020304;
}
else {
local_50 = 0x5060708;
local_54 = 0x1020304;
local_4c = param_1 + (uint)((ushort)(param_3 - 8) >> 3) * 8 + 8;
do {
bytes2uint(param_1,&local_30);
puVar4 = param_1 + 4;
param_1 = param_1 + 8;
bytes2uint(puVar4,&local_2c);
local_30 = local_30 ^ local_54;
local_2c = local_2c ^ local_50;
xor(&DAT_00013180,&local_30,&local_2c);
local_54 = local_30;
local_50 = local_2c;
} while (local_4c != param_1);
param_3 = param_3 & 7;
}
// snip a large, crazy if condition
uint2bytes(param_2,&local_30);
uint2bytes(param_2 + 4,&local_2c);
if (local_20 == *(int *)(in_GS_OFFSET + 0x14)) {
return;
}
FUN_00010850();
}
This function calls xor() in a loop, with even more XORs. It looks like it’s
scrambling the data, so let’s call it mix().
At this point, it was obvious I would never have made progress from analysing the traffic captures. There was far more occurring than I had anticipated.
On the Doorstep
The most interesting line in xor() is the do/while loop:
uVar2 = (param_1[(uVar1 >> 0x18) + 0x12] + param_1[(uVar1 >> 0x10 & 0xff) + 0x112] ^
param_1[(uVar1 >> 8 & 0xff) + 0x212]) + param_1[(uVar1 & 0xff) + 0x312] ^ uVar4;
By following the function stack, param_1 references a memory location, a
massive chunk of seemingly random values. This line is looking up values by
an offset derived from the input and smashing them together.
The multitude of XORs and shifts in that function resembled a pseudo random number generator (RNG) implemented via a linear feedback shift register. However, I was unable find any RNGs with a similar algorithm.
After staring at the code for some time, a key piece becomes clear in the array
indices of param_1, they all look like param_1[(uVar >> m & 0xff) + n]. The
& 0xff is critical, it is masking the lower 8 bits, so the array index is
limited to 28
== 256 entries.
Furthermore, the n offsets are 0x012, 0x112, 0x212 and 0x312, exactly
0x100 == 256 values apart.
That line is indexing four consecutive arrays that contain 256 values.
Furthermore, there’s this line:
*param_2 = uVar1 ^ param_1[0x11];
Where 0x11 == 17, hardcoded access to the 18th
value of the
array (using zero based indices).
As the RNG search was fruitless, the next paradigm that uses XORs is cryptography. After a fair amount of searching, I chanced upon the Blowfish cipher which uses a P array of 18 values and four S-boxes of 256 values.
18 and 256. Coincidence? I think not.
With this new found insight, I was able to massage enough struct information
into Ghidra to change the xor() function into this:
void xor(mixTable_t *pMixTable,uint *pL,uint *pR)
{
uint tmp;
uint xL;
uint32_t *p;
uint xR;
xL = *pL;
p = pMixTable->p_array;
xR = *pR;
do {
tmp = xL ^ *p;
p = p + 1;
xL = (pMixTable->sbox0[tmp >> 0x18] + pMixTable->sbox1[tmp >> 0x10 & 0xff] ^
pMixTable->sbox2[tmp >> 8 & 0xff]) + pMixTable->sbox3[tmp & 0xff] ^ xR;
xR = tmp;
} while (p != pMixTable->p_array + 0x10);
xR = pMixTable->p_array[0x10];
*pL = tmp ^ pMixTable->p_array[0x11];
*pR = xL ^ xR;
return;
}
For comparison, Schneier’s Blowfish implementation4 looks like:
unsigned long F(unsigned long x)
{
unsigned short a;
unsigned short b;
unsigned short c;
unsigned short d;
unsigned long y;
d = x & 0x00FF;
x >>= 8;
c = x & 0x00FF;
x >>= 8;
b = x & 0x00FF;
x >>= 8;
a = x & 0x00FF;
//y = ((S[0][a] + S[1][b]) ^ S[2][c]) + S[3][d];
y = S[0][a] + S[1][b];
y = y ^ S[2][c];
y = y + S[3][d];
return y;
}
which can be refactored by combining the shifts, additions and XORs into:
unsigned long F(unsigned long x)
{
return ((S[0][x >> 24 & 0xFF + S[1][x >> 16 & 0xFF) ^ S[2][x >> 8 & 0xFF) + S[3][x & 0xFF];
}
And also from Schneier4:
void Blowfish_encipher(unsigned long *xl, unsigned long *xr)
{
unsigned long Xl;
unsigned long Xr;
unsigned long temp;
short i;
Xl = *xl;
Xr = *xr;
for (i = 0; i < N; ++i) {
Xl = Xl ^ P[i];
Xr = F(Xl) ^ Xr;
temp = Xl;
Xl = Xr;
Xr = temp;
}
temp = Xl;
Xl = Xr;
Xr = temp;
Xr = Xr ^ P[N];
Xl = Xl ^ P[N + 1];
*xl = Xl;
*xr = Xr;
}
When compared to a Blowfish implementation, our xor() is actually
Blowfish_encipher() with the F() function embedded in the loop.
Additionally, the dimensions of the memory chunk referenced in xor() match
Blowfish’s P array and S-box definitions.
This confirms the handshake is based on the Blowfish cipher.
At this point, we could copy the P array, S-boxes and various support functions, however, that would be a substantial quantity of appropriated data. The less copying the better.
Ideally we want the ability to regenerate the entire Blowfish key structure. This would require us to obtain the private encryption key, is this even possible? The whole point of modern cryptography is to provide one-way transforms that are computationally impractical to reverse.
Inside Information
Whilst the data structures and transforms match the Blowfish cipher, it was unclear if the implementation was ‘off-the-shelf’ or ‘in-house modified’. In particular, is the key structure properly derived or just randomly generated? To gain the answer requires us to delve deeper into the Blowfish cipher.
The Blowfish key structure is well documented5 and we are interested in the key expansion, roughly summarised as:
- fill the P array and S-boxes with the digits of π
- select a private key for encryption
- repeatedly XOR the private key with the P array, replacing the P array with the output
- encrypt the P array entries two by two, replacing the subsequent entries with the encrypted output each iteration
- encrypt the S-box entries two by two, replacing the subsequent entries with the encrypted output each iteration
Assuming we have precomputed P array and S-boxes, we can check if the key was ‘properly’ expanded. We can unwind the last step of the expansion by replacing the S-boxes with the original values of π. We then encrypt the last P array entries and compare to the first S-box entries (effectively performing the key expansion, but starting part way).
These are the relevant entries of the extracted P array and S-box:
uint32_t p_array[18] = {
...
0x96ea497e, // 17
0x7c3f81ca // 18
};
uint32_t s_box[4][256] = {
{
0x349a3164, // [0][0]
0x85caba9f // [0][1]
...
},
...
};
Now we unwind the last step by replacing the S-box with the originally specified entries of π:
uint32_t p_array[18] = {
...
0x96ea497e, // 17
0x7c3f81ca // 18
};
uint32_t s_box[4][256] = {
{
0xd1310ba6, // [0][0]
0x98dfb5ac // [0][1]
...
}
};
And then we perform the first iteration of the S-box expansion:
...
uint32_t xl = p_array[16];
uint32_t xr = p_array[17];
Blowfish_encipher(&xl, &xr);
fprintf(stdout, "xl = %x, xr = %x\n", xl, xr);
...
With the following result:
xl = 349a3164, xr = 85caba9f
And we have a match to the extracted S-box entries!
By completing the expansion per the specification, our expanded S-box entries are an exact match for the ones extracted from the shared library.
We now have objective proof the key expansion was performed according to the specification.
Now, about that encryption key …
The Last Stage
After searching everywhere, I eventually found a single reference to how I might obtain the key in an obscure set of slides6 by Rich Schroeppel. By following ‘Undoing Equivalent Expanded Keys’, I was able to slowly work backwards to recover the π XORd P array.
A fellow (and awesome) FujiHack member, tiredboffin7 was a significant help here, publishing some code8 to automate undoing Blowfish expanded keys.
The ‘key’ concept lies in realising the following about the Blowfish cipher:
- the
F()function does not use the P array - the P array is only used as an XOR input in
Blowfish_encipher() (A XOR B) XOR B = A
By carefully replacing P array values in reverse with XOR’d and ciphered
versions, we can unwind the P array cipher stage of the key expansion. Once
complete, the P array is now key XOR π. And finally, (key XOR π) XOR π = key.
I now had the private key9.
The full transform required finding some hardcoded salt values, realising
mix() was actually a hashing function and determining the content and
hashing order.
Stage 02
The stage 02 message is constructed by randomly selecting from a known, limited set of salts. This salt is then used in generating a hash from the timestamps and session IDs of both camera and connecting device. This message could be considered the challenge in authentication-speak.
Stage 03
Stage 03 is the response of a challenge-response authentication system. As the salt is randomly selected from a known set, it is necessary to iterate and hash through all the salts until a match is found. Once a salt is identified, it is used to generate the response, thereby “authenticating” ourselves with the camera.
And …
Once all the pieces were assembled, the 4 stage handshake could be completed and decoded as:
| Stage | 4 bytes | 4 bytes | 8 (left 4 + right 4) bytes |
|---|---|---|---|
01 |
timestamp (tA) |
SessionID (sA) |
connectionID |
02 |
timestamp (tB) |
SessionID (sB) |
hash(salt + tB + sB + tA + sA) |
03 |
timestamp (tA) |
SessionID (sA) |
hash(salt + tA + sA + tB + sB) |
04 |
timestamp (tB) |
SessionID (sB) |
Camera Serial Number |
This puzzle box is no longer a puzzle.
Epilogue
Ultimately, furble is still unable to fully connect to a Nikon camera in smart
device mode.
After the initial handshake, the camera changes from Bluetooth LE to Bluetooth
Classic to perform the secure bond. As the underlying library for furble is
Bluetooth LE only, it cannot be supported without significant effort.
The hashing algorithm, salts and private key are within the source code for
furble in the hope that others may achieve interoperability.