While you're reading this, keep in mind that I'm available for hire stupid!
You’ve probably heard of Pokemon GO by now. It’s become incredibly popular, very quickly. It’s currently making over a million dollars per day from in-app purchases for Nintendo and Niantic, and caused Nintendo’s stock price to rocket upwards in the weeks following the launch. Let’s talk about the internals a bit.
Pokemon GO is built using the Unity Game Engine, targeting Android and iOS. Most of the game assets are readily accessible, and include interesting things such as: alternative pokeball icons (including a master ball!), different items, and even a McDonalds logo. Some of the script data is also available, but I haven’t looked at that yet. I was more interested in the network protocol.
Communication between the Pokemon GO app and the backend servers is performed via HTTPS, but the app doesn’t do certificate pinning, so it’s trivial to MITM the data if you control the device. Requests and responses are encoded using Google’s Protocol Buffers, and seem to form a sort of RPC system.
There’s an initial handshake performed with pgorelease.nianticlabs.com/plfe
,
which seems to assign a channel for that session. This handshake includes your
authentication token, your location, a timestamp, and some binary blobs. The
blobs are very high-entropy, so it’s likely that they’re signatures or some
other kind of encrypted data. Subsequent communications are performed with a
URL contained in the handshake response, which currently takes the form of
pgorelease.nianticlabs.com/plfe/nnn
, where nnn
is a number. These numbers
were rather small (rarely above two digits) soon after release, and have been
growing since. It’s my conjecture that they map to individual backend servers,
and that the initial handshake serves as a sort of load balancing step.
As of right now, there are some pretty decent schema definitions for most of the protocol, but when I started messing with it there were none. I wrote a tool to help me interpret the protobuf data without a schema. I’ve named it protofudger.
Let’s start with a bit of a crash course in protocol buffers’ encoding rules. The long, detailed version is available here.
Protobuf messages are made up of a series of key/value pairs. Keys are numeric, and values’ types are identified by a three-bit tag in the key. Numeric values are mostly “varint” types - this is a variable width integer encoding. Smaller value, less bits. There are two additional numeric types: 32-bit and 64-bit fixed-width. These can hold floating point or integer numbers of any sign. The spec says they’re supposed to be little-endian, but don’t count on it all the time. There’s a variable-length type as well, which can be used to hold strings, byte arrays, or embedded messages. Embedded messages are simply serialized with protobuf and stuffed into variable-length byte arrays.
Usually to correctly interpret a protobuf message, you need to know the
schema. This is because the same protobuf type can represent several possible
application types - e.g. a 32-bit fixed-width field can hold a floating point
number, an unsigned integer, a signed integer, or even a timestamp. You can,
however, correctly decode a protobuf message with no schema and display a
rough approximation of its structure. This is because, by design, all the
fields have known lengths, and enough type information to display a reasonable
representation of them. You can do exactly this using the protobuf compiler,
by doing protoc --decode_raw < message
.
What protofudger does is similar to that process, except that it makes a bit
more of an effort to figure out what the likely types are for numeric and
variable-length values. It has some rough heuristics to identify floats,
integers, signs, timestamps, and embedded messages. The rule to determine the
most likely numeric type is to find the type where the value is closest to
zero. This seems to yield very few incorrect results. Timestamps are detected
as numbers between 1400000000000
and 1500000000000
, or 1400000000
and
1500000000
. These will need to be adjusted at some point in the future, or
even now if you’re dealing with timestamps in the past. Again though,
currently it seems to yield very few incorrect results. Embedded messages are
detected by trying to parse variable-length fields as protobuf data. If it
parses successfully, it’s displayed as such. Otherwise, it’s either text or a
byte array. Text is just any byte array which is valid UTF-8 text.
protofudger is mostly useful for inspecting protobuf messages of unknown provenance and structure. It’s only meant to help you interpret the messages - it won’t get things perfect every time.
Let’s look at it working!
This is a protobuf message that was generated while I was playing Pokemon GO. It’s encoded with base64:
CAIYhICAgIDH9YMtIgIIAiICCH4iCwgEEgcI2uj5690qIgMIgQEiLggFEioKKDRhMmU5YmMzMz
BkYWU2MGU3Yjc0ZmM4NWI5ODg2OGFiNDcwMDgwMmUyqAQIBhKjBAqgBG7GaKm7Pw0q1Cc2QLtM
GMC5m4LGKR7jNoEgJhkOqyaUiU1vNExYDB7R0BanfBLTYKcTjgBSxpIj5kiDJj95L0+ri5SucL
sl1KRMuf/0N5A38gmJlKp/9KUhJ42J36NRCqtantd9bZw6r0VZq0/dH/GoK1xPx/lYi18NHlHM
BDdwB7sKh1LcQ3VRlPp9Se28SvG18kFrGXMi9W7U1HcWWsACtv7og3gSf1GVXyyA5C70y0BdOq
O7WP0I8cJjZ6i6W2fI+6CfBBxZMB+MNNIPdAW49dDitKk1cts/aHdcMnMjobLGaYye99nT25CC
mGaMyHl9KbRyu6HvwMBUQGu2qvmsZRvDoWFw33QaskRhB987DF5p7KeN12kODt7LAtZmnyHvzh
QT1qbolqKpQCwhZpdPGRkDTMoPVhzWOkoNhnQab1n8gK7GiHA5mWEBU4JKMvkRpi7wzciAiuTk
s1FFKT5ywRChW+TOJRZ0aNVyhJ8yg9yZABoe42rTo30mwL2Q3qG6oShyyrNPg9b1MrXwf+LcBz
QkGnC/RiGOwybWHuPCC5uP5PWqIIoDJP3ArGZMrNJwrEYK/aTlj7eAZEH/PO+VvlZkzdurBvKf
jb6sNO/z0POrzgR08FqqKGjIOcxysVbHlbs7vufiH6rIDKqnQh0Xm/UX/KApQCGQX9qAQ5P+yt
skrLmVEzrCPS42DqJRN7ACsJU5UOVfeGqz0RJMp6w5AAAAwGnmQsBBAAAAwNkeYkBJAAAA4Gg3
VkBaWwpAjak/Co8vA9rnxeP5tKpOApMdlWxWyeYUJyIftyjJ6EdIEotI0ueUrZQkcOCoR10J+B
xoocpdD6o4RxwTT7OcRBDD1+fs3SoaELr5MxmdQYKC7HQaxg5tat5gnzY=
Here’s how it looks if you decode it with protoc --decode_raw
. I’ve
truncated some of the fields, since they were very long and provided no useful
context.
1: 2
3: 3244797592550244356
4 {
1: 2
}
4 {
1: 126
}
4 {
1: 4
2 {
1: 1468299899994
}
}
4 {
1: 129
}
4 {
1: 5
2 {
1: "4a2e9bc330dae60e7b74fc85b98868ab4700802e"
}
}
6 {
1: 6
2 {
1: "n\306h\251\273?\r*\324\'6@\273L\030\300\271\233\202\306)\036..."
}
}
7: 0xc042e669c0000000
8: 0x40621ed9c0000000
9: 0x40563768e0000000
11 {
1: "\215\251?\n\217/\003\332\347\305\343\371\264\252N\002\223\035..."
2: 1468301700035
3: "\272\3713\031\235A\202\202\354t\032\306\016mj\336"
}
12: 6943
And here’s how it looks with protofudger
. Again, I’ve truncated some of the
fields. I think the protofudger output is far more interesting!
decoded 13 fields
1: (varint) 2
3: (varint) 3244797592550244356
4: {
1: (varint) 2
}
4: {
1: (varint) 126
}
4: {
1: (varint) 4
2: {
1: (varint, microseconds) 2016-07-12 15:04:59 +1000 AEST
}
}
4: {
1: (varint) 129
}
4: {
1: (varint) 5
2: {
1: (string) "4a2e9bc330dae60e7b74fc85b98868ab4700802e"
}
}
6: {
1: (varint) 6
2: {
1: (bytes) 6ec668a9bb3f0d2ad4273640bb4c18c0b99b82c6291ee3368120...
}
}
7: (doublele) -37.800102
8: (doublele) 144.964081
9: (doublele) 88.865776
11: {
1: (bytes) 8da93f0a8f2f03dae7c5e3f9b4aa4e02931d956c56c9e61427221f...
2: (varint, microseconds) 2016-07-12 15:35:00 +1000 AEST
3: (bytes) baf933199d418282ec741ac60e6d6ade
}
12: (varint) 6943
So that’s just one message. Most of the fields are mysteries, but I expect in the next couple of months we’ll start to get some more understanding of what’s going on.
There are already a few third-party projects that interact with this protocol data. My favourite right now is the PokeRev Mapper. It’s set up to passively monitor communications between volunteers’ Pokemon GO instances and the backend, so it shouldn’t result in anyone getting banned. I’ve seen a couple of other projects that actively intercept data, and might result in soft or hard bans as they’re detected.
Given that the communications are so easy to work with right now, I’ll be interested to see what comes out of the Pokemon GO reversing community. I’ll be equally interested to see how Niantic responds our research and activities.
Now go, catch ‘em all!