January 4, 2018
/
11
min

The Dex File Format

Have you ever wondered what happens to your Android application code when it’s compiled and packaged into an APK? This post takes a deep dive into the Dalvik Executable Format, with a practical example of the structure of a minimal Dex file.

What is a Dex file?

A Dex file contains code which is ultimately executed by the Android Runtime. Every APK has a single classes.dex file, which references any classes or methods used within an app. Essentially, any Activity, Object, or Fragment used within your codebase, will be transformed into bytes within a Dex file that can be run as an Android app.

It can be useful to understand the structure of a Dex file because all these references can take up a lot of space in your application. Using many 3rd party libraries can increase your APK size by megabytes, or worse, lead to the infamous 64k method size limit. And of course, there may come a day when knowledge of Dex files helps you track down unexpected behaviour in your app.

Dexing Process

All the Java source files in an Android project are first compiled to .class files, which consist of bytecode instructions. In a traditional Java application, these instructions would be executed on the JVM. However, Android apps are executed on the Android Runtime, which uses incompatible opcodes, and therefore an additional Dexing step is required, where .class files are converted into a single .dex file.

Because most mobile devices are severely constrained in the amount of memory, processing power, and battery life available, ART offers superior performance to the JVM. One key feature that helps achieve this is that ART performs both Ahead-of-Time and Just-in-Time compilation. This avoids some of the runtime overhead of JIT, while still allowing performance improvements over time as an app is profiled.

How to create a Dex file

A practical example of a Dex file makes this a lot easier to understand. Let’s create a minimal APK that only contains one Application class, as this allows us to understand the file format without being overwhelmed by the thousands of methods that are present in a typical app.

We’ll use Hexfiend to view our Dex file in hexadecimal, as Dex uses some unusual data types to save space. We’ve hidden any null bytes, so empty spaces in the above screenshot actually represent 00.

A Dex file’s Structure

The full structure of our 480-byte Dex file is shown in hexadecimal and UTF-8 below. Some sections are instantly recognisable when interpreted as UTF-8, such as the single BugsnagApp class that we have defined within our source code, and others not so much:

6465780A 30333800 7A44CBBB FB4AE841 0286C06A 8DF19000
3C5DE024 D07326A2 E0010000 70000000 78563412 00000000
00000000 64010000 05000000 70000000 03000000 84000000
01000000 90000000 00000000 00000000 02000000 9C000000
01000000 AC000000 14010000 CC000000 E4000000 EC000000
07010000 2C010000 2F010000 01000000 02000000 03000000
03000000 02000000 00000000 00000000 00000000 01000000
00000000 01000000 01000000 00000000 00000000 FFFFFFFF
00000000 57010000 00000000 01000100 01000000 00000000
04000000 70100000 00000E00 063C696E 69743E00 194C616E
64726F69 642F6170 702F4170 706C6963 6174696F 6E3B0023
4C636F6D 2F627567 736E6167 2F646578 6578616D 706C652F
42756773 6E616741 70703B00 01560026 7E7E4438 7B226D69
6E2D6170 69223A32 362C2276 65727369 6F6E223A 2276302E
312E3134 227D0000 00010001 818004CC 01000000 0A000000
00000000 01000000 00000000 01000000 05000000 70000000
02000000 03000000 84000000 03000000 01000000 90000000
05000000 02000000 9C000000 06000000 01000000 AC000000
01200000 01000000 CC000000 02200000 05000000 E4000000
00200000 01000000 57010000 00100000 01000000 64010000
dex
038zDÀª˚JËAÜ¿jçÒê<]‡$–s&¢‡pxv4dpñêú¨ã‰ï, ˇˇˇˇwp<init="">Landroid/app/Application;</]‡$–s&¢‡pxv4dpñêú¨ã‰ï,>
#Lcom/bugsnag/dexexample/BugsnagApp;
V&~~D8{"min-api":26,"version":"v0.1.14"}ÅÄÃ
pÑêú¨ Ã ‰ Wd

Interpreting a Dex File Header

At a very high level, Dex files can be thought of as two separate parts. A file header which contains metadata, and a body which contains the majority of the data. A diagram of the file header structure is shown below.

Let’s step through each item in the header sequentially.

Dex File Magic

Many file formats begin with a fixed sequence of bytes that uniquely identify the application used to manipulate them, and Dex is no exception.

6465780A 30333800
dex
038

We can see that the first 8 bytes must contain ‘dex’, and the version number – currently 38 when our targetSdkVersion is API 26.

You may have also noticed that the 4th byte encodes a newline character, and the 8th byte is null. These are validated by the Android Framework to check for file corruption — an APK should refuse to install if this exact sequence isn’t present.

Checksum

7A44CBBB

The next value is a checksum, which is calculated by applying a function to the contents of the entire file, excluding any bytes preceding the checksum. If a byte within the file was corrupted during download or storage on disk, the calculated checksum won’t match and the Android Framework will refuse to install the APK.

SHA1 signature

FB4AE841 0286C06A 8DF19000 3C5DE024 D07326A2

The header also includes a SHA-1 hash of the file (excluding any preceding bytes). This is used to uniquely identify Dex files, which may be useful in scenarios such as Multidex.

File size

E0010000
480

This matches the file size in bytes, and can also be used for validation when reading the Dex file.

Header size

70000000
112

The header size should be 112 bytes long.

Therefore we can now highlight all the remaining fields within the header_item.

Endian constant

78563412

Dex files support both big endian and little endian encoding. This value equals REVERSE_ENDIAN_CONSTANT, indicating that this particular Dex file is encoded in little endian, which is the default behaviour.

IDs and Offsets

The remaining values in the file header define the size and location of other data structures which hold identifiers for methods, strings, and other items.

00000000 00000000 64010000 05000000
70000000 03000000 84000000 01000000
90000000 00000000 00000000 02000000
9C000000 01000000 AC000000 14010000
CC000000

These values are summarised in the table below, where size equals the array length, and the offset is the number of bytes from the start of file where this information can be found.

Type Size Offset
link_size 0 0
map_off N/A 356
string_ids 5 112
type_ids 3 132
proto_ids 1 144
field_ids 0 0
method_ids 2 156
class_defs 1 172
data 276 204

It’s worth noting that link_size and field_ids are both 0, because our app doesn’t statically link any libraries or contain any fields. The map_off structure in the data section largely duplicates this information, in an easier format for Dex file parsing.

As an example, we can see that there are 5 Strings IDs in our Dex file, encoded between bytes 112-132. Each ID at this position also points to an offset within the data section, that encodes the actual value of the String.

Map List

The map_list is a section within the data body which contains similar information to the file header.

With this knowledge, we can use the offsets to resolve the actual information, and determine what our Dex file encodes.

Strings

Enough talk — let’s see something concrete. Let’s find out what the string_ids structure points at.

E4000000 EC000000 07010000  2C010000 2F010000
228,          236,           263,            300,          303

The array encodes 5 integer offsets, which point at the data section.

<init>
Landroid/app/Application;
Lcom/bugsnag/dexexample/BugsnagApp;
V
~~D8{"min-api":26,"version":"v0.1.14"}

If we retrieve these values as UTF-8, we are greeted by a few Java symbols which will look familiar to anyone who has used the JNI before, and also some JSON which indicates D8 created the Dex file. All this business of IDs, offsets, and multiple headers, may seem a bit useless at this point. Why not just encode the string value directly in the header?

Some of the reasoning behind this is that these strings are referenced from multiple points within the Dex file. Providing an ID for each one prevents duplication of information and reduces the overall file size, simplifies parsing as an ID will always be a fixed length, and means values are only accessed when required.

Types

01000000 02000000 03000000
1, 2, 3

Our Dex file defines 3 Java types. Each value here is an index into the previous string_id array — therefore we can determine that the types in our file are as follows:

Landroid/app/Application;
Lcom/bugsnag/dexexample/BugsnagApp;
V

The TypeDescriptor syntax may look somewhat unfamiliar, but the L simply refers to a full class name, and V is the type void. Our types include our custom BugsnagApp class, and the Application class from the Android framework.

Prototypes

03000000 02000000 00000000
3,       2,       0
"V",     V

A method prototype consists of information on the return type of a method, and the number of parameters it takes. The proto_id section uses indices to retrieve the type information, and an offset, which is non-functional in this case as our method doesn’t take any parameters.

Methods

The Method section also uses indices. Each method looks up the class ID where it was defined, the method prototype, and the name of the method from the strings table.

00000000 00000000 01000000 00000000
0,  0,   0,       1,  0,   0
Landroid/app/Application "V" <init>
Lcom/bugsnag/dexexample/BugsnagApp; "V" <init>

The only methods in our Dex file relate to the constructor for BugsnagApp — which is exactly what we’d expect.

Class Defs

This section contains the type, inheritance hierarchy, access metadata, and other class metadata such as annotations and source file indices.

01000000 01000000 00000000 00000000 FFFFFFFF 00000000 57010000 00000000
1,       1,       0,       0,       NO_INDEX,0,       343,     0

This evaluates as a public class Lcom/bugsnag/dexexample/BugsnagApp which inherits from Landroid/app/Application, whose class data is stored from byte 343. The public access modifier is determined from a bit field. Let’s view the class data.

Class Data

The first 4 bytes of our BugsnagApp class data define the number of static and instance fields, along with any direct or virtual methods.

00  00  01  00
0,   0,   1,   0,
01  81  80  04  CC  01  00  00  00
1,                     460

There is only one direct method defined in this class. It has an ID of #1, which corresponds to Lcom/bugsnag/dexexample/BugsnagApp; "V" •<init>, and a code data offset of 460. If our method was abstract or native, there wouldn’t be a code data offset.</init>

If our class defined fields and other information, more data would be encoded in this section. Incidentally, if the method ID was a value larger than 65,536, we would have encountered the infamous 64k method limit.

Code structure

We’re now analysing the constructor method defined in our class, which has the following structure at the offset of 460:

0100  0000  5701  0000  0010, 0000  01000000  64010000
1,       0,       343,   0,       16,     0        1,               64,1

This corresponds to a register size of 1, 0 incoming arguments, 343 outgoing arguments, and an offset of 16 where debug information is stored.

The most important part however, is the last few bytes. We have an instruction list size of 1, which means that our method has compiled to one opcode: 64010000.

The Dalvik Bytecode table suggests that 64 corresponds to the sget-byte operation on a register, using the field reference index of 1. This seems to match up with our expectation that a singleton #BugsnagApp field will be created for our application — but diving deep into Dalvik is a story for another day!

A new Android Compiler – D8

We haven’t touched too much on the compilation process, but our minimal Dex file was created using D8, a new compiler that will be rolled out by default in Android Studio 3.1. It offers performance benefits in the overall file size and build speed, so let’s test those claims.

Benchmarking D8 performance

Let’s create a greenfield app with Android Studio 3.0.1. We’ll add Kotlin Support and a Navigation Drawer, but will otherwise leave all the options as default, generate a signed APK, and view it with the APK Analyzer.

We can retrieve classes.dex from within the APK by unzipping the APK with unzip app-release.apk -d app, then measuring the file size in bytes: stat -f%z app/classes.dex.

Better faster smaller stronger

Metric DX D8
Uncompressed File Size (Mb) 4.23 3.73
Class count 2790 2790
Method count 22038 22038
Total Method references 28653 28651

Our Dex file is approximately 88% of its previous size when compiling with D8. Your mileage may vary, as this is a very simple example project. One other interesting thing to note is that using D8, we appear to have lost the following two method references:

android.view.View#isInEditMode
java.lang.Class#desiredAssertionStatus

These don’t appear to be used at runtime, so could be an optimisation. Please get in touch if you know why these are missing!

Why minification leads to a better app

Enabling minification and obfuscation is the single greatest thing you can do for your app, and now that you’re an expert in the Dex format, you can probably think of some reasons why.

Firstly, stripping out unused Java classes with Proguard will reduce the size of an APK, as the generated Dex file won’t contain unused class definitions and all their associated data which take up space.

Obfuscation will also reduce the Dex file size, as unless you’re the type of developer who names their classes a.a.A and z.z.Z, less characters will be required for each symbol, which will save space overall. Solutions exist for mapping obfuscated stacktraces, which allow you to easily diagnose crashes within your app.

Finally, a smaller Dex file leads to a smaller APK, which means users spend less on mobile data, and are less likely to give up on a download. If you offer an Instant App, then the hard limit of 4Mb means keeping APK size low is a big consideration.

Would you like to know more?

Hopefully this has helped you understand Dex files, which are about to get a lot smaller with the advent of D8. If you have any questions or feedback, please feel free to get in touch.

———

Bugsnag automatically monitors your applications for harmful errors and alerts you to them, giving you visibility into the stability of your software. You can think of us as mission control for software quality.

Try Bugsnag’s Android crash reporting.

BugSnag helps you prioritize and fix software bugs while improving your application stability
Request a demo