Crashes on Android can be immensely frustrating for users, so much so that after experiencing as little as two crashes, the typical user will uninstall your app. Fortunately, the Android Framework provides some great tools for debugging crashes, and provides several useful crash logs that developers can read to determine what caused that critical issue. In this blog post we’ll cover the three most important crash logs used by the system: exception stack traces, ANR traces, and NDK tombstones.
JVM stack traces are the most common type of crash that typical Android applications will encounter, as the majority of apps are written in either Kotlin or Java. In JVM languages, an Exception is thrown in exceptional circumstances, and contains debug information about the error condition that went wrong, such as a stack trace with file/line number information, and an error message.
If an app hasn’t got a crash reporting SDK installed, then the next best method for retrieving crash logs is to use adb to view logcat. This is a convenient method if physical access to the device is an option, because the default UncaughtExceptionHandler in Android apps prints out the entire stack trace to Logcat before terminating the process, meaning the crash is effectively logged out to an accessible location for developers.
ANRs (Application Not Responding) occur when an application does not respond to user input for a noticeable period of time. The visible effect of this is that an app has ‘frozen’ from a user’s perspective, which can be immensely frustrating. Common causes include performing disk reads/writes on the main thread, and other long-running tasks, which prevents the User Interface from updating in response to user input.
If the app is in the foreground, after approximately 5 seconds a dialog will be shown which allows the user to kill the app. At this point a trace including details of the ANR will be written to disk, from which valuable information for debugging can be retrieved. Again, this requires physical access to the device unless you have a crash reporting SDK installed that supports ANR detection.
Tombstone crash logs are written when a native crash in C/C++ code occurs in an Android application. The Android platform writes a trace of all the running threads at the time of the crash to /data/tombstones, along with additional information for debugging, such as information about memory and open files. Tombstones are the closest to the metal in terms of information, as they will record details such as raw memory addresses, and as such can be a bit trickier to understand unless you’re familiar with debugging native code. Again, tombstones require physical access to a rooted device in order to be read.
As a prerequisite to all these steps, you should have installed Android Studio and added the command line tools to your path. These local methods make use of the adb tool. You should also have a device or emulator connected which has had developer options enabled.
Note: if you are comfortable using the Device File Explorer in Android Studio directly, you can open device files directly from there rather than using adb pull.
By default, exception stack traces are printed out to the Logcat tool on Android devices. It is possible to retrieve crash logs via the following steps:
adb logcat AndroidRuntime:E *:S
If a crash has occurred recently on the device, you can skip step 2. This is because Logcat retains a buffer of recent logs which should include the exception. This is time sensitive however – so if you’re looking for a crash from a day ago, that information may be gone forever unless you use a crash reporting tool such as BugSnag.
adb pull /data/anr/traces.txt <destination>
Alternatively, you can inspect summary ANR information by running the following command
adb logcat ActivityManager:E *:S
adb ls /data/tombstones
adb pull /data/tombstones/tombstone_01 <destination>
Reading a JVM stack trace can be intimidating at first, but by breaking it down into its constituent parts the task becomes fairly easy. Let’s walk through it step by step, with the following RuntimeException that has been thrown in an example application:
2019-08-27 16:10:28.303 10773-10773/com.bugsnag.android.example E/AndroidRuntime: FATAL EXCEPTION: main Process: com.bugsnag.android.example, PID: 10773 java.lang.RuntimeException: Fatal Crash at com.example.foo.CrashyClass.sendMessage(CrashyClass.java:10) at com.example.foo.CrashyClass.crash(CrashyClass.java:6) at com.bugsnag.android.example.ExampleActivity.crashUnhandled(ExampleActivity.kt:55) at com.bugsnag.android.example.ExampleActivity$onCreate$1.invoke(ExampleActivity.kt:33) at com.bugsnag.android.example.ExampleActivity$onCreate$1.invoke(ExampleActivity.kt:14) at com.bugsnag.android.example.ExampleActivity$sam$android_view_View_OnClickListener$0.onClick(ExampleActivity.kt) at android.view.View.performClick(View.java:5637) at android.view.View$PerformClick.run(View.java:22429) at android.os.Handler.handleCallback(Handler.java:751) at android.os.Handler.dispatchMessage(Handler.java:95) at android.os.Looper.loop(Looper.java:154) at android.app.ActivityThread.main(ActivityThread.java:6119) at java.lang.reflect.Method.invoke(Native Method) at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:886) at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:776)
The first place to start is towards the top of our crash log. Here we can see the process ID that the system assigned the executing app, along with the package name, which can be useful when correlating against other information obtained via logcat. In our example app, the package name is “com.bugsnag.android.example”:
Process: com.bugsnag.android.example, PID: 10773
The next useful piece of information is the exception class. In Java/Kotlin, all exceptions and errors are classes which extend Throwable or one of Throwable’s subclasses, and each exception class can have a different semantic meaning. For example, a developer may wish to throw an IllegalStateException if the program entered an unexpected state, or an IllegalArgumentException if a user attempted to save null as their name. In our case, we’ve thrown a RuntimeException, whose fully qualified class name is displayed below:
java.lang.RuntimeException: Fatal Crash
The error message is also printed to the crash log, which can be very useful for providing additional debug information. In our case, we have just supplied the text “Fatal Crash”, but we could equally pass the values of our variables at the time of the crash if we wanted further information for debugging.
The next thing in our crash log is the juicy part of the information – a stack trace of the thread where the exception occurred. Looking at the top stackframe will usually allow us to find exactly where the error was thrown from, and the frames below it will allow us to observe what the program state was at the time of the crash. The top stackframe is the following:
We can immediately see that this contains some very useful information. We are given the class, “com.example.foo.CrashyClass”, so we can open the source file and hunt for a bug there. We’re also given the method name, “sendMessage”, and the line number of the crash, 10, so we can pinpoint exactly in our source where the exception was thrown from.
Understanding the crash log from this point onwards is a case of reading the stack frames below, which are in the order in which the methods were originally called. Consider the example below:
at com.example.foo.CrashyClass.sendMessage(CrashyClass.java:10) at com.example.foo.CrashyClass.crash(CrashyClass.java:6) at com.bugsnag.android.example.ExampleActivity.crashUnhandled(ExampleActivity.kt:55)
Starting at the top, we can tell that “sendMessage” was invoked by “crash”, which was in turn invoked by a method named “crashUnhandled”, which appears to be the origin of our crash. Of course, the full exception stack trace was somewhat more complex, and also involved method calls in the Android framework, but the basic principle remains the same.
An important note is that in production, apps are often obfuscated by tools such as Proguard which can mean the original symbols are not present, and the stack trace becomes unintelligible. Fortunately, most crash reporting services provide plugins that automatically upload a mapping file that contains the information necessary to symbolicate crash reports from your production apps.
ANR traces contain a very large amount of information. As an ANR can potentially be caused by multiple apps contending for a limited number of resources, the full crash log contains stack traces for multiple different processes. This full information can be very useful for debugging when all else fails, but most of the time the summary ANR information printed into Logcat is sufficient to debug the error. Consider the following:
2019-08-27 16:12:57.301 1717-1733/system_process E/ActivityManager: ANR in com.bugsnag.android.example (com.bugsnag.android.example/.ExampleActivity) PID: 10859 Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 33. Wait queue head age: 6178.1ms.) Load: 0.32 / 0.62 / 0.37 CPU usage from 323086ms to -1ms ago (2019-08-27 13:16:43.467 to 2019-08-27 16:12:54.131): 5.7% 1717/system_server: 2.6% user + 3% kernel / faults: 21251 minor 4.7% 10392/com.bugsnag.android.example: 1.3% user + 3.3% kernel / faults: 587 minor 3.9% 2375/com.google.android.gms: 2.9% user + 0.9% kernel / faults: 71377 minor 51 major 3.1% 16/ksoftirqd/2: 0% user + 3.1% kernel 2.5% 2254/com.google.android.googlequicksearchbox:search: 1.1% user + 1.4% kernel / faults: 10193 minor 1.2% 10427/kworker/u8:0: 0% user + 1.2% kernel 0.9% 8990/kworker/u8:2: 0% user + 0.9% kernel 0.8% 1342/surfaceflinger: 0.1% user + 0.7% kernel / faults: 35 minor 0.5% 1344/adbd: 0% user + 0.4% kernel / faults: 8471 minor 0.4% 1896/com.google.android.gms.persistent: 0.3% user + 0% kernel / faults: 1106 minor 0.4% 1288/logd: 0.1% user + 0.3% kernel / faults: 43 minor 0.3% 1806/com.android.systemui: 0.2% user + 0% kernel / faults: 404 minor 0.2% 1916/com.android.phone: 0.1% user + 0% kernel / faults: 203 minor 0.2% 1410/audioserver: 0% user + 0.1% kernel / faults: 119 minor 0.1% 10429/kworker/u8:3: 0% user + 0.1% kernel 0.1% 10378/com.google.android.apps.photos: 0.1% user + 0% kernel / faults: 426 minor 0.1% 8/rcu_preempt: 0% user + 0.1% kernel 0% 1396/jbd2/dm-0-8: 0% user + 0% kernel 0% 2179/com.google.android.apps.nexuslauncher: 0% user + 0% kernel / faults: 802 minor 1 major 0% 1409/zygote: 0% user + 0% kernel / faults: 857 minor 0% 3951/com.android.defcontainer: 0% user + 0% kernel / faults: 265 minor 0% 10137/kworker/u9:0: 0% user + 0% kernel 0% 1987/wpa_supplicant: 0% user + 0% kernel 0% 10205/com.google.android.apps.docs: 0% user + 0% kernel / faults: 50 minor 0% 1378/dmcrypt_write: 0% user + 0% kernel 0% 2111/com.google.process.gapps: 0% user + 0% kernel / faults: 356 minor 0% 3882/com.android.printspooler: 0% user + 0% kernel / faults: 241 minor 0% 8829/kworker/u9:2: 0% user + 0% kernel 0% 9808/kworker/u9:4: 0% user + 0% kernel 0% 19/migration/3: 0% user + 0% kernel 0% 1420/rild: 0% user + 0% kernel 0% 10138/kworker/u9:1: 0% user + 0% kernel 0% 1339/lmkd: 0% user + 0% kernel 0% 1419/netd: 0% user + 0% kernel / faults: 59 minor 0% 1793/com.android.inputmethod.latin: 0% user + 0% kernel / faults: 12 minor 0% 10146/com.android.gallery3d: 0% user + 0% kernel / faults: 95 minor 0% 10181/android.process.acore: 0% user + 0% kernel / faults: 52 minor 0% 1281/kworker/0:1H: 0% user + 0% kernel 0% 10162/kworker/2:1: 0% user + 0% kernel 0% 10348/com.google.android.partnersetup: 0% user + 0% kernel / faults: 92 minor 0% 20/ksoftirqd/3: 0% user + 0% kernel 0% 10308/android.process.media: 0% user + 0% kernel / faults: 16 minor 0% 1336/healthd: 0% user + 0% kernel 0% 1354/logcat: 0% user + 0% kernel 0% 1709/hostapd: 0% user + 0% kernel 0% 3/ksoftirqd/0: 0% user + 0% kernel 0% 1341/servicemanager: 0% user + 0% kernel 0% 2091/com.google.android.ext.services: 0% user + 0% kernel / faults: 10 minor 0% 10475/com.google.android.apps.photos:CameraShortcut: 0% user + 0% kernel / faults: 29 minor 0% 4/kworker/0:0: 0% user + 0% kernel 0% 12/ksoftirqd/1: 0% user + 0% kernel 0% 1422/fingerprintd: 0% user + 0% kernel 0% 1591/dhcpclient: 0% user + 0% kernel 0% 1706/ipv6proxy: 0% user + 0% kernel 0% 1913/sdcard: 0% user + 0% kernel 0% 2137/com.google.android.googlequicksearchbox:interactor: 0% user + 0% kernel / faults: 3 minor 0% 687/kworker/1:1: 0% user + 0% kernel 0% 1297/vold: 0% user + 0% kernel / faults: 10 minor 0% 1413/installd: 0% user + 0% kernel / faults: 35 minor 0% 1//init: 0% user + 0% kernel 0% 11/migration/1: 0% user + 0% kernel 0% 466
First off, the crash log contains information about which process on the system suffered an ANR, and gives the process ID and package name, which comes in useful when finding the appropriate stack traces in the more detailed ANR trace:
E/ActivityManager: ANR in com.bugsnag.android.example (com.bugsnag.android.example/.ExampleActivity) PID: 10859
The Android framework gives us a reason for the ANR. In this case, the user touched the screen several times, and the dispatch queue waited for over 6 seconds without showing a visible response to these touch events. This represents a very bad user experience that would be noticeable as the whole app would appear to freeze from a user’s perspective:
Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 33. Wait queue head age: 6178.1ms.)
Finally, we’re given some CPU load information. While some ANRs have simple causes such as performing IO on the main thread, this is not always the case. Sometimes an ANR can occur on a low-end device due to a lot of resource-hungry apps competing for CPU, so determining whether there are other apps using lots of resources at the same time as our application can be very helpful:
CPU usage from 323086ms to -1ms ago (2019-08-27 13:16:43.467 to 2019-08-27 16:12:54.131): 5.7% 1717/system_server: 2.6% user + 3% kernel / faults: 21251 minor 4.7% 10392/com.bugsnag.android.example: 1.3% user + 3.3% kernel / faults: 587 minor 3.9% 2375/com.google.android.gms: 2.9% user + 0.9% kernel / faults: 71377 minor 51 major 3.1% 16/ksoftirqd/2: 0% user + 3.1% kernel 2.5% 2254/com.google.android.googlequicksearchbox:search: 1.1% user + 1.4% kernel / faults: 10193 minor
Like ANR traces, tombstones also contain a very large amount of information that wouldn’t be possible to walk through entirely. We’ll consider a truncated example, which shows the most important information near the top:
ABI: 'x86' pid: 15300, tid: 15300, name: android.example >>> com.bugsnag.android.example <<< signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xb19aa650 eax b19aa650 ebx b19abfd8 ecx 00000005 edx b32a1230 esi 99365d7e edi bf9c2338 xcs 00000073 xds 0000007b xes 0000007b xfs 0000003b xss 0000007b eip b19aa688 ebp bf9c2148 esp bf9c2140 flags 00010296 backtrace: 00 pc 00000688 /data/app/com.bugsnag.android.example-2/lib/x86/libentrypoint.so (crash_write_read_only+40) 01 pc 000006ca /data/app/com.bugsnag.android.example-2/lib/x86/libentrypoint.so (Java_com_bugsnag_android_example_ExampleActivity_doCrash+42) 02 pc 003e9d3c /data/app/com.bugsnag.android.example-2/oat/x86/base.odex (offset 0x399000) stack: bf9c2100 b32dc140 [anon:libc_malloc] bf9c2104 b3159f5c /system/lib/libart.so bf9c2108 b315abb3 /system/lib/libart.so bf9c210c b315ab82 /system/lib/libart.so bf9c2110 b315ab69 /system/lib/libart.so bf9c2114 b716ced4 /dev/ashmem/dalvik-LinearAlloc (deleted) bf9c2118 b328b400 [anon:libc_malloc] bf9c211c b312b721 /system/lib/libart.so (_ZN3art14JniMethodStartEPNS_6ThreadE+17) bf9c2120 00430000 bf9c2124 00590000 bf9c2128 b328b400 [anon:libc_malloc] bf9c212c b32a1230 [anon:libc_malloc] bf9c2130 b32b00c0 [anon:libc_malloc] bf9c2134 b328b400 [anon:libc_malloc] bf9c2138 00000043 bf9c213c b19aa66e /data/app/com.bugsnag.android.example-2/lib/x86/libentrypoint.so (crash_write_read_only+14) 00 bf9c2140 bf9c2200 bf9c2144 b19aa650 /data/app/com.bugsnag.android.example-2/lib/x86/libentrypoint.so bf9c2148 bf9c2178 bf9c214c b19aa6cb /data/app/com.bugsnag.android.example-2/lib/x86/libentrypoint.so (Java_com_bugsnag_android_example_ExampleActivity_doCrash+43) 01 bf9c2150 00430000 bf9c2154 00000013 bf9c2158 05980a40 bf9c215c bf9c219c bf9c2160 b32a1230 [anon:libc_malloc] bf9c2164 0000000c bf9c2168 bf9c21cc bf9c216c b2bc803f /system/lib/libart.so (art_jni_dlsym_lookup_stub+15) bf9c2170 b328b400 [anon:libc_malloc] bf9c2174 0000000c bf9c2178 bf9c21cc bf9c217c 994aed3d /data/app/com.bugsnag.android.example-2/oat/x86/base.odex 02 bf9c2180 b32a1230 [anon:libc_malloc] bf9c2184 bf9c219c bf9c2188 05980a40 bf9c218c 00000001 bf9c2190 b716ced4 /dev/ashmem/dalvik-LinearAlloc (deleted) bf9c2194 bf9c2c24 bf9c2198 00000001 bf9c219c 12c7b450 /dev/ashmem/dalvik-main space (deleted) bf9c21a0 00000006 bf9c21a4 b31fbb74 /system/lib/libart.so bf9c21a8 bf9c2238 bf9c21ac b2f3a0a4 /system/lib/libart.so (_ZNK3art7OatFile8OatClass19GetOatMethodOffsetsEj+100) bf9c21b0 bf9c21cc bf9c21b4 99365d7e /data/app/com.bugsnag.android.example-2/oat/x86/base.odex bf9c21b8 bf9c2338 bf9c21bc b2bc9263 /system/lib/libart.so (art_quick_invoke_stub+339)
Breaking it down, we’re given essential crash log information towards the top, such as the package name, and the process ID. Crucially because this is a native error and could be affected by the CPU architecture, the tombstone also contains the ABI of the device, which in this case is x86 as the crash was triggered on an emulator:
ABI: 'x86' pid: 15300, tid: 15300, name: android.example >>> com.bugsnag.android.example <<<
We’re given information about the native error, which in this case was due to a SIGSEGV signal being raised. This contains the address where the fault was triggered, along with assembly instructions:
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xb19aa650 eax b19aa650 ebx b19abfd8 ecx 00000005 edx b32a1230 esi 99365d7e edi bf9c2338 xcs 00000073 xds 0000007b xes 0000007b xfs 0000003b xss 0000007b eip b19aa688 ebp bf9c2148 esp bf9c2140 flags 00010296
The rest of the trace contains symbols which were being executed at the time of the crash, which can be symbolicated using debug information in your application’s shared object files, a feature that BugSnag achieves automatically with a gradle plugin integration.
While getting crash logs of your Android device manually can be very useful in certain situations, it’s possible to automate collection of crash logs by installing a crash reporting SDK such as BugSnag’s Android SDK in your application. Crash reporting SDKs automatically detect JVM crashes, NDK crashes, and ANRs, and automatically deliver a diagnostic report about the error to a web dashboard.
There are several advantages to collecting crash logs automatically. You might not always have access to a device, particularly if a crash has occurred on an end-user’s device, or if you are employing a 3rd party Quality Assurance firm, they may not have the necessary tools installed to obtain crash logs manually from a test device. Crash reporting SDKs offer automatic reporting of errors from your production application, so that you can gain an immediate insight into how many users are affected by a bug and determine how to fix it with diagnostic information.
A great advantage of crash reporting SDKs is that it’s possible to attach custom metadata to an error report. On-device crash logs are limited by the amount of information that the Android framework can collect, but there is no such limitation when it comes to 3rd party tools. For example, BugSnag will automatically capture breadcrumbs on Android of lifecycle events and common system broadcasts, which can help track down those tricky bugs that are related to unexpected state in a lifecycle event.
Crash reporting services also offer powerful search and segmentation. Perhaps a bug is only occurring on certain OS versions, or a critical error with In-App-Purchases is affecting the conversion rate of your paid customers. By adding custom metadata to error reports, it’s possible to segment and search for the important bugs, so that you can prioritise what delivers value for your business, rather than wasting time manually collating crash logs from dozens of devices yourself.