March 24, 2021

Android System WebView Causes Apps to Crash: How Mobile App Teams Can Avoid a Similar Blunder

On Monday, March 22, Android users globally suddenly saw notifications pop up on their devices saying that apps had stopped running. Critical apps such as Gmail, Google Pay, and other banking apps showed sudden errors that they couldn’t be opened, creating widespread consumer concerns. Later, Google revealed that the issue resided in the Android System WebView, and many users were able to remediate this issue by uninstalling the latest update. While the issue can be resolved by relying on consumers to manually update, major crashes and necessary manual updates have a lasting impact on the end user and the overall brand reputation.   

I would be remiss if I didn’t add that bugs are inevitable, and engineering teams needn’t aim for 100% error-free software. They should, however, have pre-production QA measures in place that act as a safety net for situations like this, enabled by tools for comprehensive error diagnostics and actionable insights. This allows engineering organizations to prioritize the bugs creating the most damaging user experience. Even giants like Google and Facebook still experience lapses in this process, but it is a critical step in delivering consistent, quality software.

Unfortunately, operating system components like the Android System WebView should never crash an application. In fact, one of the tenets of good component design is that they should never crash an app. 

Key app stability observations amid the crash 

Well before and amid the crash, Bugsnag did not drop a single error due to its elastic queuing architecture. As outlined in the blog related to Facebook’s SDK outage in May and July of 2020, our auto scaling capabilities helped our customers this time around too. 

Since the start of the Android app outage, Bugsnag was monitoring for errors in the Android ecosystem. Our platform monitored the situation in real time Monday night to reveal the following key findings:

  • The Bugsnag platform registered four times the volume of regular Android errors tracked within a day, showing significant impact across the Android user base. 
  • The Webview bug caused approximately 75% of the crashes in the leading Android projects monitored by Bugsnag and these projects saw around 40 times more crashes compared to the same period in the previous week.
  • The worst-affected projects saw 200 times the number of crashes compared to the same period in the previous week.
  • We estimate that approximately 2 million users were impacted across all apps that are monitored by Bugsnag. 
  • Bugsnag also detected a drop in overall application stability by at least 2% in Android applications, with the worst-affected projects seeing a 10% decrease in app stability scores — meaning 1 in 10 Android customers were experiencing a crash. 

How to proactively protect your apps from similar outages

In this scenario, given that it was an operating system component at fault, there really isn’t much development teams could have done to prevent their applications from crashing, but immediate visibility into issues impacting the customer’s application experience is critical. Engineering teams using Bugsnag were able to provide clear guidance to their support and customer success teams to respond confidently and quickly to their customers.

Although the steps below would not have applied to this situation since an OS component was at fault, here are some proactive steps engineering teams can take to protect their applications from similar problems that impact application stability: 

  • Monitor for stability issues in production to gain immediate visibility into crashes and spikes in errors. Configure team notifications and incident management integrations to quickly align the team and deal with business-critical issues.
  • In addition to session ending crashes, be sure to also track application freezes to understand if certain features are the root cause of any ANRs (Application Not Responding) being captured. Use the stack trace to see the line of code that was running when the application froze and set off the ANR.
  • A/B test new features to understand how they are impacting application stability before releasing to production. You can also phase the rollouts and test features with a small group of users before releasing to everyone.  

Additionally, this Android WebView error was caused by a Native Development Kit error (NDK), which can only be detected if your crash reporting supports NDK crash detection, and if it is enabled. Bugsnag’s monitoring capabilities are critical in situations like this one because you don’t have to opt-in for NDK monitoring like you do with other systems. It is available by default.

Considerations for the community

Since consumers rely heavily on mobile apps to navigate day-to-day life, application stability is absolutely critical, especially in today’s relentlessly competitive environment. The silver lining of outages such as this one is that it draws attention to good software design and process. It showcases where software engineering teams need to introduce new best practices or where to to fine-tune existing ones.

BugSnag helps you prioritize and fix software bugs while improving your application stability
Request a demo