10
minute read
Jamie Lynch

The big rewrite: creating a consistent API across all our SDKs

Here at Bugsnag we’ve recently released major version updates for our Android, Cocoa, JS, and React Native SDKs. Behind the scenes this involved a rewrite of several codebases to create a consistent API for users. This post dives into what we changed, how we did it, and why we went to the effort of the big rewrite.

What do Bugsnag’s SDKs do?

Bugsnag automatically captures errors and reports them to your development team, allowing you to track the stability of your application in real time. To achieve this Bugsnag offers SDKs for dozens of different languages which can be seamlessly integrated into your application.

Why was a rewrite required?

Although the existing SDKs were functioning well in production several factors led us to consider a rewrite. One of the main pain points was that new functionality had been added to each SDK organically over the years, leading to subtle differences in naming conventions.

The perfect example for how divergent the API could be is one of the simplest methods: the preferred method for initializing Bugsnag. What should have been a simple and standardized method used three different names across our Android, Cocoa, and JavaScript SDKs:

-- CODE language-kotlin --
Bugsnag.init(this, "your-api-key") // Android
-- CODE language-objectivec --
[Bugsnag startWithApiKey:@"your-api-key"]; // iOS
-- CODE language-js --
bugsnag({ apiKey: "your-api-key" }) // JavaScript

As you can imagine, having different terminology and subtle differences in behaviour across the entire API interface caused quite a headache when discussing the SDKs across the engineering team, particularly when designing new features. This also caused quite a lot of overhead in our customer support teams and most importantly in users who integrate Bugsnag into several applications written in different languages.

Additional advantages of a codebase rewrite

There were several other advantages to a rewrite. Firstly, technical debt was a non-trivial issue in some codebases, and code was structured in a way that was hard to test in isolation. Rewriting would allow us to make changes to our API surface and also increase the testability of our code.

Secondly, the existing SDKs had a problem with mutability. In several places it was possible to change Bugsnag’s configuration after Bugsnag had been initialized. For example, if a user wished to configure a custom URL to send Bugsnag error reports to, they would have to specify this after initialization:

-- CODE language-kotlin --
Bugsnag.init(this, "your-api-key")
// alter endpoint which bugsnag sends data to
Bugsnag.setEndpoint("http://internal-bugsnag.example.com")

This was problematic as the URL could be altered at any time in the application’s lifecycle, adding complexity to the implementation. Perhaps worse is that any errors captured between initialization and setting data would be sent to the wrong endpoint - which is almost certainly not what the user wanted! 

To fix this, we redesigned the Configuration class so that alterations to Bugsnag’s default behaviour were mandatory before initialization. Any subsequent changes would be ignored, resolving the issue:

-- CODE language-kotlin --
val config = Configuration("your-api-key")
-- CODE language-kotlin --
// alter endpoint which bugsnag sends data to
config.endpoint = "http://internal-bugsnag.example.com"
Bugsnag.start(this, config)

Starting the rewrite

The catalyst for a rewrite was planned improvements to our React Native SDK. What started out as a small project quickly got bogged down in the product design phase when it became apparent that the Android, iOS, and JavaScript SDKs were highly divergent in functionality. As React Native uses all these languages, we quickly realized that the planned improvements would not have been possible without rationalizing our API surface, so embarked on the big rewrite.

Gathering our requirements in a technical specification

Bugsnag’s SDKs had evolved organically over the years, and many had originally been written by the founders. So step one of our process was to extract all the requirements from their heads and to write them down in a technical specification.

Naming things is well-known as one of the hardest activities in computer science. This activity was made even harder by needing to avoid any reserved keywords in the dozens of languages Bugsnag supports, whilst still conveying each API’s meaning in a concise way. Making this even harder was the fact that the team was distributed across two different time zones, resulting in a limited window of overlap each day for design discussions.

As everyone had their own opinion on what APIs should be named, it took around 3 months to solidify a technical specification - longer than we had expected. In retrospect this was a critical step and it feels unlikely that the project would have been a success if we hadn’t taken the time up front to very clearly define our requirements before writing code.

Rewriting the Bugsnag SDKs

Once the specification had matured we started rewriting the public interface of our JavaScript and Android SDKs, followed by Cocoa. Due to the size of the changes this inevitably raised a few more questions along the way, which led to some rework during development.

We decided to split the project into three main sections, roughly aligned with the three main APIs Bugsnag exposes: Configuration, which controls the default behaviour of the SDK; Client, which captures user-submitted and unhandled errors, and Event, which allows users to manipulate error payloads before they are sent to Bugsnag.

As mentioned earlier, one of the key drivers of the project was to reduce unnecessary mutability in Bugsnag’s SDK, so fixing the design problems of Bugsnag’s Configuration class was the first port of call. Firstly we added any new methods in the specification that weren’t implemented in the SDK, adding unit and end-to-end tests for each change. This increased test coverage gave us more confidence that future changes would not introduce unintended behaviour.

The second step was to decide whether any remaining methods should be retained in the SDK, or removed if they were obsolete. Some methods were only implemented on one platform, but were deemed useful enough to add to other SDKs - for example, Android’s sendThreads configuration option was implemented on every SDK where the language supported threading. Again, the discussions around this resulted in some additional unanticipated work towards the end of the project.

Testing the rewritten Bugsnag SDKs

Testing took a bit longer than expected, but was well worth the results! Along with adding unit tests for most PRs, end-to-end tests were added using a custom black-box test framework. This gave us a lot of confidence that the SDKs were behaving as expected, as it allowed us to run crashy code in a test fixture and assert that the Bugsnag SDK sent appropriate error information to a mock server.

Adding new end-to-end test cases gave us peace of mind that core functionality was automatically tested against each change, and that we had covered each major requirement written in the specification. However, it did take a considerable amount of time to write new scenarios which we had not fully accounted for at the start of the project.

In addition to automated testing, we ensured that multiple engineers on the team manually tested each SDK for ease of use. Where possible we also dogfooded our SDKs, by using our JavaScript SDK to monitor errors on app.bugsnag.com. Both these activities provided invaluable feedback which allowed us to fix some bugs and sharp corners before considering a general release.

Completing the rewrite

Approximately 9 months from when we had started, we finally shipped new versions of our Android, JS, and Cocoa SDKs, all of which used a consistent API interface. The product improvements we had originally planned for React Native shipped a couple of months after this.

How the rewrite went well

We shipped it!

It’s quite easy for a ‘big rewrite project’ to accumulate more and more scope as time goes on, to the point where the final product is never released. While the project’s scope definitely increased in the early days, the addition of new functionality never became a major blocker preventing a release.

But did we meet the objectives we set when originally starting the project?

Our first objective of rationalizing the API surface was certainly met. The vast majority of the APIs on our Android, JavaScript, and Cocoa SDKs are now derived from a written specification. This has made communication much easier when discussing design problems that affect all of our SDKs - a common task on the platforms team. A lot of the pain that we had felt around duplication of effort in design and documentation has been much reduced.

Anecdotally there have also been fewer support requests from customers who are using the latest versions of our SDKs, compared to customers using the previous offering. Removing unnecessary mutability within our SDKs has effectively eliminated a whole class of support issues, leaving us more time to develop new features rather than fighting technical debt.

An additional benefit is that knowledge has become less siloed. If anyone wants or needs to learn how a piece of SDK functionality works, then they can read the technical specification to get an overview of how something should be built. This evergreen document seems to have been helpful already for onboarding new-starters. We also hope that a detailed specification will make it easier to implement SDKs for new languages and platforms than it has been historically.

Overall the effect of a rewrite seems to have worked well - we’ve added more tests, rationalized our public API, improved internal communication, and created a solid foundation which we can build upon for years to come. So what could we have done differently?

Things that could have gone better

Collecting our requirements into a written specification took longer than we first intended, taking around 3 months and several iterations to get to a mature stage. This is perhaps not surprising given how critical it was to get this part right, but scheduling more regular meetings earlier on would have helped us progress faster. Agreeing on the structure of the technical specification up-front would also have made it easier to parallelize contributions to the document.

We should have also considered writing end-to-end tests at the very start of the project to verify the core functionality of each SDK. Although we did write tests along the way and at the end of the project, it would have been invaluable to have high test coverage from the very start of the project. This would likely have caught several bugs that were identified later on in the project, and subsequently reduced the amount of effort spent manually testing the SDKs.

Splitting the rewrite into logical sections was also difficult because in practice each component was intertwined. This led to the rewrite taking longer than expected, particularly in the Cocoa SDK which had more technical debt than initially realized. Creating more granular tickets for each piece of work would have helped us more accurately estimate the project’s end date.

As always, hindsight is 20/20.

Would we do it again?

We feel the big rewrite worked well for our SDKs, although that doesn’t necessarily mean it will work for everyone - perhaps we were just lucky that it worked out for us! Crash reporting SDKs are relatively small and well-understood codebases, so proposing a rewrite for an SDK is a bit different than rewriting a 2-million LOC mega-project from the ground up.

———

Thank you for reading about the big rewrite! If you’re currently using Bugsnag on your mobile applications, be sure to upgrade to the latest version of our notifiers to take advantage of all our newest features. You can check out the upgraded Android, JavaScript, Cocoa, and React Native SDKs for yourself at bugsnag.com.

Bugsnag helps you prioritize and fix software bugs while improving your application stability
Request a demo