Chef is a very popular configuration management tool that a lot of companies (including giants like Bloomberg, Etsy, Yahoo, Facebook, and Intuit) use to automate the management of their infrastructure in order to reduce operational overhead. This way, fewer engineers can run a larger fleet of servers, and they also get the added benefit of codifying their desired infrastructure state.
Chef is written in Ruby, and exposes a very nice API for hooking into its internal events. As such, it’s possible to treat Chef just as another application - monitoring it during its run time to make sure it does not throw any errors. Using Bugsnag, I was able to setup Chef monitoring with just 12 lines of code to send exceptions from a failed chef-client run to Bugsnag in order to increase visibility for when configuration is not applied to servers properly.
We have set up all our servers to report any errors that a chef-client run encounters to a Chef project in Bugsnag. So for example, if we configure a server to fetch a file from S3, but during the run, it’s discovered that the file is not there, Chef will fail and execute our registered Bugsnag report handler, making the error appear in our dashboard. This is very helpful to us in terms of increasing visibility of the correctness of our infrastructure configuration.
The following screenshot demonstrates how easy it is to hook up Chef to Bugsnag:
Let’s break that down:
That’s it! All you have to do is put this as early on in your Chef run as possible (we put it in a “base” cookbook which we include everywhere) and any exceptions raised during a chef-client run will automatically appear in the dashboard:
In our case, we have only configured Chef to send any general errors raised by a failed client run to Bugsnag. However, Chef exposes many different types of events that you can use to report any Chef errors with a granularity level of your choice. In addition to that, the events exposed are not only errors, but other interesting and potentially valuable events, such as:
These are just some examples of the types of events you can register callbacks for (and potentially report to Bugsnag). for a full list see this link: https://docs.chef.io/handlers.html#event-types
As you can see, it’s possible to treat Chef as another application with monitoring in place to make sure it does not throw any errors at run time. Since Bugsnag is built specifically for production error monitoring, Chef fits in this use case pretty well, making Chef and Bugsnag a perfect match. This is much more effective and straightforward than building out an in-house solution for monitoring Infrastructure Configuration since it’s only 12 lines of code.