Articles

Reliable Releases on a Reliable Schedule

By Rick Fillion

Looking for designers or developers for your next project?

Black Pixel offers design, development, and testing services backed by unrivaled experience.

Hire Us

Building the next major version of an established application should always take user feedback into consideration. There are many ways of doing this. You can solicit that feedback, amalgamate it, distill it, come up with your next target and start working (approximately what we did with Kaleidoscope). You can be completely open and let users vote on feature sets. Or anything in between. What matters is that you get input from the users.

In the case of NetNewsWire, the Google Reader shutdown forced us to change how we were approaching its development, in a big way. Since the final design was not complete, we decided to try to take advantage of this situation, and instead of 2 or 3 bigger releases on the road to 4.0, we would do as many smaller releases as it took for us to get to the point where we're happy calling it 4.0. This meant pushing out reliable builds, on a reliable schedule: roughly every two weeks. This gave us an opportunity to incorporate feedback as quickly as possible.

General Structure

Based on the release schedule of Public Betas, you might have been able to guess that NetNewsWire 4.0's development team is working with two week sprints. After a sprint is completed, we enter a week of QA where we ensure that no regressions have been introduced. Let's take a look at how this all works.

Sprint Planning

Our sprints begin on a Monday, and on the Thursday or Friday before the sprint begins, the team meets to plan the upcoming sprint. This might seem very last-minute, but the sprint planning session is really just a formality, if you think about it. Everyone already has their pet features/bugs they'd like to see worked on: we already have a general idea of what's going to happen next.

We start by looking at what didn't get done in the current sprint, and determine why that was. Is it no longer applicable? Was it too big a task and is sitting somewhere half-done on a branch? Maybe we just didn't have enough time, and it wasn't high priority. For each of these we have to decide if we want to bring them into the next sprint, send them back to our backlog of tasks, or possibly just close them out as no longer applicable to the scope of work. If bringing them into the next sprint, we re-evaluate the priority of the task and its time estimate.

Once that's completed we can go through our backlog of work (all of those feature requests you've sent us? They live in there) and decide what new work will be added to the sprint. When creating a feature request or filing a bug, we assign them priorities and estimates, so during this backlog triage we simply start at the highest priority cases and slowly move down until we feel like we have enough work to keep the team busy during the next sprint. The mix of feature work and bug fixing or code cleanup will change per sprint.

Priority & Estimates

Priority and time estimates are very important to our process. We use P1-P4 for priorities where P1 is "Showstopper" and P4 would be "Nice to Have". Technically P5 and P6 exist, but they're known internally as where pet ideas go to retire as no one cares enough to push them further. Every case we bring into the sprint needs a priority that we all agree on, and an estimate that sounds reasonable. We don't use estimates as a way to measure performance, but just to get an idea of how much work we can accomplish in two weeks. We've learned that despite there being roughly 240 hours of developer time available, it's usually pretty safe to add between 250 and 280 hours worth of work into the sprint (i.e. we tend to be slightly pessimistic when estimating).

Development Week One

We start a sprint by branching our code for the release that includes the previous sprint's work. We do this so that development can continue at our regular pace on our main 'develop' branch. QA can now start officially testing this new release branch. QA is looking for bugs, naturally, but specifically what we're concerned with at this stage is regressions (functionality that worked in a previous release that has since stopped working). Any bug that we've introduced during the last sprint, or critical bugs in a new feature get marked as a regression. Regressions get treated specially : they automatically become P1 cases, and immediately get put into the current sprint.

The order of what gets tackled by the development team is defined by case priority. During the first week we try to tackle the larger cases (which tend to be P1s and P2s), and address all of the regressions. Regressions being marked P1 isn't just symbolic; P1 is Showstopper, and that's exactly what they are. We do not want to ship with regressions we know about.

Regressions are always fixed on our 'develop' branch first where we can confirm that the fix is valid, and we then need to get this fix onto the release branch. Due to the wonders of modern versioning control systems, that usually means doing a simple git cherry-pick. But depending on how much our branches have diverged we might need to reimplement the fix in a completely different way for the release branch. This is rare, but has certainly happened more than a couple times.

Mid-way through the week we do an internal release build, something of a Release Candidate, and distribute it within the company to start getting real world use. We have some cool stuff to help with this, but that's another post.

Development Week Two

By the second development week QA has been completed, regressions fixed for the previous sprint's build, and we're ready for the release. If everyone agrees, we push the build to the world, and you have an official Public Beta. This means that what you're seeing when we publish the update is roughly one week behind current development.

The branch QA has been testing for the last week is essentially done, so they switch over to testing straight off of our 'develop' branch and start spotting issues before we even start the release process for the sprint.

Typically by this point most P1s are completed and we get to start focusing on lower priority bugs which is where the polish lives. By stacking the more disruptive work in the previous week, we can get more eyes on it, and find bugs before QA gets their hands on the code, and so it's common for us to be filing bugs against the previous week's work here.

Any work that is sizable gets done on feature branches. Testing might even happen on that branch before we merge that back into 'develop' as to avoid slowing down the other developers and to try to keep the 'develop' branch stable-ish. As the end of week two approaches and we still have unmerged feature branches, we have to start making calls about which will get merged in before we finish the sprint. Merging that branch means committing to resolving every issue we can identify with the work before it hits the public. We'll make those calls, and will often let some features sit on their branches until after the next sprint has started, then merge them in. There will always be a another release, and that feature will ship eventually.

Crash Triage

In theory QA catches everything. Reality isn't quite as pretty though, and you'll never ship a build without crashers. We use Hockey to manage our crash reports. Twice per week, a developer will look at every new crash group that has been created, and determine what needs to be done with this information. They have to decide if this is our problem to fix, and when it should get fixed. The two options available are:

  1. File the case and put it onto the backlog to get addressed during the next sprint planning meeting, or
  2. If the crash is severe enough it might be worth jumping the queue and injecting this case straight onto the current sprint so ensure we address it by the time the next Public Beta is released.

It's really important that crash triage be done by the same person all the time. Looking at individual crash logs can be deceiving and will lead you to dismiss issues as one-offs. Patterns will slowly arise, and with the same person seeing them all there's a higher likelyhood of it getting caught.

This All Works Great, In Theory.

For the most part, this is how we operate. If you look at the release dates of our public betas, you'll notice that the vast majority are every second Monday. However, rules are meant to be broken. These are guidelines that we try to adhere to (and we try really. really. hard.), but sometimes reality has a different plan for us. I think it's useful and interesting to look back at those times and how we handled them.

Where'd Public Beta 5 go?

There's a four week gap between Public Beta 4 and Public Beta 6, and no Public Beta 5. We jokingly say "There is no missing Beta 5" in the release notes for PB6. Beta 5 did exist, it just never got to the public. During development of Public Beta 5, we switched out our QA and brought in someone new. There's ramp-up time for anyone on a team, and we had concerns about our ability to properly QA the PB5 build in time for release. We looked at different options and decided that instead of just extending QA time for the build which would have shifted the development schedule, we would proceed as originally planned but would omit the final step of publishing to the public. This included internal release candidate builds and a final build, all of which were available as official builds inside the company. This gave the new team member time to learn about the app and our process, and the development team could continue as usual.

Public Beta 8.1 : Whoops.

Public Beta 8 shipped normally. A few hours later the first crash log came in. A crash in WebCore. We see this a lot, since we don't have out-of-process WebView rendering yet. We didn't think much of it. Then more came in. Way more. By the time we did Crash Triage the next day, it was obvious that this was one of the more unstable builds. We had modified some of the code relating to tabs, but these stack traces were not making it obvious that this was our problem. Luckily some of our users managed to find steps to reproduce the crash : if you closed a tab that contained certain web sites, it would lead to a crash. We just happened to never hit these sites during our testing process. With steps to reproduce clearly identifying this as a regression it was obvious that this bug should go straight to the current sprint. Within a day we had a fix for it, and knew that by Public Beta 9 this issue would be gone. The crash logs kept coming in though, as were support inquiries. We decided to make a special 8.1 build that was exactly like PB8, but with this one additional fix in place. This saved us the support time of answering all of those users emailing in about the crashes.

Mondays

The plan is release on Mondays, but sometimes we feel like we're simply cutting it too close. A regression fix may have required more work than expected; either to do the original fix or to get it over to the release branch. It's not abnormal for us to still be doing those fixes on the Friday before release. Doing them on the Monday is rare, but happens. In some cases we have to make a call whether to ship with a known regression, or to push back the release a day to get it in if we feel strongly enough that it needs to be fixed. Public Beta 8 was one of these, and though we had the regression fixed by mid-day Monday, we felt it safer to keep testing for another day. That fix was unrelated to the bug that necessitated 8.1.

How has this worked?

This schedule has worked quite well for us. It allows us get builds into users' hands on a frequent basis, which has allowed us to get the feedback we wanted in order to finalize the feature set for NetNewsWire 4.0. The stability of NetNewsWire has almost always increased between builds, despite the fact that we've been adding new features and rebuilding core parts of the app. Every app and team will have different circumstances. NetNewsWire is the first app we've done this with on the product side of Black Pixel, and this has worked well enough that we plan on migrating the development scheduling of our other products over to something similar.