How to Conduct A/B Tests in Mobile Apps: Part I

  • A developer thinks he knows how to improve the app without testing
  • The developer believes he can simply compare “before” and “after”
  • A/B tests are time consuming and expensive
  1. Statistical significance
  • They are revealing. Radical testing has an extremely positive or negative effect, so it is easier to assess the effect from the changes. Even if the test gave a negative outcome, you get an understanding in which direction to move, whereas trivial tests inspire the illusion that an optimum is found. The $5 option lost to the $4 option — it could be regarded as meaningless to test even larger costs, they definitely won’t win. In our experience, it doesn’t work that way.
  • They give an opportunity to save money. There are more iterations in the radical test, but they are cheaper to perform and achieve the desired significance with fewer conversions. We need less data to draw a certain conclusion.
  • Lower error probability. The closer the tested variations, the higher the chance of randomness.
  • A chance for a pleasant surprise. Once in AppQuantum we had been testing an unreasonably high offer price — $25. Our entire team and the team of our developer partner were convinced this was too expensive and no one would buy that offer at this price. Competitors’ similar offers cost a maximum of $15. But all in all our variation won. Pleasant surprises happen!
  • Narrative and quality of localisation
  • User interface
  • User experience design
  • Tutorial and onboarding
  • We know that only together these changes work effectively.
  • We are sure no change will give a negative result.
  • It is easier to simultaneously test several elements that are inexpensive to design.
  • Defining the biggest bonus and risk features;
  • Asking why this superfeature could fail and succeed;
  • Estimating whether the bonus is worth the possible risks at all;
  • Determining the minimum implementation in order to receive a bonus;
  • Formalising the assessment of the bonus and risk in the test;
  • As a result, comparing the variation not only with the control group, but also with alternative ones.
  1. By demography. The audience is commonly split by country or gender + age. This factor decides what traffic sources you should use for the campaign;
  2. By payers. If there is enough data, we make several segments;
  3. By new and old users. But if possible, it is worth testing only new users;
  4. By platform and traffic source.
  1. Embedded analytics and tracking in the app.
  2. Understand how much one user in your app costs and whether you can scale it.
  3. Have resources for constant hypothesis testing.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store