Measure and improve performance with Macrobenchmark

Introduction to Jetpack Macrobenchmark and Baseline Profiles

Tomáš Mlynarič
Android Developers

--

Are you thinking about optimizing your app’s performance, but don’t know where to start? Or have you optimized the performance and wonder if there’s space for improvement?

You could benchmark your app!

In this article, we’ll take a look at how the Jetpack Macrobenchmark library helps you understand your app’s performance and how you can improve your app’s startup time by up to 30% using Baseline Profiles!

What is Jetpack Macrobenchmark

Jetpack Macrobenchmark is the library for measuring (and benchmarking) performance of your app. Macrobenchmarks are suited for end-to-end use cases, such as app startup, cross-Activity navigation, scrolling lists, or other UI manipulation. The library provides results directly in Android Studio and writes results to a JSON file. This makes it suitable for measuring locally on your workstation, but also for performance testing in continuous integration (CI).

With Jetpack Macrobenchmark you can:

  • measure the app multiple times with deterministic launch patterns and scroll speeds
  • control the compilation state of your app — a major factor in performance stability
  • check real world performance with local reproduction of install-time optimizations performed by Google Play Store

Instrumentations using this library don’t call your application code directly, but instead navigate your application as a user would. If you want to measure parts of your application code directly, see Jetpack Microbenchmark instead.

Macrobenchmark tests run in a separate process to allow restarting or pre-compiling of your app. This means in-process mechanisms like Espresso won’t work, you can instead use UiAutomator to interact with the target application.

Enough of the theory, let’s get started.

Add benchmark to your project

In this article, we’ll add macrobenchmarks to the Sunflower sample app.

Macrobenchmarks require adding a new Gradle module to your project. The simplest way to start is with the Android Studio template (requires at least Arctic Fox 2020.3.1).

  1. Right-click your project or module in the Project panel.
  2. Select New > Module.
  3. Select Benchmark from the Templates pane.
  4. Select Macrobenchmark as the Benchmark module type and fill in the details.
  5. Set the Minimum SDK you want to benchmark on, at least Android 6 (API level 23) is required.

The wizard does several things for you:

  • Creates a com.android.test module for the macrobenchmarks.
  • Adds benchmark buildType that sets debuggable to false and signingConfig to debug.
  • Adds <profileable> tag to the AndroidManifest.
  • Creates a basic startup benchmark scaffold.

The signingConfig we set to debug just for the ability to build without requiring your production keystore. We also need debuggable disabled because it adds a lot of performance overhead and makes the result timings unstable. However, because we disabled debuggable, we need to add the <profileable> tag to allow benchmarks to profile your app with release performance. To get more information about what <profileable> does, check our documentation.

Now we’re good to get started writing the actual benchmarks.

Measure app startup

App startup time, or the time it takes for users to begin using your app, is a key metric for user engagement. To measure app startup time with macrobenchmarks, write a @Test as follows (if you created the module with the template, Android Studio already created it for you):

Let’s break down what all this means.
Macrobenchmarks are regular instrumented unit tests, therefore they use the JUnit syntax — @RunWith, @Rule, @Test, etc. When writing a benchmark, your entrypoint is the measureRepeated function of the MacrobenchmarkRule, where you need to specify at least these parameters:

  • packageName – Because benchmarks run in a separate process, you need to specify which app to measure.
  • metrics – The main type of information captured. In our case, we care about startup timing.
  • iterations – How many times the loop will repeat. More iterations mean more stable results, but at the cost of longer execution time.
  • measureBlock (the last lambda parameter) – Macrobenchmark will trace and record the defined metrics during this block. You perform the actions you want to measure here.

Optionally, you also can specify CompilationMode and StartupMode.

The CompilationMode defines how the application is pre-compiled into machine code and has the following options:

  • None() – Doesn’t pre-compile the app, but JIT is still enabled during execution of the app.
  • Partial() – Pre-compiles the app with Baseline Profiles and/or warm up runs.
  • Full() – Pre-compiles the whole app. It’s the only option on Android 6 (API 23) and lower.

The StartupMode allows you to define how your application should be started upon the start of your benchmark. The available options are COLD,WARM, and HOT. You can start with StartupMode.COLD that represents the biggest amount of work your app has to do.

Let’s run it — the same way as you’d run any unit test — with the gutter icon next to the test.

You should benchmark on real devices and not on Android emulators. If you attempt to run the benchmarks on an emulator, it will fail at runtime with a warning that it’s likely to give incorrect results. While technically you can run it on an emulator (if you suppress the warning), you’re basically measuring your host machine performance — if it’s under heavy load, your benchmarks will appear slower and vice versa.

During execution, the benchmark will start and stop your app several times (based on iterations) and afterwards it will output results to Android Studio:

The results provide timing in milliseconds about how long it took your app to start (timeToInitialDisplayMs). Each of the results itself is a link with a system trace you can open in Android Studio to further investigate how the startup looked and take actions to optimize it.

A common case is to measure that the app has fully loaded the content and the user can interact with it, also called time to full display. To tell the system when that happens, you must call Activity.reportFullyDrawn(). If you do, the startup benchmark will automatically capture timeToFullDisplayMs. Be aware that you need to wait for the content in your benchmarks, otherwise the benchmark finishes with the first rendered frame and can skip the metric.

For example, the following snippet waits until the garden_list has some children:

Okay, we’ve measured the app startup time! Can we do more? You can measure frames and investigate jank!

Measure frames timing and detect jank

After your users land in your app, the second metric they encounter is how smooth the app is. Or in our terms, how fast the app can produce frames. To measure it, we’ll use FrameTimingMetric.

To achieve the mentioned flow with Macrobenchmark, you’d write a benchmark as follows:

Let’s run it the same way as the startup benchmark and get following results:

This metric outputs duration of frames in milliseconds (frameDurationCpuMs) in 50th, 90th, 95th and 99th percentile. On Android 12 (API level 31) and higher, it also returns how long your frames were over the limit (frameOverrunMs). The value can be negative, which means the time left to produce a frame.

We’ve measured it, so what?

Macrobenchmarks generate system traces for each iteration to let you investigate further what’s happening during execution. You can open the trace file directly in Android Studio by clicking on the iteration from the results (or min/median/max in case of app startup).

As we’ve seen in the benchmark, some frames are skipped when opening the plant detail. We can start investigating the problem with the trace file.

The system tracing shows various sections captured by platform code and libraries that are part of your app. Oftentimes it won’t have enough information. To improve that, add custom trace sections with the AndroidX Tracing library using the trace(“MySection”) { /* this will be in the trace */ }. For more information about reading traces and adding custom events, visit Overview of system tracing.

Now, onto improving the performance.

Improve performance with Baseline Profiles

Baseline Profiles are a list of classes and methods included in your APK that are pre-compiled into machine code during the installation of your app. This can make your app startup faster, reduce jank, and improve general performance. This is because the JIT compiler doesn’t need to be triggered when encountering the specified part of the code.

You can add custom items into the Baseline Profile by adding them into the baseline-prof.txt file in your src/main directory. However, if you don’t want to write hundreds or thousands of important methods to the file yourself, you can simplify the process and generate the Baseline Profile with Macrobenchmark! Check out Baseline Profiles for more information about how these work.

Usually, you’d generate profiles for the typical user journeys of your app.
In our example, we could identify these three journeys:

  1. Start the application (this will be critical for most applications)
  2. Go to plant list (from the previous example)
  3. Go to plant detail

For generating the profile, you’ll use BaselineProfileRule (not the MacrobenchmarkRule as previously) and call collectBaselineProfile(packageName). The following snippet shows how to generate profiles for the mentioned journeys:

To run this, you need a userdebug or rooted emulator (without Google Play Store) running Android 9 (API level 28) or higher. Before running the test, restart adb with root permission by calling adb root from the terminal. Now you can run the test to generate the profile file.

Once you run the test, you need to do several things to make the Baseline Profile work with your app:

  1. You need to place the generated Baseline Profiles file into your src/main folder (next to AndroidManifest.xml). To retrieve the file, you can copy it from the connected_android_test_additional_output folder, which is located in project_root/macrobenchmark/build/outputs/ as shown in the following screenshot.

Alternatively, you can click on the Baseline Profile results link in the Android Studio output and save the content, or use the adb pull command printed in the output.

2. You need to rename the file to baseline-prof.txt.

3. Add profileinstaller dependency to your app

To verify the Baseline Profile was loaded correctly, you can run the benchmarks we defined earlier, but use CompilationMode.Partial() as the compilationMode parameter. This parameter by default requires Baseline Profiles to be available during the installation of the app.

In the following screenshot we can see results of the two benchmarks — startupCompilationPartial that uses the Baseline Profiles and startupCompilationNone that does not.

From the results we can see that the timeToFullDisplayMs median of the not-compiled app is 293.9ms and for the app using Baseline Profiles it’s 239.9ms. This gives us ~22% faster startup time! And that’s for a sample app that uses Views, which are pre-compiled for the whole system. If your app uses Jetpack Compose where UI code is contained in your APK, the performance gain will be even bigger!

What next?

This article just scratched the surface of what’s possible with macrobenchmarks.

It’s nice to run benchmarks locally to find if there’s an issue, but it’s even better to track performance over time on your CI. For now, you can check the documentation where we show how to benchmark in CI, and check the setup for Firebase Test Lab in our samples.

If you want to dig deeper, we recently revamped our documentation, so check it out. If there’s anything you’re missing or you have an idea what you would like to measure, let us know in our tracker! We appreciate any feedback.

And also, let us know what performance benefit have you achieved in your app using Baseline Profiles!

--

--