Garmin Benchmark Whitepaper

WristBench Methodology and Scoring Model

WristBench measures real Connect IQ application performance across integer logic, VM memory behavior, persistent storage I/O, and 2D rendering throughput. It avoids floating-point Pi tests because those paths are less representative of everyday watch apps.

Logic & Bitwise1000 ints x 256 passes
Memory & GCstrings, slices, 5000 dictionaries
Storage I/O500 local storage keys
2D Render5-second frame throughput

1. Methodology

The benchmark is designed to measure what Garmin apps can actually use, not an isolated silicon peak number. Every workload runs inside Monkey C and the Connect IQ runtime, and every non-UI phase is timed with System.getTimer().

CPU-heavy phases are sliced into short steps to avoid the Connect IQ watchdog. The timer still spans the whole phase, so slicing keeps the app responsive without reducing the measured elapsed time.

2. Test Modules

  • Logic & Bitwise: 1000 deterministic pseudo-random integers, 256 passes of bitmask, bitshift, xor, or, and accumulation operations, followed by QuickSort. This stresses integer execution, array access, VM loop dispatch, and recursive sorting.
  • Memory & GC: dynamic string construction, 2400 substring slices, and 5000 short-lived Dictionary objects. This stresses allocation, string copying, object lifetime, and garbage collection behavior.
  • Storage I/O: 500 Application.Storage keys are prepared, written, read, and deleted. This measures the storage interface visible to a Connect IQ app, not theoretical raw flash throughput.
  • Graphics Render: for five seconds, each frame draws 500 circles, polygons, and text operations. The result is total completed frames, reflecting the drawing API, display pipeline, antialiasing support, and screen resolution.

3. Scoring

WristBench normalizes each metric against a fixed baseline where 1000 points represents the reference device. Faster time-based tests score above 1000 when actual time is lower than the baseline. The UI throughput test scores above 1000 when it renders more frames than the baseline.

Raw Metricsms / framesNormalizebaseline 1000Weighted Score25 / 35 / 30 / 10

Time-based scores use (Base_Time / Actual_Time) * 1000.

The UI throughput score uses (Actual_Frames / Base_Frames) * 1000.

Baseline constants are Logic 2000 ms, Memory 3500 ms, I/O 1500 ms, and UI 60 frames.

Final = Logic * 0.25 + Memory * 0.35 + UI * 0.30 + I/O * 0.10

4. Weighting

Memory and GC receive the largest weight because many real watch apps spend meaningful time on strings, object allocation, and VM-managed lifetimes. UI rendering is next because watch app responsiveness is highly visible to users. Logic reflects VM integer execution. Storage I/O is important but weighted lower because it is usually not part of every frame.

  • Logic: 25%
  • Memory: 35%
  • UI: 30%
  • I/O: 10%
Logic25%
Memory35%
UI30%
I/O10%

5. Leaderboard and Deduplication

The watch uploads raw metrics. The server recalculates scores, hashes the device identifier for deduplication, stores records server-side, and aggregates model rankings by average score.

6. How to Interpret Results

Scores can vary with firmware version, Connect IQ version, battery policy, display type, screen resolution, and thermal state. For cross-device comparison, use the same app version and compare results from similar firmware conditions. For one device, repeated runs are more useful than a single peak result.