A/B Testing Platform
Core frontend engineer — 199 commits, led delivery of metrics system, segmentation, and data layer refactor
Overview
King runs hundreds of A/B tests across its games at any given time. Data scientists and product managers need to create experiments, monitor them in real time, and analyse results with statistical rigour — confidence intervals, power analysis, multiple comparison corrections. This platform is what they use every day.
I joined as a core frontend engineer and over three years delivered several major features: a complete metrics management system, a user segmentation system, a React-Query data layer refactor, and continuous improvements to the statistical visualisation layer used to interpret experiment results.
Interactive Demo — Statistical Analysis
The hardest part of building this platform was translating statistics into decisions. Try it yourself — pick a scenario, collect more data, and see how the chart guides you to a clear answer.
Pick a scenario. You know the truth — but the platform doesn't yet.
Drag right = more users in the experiment = narrower range on the chart.
Can't decide yet. The new tutorial might be better, but we've only tested 5,000 players — not enough to be sure. Keep the experiment running.
The Challenge
The hardest part was making statistical concepts accessible to non-statisticians. A p-value or confidence interval means nothing to a product manager who just wants to know if the test "worked". The UI had to present complex statistical outputs — uplift percentages, confidence bands, smoothed time series, power curves — in a way that guides the right decision without hiding the underlying rigour.
Technical Highlights
Statistical visualisation layer
The core of the platform is a Highcharts-based chart layer that renders time series with confidence intervals, cumulative uplift curves, and per-segment breakdowns. Charts support 10,000+ data points via Boost mode, display smoothed trend lines, and adapt to the user's chosen confidence level (95% or 99%). Every chart is deeply integrated with the filter system — changing a parameter re-fetches and re-renders in real time.
Self-service metrics system
I designed and built the metrics management module from scratch: a CRUD interface backed by a SQL editor for defining custom metric queries, a measurement units layer, and a validation pipeline. Data scientists can now define new metrics themselves without involving the backend team. The module migrated state management from MobX to React-Query, making it simpler, cacheable, and easier to test.
Custom report builder (ETL UI)
The Custom Table Builder lets analysts define their own data extraction queries, schedule them as background jobs, and export results. The UI manages complex ETL task state — pending, running, failed, completed — with a DAG visualisation showing task dependencies (built with Dagre-D3). Analysts can share report templates with colleagues via URL.
Power calculator
Before running an experiment, teams need to know how long to run it and how many users they need to detect a meaningful effect. The power calculator interface takes inputs (MDE, population size, number of test groups, alpha, beta) and returns the required sample size and estimated run duration — helping teams design statistically sound experiments before they start.
Data layer architecture
The codebase had grown to use three different data-fetching approaches: legacy SWR, MobX stores with direct API calls, and ad-hoc fetch. I led a systematic refactor of key modules to React-Query 3, establishing a consistent pattern: declarative hooks, stale-time management, and devtools visibility. The platform connects to five separate backend services, all behind a unified API gateway layer.