Scripting WebPageTest for Frame Budget Regressions

Scripted WebPageTest runs drive multi-step flows, capture the full main-thread trace, and expose custom metrics you compute yourself — letting CI assert a specific interaction’s long-task cost against the 16.6ms frame budget rather than a single summary number. This builds on Lab Tooling and CI, part of Rendering Performance Metrics and Tooling.

Why Scripting, Not a Single URL

A one-URL audit measures cold page load. Real frame-budget regressions often hide behind interactions: a click that triggers a 90ms filter, a tab switch that forces a layout flush. WebPageTest’s scripting language navigates to that state, marks the step boundaries, and records a trace for each step, so you can attribute main-thread time to the exact interaction instead of averaging it into a page-level total.

The Scripting Language

WebPageTest scripts are line-oriented commands. The ones that matter for frame-budget work are navigate, exec (run JS in the page), setEventName (label a measurement step), and execAndWait (run JS and wait for activity to settle).

# wpt-search.txt — script a load, then a search interaction as its own step
logData    0
navigate   https://example.com/
logData    1
setEventName    SearchInteraction
execAndWait     document.querySelector('#q').value='laptop'; \
                document.querySelector('#q').dispatchEvent(new Event('input'))

logData 0 suppresses metrics during setup; logData 1 re-enables them so only the search step is measured. setEventName makes the step show up as a discrete entry in the result, with its own filmstrip and trace.

Custom Metrics from the Trace

WebPageTest lets you declare custom metrics as JavaScript that runs at the end and returns a number. To assert against the frame budget you need the longest task and total main-thread blocking for the interaction step, which you derive from the long-task entries the trace recorded.

[Custom Metrics]
longestTask
return performance.getEntriesByType('longtask')
  .reduce((max, t) => Math.max(max, t.duration), 0); // ms of the worst block

totalBlocking
return performance.getEntriesByType('longtask')
  .reduce((sum, t) => sum + Math.max(0, t.duration - 50), 0); // TBT-style sum

These surface as longestTask and totalBlocking in the JSON result alongside the filmstrip and the raw trace, so a CI step can read them and compare against budgets.

Asserting Against the 16.6ms Budget

The result JSON is fetched by the test API and checked in CI. The frame-budget assertion is simply: did any task in the interaction step exceed 16.6ms?

// CI gate: fail if the interaction step blocked a frame
const result = await fetchWPTResult(testId)          // poll the WebPageTest API
const step = result.data.median.firstView.SearchInteraction
const FRAME_BUDGET = 16.6

if (step.longestTask > FRAME_BUDGET) {
  console.error(`longest task ${step.longestTask}ms > ${FRAME_BUDGET}ms budget`)
  process.exit(1)                                     // non-zero blocks the merge
}

Reproduction: A Regression in the Interaction Step

// The search handler does a synchronous, layout-reading filter in one frame
input.addEventListener('input', () => {
  for (const row of rows) {
    row.style.display = matches(row, input.value) ? '' : 'none'
    void row.offsetHeight // ❌ forces a layout flush every iteration — long task
  }
})

The per-row offsetHeight read interleaves a layout flush with every write, turning the loop into one long task. The scripted WebPageTest step captures it:

[WebPageTest trace — SearchInteraction step]
  Main thread:
  ├─ Event: input ......................... 0.4ms
  ├─ Task (filter loop) .................. 88.0ms  ▣ LONG TASK
  │    └─ interleaved Layout × 240 rows (forced sync layout)
  └─ Paint ................................ 5.0ms
  Custom metrics:  longestTask = 88   totalBlocking = 38
  Frame budget 16.6ms exceeded → CI assertion FAILS (88 > 16.6)

The Fix

Separate the reads from the writes so layout flushes once, not per row, and the loop no longer blocks a frame past budget. This is the standard remedy for a forced synchronous layout — batch the measurements, then batch the mutations.

// ✅ One layout flush for all reads, then all writes — no per-row sync layout
input.addEventListener('input', () => {
  const visible = rows.map((row) => matches(row, input.value)) // reads only (single flush)
  rows.forEach((row, i) => {
    row.style.display = visible[i] ? '' : 'none' // writes only — no interleaved reads
  })
})

The re-run trace shows the loop split below the frame budget and the custom metrics back under threshold, so the gate passes. WebPageTest’s per-step attribution is what made the regression visible at the interaction level; correlated with field data, the same stall would appear as a long animation frame with the handler named in its scripts array.

Verification Checklist

metric	target	how measured
`longestTask` (interaction step)	< 16.6ms	WebPageTest custom metric
`totalBlocking` (interaction step)	< 50ms	WebPageTest custom metric
Forced layout count in step	0	trace inspection of the step
CI exit code	0 on fixed build	result-JSON assertion

The SearchInteraction The `SearchInteraction` step records no task over 16.6ms.
longestTask and totalBlocking `longestTask` and `totalBlocking` custom metrics drop below threshold after batching.
The trace shows a single layout flush, not one per row.
The CI assertion exits 0 on the fixed commit and non-zero on the regression.