HCI Research · Safety · Design

Bloom

LLM-augmented physical activity coaching app. The primary research project we built during my MS at Stanford with Prof. James Landay. CHI 2026, accepted.

View the Bloom website ↗App Guide PDF ↗
Year
2025
Role
UI/UX Design · Safety Engineering · Frontend · Second Author
Context
4-week randomized field study · N=54
Tools
Figma, React Native, LLM red-teaming, qualitative coding
Citation
Jörke, J., Genç, D., Teutschbein, M., Sapkota, S., Chung, J., Schmiedmayer, H.-B., Campero, A., King, A. C., Brunskill, E., & Landay, J. A. (2026). Bloom: Designing for LLM-Augmented Behavior Change Interactions. CHI '26. ACM. https://arxiv.org/abs/2510.05449

What it is

Bloom is an LLM-augmented physical activity coaching app built on Stanford's validated Active Choices Program. It integrates a conversational AI coach ("Beebo") with evidence-based behavior change UI — including an ambient garden display that grows as you complete your weekly exercise goals. The central question: can LLM coaching complement, not replace, established digital health interaction patterns? This is the primary research project we built during my MS, advised by Prof. James Landay.

Longer app engagement in LLM condition
+1.2
Mindset shift vs +0.8 in control
600
Example safety benchmark
>96%
Recall across harm categories

The App

Garden Ambient Display

The core design concept I led: a garden that lives on your homescreen and lockscreen and grows as you complete your weekly activity plan — in 20% increments. Every walk adds a bee (size proportional to duration); every other activity adds a butterfly (color varies by type). The garden resets if you don't hit 100% by week's end, and evolves to a new plant when you do. The goal was to make progress feel gradual and accumulative rather than binary, reducing the goal-fixation anxiety that metrics-forward health apps tend to produce. This was one of the central design decisions we made as a team.

Two Conditions

We designed both conditions — treatment (with Beebo, the LLM coach) and control (without). Both used the same garden ambient display, plan-setting, and wearable integration. The only difference was the conversational AI layer, which let us isolate the effect of LLM coaching on engagement, mindset shift, and behavior change.

My Contributions

UI/UX Design

I had the most influence on UI design across the app — from the garden metaphor and ambient lockscreen display to the activity logging interface, onboarding flows, and overall app architecture. I also designed the Bloom website. This wasn't a design handoff role: I owned decisions end to end, working closely with the team to make sure the UI served the study's behavior change hypotheses.

Safety Engineering

I led red-teaming for the LLM coaching agent across a vulnerable participant population (adults with existing activity barriers, including chronic pain and mental health considerations). I created a taxonomy of harm categories, then validated a 600-example benchmark achieving >96% recall. This was domain-expert red-teaming — not automated — and it was critical: Beebo regularly encountered sensitive topics that required nuanced, harm-aware guardrails to keep responses safe and within scope.

Frontend & Research Execution

I had the most influence on frontend implementation decisions — building in React Native alongside the team. I also owned participant recruitment for the 54-person study, managed onboarding logistics, and completed qualitative coding of all offboarding interviews. Second author on the published paper (CHI 2026, accepted).

Study Results

What we found

Both conditions doubled weekly goal achievement (36% → 72% meeting 150 min/week). The LLM condition showed 5× longer engagement and greater mindset shifts (+1.2 vs +0.8 points in beliefs about activity benefits), with greater improvements in exercise enjoyment and self-compassion. No single conversational strategy drove these shifts consistently — the system's flexibility was the mechanism, allowing different participants to benefit from different aspects of the AI interaction.

LLM coaching's primary value is psychological, not behavioral — surfacing behaviors people already do so they realize they're doing more than they've given themselves credit for.

Key Findings

  • Even participants with shallow engagement showed meaningful mindset changesthe garden display made progress feel real even without deep conversational interaction.
  • Highlights of existing behaviors (gardening, walking to work) counted as exercise, enabling participants to discover they were already doing more than they realized.
  • Safety filtering was essential: participants raised chronic pain, mental health struggles, and griefrequiring nuanced, harm-aware responses that the red-teamed guardrails handled correctly.
Dishcovery