CHI 2026 · Best Paper · Top 1%

Bloom

At Stanford with Prof. Landay, I co-designed and evaluated Bloom, an LLM-based physical activity coaching intervention. My contributions spanned early-stage design through full-stack implementation. CHI 2026, accepted.

View the Bloom website ↗App Guide PDF ↗

Year

2025

Role

UI/UX Design · Safety Engineering · Frontend · Second Author

Context

4-week randomized field study · N=54

Tools

Figma, React Native, LLM red-teaming, qualitative coding

Awards

Best Paper Award · Top 1% of submissions

Citation

Jörke, J., Genç, D., Teutschbein, M., Sapkota, S., Chung, J., Schmiedmayer, H.-B., Campero, A., King, A. C., Brunskill, E., & Landay, J. A. (2026). Bloom: Designing for LLM-Augmented Behavior Change Interactions. CHI '26. ACM. https://arxiv.org/abs/2510.05449

What it is

Bloom is an LLM-augmented physical activity coaching app built on Stanford's validated Active Choices Program. It integrates a conversational AI coach ("Beebo") with evidence-based behavior change UI, including an ambient garden display that grows as you complete your weekly exercise goals. The central question: can LLM coaching complement, not replace, established digital health interaction patterns? We ran a 4-week randomized field study with 54 participants to find out.

5×

Longer app engagement in LLM condition

+1.2

Mindset shift vs +0.8 in control

600

Example safety benchmark

>96%

Recall across harm categories

The App

Garden Ambient Display

The core design concept I led: a garden that lives on your homescreen and lockscreen and grows as you complete your weekly activity plan — in 20% increments. Every walk adds a bee (size proportional to duration); every other activity adds a butterfly (color varies by type). The garden resets if you don't hit 100% by week's end, and evolves to a new plant when you do. The goal was to make progress feel gradual and accumulative rather than binary, reducing the goal-fixation anxiety that metrics-forward health apps tend to produce. This was one of the central design decisions we made as a team.

Two Conditions

We designed both conditions — treatment (with Beebo, the LLM coach) and control (without). Both used the same garden ambient display, plan-setting, and wearable integration. The only difference was the conversational AI layer, which let us isolate the effect of LLM coaching on engagement, mindset shift, and behavior change.

My Contributions

UI/UX Design

I had the most influence on UI design across the app — from the garden metaphor and ambient lockscreen display to the activity logging interface, onboarding flows, and overall app architecture. I also designed and built the Bloom website. This wasn't a design handoff role: I made key decisions on app architecture and user experience end to end, working closely with the team to ensure the UI served the study's behavior change hypotheses.

Safety Engineering

I led red-teaming for the LLM coaching agent across a vulnerable participant population — adults with existing activity barriers, including chronic pain and mental health considerations. I created a taxonomy of harm categories and validated a 600-example benchmark achieving >96% recall across risk categories. This was domain-expert red-teaming, not automated, and it was critical: Beebo regularly encountered sensitive topics that required nuanced, harm-aware guardrails to keep responses safe and within scope.

Frontend & Research Execution

I had the most influence on frontend implementation decisions, building in React Native alongside the team. I was heavily involved in participant recruitment for the 54-person study, managed onboarding logistics, and completed qualitative coding of all offboarding interviews. Second author on the published paper (CHI 2026, accepted).

Study Results

What we found

Both conditions doubled weekly goal achievement (36% to 72% meeting 150 min/week). Treatment participants showed larger mindset shifts (+1.2 vs +0.8 points in beliefs about activity benefits), with greater improvements in exercise enjoyment and self-compassion. No single conversational strategy drove these shifts consistently. The system's flexibility was the mechanism: different participants benefited from different aspects, whether activity reframing, goal alignment, or acknowledgment of existing efforts. Even those with shallow engagement showed meaningful changes, suggesting that adaptive, personalized representations can shift self-perception without requiring deep conversational interaction.

LLM coaching's primary value is psychological, not behavioral — surfacing behaviors people already do so they realize they're doing more than they've given themselves credit for.

Key Findings

—Even participants with shallow engagement showed meaningful mindset changes — the garden display made progress feel real even without deep conversational interaction.
—Highlights of existing behaviors (gardening, walking to work) counted as exercise, enabling participants to discover they were already doing more than they realized.
—Safety filtering was essential: participants raised chronic pain, mental health struggles, and grief — requiring nuanced, harm-aware responses that the red-teamed guardrails handled correctly.

Learning Et Al. →