Full-Stack · AI · Mobile

Menuto

Personalized restaurant dish recommendations powered by an LLM agent that learns your taste over time, using your favorites from other restaurants to inform what you'll love at new ones. Solo-built end to end: product, design, React Native frontend, FastAPI backend, and deployment.

Try on TestFlight ↗View on GitHub ↗

Year

2026

Role

Solo — Product · Design · Full-Stack

Context

Personal Project · End-to-End Ownership

Tools

React Native · Expo 53 · FastAPI · Google Gemini 2.5 Flash · Supabase · PostgreSQL · Google Places API

I’m always indecisive at restaurants, and when I do decide, it’s always the wrong thing.

Why Not Just Ask an LLM?

You could send ChatGPT a photo of the menu and ask “what should I order?” You’d get a generic answer with no memory of what you’ve liked before, no awareness of what reviewers say about this specific restaurant, and no ability to learn from the fact that you rated the cacio e pepe 5 stars last week but hated the carbonara. Every conversation starts from zero. I wanted a system with state: one that tracks your favorites across restaurants, extracts taste signals from your ratings, and runs an 8-component scoring algorithm with Bayesian weight learning that adapts to how you specifically make decisions over time.

~50

Dishes scored per request across 8 signal sources

Menu input modes: photo, URL, paste

LLM calls per recommendation (embeddings + agent reasoning)

LLM-analyzed dietary flags per dish (catches hidden ingredients)

The App

Search for a restaurant by name or browse nearby. Tap into one to see its full menu.

Find a restaurant

Browse the full menu

Set your mood: how hungry you are, how adventurous, what you're craving, and how you're dining.

Hunger and taste sliders

Cravings and dining context

The agent reasons about your signals and returns personalized picks with explanations.

Browsing the kitchen

Your picks with reasons

Rate dishes after your meal. Your favorites carry across restaurants for future visits.

Rate and save favorites

Your restaurant list

The Recommendation Engine

Agent-First Architecture

Rather than rigid scoring formulas, an LLM agent receives all available signals about the user and reasons about what to recommend. An earlier version used 10 hand-tuned scoring components (personal taste: 0.30, sentiment: 0.17, etc.). The weights were identical for everyone and couldn't reason about context.

The Pipeline

—Data Gathering: 8 signal sources per dish. Parsed menu items, Google Places reviews (cached 14 days), review-based dish popularity via mention frequency, cross-user order counts, past ratings, behavioral signals (views/orders/favorites), LLM-extracted taste keywords from feedback text, and embedding-based taste similarity (cosine similarity computed in 2 batch API calls).
—Dietary Filtering: The only rigid step. LLM-generated dietary flags per dish, with explicit instructions to catch hidden ingredients (anchovy in Caesar dressing, fish sauce in Pad Thai, parmesan in pesto). Falls back to a 30+ term keyword list for menus parsed before LLM tagging was added.
—Signal Enrichment: Each candidate gets readable flags attached. MATCHES YOUR TASTE, POPULAR (60%), WELL-REVIEWED, LOOKED AT BUT NEVER ORDERED, HAS FLAVORS YOU LIKE. No numerical scoring, just facts the agent can reason about.
—Agent Selection: The agent receives the full user narrative. Taste profile, spice tolerance, learned flavor preferences, hunger level, cravings, adventure-vs-safe slider, dining occasion, free-text mood input, history at this restaurant, and what's popular. It reasons about meal composition, honors cravings, and writes personal explanations per dish.
—Feedback Loop: After ordering, the user rates dishes with quick-tap tags and optional free-text notes. The LLM extracts taste signals from the text. “Loved the cream sauce” becomes a liked: [“cream”, “rich sauce”] signal that boosts similar dishes in future visits.

Research Foundations

Informed by Microsoft’s RecAI framework (Zhao et al., ACM Web Conference 2024): the “LLM-as-brain, traditional-models-as-tools” pattern where traditional signals handle candidate generation and the LLM handles final reasoning. The serendipity slot draws from SERAL (Chen et al., “Serendipity-Enhanced Recommender Agent with LLM,” arXiv 2502.07132, Feb 2025) on filter bubble mitigation. The implicit negative feedback model follows Hu, Koren & Volinsky’s foundational work on collaborative filtering for implicit feedback datasets (IEEE ICDM 2008).

How It Learns

Thompson Sampling for Weight Learning

The 8-component scoring algorithm doesn't use fixed weights. Each user has Bayesian priors (alpha/beta per component) that update every time they rate a dish. Over time, the system learns whether a specific user responds more to popularity signals vs. personal taste matching vs. craving alignment, without needing a cold-start dataset. After ~10 ratings, the weights diverge meaningfully from the uniform prior.

Embedding-Based Taste Compatibility

Both the user's taste profile and each dish description are embedded into the same vector space via gemini-embedding-001, then scored by cosine similarity. Someone who likes "creamy burrata" will score well on "stracciatella with olive oil" even though no keywords overlap. Two API calls total: one for the taste profile, one batch for all candidates.

Review Sentiment Decomposition

Google Places reviews are processed through the LLM to extract per-dish sentiment. “The cacio e pepe was transcendent but the tiramisu was dry” gets decomposed into dish-level praise and criticism scores that feed directly into the recommendation's customer_praise component. Cached 14 days in Supabase to stay within the Places API free tier.

System Design

Architecture Decisions

—Multi-modal menu ingestion: Three input paths (URL/HTML scraping, PDF extraction via PyMuPDF, camera photo via LLM vision) all normalize into the same ParsedDish schema. Auto-detects content type from response headers with byte-sniffing fallback.
—Composite scoring with 8 independent components: Personal taste (embedding similarity), craving match, hunger appropriateness, popularity/sentiment, dietary compliance, cuisine affinity, price fit, and friend boost. The system can explain exactly why a dish was recommended by surfacing which components dominated.
—Behavioral signals as separate normalized tables: dish_views, dish_ratings, dish_orders, dish_favorites are separate tables rather than a single interactions table. Enables efficient per-signal queries and signal-specific columns (hunger_level_when_ordered on orders, taste_signals JSONB on ratings).
—Cold start via cross-user popularity: New users with no history get recommendations weighted toward what other users ordered and review sentiment. Free-text mood input ("celebrating tonight") gives the agent rich context even without rating history.

← Learning Et Al.Dishcovery →