Solo Project · RecSys · LLM Agents

Learning Et Al.

Learning Et Al. (“learning it all”). A daily research digest that finds, synthesizes, and contrasts academic papers and news articles based on your interests.

Year
2026
Role
Solo — Product · Design · Full-Stack
Context
Personal Project · End-to-End Ownership
Tools
Next.js 16 · Turso/libsql · Drizzle ORM · Tailwind + shadcn/ui · ONNX Embeddings · Vercel
After leaving research, I didn’t want to stray from the literature, but I didn’t want to read entire papers either. I wanted to see what’s out there and find new things to be curious about.
The Core Idea

It generates a central question from your interests before searching, then deliberately pulls in papers from adjacent fields so the pairings are cross-domain by design. Candidates are ranked by relevance and diversity, and an LLM picks the two or three that make the best argument together. Synthesis builds a structured skeleton before writing, so you get an argument, not parallel summaries.

One Digest Per Day

One curated digest each morning, and you can’t regenerate it. The constraint is the product: engage with today’s papers or wait for tomorrow. It’s anti-engagement-maximizing by design. The value is in curation, not volume.

Why Not Just Summarize Papers?

Because a summary of papers from different fields just lists them side by side; the interesting part is the argument between them, and that has to be built, not summarized.

DISCOVERYINTERESTSDecay + rotationWeighted randomQUESTION GENTheme-firstNovelty enforcedPAPER SEARCHBM25 + EmbeddingsRRF + MMR diversitySYNTHESIS (15+ LLM CALLS)SKELETONRoles + tensionsStructured JSONPROSEArgument arcNot summariesCRITIQUE + REVISESelf-refineMandatory revisionDIGESTOne per dayGap-based Q&A
The Synthesis Pipeline

So the synthesis writes a structured skeleton first: which paper supports the theme, which complicates it, where the tension is. Only then is prose written, scored across six dimensions, and revised. The skeleton-first approach draws on Yao 2023’s Tree of Thoughts and Madaan 2023’s Self-Refine.

Follow-Up Questions

Each digest suggests questions targeting the detail most likely to make a reader want to know more: the glossed-over mechanism, the assumption doing heavy lifting. Generic questions (“What are the implications?”) are banned. Answers are pre-generated, so they’re instant.

How It Works
Interest sampling

It samples your interests, down-weighting recent topics, then turns them into a central question (max 8 words) and search queries. If the question is too similar to a recent one, it tries a different angle.

Paper fetching

For each query: OpenAlex first, then Semantic Scholar, then arXiv. About 10 results per query, deduplicated across sources.

Relevance scoring

Each candidate is scored by semantic similarity and keyword overlap, fused into one signal. Predatory journals are dropped; recent papers and strong venues get a small boost. Anything below threshold is cut, and if too few pass, the threshold relaxes or it restarts with a new theme.

Diversity pool

Six papers are selected so each pick maximizes relevance while minimizing overlap with the others. Prevents six variations of the same finding.

Complementarity

The LLM picks the two or three papers that make the most interesting argument together: ones that support, complicate, or explain each other rather than just agreeing. Each gets a short nickname from the author’s name.

News (parallel)

While papers are scored, it searches the web for recent coverage of the theme. The count is dynamic: three or more strong papers and news is skipped; thin papers and news fills the gap.

Synthesis

Multi-stage writing: skeleton, draft, self-critique for specificity and clichés, then targeted revision. A final check verifies every paper was cited correctly.

Self-Correcting Loops

Each stage assumes the previous one may have erred. Metadata summaries are checked against the abstract; if one looks disconnected from its source, it falls back to the abstract’s first sentence. Drafts are checked for factual accuracy before style critique. A final coverage gate verifies each paper appears by name, re-inserting any dropped during revision.

Staying Interesting
Theme novelty

New themes are compared against the last five by cosine similarity. Above 0.5, it retries with different interest combinations. Without this, themes converge to a template within weeks.

Interest decay

Topics lose weight daily (×0.95) with a penalty for recent use. Selection is weighted-random rather than top-N, so low-weight interests still surface. Engagement signals are kept small: one starred paper once dominated the feed.

Antipattern prompting

Models route around banned strings: banning “here’s where it gets interesting” produces “here’s where it gets messier.” So the self-critique scans for pattern shapes, not literal phrases. Vague claims (“barriers”, “limitations”) need a concrete example in the same sentence or they’re dropped.

Things I Reworked
Theme generation

Started with a “best paper” anchor, scrapped when highly-cited papers pulled in wrong-field methods. Mandatory theme revision helped, then backfired when the LLM warped themes to fit weak papers. Conditional revision with a “keep if it fits” exit works better.

Paper selection and filtering

Citation graph → keyword matching → embedding-only → BM25+embedding RRF with MMR diversity. Hard blocklists for predatory publishers, soft penalties for high-volume journals, and a domain gate after complementarity started producing cross-field analogies instead of connected papers.

Synthesis quality

Iterated from a single call to a 7-call pipeline to the current skeleton-first approach. The prose stayed vague and pattern-heavy until I moved from banned phrases to pattern families, and from style rules to factual requirements.

News sources

Hardcoded RSS → DuckDuckGo scraping (broke on one CSS change) → Serper/DDG with User-Agent rotation and field-specific RSS fallback.

The Vault

Past digests live in a searchable archive where you can browse themes over time and compare any two papers side by side. Brutalist aesthetic: hard borders, box shadows, crosshair cursor, accent colors only in tags.

BloomMenuto