Learning Et Al.
Learning Et Al. (“learning it all”). A daily research digest that finds, synthesizes, and contrasts academic papers and news articles based on your interests.
After leaving research, I didn’t want to stray from the literature, but I didn’t want to read entire papers either. I wanted to see what’s out there and find new things to be curious about.
It generates a central question from your interests before searching, then deliberately pulls in papers from adjacent fields so the pairings are cross-domain by design. Candidates are ranked by relevance and diversity, and an LLM picks the two or three that make the best argument together. Synthesis builds a structured skeleton before writing, so you get an argument, not parallel summaries.
One curated digest each morning, and you can’t regenerate it. The constraint is the product: engage with today’s papers or wait for tomorrow. It’s anti-engagement-maximizing by design. The value is in curation, not volume.
Why Not Just Summarize Papers?
Because a summary of papers from different fields just lists them side by side; the interesting part is the argument between them, and that has to be built, not summarized.
So the synthesis writes a structured skeleton first: which paper supports the theme, which complicates it, where the tension is. Only then is prose written, scored across six dimensions, and revised. The skeleton-first approach draws on Yao 2023’s Tree of Thoughts and Madaan 2023’s Self-Refine.
Each digest suggests questions targeting the detail most likely to make a reader want to know more: the glossed-over mechanism, the assumption doing heavy lifting. Generic questions (“What are the implications?”) are banned. Answers are pre-generated, so they’re instant.
It samples your interests, down-weighting recent topics, then turns them into a central question (max 8 words) and search queries. If the question is too similar to a recent one, it tries a different angle.
For each query: OpenAlex first, then Semantic Scholar, then arXiv. About 10 results per query, deduplicated across sources.
Each candidate is scored by semantic similarity and keyword overlap, fused into one signal. Predatory journals are dropped; recent papers and strong venues get a small boost. Anything below threshold is cut, and if too few pass, the threshold relaxes or it restarts with a new theme.
Six papers are selected so each pick maximizes relevance while minimizing overlap with the others. Prevents six variations of the same finding.
The LLM picks the two or three papers that make the most interesting argument together: ones that support, complicate, or explain each other rather than just agreeing. Each gets a short nickname from the author’s name.
While papers are scored, it searches the web for recent coverage of the theme. The count is dynamic: three or more strong papers and news is skipped; thin papers and news fills the gap.
Multi-stage writing: skeleton, draft, self-critique for specificity and clichés, then targeted revision. A final check verifies every paper was cited correctly.
Each stage assumes the previous one may have erred. Metadata summaries are checked against the abstract; if one looks disconnected from its source, it falls back to the abstract’s first sentence. Drafts are checked for factual accuracy before style critique. A final coverage gate verifies each paper appears by name, re-inserting any dropped during revision.
New themes are compared against the last five by cosine similarity. Above 0.5, it retries with different interest combinations. Without this, themes converge to a template within weeks.
Topics lose weight daily (×0.95) with a penalty for recent use. Selection is weighted-random rather than top-N, so low-weight interests still surface. Engagement signals are kept small: one starred paper once dominated the feed.
Models route around banned strings: banning “here’s where it gets interesting” produces “here’s where it gets messier.” So the self-critique scans for pattern shapes, not literal phrases. Vague claims (“barriers”, “limitations”) need a concrete example in the same sentence or they’re dropped.
Started with a “best paper” anchor, scrapped when highly-cited papers pulled in wrong-field methods. Mandatory theme revision helped, then backfired when the LLM warped themes to fit weak papers. Conditional revision with a “keep if it fits” exit works better.
Citation graph → keyword matching → embedding-only → BM25+embedding RRF with MMR diversity. Hard blocklists for predatory publishers, soft penalties for high-volume journals, and a domain gate after complementarity started producing cross-field analogies instead of connected papers.
Iterated from a single call to a 7-call pipeline to the current skeleton-first approach. The prose stayed vague and pattern-heavy until I moved from banned phrases to pattern families, and from style rules to factual requirements.
Hardcoded RSS → DuckDuckGo scraping (broke on one CSS change) → Serper/DDG with User-Agent rotation and field-specific RSS fallback.
Past digests live in a searchable archive where you can browse themes over time and compare any two papers side by side. Brutalist aesthetic: hard borders, box shadows, crosshair cursor, accent colors only in tags.