Opinionated news summarizer racing clock
Compression is the easy part. A news summarizer makes calls on sources, context, and trust while racing a clock. The text is short. The system is opinionated by design. 🗞️👇
I found this note from Apr 15, 2017 while cleaning my old notes. I hacked “Quickie News” in Python (yes, really!) to do what Inshorts did with humans. Today, it looks a lot like what Perplexity does with LLMs. Here’s how it aged.
2017 reality in plain text:
- Most production was extractive: TextRank/LexRank.
- Early abstractive seq2seq showed promise, pointer generator with coverage cut repetition.
- Training data was CNN/DailyMail and Gigaword, ROUGE drove tuning.
- Pipeline: crawl RSS, cluster near duplicates, pick a centroid, write a tight blurb, attach 2 to 3 links.
It worked, but discourse was brittle, entities drifted, and blurbs felt shallow.
2025 reality I use and see:
- LLMs do multi-document synthesis with retrieval, re-ranking, and citations.
- You can steer style and length, and add critique loops for self-checks.
- Tradeoffs shifted: faithfulness and attribution matter more than raw brevity, latency and cost vs deeper verification, personalization vs a steady editorial voice.
What held up from 2017: treat the summary as an interface to explore.
What changed: synthesis quality, real-time grounding, and user expectations for transparency.
Open questions I’m still asking:
- Should summaries declare a lens and be opinionated, or start neutral with sliders for viewpoint?
- How do we audit bias and factuality at scale without blowing the budget or the latency?
- What licensing and attribution model actually pays upstream publishers and journalists?
Here’s the original note from 2017: https://github.com/AnandChowdhary/notes/blob/main/notes/2017/news-summarizer.md