Free Audible on LibriVox
I had a silly but stubborn idea: a free Audible on top of LibriVox. I wasn’t trying to outsmart distribution. I was betting that open catalogs plus great UX can compete when discovery, metadata, and listening ergonomics are excellent. The product is how it feels to listen, not the MP3s. 🎧👇
Found this in my notes from Aug 16, 2017. Audible was the default, React Native was finally usable for real apps, and I’d just shipped Saga Music with youtube-dl (yes, really). The thought was simple: same playbook, different source - LibriVox’s public‑domain books with Saga‑style polish.
Tech reality check: LibriVox is public domain and mostly lives on Internet Archive. Legally clean, practically messy - variable audio quality, uneven metadata, inconsistent chaptering, and multiple editions per title.
A 2025 build starts as a data project:
- Ingest LibriVox/Archive feeds
- Hash/fingerprint and dedupe editions
- Normalize into chaptered manifests (m4b-like)
- Enrich with Wikidata/ISNI, subjects, languages
- Index for search/browse
Backend: worker queue to fetch/transform, Postgres for manifests, Meilisearch for search, CDN with range requests, optional edge cache (R2/Backblaze). App: React Native + TrackPlayer (ExoPlayer/AVAudioSession), offline-first caching, precise chapter progress, speed/eq/trim-silence, bookmarks, and solid resume across devices.
What changed since 2017: audiobooks aren’t a single-store world anymore. Spotify and Apple sell them, Libby normalized “free,” and neural TTS made decent narration cheap. The bar moved from availability to taste - curation, narrator quality, and personalization.
Saga-style scraping for music was gray. Here the PD base is clean, but the hard parts are metadata ops, duplicate-version moderation, rehost vs link policy, and the ethics of synthetic voices. The viable business is service value - discovery, queues, annotations, and community - not exclusives.
Open questions I’m still chewing on:
- Can a PD-first app win on discovery and ergonomics alone?
- Should synthetic narration be allowed, and how do we label and rank it next to human reads?
- Rehost for reliability or respect source hosting and cache at the edge?
- What’s a fair way to fund bandwidth/metadata work and also support LibriVox and Internet Archive?
Here’s the original note from 2017: https://github.com/AnandChowdhary/notes/blob/main/notes/2017/libri-vox-material.md