One-line thesis
“Voice data, collected and validated by AI.”
The problem
ASR datasets are usually scraped, mislabeled, and biased toward whoever was easiest to record. Underrepresented languages and accents are an afterthought.
The fix is recording at scale with native speakers in their own environments, with first-pass QA in the loop, and humans for the edge cases.
The build
Layer
FEED
Stack
Next.js · WebRTC · Whisper · Postgres
Timeline
Shipped in 6 weeks
Screens
What it looks like in production.
Outcomes
Live as of today.
0
Hours collected
0
Languages covered
0
Validation accuracy (%)
Stack
Next.jsWebRTCWhisperPostgres
Related missions