How do you make Tamil text-to-speech dependable inside an SLS interactive when the learner's browser has no Tamil voice? This case study shows how Codex converted a browser-dependent activity into a portable, offline-ready package containing genuine Singapore Tamil audio.
The finished interactive uses Microsoft's ta-SG-VenbaNeural voice, includes ten local MP3 clips, preserves touch-to-play on classroom devices, reports truthful diagnostics, and can be downloaded as an SLS-ready ZIP.
Resource type: SLS-ready HTML5 interactive, Singapore Tamil text-to-speech, offline audio package, reproducible agentic AI workflow
Language: Singapore Tamil (ta-SG)
Try the upgraded interactive Download the SLS ZIP Open detailed build notes Get the audio generator
The problem: Tamil text can render while Tamil speech is missing
The original interactive correctly requested the ta-SG locale through the browser's Web Speech API. However, speechSynthesis.getVoices() only returns voices installed or exposed by the current browser and operating system. A learner's device may therefore show Tamil text perfectly while having no Tamil voice available for playback.
Setting utterance.lang = "ta-SG" does not install a Singapore Tamil voice. It only requests one. When no exact match exists, a browser may remain silent or substitute a voice from another locale. For a Singapore classroom resource, silently falling back to Indian Tamil, Malaysian Tamil, or an unrelated system voice is not dependable localisation.
The engineering decision
Microsoft lists dedicated Singapore Tamil neural voices, including ta-SG-VenbaNeural and ta-SG-AnbuNeural. Azure Speech is an online service, not a JavaScript library that should be bundled into a public interactive, and credentials must never be placed in client-side HTML.
Because this activity speaks a fixed vocabulary set, the robust design was to generate each required phrase once during development, save the MP3 files inside the interactive, and use those packaged files as the primary playback path. An exact browser ta-SG voice remains only as a secondary fallback.
- Audit every Tamil phrase that the activity can speak.
- Generate the phrase once with a Singapore Tamil neural voice.
- Store the MP3 beside the HTML, CSS, and JavaScript.
- Map each phrase to its packaged file.
- Play local audio first and use browser speech only when appropriate.
- Package and test the complete folder as an SLS ZIP.
Step 1: Audit the speech inventory
The activity contains eight vocabulary words and two listening-discrimination words. Codex traced every code path that could request speech, then created a complete manifest. This prevents an overlooked word from unexpectedly falling back to browser speech.
| Tamil | Meaning | Packaged file |
|---|---|---|
| பள்ளி | School | audio/palli.mp3 |
| புத்தகம் | Book | audio/puththagam.mp3 |
| நண்பன் | Friend | audio/nanban.mp3 |
| ஆசிரியர் | Teacher | audio/aasiriyar.mp3 |
| வீடு | House | audio/veedu.mp3 |
| உணவு | Food | audio/unavu.mp3 |
| தண்ணீர் | Water | audio/thanneer.mp3 |
| மரம் | Tree | audio/maram.mp3 |
Step 2: Generate Singapore Tamil audio
For this implementation, Codex used the open-source edge-tts command-line client during development. The complete repeatable process is included in generate_tamil_audio.ps1.
python -m pip install edge-tts
python -m edge_tts --list-voices | Select-String '^ta-SG-'
python -m edge_tts `
--voice ta-SG-VenbaNeural `
--text "பள்ளி" `
--write-media "audio/palli.mp3"
The same pattern generated ten small files, together about 50 KB. A formal production service can use the official Azure Speech SDK or REST API through a secure backend. The important architectural rule is unchanged: never expose speech-service credentials in public HTML or JavaScript.
Step 3: Map words to local files
const vocabulary = [
{
tamil: "பள்ளி",
roman: "paḷḷi",
meaning: "School",
audio: "audio/palli.mp3"
}
];
function getPackagedAudioPath(text) {
const word = vocabulary.find(item => item.tamil === text);
return word?.audio || minimalPairAudio[text] || null;
}
Step 4: Prefer offline playback
async function speakText(text, slow = false) {
stopCurrentSpeech();
const audioPath = getPackagedAudioPath(text);
if (!audioPath) {
speakWithBrowserVoice(text, slow);
return;
}
const audio = new Audio(audioPath);
audio.playbackRate = slow ? 0.7 : 1;
audio.preservesPitch = true;
currentAudio = audio;
await audio.play();
}
The learner still gets normal and slow playback, but the result no longer depends on the device's installed speech voices. The ZIP carries the Singapore Tamil speech with it.
Step 5: Preserve touch user activation
Browser testing uncovered another subtle issue. The smartboard touch handler delayed its callback by 50 milliseconds. That delay could move audio.play() outside the browser's trusted user gesture, causing playback to be blocked even though the learner had tapped a button.
Codex moved playback back inside the synchronous touchend event and retained timestamp-based debouncing. This is particularly important for classroom touchscreens, tablets, and LMS iframes, where autoplay restrictions are stricter.
Step 6: Make diagnostics truthful
The interactive no longer warns that no Tamil browser voice exists as though the activity has failed. Its diagnostics now distinguish between:
- Primary audio: ten packaged Singapore Tamil clips;
- Voice: Microsoft Venba Neural (
ta-SG); and - Browser fallback: available or unavailable on the current device.
Step 7: Validate before packaging
Codex checked the JavaScript syntax, decoded every MP3 with FFmpeg, served the activity locally, exercised the controls in a real browser, and confirmed that the public HTML, audio, and ZIP URLs all respond successfully.
node --check script.js
Get-ChildItem audio -Filter *.mp3 | ForEach-Object {
ffmpeg -v error -i $_.FullName -f null -
}
python -m http.server 8766 --bind 127.0.0.1
What agentic AI contributed
This was more than asking an AI to write a code snippet. Codex inspected the existing project, verified the available ta-SG voices, generated assets, modified the playback architecture, diagnosed a touch-specific browser failure, validated all media, rebuilt the ZIP, prepared the documentation, and deployed the result.
Each individual step is possible with conventional tools. Agentic AI made the complete loop practical: investigate, implement, test, document, package, publish, and verify without losing the educational intent between steps.
When this pattern is appropriate
- Use packaged audio for fixed vocabulary, instructions, feedback, or assessment prompts that must work consistently and offline.
- Use a secure speech backend when learners can enter arbitrary text that must be synthesised dynamically.
- Use browser speech only as an enhancement when exact locale support is not guaranteed across managed classroom devices.
Replicate the result
- Download the completed SLS package and inspect its folder structure.
- Open the PowerShell phrase manifest.
- Replace the Tamil phrases and filenames with those from your own interactive.
- Generate audio with an exact
ta-SGvoice. - Map the assets in JavaScript and keep playback inside the user's click or touch event.
- Test every file and then rebuild the ZIP.
Result: the learner hears the intended Singapore Tamil voice even when the browser reports that no Tamil speech-synthesis voice is installed.
Credits and license
Interactive engineering, testing, packaging, documentation, and deployment were completed with Codex as an agentic AI collaborator. The resource is published by Open Educational Resources / Open Source Physics @ Singapore for educational reuse.
Content is shared under the Creative Commons Attribution-Share Alike 4.0 Singapore License unless otherwise stated.