Why this project exists
Rukai (魯凱語) is an Austronesian language spoken in southern Taiwan, classified as severely endangered by UNESCO. There is no commercial speech-synthesis support for it, no public TTS training set, and no maintained pronunciation dictionary that a developer could build on top of.
This project is an attempt to change the first of those — by building an open TTS engine the community can use, study, and extend, while preserving the biometric privacy of the few remaining fluent speakers.
Co-developed with a Rukai community partner
This project is co-authored with a Rukai community partner from Pingtung. At their request, they are not publicly named on this work. That choice is itself an expression of the CARE Principle of Authority to Control — the right of an Indigenous contributor to decide how their participation is attributed, including the right not to be visible. Their contribution is foundational; their preference about visibility is theirs to make.
Open code, community-controlled corpus
The engine code will be released under an open-source license. The training corpus will not.
This separation is deliberate, in line with the CARE Principles for Indigenous Data Governance — Collective benefit, Authority to control, Responsibility, and Ethics. The audio recordings, transcripts, and annotations belong to the Rukai-speaking community. They retain authority over how that material is used, shared, or reused.
A longer essay explains the technical and ethical reasoning in full — Open code, community-controlled corpus: a privacy-first design for Rukai TTS.
Current status — May 2026
- The project abstract is archived on the Open Science Framework (OSF DOI link to be added once finalised).
- A working paper version of this project will be presented at SEALS 35 — the Annual Meeting of the Southeast Asian Linguistics Society — on 3 June 2026.
- Engine architecture and corpus protocol are under active development.
- Code is not yet public; release will happen alongside the SEALS publication.
Following along
If you work on Formosan languages, language documentation, or TTS for low-resource languages and want to be in touch, the studio email at the bottom of this page reaches me directly.
Request access to the Rukai TTS demo →