Hall 03 · Studio Lab · 實驗室
In progress est. 2026 Lead researcher

Rukai TTS — an open-code speech engine for an endangered Formosan language

An open-code text-to-speech engine for Rukai, a Formosan language of southern Taiwan. Built under CARE principles for Indigenous data sovereignty: code is open, training data stays community-controlled.

Why this project exists

Rukai (魯凱語) is an Austronesian language spoken in southern Taiwan, classified as severely endangered by UNESCO. There is no commercial speech-synthesis support for it, no public TTS training set, and no maintained pronunciation dictionary that a developer could build on top of.

This project is an attempt to change the first of those — by building an open TTS engine the community can use, study, and extend, while preserving the biometric privacy of the few remaining fluent speakers.

Co-developed with a Rukai community partner

This project is co-authored with a Rukai community partner from Pingtung. At their request, they are not publicly named on this work. That choice is itself an expression of the CARE Principle of Authority to Control — the right of an Indigenous contributor to decide how their participation is attributed, including the right not to be visible. Their contribution is foundational; their preference about visibility is theirs to make.

Open code, community-controlled corpus

The engine code will be released under an open-source license. The training corpus will not.

This separation is deliberate, in line with the CARE Principles for Indigenous Data GovernanceCollective benefit, Authority to control, Responsibility, and Ethics. The audio recordings, transcripts, and annotations belong to the Rukai-speaking community. They retain authority over how that material is used, shared, or reused.

A longer essay explains the technical and ethical reasoning in full — Open code, community-controlled corpus: a privacy-first design for Rukai TTS.

Current status — May 2026

  • The project abstract is archived on the Open Science Framework (OSF DOI link to be added once finalised).
  • A working paper version of this project will be presented at SEALS 35 — the Annual Meeting of the Southeast Asian Linguistics Society — on 3 June 2026.
  • Engine architecture and corpus protocol are under active development.
  • Code is not yet public; release will happen alongside the SEALS publication.

Following along

If you work on Formosan languages, language documentation, or TTS for low-resource languages and want to be in touch, the studio email at the bottom of this page reaches me directly.

Request access to the Rukai TTS demo →

Back to all essays →