Data Sovereignty Policy

Developed by the Data Sovereignty Working Group:
Lisa Mesher, Natasha Ita MacDonald, Thomassie Mangiok, Robin Anawak (and in consultation with Abundant Intelligences including Fenwick McKelvey and Maroussia Lévesque).

Purpose

Heritage Lab is an Indigenous-led, values-driven organization committed to advancing AI development in alignment with Indigenous data sovereignty principles outlined in this policy. Technology development is not neutral—it is a process of community governance and an articulation of sovereignty. This policy ensures that Indigenous communities maintain control over their knowledge, language, and cultural heritage as they engage with artificial intelligence and digital technologies. This policy outlines measures to protect Indigenous rights to ownership and control over data while establishing clear, ethical standards for how Heritage Lab develops and deploys AI tools in partnership with Indigenous communities.

1. Our Responsibilities

Heritage Lab is a steward, not an owner, of community-contributed data, knowledge, and linguistic materials. We build technology in partnership with Indigenous communities through transparent agreements and shared governance, store data on Indigenous-controlled infrastructure, and support community capacity-building in data governance and AI technology. We are accountable to the communities we serve through regular review, transparent communication, and good faith consultation on any policy changes.

2. Scope

This policy applies to:

All data collected, stored, and/or processed by Heritage Lab
All AI models, tools, and technologies developed by Heritage Lab
All Partners of Heritage Lab
All Heritage Lab staff, contractors, and organizational users of Heritage Lab tools
All community members and individuals who contribute data or use Heritage Lab services

3. Definitions

Partners: Indigenous organizations working in collaboration with Heritage Lab. Non-Indigenous partners are explicitly identified as such when applicable. Community-Contributed Content: Stories, historical information, cultural knowledge, and materials submitted for reference and retrieval AI Training Data: Data used to train translation and language learning AI models Public Data: Publicly available materials (social media, websites, published documents) Partner Data: Non-public documents provided by Partners AI Model: A software system trained on language data to perform tasks such as translation, text generation, or language learning support. Heritage Lab’s AI models are trained specifically on Indigenous language data to understand, translate, and generate content in Indigenous languages.

4. Heritage Lab’s Foundational Commitments

Indigenous communities maintain perpetual ownership and control of their epistemologies, ontologies, knowledge, stories, art, cultural heritage, linguistic materials, and how these become data
Community-contributed data, stories, knowledge, and linguistic materials remain the property of the contributing communities — Heritage Lab retains ownership only of the technology platforms and intellectual property it develops
All technologies developed in collaboration between Heritage Lab and Indigenous partners are subject to community agreements over use and ownership
Local autonomy and sovereignty over AI development and tools must be enhanced, not diminished
Technology development is a process of community governance and an articulation of sovereignty
Data practices must align with established principles of Indigenous data sovereignty outlined in this policy
Heritage Lab tools and community data shall never be used for surveillance, tracking, or monitoring of individuals or communities without consent; discrimination, persecution, or unfairness of any kind; building competing Indigenous language corpora or datasets for commercial purposes; mining Indigenous data for other AI training purposes; or training commercial AI models without explicit community partnership and agreement. Any use inconsistent with Indigenous data sovereignty principles is prohibited.

This policy is informed by and aligned with the following frameworks:

OCAP® Principles (First Nations Information Governance Centre)
CARE Principles for Indigenous Data Governance (Global Indigenous Data Alliance)
Te Mana Raraunga Māori Data Sovereignty Principles
Kaituhi Kaitiakitanga License (Te Reo Irirangi o Te Hiku o Te Ika)
AFNQL Quebec First Nations Information Governance
Paulatuk Statement on project-specific consent
Smith, L. (2024), The Computerized Database of Labrador Inuttut: A Language Revitalization Technology Component, Études/Inuit/Studies, 48(1–2), 183–205

5. Distinctions-Based Data Governance

Heritage Lab treats different kinds of data differently based on their purpose and community agreements.

5.1 Community-Contributed Data

Example: When a community member shares a story, piece of cultural knowledge, or terminology to Ayaguta — the community search tool under the Ai! Project — it is stored as reference material and made retrievable through the search tool. The original content is never modified, generated, or reformatted; it remains as the contributor shared it. How we handle this data:

Closed-loop retrieval system: We use a retrieval system that references trusted sources and community contributions without modifying, generating, or reformatting the original information
Ownership and attribution: Contributors retain full ownership with rights to attribution or anonymity
Retention: The data is stored indefinitely, unless permission is revoked
Right to revoke: Contributors may request removal at any time; Heritage Lab will permanently delete within 30 days
Citation: All sources are always referenced

5.2 AI Language Model Training Data

Purpose: Data used to train translation and language learning AI models Data sources: Public Data:

Definition: Publicly available materials (social media, websites, published documents)
Retention: Data is kept as part of training corpus for model improvement

Partner Data:

Definition: Non-public documents provided by Partners specifically for AI training requires a Memorandum of Understanding (MOU) specifying:
- Exact data to be used
- Retention period for source files
- Model ownership
Retention: Data is kept only for duration specified in MOU
Automatic deletion upon MOU expiry unless renewed
Physical and digital materials transferred or deleted if contract is terminated—Heritage Lab does not retain materials after agreement ends

5.3 AI Model File Governance

Heritage Lab manages the AI Models on behalf of the Partners through:

The Partner’s Language Committee that provides oversight for model updates and improvements
The Partner’s community has the right to request retraining if bias or inaccuracies are detected. This means that if a community identifies errors, inappropriate outputs, or systematic inaccuracies in the AI model’s results, they can request that Heritage Lab retrain or adjust the model to correct these issues.
Feedback mechanisms are built into Heritage Lab services

5.4 User-Generated Content

How we handle user content: Users retain full control over their translations — Heritage Lab will not use any user-generated translations for AI training without explicit consent. All translation tools include a clear disclaimer about data usage and the need for review by a language speaker.

5.5 Compliance

Heritage Lab ensures adherence to its data governance commitments and foundational principles through:

Technical safeguards including API authentication, usage rate limits, and audit logging
Annual partnership reviews to assess compliance with community agreements and data sovereignty commitments
Complaint resolution where concerns are shared with Partner organizations and access to tools can be restricted or suspended as needed

Heritage Lab reserves the right, at its sole discretion, to suspend, terminate, or revoke any User’s access to all or any portion of its services without prior notice or liability. This action may be taken upon Heritage Lab’s determination that a User has engaged in conduct that Heritage Lab reasonably deems inappropriate, abusive, unlawful, or otherwise in violation of Heritage Lab’s policies or applicable law.

6. Storage and Infrastructure

All Heritage Lab data is stored within Indigenous-controlled infrastructure. As of the inception of this policy (February 2026), our data is stored in Kahnawà:ke, Quebec, Canada through a partnership with Mohawk Internet Technologies (MIT). The purpose of storing on Indigenous-controlled infrastructure allows for:

Data sovereignty and geographic control within Indigenous territory
Jurisdictional alignment with Indigenous governance
Physical and digital security under Indigenous oversight

This is a living document that will evolve with community input and technological change. Community feedback is essential and welcome at any time. Please direct feedback to the Data Sovereignty Working Group: privacy@heritagelab.ca

Documentation Index

​Purpose

​1. Our Responsibilities

​2. Scope

​3. Definitions

​4. Heritage Lab’s Foundational Commitments

​5. Distinctions-Based Data Governance

​5.1 Community-Contributed Data

​5.2 AI Language Model Training Data

​5.3 AI Model File Governance

​5.4 User-Generated Content

​5.5 Compliance

​6. Storage and Infrastructure