How to Create a Multilingual Glossary (Step-by-Step)

Last updated: February 2, 2026

Translation quality falls apart when teams use different words for the same thing. If you’re localizing product UI, docs, and marketing across languages, a shared terminology foundation is the quickest way to cut errors and speed up releases. This guide shows how to build a multilingual glossary and termbase that works in real life: what to plan, how to model your data (including TBX), how to extract and approve terms, how to integrate with CAT/TMS, and how to keep everything healthy over time.

Glossary vs Termbase: What’s the Difference?

A glossary is the human‑friendly list of approved terms and definitions. A termbase is the structured database behind it that stores concepts, language variants, metadata, and workflow states.

  • Glossary: Readable, searchable view for writers, translators, support. Often a filtered output of “Approved” terms.
  • Termbase: Concept‑centric store with IDs, definitions, preferred/forbidden variants, context, domains, locale attributes, status, and audit trail.

Key principle: model by concept, not by string. One concept (e.g., “Sign in”) can have multiple terms per language and usage notes per platform/audience.

Why a Multilingual Glossary Pays Off

  • Consistency: Fewer contradictory translations and fewer “which word should we use?” threads.
  • Speed: Translators and writers move faster when terminology is clear and in‑tool.
  • Quality: Fewer support tickets caused by confusing labels or mismatched wording.
  • Compliance: Regulated industries need exact, approved phrases.
  • Scalability: New vendors and languages onboard with less coaching.
  • Measurability: You can track issues and prove ROI (fewer terminology QA fails, faster approvals).

Plan Before You Build: Scope, Languages, and Brand Rules

Decide what you will cover first and who owns what. Align this in one page before you open a spreadsheet.

  • People: Who uses the glossary (translators, UX writers, engineers, support)? Who approves?
  • Domains: Start with UI + help center (highest visibility), then marketing/legal.
  • Locales: Pilot with 3–5 key languages; expand after governance holds.
  • Brand names: Set rules per market on translate vs transliterate vs keep original. For a practical framework, see Translate vs Transliterate Brand Names: Best Practices.
Engineering note: Treat the termbase like a product. Version the schema (v1, v2), keep stable concept IDs (e.g., CON‑000142), and log migrations so exports/imports don’t break.

Model Your Termbase: Fields, IDs, and TBX Mapping

Start small but future‑proof. These fields cover 90% of real‑world use cases.

Minimum viable fields

FieldPurposeExample
Concept IDStable identifierCON‑000142
Source TermCanonical term in source languageSign in
DefinitionUnambiguous meaningAuthenticate to access an account
Part of SpeechGrammar and UI guidanceVerb (button label)
DomainModule/subject areaAuthentication
Context SentenceReal usageTap “Sign in” to continue
Usage NoteStyle/case rulesTitle case in buttons
Forbidden TermsWhat not to use (and why)Login (as verb)
Locale FieldsApproved translation + attributesfr‑FR: Se connecter (verb)
StatusLifecycleProposed / Approved / Deprecated
Source/AuthorityWhere definition came fromSpec v3.1
Last UpdatedAuditability2026‑01‑15

Helpful extras

  • Synonyms (admitted/avoid) with priority flags.
  • Morphology (gender, pluralization) where languages require it.
  • Relations (broader/narrower/related concepts).
  • Platform/audience qualifiers (web, iOS, Android; consumer vs admin).
  • Risk tags (legal/medical/payments) to route to SMEs.
Arabic/RTL: Add attributes for digits policy (Arabic‑Indic vs Latin), preferred punctuation (Arabic comma “،”), and bidi isolation needs (e.g., Latin codes inside Arabic). These remove ambiguity in UI.

TBX mapping at a glance

Your fieldTBX element
Concept ID<termEntry id="CON‑000142">
Language section<langSet xml:lang="fr‑FR">
Term (preferred)<tig><term>Se connecter</term></tig>
Definition<descrip type="definition">…</descrip>
Forbidden term<termNote type="administrativeStatus">deprecatedTerm</termNote>
Status<admin type="status">approved</admin>
<termEntry id="CON-000142">
  <descrip type="definition">Authenticate to access an account</descrip>
  <langSet xml:lang="en">
    <tig><term>Sign in</term></tig>
  </langSet>
  <langSet xml:lang="fr-FR">
    <tig>
      <term>Se connecter</term>
      <termNote type="partOfSpeech">verb</termNote>
      <admin type="status">approved</admin>
    </tig>
  </langSet>
</termEntry>

Choose the Right Tools

  • Spreadsheet (MVP): Fast start; add validation and filters. Plan to export CSV/TBX later.
  • Terminology module (in CAT/TMS): In‑editor term suggestions, QA checks, workflows, TBX I/O.
  • Standalone terminology tools: Rich metadata and API integration with product systems.

Select on: TBX support, approval workflows, QA rules (forbidden/casing), API access, and search UX. Pilot in one domain and 3–5 locales before scaling.

Step‑by‑Step Workflow (extraction → publication)

1) Harvest candidate terms

  • Pull from UI strings, docs, release notes, support tickets, analytics queries, sales decks.
  • Run monolingual extractors to spot frequent noun phrases; align bilingual corpora to find stable pairs.
  • Ask PMs/support for “must‑keep” terms and pain points.

2) Curate and normalize (by concept)

  • Group duplicates under one concept; write a single‑sense definition.
  • Set preferred term, list admitted/forbidden variants and the “why”.
  • Capture part of speech, domain, platform/audience qualifiers.
Arabic/RTL: Record gender/plural, digits policy, and a sample RTL sentence to sanity‑check punctuation and bidi behavior.

3) Approve with a lightweight workflow

  • Roles: Terminologist (curates), SME (meaning), Language leads (locale approvals).
  • States: Proposed → In Review → Approved → Deprecated (with reasons).
  • Keep an audit trail (who/when/why) for regulated domains.

4) Localize per language

  • Provide context sentences and screenshots.
  • Capture locale attributes (gender, formality, platform quirks).
  • Check against trusted sources (e.g., IATE) when helpful.

5) Enrich metadata

  • Link related concepts (“Sign in” ↔ “Sign out” ↔ “Register”).
  • Add pronunciation/transliteration where non‑Latin scripts help support teams.
  • Flag regulated terms for mandatory SME review.

6) QA the terminology

  • Run automatic checks for forbidden terms, casing, duplicates, and locale completeness.
  • Preview in context (staging UI/docs) to catch truncation and RTL/LTR issues.

7) Publish and integrate

  • Expose a read‑only glossary (searchable, filterable) for non‑linguists.
  • Enable in‑tool suggestions and QA in CAT/TMS; enforce forbidden terms.
  • Sync via TBX or API to keep systems aligned.

8) Train the team

  • Share a one‑pager: “How to use the glossary” with examples.
  • Offer a simple form to propose terms or flag issues.

9) Maintain and measure

  • Quarterly review high‑impact domains; deprecate stale entries.
  • Track KPIs: terminology QA fails, lookup rate, time‑to‑approve, and support tickets mentioning terms.

Example Entries and Templates

Concept: Sign in (Authentication)

  • ID: CON‑000142
  • Definition: Action that authenticates a user to access an account.
  • Source: Sign in (preferred); Login (noun only). Avoid “Logon”.
  • Context: Button label on login screen.
  • Locales:
    • es‑ES: Iniciar sesión (verb)
    • fr‑FR: Se connecter
    • de‑DE: Anmelden
    • ar‑SA: تسجيل الدخول (RTL)
    • zh‑CN: 登录
  • Status: Approved

Concept: Two‑factor authentication (2FA)

  • Definition: Security process requiring two independent verification factors.
  • Note: Use “2FA” in UI where space is tight; expand on first mention in help.
  • Locales: es‑ES: Autenticación de dos factores; fr‑FR: Authentification à deux facteurs; ja‑JP: 2要素認証
  • Status: Approved

Concept: Free trial (Subscription)

  • Definition: Time‑limited access at no charge; charges start unless cancelled.
  • Locales: ar‑SA: تجربة مجانية (digits policy: Arabic‑Indic on UI), fr‑FR: Essai gratuit, pt‑BR: Avaliação gratuita
  • Usage: Add legal disclaimer link in UI when required.

Automation and AI‑Assisted Term Extraction

  • Monolingual extraction: Identify frequent domain phrases; filter out stopwords and boilerplate.
  • Bilingual alignment: Align legacy translations to surface stable term pairs and inconsistencies.
  • LLM support: Draft definitions or disambiguation notes; keep human approvers in the loop.
  • Linting: Add CI checks in source repos to block forbidden terms before localization starts.
Measure: Track precision/recall of extractors. Use high precision for UI (few false positives), wider recall for docs (curate later).

Governance, Workflows, and QA

  • RACI: Who proposes, reviews, approves, audits.
  • SLA: e.g., 5 business days for high‑impact terms; 10 for low‑impact.
  • Lifecycle: Proposal → Review → Approval → Publication → Periodic review → Deprecation.
  • QA gates: Conflicts, casing, duplicates, forbidden hits, locale coverage.
  • KPI dashboard: Approval backlog, time‑to‑approve, QA fail trend, regulated term queue.

Troubleshooting: Symptoms and Fixes

SymptomLikely causeFix
Translators keep using different wordsNo concept‑level modeling; glossary hard to findGroup by concept; expose a read‑only glossary; enable in‑tool suggestions
UI truncates or looks wrong in RTLNo context check; digits/punctuation not specifiedAdd context screenshots; add digits and punctuation attributes; preview in staging
Legal blocks last‑minuteRegulated terms not flagged earlyTag “payments/privacy/medical” terms; auto‑route to SMEs; set SLA
TBX import loses statusesCustom fields not mapped; spec mismatchDocument extensions; validate TBX; run round‑trip tests before switching
Search shows wrong script/variantNo hreflang or schema; mixed usage onlineAdd alternateName in Organization schema; use hreflang; publish a clarification page

Common Pitfalls (and how to avoid them)

  • Modeling strings, not concepts: consolidate meaning first, then terms.
  • Over‑translating brand names: document per‑market rules and stick to them.
  • Ignoring morphology: capture gender/plural rules where needed.
  • Siloed systems: use TBX and APIs so tools can talk to each other.
  • No owner: assign a terminologist and language leads; publish SLAs.
  • No context: add examples/screenshots for critical UI terms.
  • Skipping i18n basics: ensure Unicode, RTL support, and proper segmentation in product.

Export/Import with TBX

TBX (TermBase eXchange) is the open standard for moving terminology between tools. Use it for vendor sharing and platform migrations.

  • Map concept IDs to <termEntry>, languages to <langSet>, and terms to <tig>/<term> with admin/descriptive data.
  • Keep any custom fields in a documented extension; confirm round‑trip symmetry.
  • Validate against the spec before large imports to avoid silent data loss.
Data integrity: Freeze edits → export TBX → import to target → export again → diff. Pay attention to status, forbidden terms, and locale attributes.

Integrate with CAT/TMS and Product Content

  • CAT/TMS: Real‑time term lookups; automatic warnings for forbidden terms; enforce casing.
  • Docs/CMS: Glossary widget in help center and knowledge base for consistent linking.
  • Design systems: Sync preferred UI terms to component libraries (e.g., via API) so product copy matches the glossary.
  • Dev portals: Expose approved API terminology and parameter names for consistency.

SEO and UX Benefits

  • Use glossary terms in titles, headings, and snippets where natural to improve discoverability.
  • Align paid/organic keywords to approved terms per locale to reduce mixed nomenclature.
  • Create FAQ or landing pages for high‑value terms; link them across docs and product guides.

Build Checklist

  • Define scope, locales, roles, and brand rules.
  • Draft data model and TBX mapping; set stable IDs.
  • Extract candidate terms (UI/docs/tickets); gather SME input.
  • Normalize by concept; write definitions and usage notes.
  • Localize with attributes; set status; add context and relations.
  • QA: conflicts, forbidden terms, casing, locale completeness; preview in context.
  • Publish a read‑only glossary; integrate with CAT/TMS and CMS; set CI linting.
  • Train teams; track KPIs; review quarterly; deprecate stale entries.

FAQ

What’s the fastest way to start?

Use a spreadsheet with the minimum fields and data validation. Pilot on one product area and 3–5 locales. When stable, export to TBX and move to a terminology tool.

Glossary vs termbase—do I need both?

Yes. The termbase is your source of truth; the glossary is its readable, filtered output. Many tools generate the glossary automatically from the termbase.

Who approves terms?

A terminologist or language lead, with SMEs for domain accuracy. Publish SLAs and keep an audit trail.

How often should we review?

Quarterly for high‑impact domains (auth, payments); twice a year for stable areas. Also review after major feature or branding changes.

How is a termbase different from translation memory (TM)?

TM stores previous sentence‑level translations. A termbase stores concept‑level terms and rules. They complement each other: termbase guides wording; TM speeds repeated segments.

References and Useful Resources

Conclusion and Takeaways

  • Model terminology by concept, not string. Assign clear owners and SLAs.
  • Start small (spreadsheet + validation), then standardize with TBX and integrate with CAT/TMS.
  • Document Arabic/RTL specifics (digits, punctuation, bidi) where relevant to avoid UI rework.
  • Automate checks (forbidden terms, casing) and preview in context before release.
  • Measure impact (QA fails, time‑to‑approve, support issues) and keep the glossary visible and easy to use.

With a disciplined termbase and a simple workflow, you’ll ship multilingual content faster, reduce rework, and give every team—from engineering to support—the same reliable words to work with.

Share this article

Leave a Comment