Introduction
Character encoding is one of the most overlooked but critical aspects of subtitle creation. Incorrect encoding can result in garbled text, missing characters, or completely unreadable subtitles. This guide explains everything you need to know about subtitle encoding.
What is Character Encoding?
Character encoding defines how text is represented as bytes in a file. Different encoding standards support different character sets:
•ASCII: 128 characters, English only
•Latin-1 (ISO-8859-1): Western European languages
•UTF-8: All languages, most common standard
•UTF-16: All languages, used by some Windows applications
•Windows-1256: Arabic script
Why UTF-8 is the Standard
UTF-8 has become the universal encoding standard for good reasons:
•Supports every language and script system
•Backward compatible with ASCII
•Efficient storage (1-4 bytes per character)
•Supported by all modern platforms and players
•Required by YouTube, Netflix, and most streaming services
Common Encoding Problems
1. Mojibake (Garbled Text)
When text encoded in one format is read as another:
"Straße" → "Straße" (UTF-8 read as Latin-1)
"日本語" → "日本語" (UTF-8 read as Windows-1252)
2. Missing Characters
When the encoding doesn't support certain characters:
•Arabic characters in Latin-1 encoding
•Chinese characters in Windows-1252
•Emoji in older subtitle formats
3. BOM Issues
Byte Order Mark at the start of UTF-8 files can cause:
•Extra invisible characters at the start of subtitles
•Sync offset in some players
How to Detect and Fix Encoding Issues
Detection
1. Open the subtitle file in a text editor
2. Look for garbled text, question marks, or empty boxes
3. Use our Metadata Extractor tool to check file encoding
4. Compare character count vs file size
Fixing
1. Identify the current encoding
2. Choose the correct target encoding (UTF-8 recommended)
3. Use our Encoding Converter tool
4. Verify all special characters display correctly
5. Test on multiple players
Best Practices
•Always save subtitles in UTF-8 encoding (without BOM)
•Use Unicode for any multilingual content
•Test subtitles on at least 3 different players
•Check special characters (é, ü, ñ, ç, Arabic, CJK)
•Avoid BOM unless specifically required by your target platform
Platform Encoding Requirements
| Platform | Required Encoding | Notes |
|----------|------------------|-------|
| YouTube | UTF-8 | Required for all languages |
| Netflix | UTF-8 | Specified in delivery specs |
| Vimeo | UTF-8 | Recommended |
| VLC | UTF-8, UTF-16 | Auto-detects encoding |
| Windows Media Player | UTF-8, ANSI | Best with UTF-8 |