Encoding issues
Encoding issues in CSV files can cause import failures or data corruption. Common problems include mismatched UTF-8 and Latin-1 encodings, BOM markers, smart quotes, and hidden control characters. This guide helps you identify and fix these issues quickly.
Common causes
- Wrong encoding (UTF-8 vs Latin-1 / Windows-1252)
- BOM (Byte Order Mark) at the start of the file (often shows up as weird characters)
- Smart quotes and other “pretty” punctuation from copy/paste
- Hidden control characters (non-printing characters that break parsers)
- Inconsistent encoding within the same file (mixed sources)
Examples of encoding issues in CSV files
1) UTF-8 vs Latin-1 mismatch (mojibake)
id,name
1,José
2,Ana
If UTF-8 text is read as Latin-1 (or Windows-1252), characters can appear corrupted (e.g., José → José).
2) BOM at the start of the header
id,name
1,Ada
A BOM can get interpreted as visible characters, which can break header matching (e.g., the first column becomes id instead of id).
3) Smart quotes (non-standard punctuation)
id,comment
1,“hello world”
2,”ok”
Some tools export or paste smart quotes. Many parsers only treat the plain quote character (") as valid quoting.
4) Hidden characters in values
id,city
1,New York
2,LA
Non-breaking spaces or other invisible characters can cause validation or matching failures even when the text looks identical.
How to diagnose encoding issues quickly
Encoding problems are hard to spot by eye. CSV Checker detects likely encoding mismatches, highlights suspicious characters, and points you to the exact rows and columns affected—so you can fix the file before importing.