samedi 31 janvier 2015

Is my boss's preference for Latin-1 over UTF-8 when it comes to database configuration justified?


We are using MySQL at the company I work for, and we build both client-facing and internal applications using Ruby on Rails.


When I started working here, I ran into a problem what I had never encountered before; the database on the production server is set to Latin-1, meaning that the MySQL gem throws an exception whenever there is user input where the user copies & pastes UTF-8 characters.


My boss calls these "bad characters" since most of them are non-printable characters, and says that we need to strip them out. I've found a few ways to do this, but eventually we've ended up in a circumstance where a UTF-8 character was needed. Plus it's a bit of a hassle, especially since it seems like the only solution I ever read about for this issue is to just set the database to UTF-8 (makes sense to me).


The only argument that I've heard for sticking with Latin-1 is that allowing non-printable UTF-8 characters can mess up text/full-text searches in MySQL. Is this really true?


Aside from that point, I see no reason why we shouldn't switch to UTF-8. It's my understanding that it is superior and becoming more ubiquitous.


Which of us is right?





Aucun commentaire:

Enregistrer un commentaire