Alright Encoding, Let’s Do It

My former colleagues ran into an issue recently with a Rails 3.1 application when they upgraded to the latest versions of several gems where text stored in a serialized field suddenly started showing the bytecodes for accented quotes e.g. I don’t suddenly turned into I donâu0080u0099t

Let’s pause for station identification (here, watch this duel for some cinematic flavor) and write up a few terms for google to find to save others this headache: Problem. Encoding. YAML. Serialized. Rails. Delayed Job. Upgrade. Syck. Pysch. Characters look funny. Display Issues. Latin1 is the root of all evil. UNHOLY TEXT CRAPTASM.

They resolved it with some phpmyadmin text field editing. But I thought I had beat down this encoding mess once and for all with a great big utf-8 mysql push years ago, and heading into the promised land that was Ruby 1.9 with regard to string handling. So I wanted to know the root cause.

What went wrong?

I had this yaml file of stock questions that I used to seed the database. Unfortunately I paid no attention to what was actually being stored in the database.

Let’s observe – I can’t paste the “right single quotation mark” (which the Mac OSX character viewer gleefully reports as Unicode: U+2019, UTF-8: E2 80 99) into IRB, but I can cheat:

% echo "I’m your huckleberry." > test.yml% rails consoleLoading development environment (Rails 3.1.3)>> string = YAML.load(File.open('test.yml'))=> "I’m your huckleberry."

And when we convert that to yaml as Rails does when serializing it (by default):

>> string.to_yaml=> "--- "I\xE2\x80\x99m your huckleberry."n"

Doh! But then de-yamling it seems okay:

>> newstring = YAML.load(string.to_yaml)=> "I’m your huckleberry."

Which is why I never noticed. I mean, how often do you look at a man’s shoes? er. I mean, in the database. Sorry, mixing the movie metaphors.

But that was until after the gem upgrade – which we’ll simulate here with a hint of foreshadowing:

>> yamlstring = string.to_yaml=> "--- "I\xE2\x80\x99m your huckleberry."n">> YAML::ENGINE.yamler = 'psych'=> "psych">> newstring = YAML.load(yamlstring)=> "Iâu0080u0099m your huckleberry."

Doh! And all I wanted was an normal encoding-free life.

So after observing the problem in its native form, I turn to google – which turns up this stackoverflow post – and yep:

% rails consoleLoading development environment (Rails 3.1.3)>> YAML::ENGINE.yamler => "syck"

We have the culprit! But not where it’s coming from.

At first, I blame rails, because that’s usually the easiest thing to do right? Surely they changed something between 3.1 and 3.2? But searching the source code, and grepping the log indicates that rails got some pysch tenderlove a long time ago.

% git log | grep 'psych'c29eef7 [1 year, 2 months ago] (Aaron Patterson) load psych by default if possible59f3218 [1 year, 2 months ago] (Aaron Patterson) load and prefer psych as the YAML parser when it is available

So them I do a grep on the gems:

% grep -ir 'syck' .[...]./delayed_job-2.1.4/lib/delayed/yaml_ext.rb:YAML::ENGINE.yamler = "syck" if defined?(YAML::ENGINE)

And there we have it and here’s why (Note Aaron Patterson’s prophetic warning) – Delayed Job 3 doesn’t force ‘syck’ anymore, so it fell back to ‘psych’.

% rails console                                              Loading development environment (Rails 3.2.2)>> YAML::ENGINE.yamler => "psych">> string = YAML.load(File.open('test.yml'))=> "I’m your huckleberry.">> string.to_yaml=> "--- I’m your huckleberry.n...n"

There’s still the issue of cleaning up the old data, and while it’s a little late for my colleagues, an easy fix (though you may want to turn off timestamping) for our serialized fields (at least for the stock questions) could have been:

>> YAML::ENGINE.yamler = 'syck'>> all_responses = {}>> StockQuestion.all.map{|sq| all_responses[sq.id] = sq.responses}>> YAML::ENGINE.yamler = 'psych'>> StockQuestion.all.each do |sq|>> sq.responses = all_responses[sq.id]>> sq.save!>> end

I’m sure I’ll meet up with encoding again. Then we’ll have us another reckoning.

p.s. syck must have the most unique dual license ever

Perspective

Sunrise over Savannah

Four weeks ago, I wrote about starting a new job for the first time in 15 years.

Last friday, I resigned.

The company was Rails Machine, a web operations company based in Savannah with an orientation toward managing Ruby on Rails applications. I have a tremendous amount of respect and appreciation for what the people there do, how they treat their customers, and the level of talent and problem-solving skill in the team. They live and breathe web operations.

For me, at the end of the day, it just wasn’t the right fit or the right place. And I don’t feel I was the right person for Rails Machine. I don’t know how fair it is to say that after just four weeks. I’m not sure any new, challenging role is the right fit after four weeks. But sometimes you just know about the place. Staying wasn’t going to be right for me, or my family or the people at RailsMachine.

I am looking forward to some upcoming opportunities, but for now, I’m taking the break that I should have taken prior to starting four weeks ago. I’m learning some new things, paying down some technical debt, working on the honeydo list, taking walks with the dogs, and gaining some perspective.