Scenes from the Insane Asylum

Yet more fun from the ongoing series “This are the days of our lives of a system administrator” – today’s email (edited slightly for the blog):


Hi all,

Just to provide you far more information than you ever wanted to know about some of the login problems for a few of our services and the source of the @#$%@#%&%^ “500 Internal Server Errors” for our sourcecode repository.

Most of the staff can happily ignore this – and normally I wouldn’t even send this to the whole staff – there’s no impact really outside of just our few developers and possibly a few instant messaging users- but it might be vaguely educational or entertaining.

As you may or may not know, for reasons that seemed like grand ideas at the time (early last year) – our account management application writes usernames/encrypted passwords to two different db tables at the time accounts are created/enabled/passwords changed, etc. – One of these tables is a backend for openldap. openldap connects through a openldap-sql provider which connects through odbc to mysql.

Our sourcecode repository is using subversion which uses Apache for its various operations. In turn, we use the ldap authentication module for apache to authenticate against ldap.

If you are playing along with the whole version of the fairy tale/nursery rhyme home game here – that’s:

sourcecode reppository uses subversion which uses apache which uses mod_auth_ldap to connect to openldap which uses openldap-servers-sql which uses odbc which uses the mysql-odbc-connector to connect to mysql which uses a table managed by the account management application.

At this point, please feel free to conjure images of mice, clocks, and houses that Jack built.

All this does, by the way, is to provide for authentication for subversion, openfire (the IM server), and a few one-off applications – I don’t know if those one-off applications are at all used or live anymore – or if there’s an expectation they’ll be used again.

Any way – there seems to be a timeout or failed connections or mice or something somewhere between odbc and mysql.

What this is, I don’t know for sure. Google searches were singularly unhelpful. Most sane people it seems don’t use openldap-servers-sql

Well, according to the release notes for the mysql-odbc-connector there was a change in the mysql libs at some point that made it so it didn’t reconnect on dropped connections – and we may have crossed that maginot, er, mysqlodbc line with the move to Red Hat ELv5 – so there’s an option flag to mysql-odbc-connector to have it reconnect. mysql option flags are bit fields – and so when you enter the integer representation into the odbc configuration file – the magic auto-reconnect line is:

option = 4194304

Which if it works, shall hereafter be a special number, worthy of rewriting the lyrics to “Jenny” (four, six, nine, four three OH four-our-our-our) to sing it’s praises.

That’s the update on possible login issues and a semi-rare glimpse into various inner workings of services that we all would rather happily ignore – myself especially 🙂

p.s. In case you’ve ever wondered why it seems that I’m notoriously hesitant about the number of applications we run, or services we provide, it’s this “number of of moving parts” issue. Every application carries a dependency chain or matrix. Some of them are just normal OS/network dependencies that we all know and love. But many are like this one – chains of multiple independent software packages. Obviously, the more and more moving parts you introduce into a situation – either to patch/mitigate problems – or just “because you can” – the worse and worse it gets for trying to narrow down where problems can be. What just might surprise you is how many moving parts that “good ideas” and seemingly useful services can carry with it.

After you get a fair number of these individual dependency chains well, there are lots of great books written about Engineering disasters in history that are particularly applicable reading

p.p.s We have a lot more of these behind the scenes than anyone might imagine.

Peeling the Onion

And no, I don’t mean The Onion – which would have been far more entertaining.

Through Joi Ito’s blog I have recently become aware of the phrase “Yak Shaving.” Joi wrote about it in 2005, here’s WikiPedia’s take – and here’s an etymology from Alexandra Samuel.

When I first read Joi’s blog. I took Yak Shaving to mean a pointless activity (Joi writes in a bit more layered fashion than most folks). It’s part that of course (read Alexandra’s post). But it’s more about good problem solving. Especially when you go and read the entry in the Jargon File

Any seemingly pointless activity which is actually necessary to solve a problem which solves a problem which, several levels of recursion later, solves the real problem you’re working on.

One of the things that I’ve spent my entire career doing is looking for the root causes of problems. Yes, just like every other system administrator/developer, there are times that I’ll defer the problem to another day (to this day, I’m still avoiding a mime types/magic file problem with Red Hat and PHP and MediaWiki that I’ve spent too much time on already). But I recognized a long time ago that digging into something, rather than stopping when the problem was mitigated was going to be much better for everyone. I spent a lot of long nights doing this early on, and still do occasionally – and I’m thankful for some early examples from mentors that encouraged this. It’s made me a much, much better troubleshooter over the years for doing so.

The latest peeling the onion activity came last thursday. I arrived at work, with every intent of taking the example OpenID rails server in the ruby-openid distribution and beginning to pull it in our internal authentication tool. Doing that is very much a “Yak Shaving” activity. There’s some other more pressing problems, but doing OpenID in the internal application solves part of 2 or 3 other problems.

Well that fell by the wayside by mid-morning. We have a preview version of our public website. Most days I’m actually surprised that it works given our Rube-Goldberg esque technical framework for taking mediawiki pages and getting them to the public web site. But it’s been a real benefit internally to have the preview site. Making it happen also made the public application more flexible too.

Well, mid-morning thursday, there was a report that content wasn’t updating on the preview site. At first it was thought this might be a by-product of the previous day’s activity – pulling out a rewrite rule that was eating encoded “?” characters (i.e. %3F) in MediaWiki page titles and causing 404’s by the time those URL links made it into a Flash application and a Rails application. In the process of fixing that, we actually fixed another problem where the source URL for each page in our atom update feed was wrong.

Making that URL correct was what broke things the next day. It turns out that Problem #1 was that the process that pulled in the atom feed keyed in on the URL as the unique identifier for the content (as a fake unique identifier actually, it wasn’t enforced by MySQL). Since the URLs changed when they were fixed – voila! duplicate content – and of course the Find was doing a Find first – and pulling the original, un-updated article.

There was a whole lot of scratching our heads (okay, there was a lot of cursing) about that unique key. The URLs in question are “internal” and pretty much guaranteed to change. Keying off that certainly wasn’t great design. I guess it goes back to that original issue – it solved the problem for a day, but no one gave any future thought as to what it would impact.

So we needed to key off the page title. Well, the page titles weren’t unique either. Which was also a head scratching/cursing problem. MediaWiki page titles are unique in a namespace, and our find functions assume they’ll be unique as imported, but that uniqueness was not enforced.

Well, MySQL folks can guess what happened next. We’ve never actually ever dealt with the collation issues with our MySQL server (there’s a lot we haven’t dealt with with our MySQL server – but that’s another post for another day).

For good or for bad, I really didn’t understand why our collation was set to “latin1_swedish_ci” – and thought that I had made a major mistake setting up the server defaults in the first place, that no dev ever caught when thinking about their schemas. I was pretty relieved to find out that that was just the default for MySQL in the first place.

James’ absolutely groaner of a quote?

Well at least we didn’t bork it up

Well, MediaWiki titles are case sensitive, and it made sense for that column to be case sensitive too – so in went the migration. This actually gave the additional benefit that searches for articles titles would actually be accurate now (even though we have some content that differs only in case that needs to be fixed).

execute "DROP INDEX title_idx ON wiki_articles" execute "alter table wiki_articles change title title text character set latin1 collate latin1_general_cs null default null" execute "alter table wiki_articles add unique title_idx (title(255))"

(p.s. select somecolumn,count(*) as n from table group by somecolumn having n > 1 is a wonderful tool to stick in the tool belt)

After all this is done, we had to import the content again. It’s about 25MB of an atom file – 5,000+ pages of content dating back to last september. Our standard processes of trying to pull this data in with a HTTP GET takes too long to run with the HTTP timeouts in the libraries we use – so a long time ago I modified our code to read the data from a file when needed.

Well, when a contractor modified the code to stop using the FeedTools library and just do some simplified parsing for our internal purposes they took out the “read from file” functionality and didn’t replace it. Which generated some more head scratching and cursing. So that had to go back in to get all the content back into and corrected.

A simple problem of content not being updated highlighted 4 different problems: wrong key for the content, no unique enforcement for any keys, wrong collation on the column, and data import from files missing.

We could have stopped early on by just killing the duplicated content with the wrong URL, updating it, and reimporting the latest changes. But we didn’t. The application we fixed didn’t matter for public use – but our fixes prevented some future problems.

I guess we shaved a few yaks that day. And proved yet again how important it is to get to the root of problems. And how painful it is later when you have to go back in behind yourself and others because it wasn’t done originally.

A Day In the Life

Most days when I get to the end of the day, I can’t remember half of what I did that day. When people ask me what I did – I can’t really tell them. This, of course, is horribly demoralizing because I begin to doubt whether I actually did anything, or maybe I just zoned out doing my TPS reports.

This isn’t some “woe is me, I’m soooo busy” statement. Or some arrogant “I’m so important, I do all kinds of things” statement. I’m not relatively busier than anyone else in similar positions (or in dissimilar positions in our specific team) – and my importance is like the Dilbert carton that showed up a few months ago where Dilbert had a choice between doing an important anonymous activity – or doing something useless that looked like an accomplishment and then attended meetings until he couldn’t appreciate the difference.

Nope, not remembering is more a natural function of most system administration positions – there’s a number of clearly distinct, interrupt-oriented tasks that don’t lend themselves to spending any quality, concentrated study time (which most technical tasks really need) – and add to that the information overload of aggregated feeds, twitters, IM’s that I let myself indulge in and whammo! there goes the memory.

I can’t figure out whether the current position is worse for the amnesia or my last position (more management oriented) was. The current one has more distinctly separate technology pieces, but the last one involved way too many meetings and a lot of EDS-esque cat herding. I think the current one is, maybe because I’ve gotten older, or maybe because our project is trying to tackle too many completely different scopes at the same time.

Anyway, I actually remembered what I did yesterday. (well, almost, I remembered the two or three big things – and went back through my email and IM and console logs for the others) And the more I thought about it – the more that yesterday was a completely typical day in my Life as a Grant-Sponsored University Systems Manager. So I present to you – a day in the life (maybe will start a meme. I doubt it though).

Arrive at work – 7:51 am

Run mail.app. Run Adium.

I review my overnight email, trash the spam, make sure the expected email that comes from some of the overnight processes that indicate they completed is there. Check all the shared folders and make sure all those mails came in from the overnight processes. Some days this is a “gut feel” check – I make sure what’s supposed to be there is there, and is about the right size. Some mornings I read each of the emails. It depends on what’s going on. I usually catch up with everything by the end of the week when I don’t read all of them every day. Yesterday I read about half of them because I started IM’ing with James about some changes I made to our pubsite application the day before to fix issues we were having with underscores, plusses, and %20 characters all being treated as spaces and the side effects of that.

We also talked about one of us trying to explore the mediawiki python bot to see if it could do anything to help our colleagues do any mass changes to categories in our mediawiki acting as a CMS for our public website.

Sadly, I volunteer to do this 🙂

8:30am

I download the python wikipedia bot. I respond to some comments on one of my flickr photos, go through anything that looks incredibly interesting in Google Reader and Del.icio.us. Decide to go ahead and update the three wordpress installs to WordPress 2.1.3 and put it in the blog

8:55

Start IM’ing with Daniel – who works with me part-time. He’s trying out the custom Locomotive Bundle I put together to our own gem server. And I remember that I need to open the conf rules on that to allow folks at home to point to our gem repo instead of limiting that to campus. So I make the change, check it into svn, update the server’s conf, and restart Apache

9:15

Got an IM that a mistake was made in deploying a bug fix to our public site application, and some other code that hadn’t cleared through all our internal discussions got deployed – along with db migrations that make it impossible to revert the change. Whoops. Big Whoops. Huge Whoops. And the content needs to be refreshed for the site because some bug fixes were also in that code that make it necessary to reimport content to make sure the timestamps are right. I put the site in maintenancemode and got to that.

9:30

Help another staff member debug a problem with Google Reader and the feeds for some of our applications.

Chatted with Ben about some user interface things that could help ease the transition of the code change we just made and how putting in certain changes just might exacerbate trying to describe to folks how it really works.

9:45am

Return to the python mediawiki bot. Get the CVS directories yanked out of it, turn the .cvsignore’s into svn:ignore – and check it into our deployment repository. Create the account for the bot. Start reading all the instructions for configuring the thing.

Start cursing.

Problem #1: the bot doesn’t understand our redirection on login to https:// I don’t know python. But I know enough ruby and perl and php to hack – so I hack. I figure out where in the world in the login script for the bot to change it to understand SSL. But I don’t have much hope that it’s going to work. I change httplib.HTTPConnection to httplib.HTTPSConnection and pray that it works – and wow, wonders upon wonders that it does. I begin to praise Guido van Rossum and temporarily overlook that the language requires proper indentation.

Problem #2: the bot really hates that we eschew wiki style and allow category (and page) names that start with lower case letters. I begin cursing again, because this one is really buried, and my debug-by-print attempts are throwing syntax errors because of the indentation issues.

James tries to help, because he actually likes python. I curse James for liking python.

I fix that problem, and feel proud and smug because I actually made it a configuration option. I give up while I’m ahead and go have lunch with the wife.

11:30am

Lunch with the wife. Best part of the whole day.

1:10pm

Get back to the python mediawiki bot. It doesn’t appear to work at all for editing the content in the test wiki. Waste the next hour and half of my life running python interactively, importing the bot libraries, creating my own site and page objects, and trying to figure out what’s wrong – and why then page content was blank, and traipsing through the code to figure out how the thing parses the edit page to get at the content (it parses for the textareas)

And then come to find out, an add-on to mediawiki written by our colleagues to try to make it a little easier to pick images and templates for an article is full of a bunch of hidden javascript-presented blocks with – you guessed it – a bunch of textareas. Turn off the plugin for a bit, and voila! the bot works.

2:45pm

Start talking with James about whether or not this is ever going to work at all. (the bot, not the project) We don’t really resolve out whether or not the bot is going to be useful. But we now have enough information to run with if it does come up again

3:00pm

Go traipsing through the shared mailboxes – including the inbound and outbound mail out of our support system. See that a colleague has entered a support call about how article summaries aren’t showing any markup. Go looking through our code to figure out how the summaries are created and how the tags are stripped to make sure that the summary truncation doesn’t break things due to dangling tags. IM my colleague and explain what was going on with the summary thing and ask her thoughts on trying to actually do anything with it. (begin to contemplate how in the world to even do that, probably white listing em and strong tags and making sure they are closed. groan thinking about the regexes)

3:20pm

Read through the staff list mails about some of the issues that resulted from the morning’s premature deployment. Write up some explanations about how the mediawiki include functionality works (the mediawiki includes are used in some of the content preparation to reproduce content nav blocks in pages that eventually display on our public site). Try to clarify some additional confusion that results from how we look for specific category tags to display specific content pages (like in a sidebar – it’s very blog like)

3:45pm

Talk with James for a bit about ongoing issues and group priorities

4:00pm

Talk with the wife for a bit in IM. Catch up on the day’s Google Reader.

4:15pm

Start figuring out again the find command to let me grep a string in all the files created in the last day. Settle on find . -ctime -1 -print -exec grep -i jason {} ;

The goal, because I have 75,000+ spam emails sent to our server in the last three months and because it was choking up Mail.app so bad to occasionally browse through that looking for any false positives – that I pulled the account out of Mail.app and went to the server side to poke through the spam folders looking for that.

4:45pm

Run that in the wrong directory. Whoops!

5:10pm

Start reviewing the meeting items posted for the all-staff meeting the next day. Send Kevin some questions on e-commerce and what was discussed at the meeting retreat the previous week.

5:30pm

Leave for the day. Head home trying to figure out someway, somehow to ask questions in our staff meeting about our focus and direction.


Pretty typical actually. (usually less time spent on one problem like the python bot problem) Some days are a little longer. Some days have more Google Reader 😉 Some days are more systems, less dev, some days more dev, less systems.

And that folks, was the way it was, April 4 2007 🙂 A day in the life of this systems manager.

Don’t Do That

So… maybe you are coding up your totally way rad awesome application in Rails – and you are thinking to yourself.

“Self, I really would like to set my own created_at and updated_at timestamps. Look – there’s even a way to do that in the Rails documentation

class Feed < ActiveRecord::Base self.record_timestamps = false # ... end

At this point you need to back away from the keyboard. Quickly. If you don’t, pretty soon, somewhere in your application – you are going to run into this error:

Mysql::Error: Column 'created_at' cannot be null

Or ALL KINDS OF OTHER FUN SIDE EFFECTS (FUN is actually a euphemism here for various four letter words)

See, record_timestamps is a class variable for ActiveRecord::Base created with the rails :cattr_accessor – maybe the self.record_timestamps should have tipped us all off – maybe not (there’s also a class_inheritable_accessor – I’m not sure where all that gets used though)

Even experienced developers not all that fluent in ruby minutiae (I think class variables count as minutiae) cut-and-paste first and figure out how it works second (yeah, don’t do that either).

So – anyway – once you change record_timestamps once – you change it for all descendants of ActiveRecord::Base

There’s a bit of discussion this on a separate, but related problem at Evan Weaver’s blog (pay special attention to that threading issue for those playing along with the home game). And of course, your friendly neighborhood reminder of what happens with class variables at Nic Williams’ blog (I recommend reading that twice and breaking out the home game version of irb)

So the moral of the story? self.record_timestamps – Don’t Do That.

p.s. Production of this blog entry was made possible through various grants and assessments, and with some moans, groans, sighs, and “what tha–?” from my colleagues James Robinson and Aaron Hundley (doesn’t have a blog, he needs to get with the program) 🙂

p.p.s edited to change “you chance it for all descendants…” to “you change it for all descendants” – I think the first one is quite apropos however

Computing Expertise

In higher education, and I imagine within IT support in most small organizations, where the “IT Gal” or “IT Guy” is called upon to do everything from run the servers to manage the routers to “doing the web page” to answering “how exactly do I do that in Word again?” they’ll find that people that aren’t the “Eye-Tee” person attribute computing expertise to one’s proficiency (or even beginning-icy) in the company’s/organization’s software packages.

Let me make something absolutely, positively clear. Knowing how to create a table of contents in Microsoft Word is not a function of computing expertise. Knowing how to do anything in any given consumer software package is not an activity reserved for “eye-tee”

Knowing how to use a hammer and a drill does not make me an architect, or a builder

This myth, that somehow software expertise is an Information Technology function, is one of the worst myths that ever pervaded the currently-heavily-dependent-on-software society. It means that ordinary, hard-working people that in every other area of their life would roll up their sleeves, break out the instruction book and learn whatever task (and tool) they have in front of them, give up, and ascribe some mystical, magical wall that even being a beginner in a piece of software is a scarce, specialized skill.

Yes, being an expert in a given software package is a scarce skill. But the best qualified to learn that are those that the software was written for. Just like a Chemistry PhD is going to know far more about the intimate details of their field, but everyone has the ability to understand that Dihydrogen Monoxide is safe in low doses.

Quite honestly – those of use in “eye-tee” are actually the absolute worst people to rely on for specific expertise in software – be that word, or photoshop, or Google Reader, or whatever the software application. It’s like the faculty member in Math answering Physics questions. Of course they get the math, but it’s not like they have the same expertise.

And for those of you that are proficient in a given software package and have proficiency in several or beginner knowledge in several? You aren’t computing experts. So give up thinking that – and stop trying to tell people otherwise. Because you are just as bad as the people that aren’t trying. (there are no computing experts, btw, the more you know the more you realize that there’s way more that you don’t know).

I have specialization in certain areas of computing technology. I know the fundamentals. I know how programs are constructed. I know how the operating systems work – at least a certain level of their architecture. I can make use of software written for folks like me to deliver services – but like I told a colleague today: I use software like everyone else does – one menu option or button at a time. The only reason I know how to use Photoshop, or use Firefox, or use anything, is the fact that I clicked on it and I started exploring menus, and trying things. I don’t use them 1/10th of the power they hold, but my beginner level usage has nothing to do with the fact that I have computing specialization.

It just means I tried.

p.s. it’s never really about a single post

so I went into snarky overdrive with my vendor dependencies post. it wasn’t quite as funny as last year when I went off on the people that can’t unsubscribe from lists either. Nor was it as funny as some other commentary on Rails I’ve snarkily made

I really do love the err.the.blog guys – – they have the best rails blog – bar none – that I’ve ever found – I even have their toolbox post burned into my retina I think.

They really know their stuff. Rails needs devs like this – and they do a great service educating other folks about the framework.

But I don’t agree (obviously) with packaging up all the dependencies with an application. I get all the reasons for doing so. It just sets a really dangerous precedent for the people that are going to take it as the gospel and never think about the ramifications of what packaging up everything with your application means. (like simple things – remind me to tell the inode story sometime. 20 capistrano delivered copies of edge rails might not kill your storage but it dang sure can eat some inodes)

But hell, you can’t really trust the system administrators to get it right either about not breaking dozens of rails apps that they don’t have a clue about. I, um, er, have known some sysadmins to do that (more than once even).

p.p.s Oh, man, I forgot about the sponsorship link. This post sponsored by the Static Linking Historical Society. And support also comes from Microsoft Corporation. Proud facilitators of DLL Hell for all the static linkers that decided to go dynamic, but distributed their own libraries.

Good Grief People, stop with the local gems

From Err The Blog: Vendor Everything

For hosted environments? sure.

But if you are responsible for the application AND the server? (or your shop is?)

No.

Not just no. But HELL No. And I’d really like to write “HELL No” in an <h1> but I’m going to avoid that for the sake of sanity

I’ve yet to figure out why the rails community has this inbred desire to cause harm to their reputation in organizations that aren’t pure dev shops. I’m not even talking about the enterprise, I’m talking about small business, non-profits, companies they contract with, academic shops…

I don’t disagree with Chris’s reason here for using vendor for WayCoolFunkyGemThatYouThinkIsTheBeesKnees (WCFGTYTITBK) and not being “That Person” for breaking the build (and your peeps) is laudable. But really – I don’t buy it. If you are small Rails shop and you plan on using test/spec or any other WCFGTYTITBK – for goodness sake you communicate that with the rest of your team (hello? IM? email? even that ringy thing on the hip or desk we all hate to use?)

If you think that someone else’s code is so great that it ought to be in your application – well then it ought to be in everyone’s install too. Go get up and install it for them (take the train if you can’t fly there) That’s what good developers do. They have sane development environments set up and they are completely proficient at “gem install blah” – and makes them completely aware that a brand-new third-party dependency just showed up in the application. I dare say that “gem install blah” is a lot less intrusive than “why in the sam heck did 1000 lines of crap just show up in vendor – I evaluated that third-party code last month and it was crap then and is crap now”

Local copies of every gem is madness – especially gems that are core to your application (and would break builds) It creates situations where the whole team (and often the people that run the servers and are ultimately responsible for the application) is not fully aware of the dependency needs of the application. Let me repeat that again – EVERY DEVELOPER ON A SMALL TEAM SHOULD KNOW EXACTLY WHAT AN APPLICATION DEPENDS ON, WHAT VERSION, AND SHOULD TASK THEMSELVES WITH CHECKING UP ON THOSE VERSIONS.

One app? not a big deal either way. 5 or 6 apps running in the same environment? It’s a Big deal. (of course it’s probably a complete architectural failure to have your 5 person team working on 5 or 6 apps at the same time – but that’s another post)

We had pinned rails in our applications – at least until the “Upgrade Your Rails NOW NOW NOW” event – and going through multiple applications on multiple staging servers and multiple versions was a complete pain in the ass. Okay, so that’s a little hyperbolic – but it was more trouble than it needed to be. You upgrade the server – when you control the server and your application – and you know that that the dependencies are handled.

I had these arguments a few months ago with a developer that was contracting with us – and it was like imposing a little (certainly not anything like some waterfall corporate development shop) structure on the process (“Hey – tell us exactly why you are using edge rails so that we all understand the issues”) was like we were impeding progress (“no, we are trying to make sure we understand what you’ve done when you get bored with us”). I know that new software introductions are disruptive. But that’s what developers (and I’m counting myself here for the sake of that sentence) do. Things break, we tell others, and we fix them. (and some of my other colleagues think WE are the lack of planning ones – you have no idea)

While every application should have a definite lifecycle – you know, and I know, and everyone else knows that in many, many, many environments apps get written, and they live well beyond the developers, the systems people, and everyone else that every had any responsibility for it – and local copies of everything creates a maze of having to upgrade the third-party dependencies all over the place when some script kiddie decides to take advantage of that 2-year old failure to sanity check POST.

Rails developers have to start figuring out that someone beyond them is going to be responsible for inheriting what they’ve done – and they have to start thinking more seriously about dependencies, third-party code, add-ons, and the lifecycle of what they do. It’s like two-bytes for the year value all over again. Seriously people, no amount of “unit tests,” “syntactic sugar,” and vendor kung-foo will ever trump communication and documentation (I don’t mean constantly out-of-date systems analyst documentation – I mean documentation about decisions and why something was done, or why it was added, etc.)

Uploading KML files to MediaWiki

So you want to upload KML files to your MediaWiki install? Simple as putting ‘kml’ in your allowed file extensions right?

Wrong.

Through the almighty power of a string match that’s not actually a regular expression, but instead a strpos match in the SpecialUpload:detectScript function (yes, that’s right, a strpos match, not a stripos match – but a strtolower takes care of that a few lines before – that’s probably faster anyway).

The strpos looks for <head in the chunk o’ text that’s in the uploaded file – which of course matches the KML heading tag – producing a detectScript match

Yes, one could modify the function in MediaWiki to handle KML file uploads – pulling <head out of the following code block:

		$tags = array(			'<body',			'<head',			'<html',   #also in safari			'<img',			'<pre',			'<script', #also in safari			'<table'			);		if( ! $wgAllowTitlesInSVG && $extension !== 'svg' && $mime !== 'image/svg' ) {			$tags[] = '<title';		}    		foreach( $tags as $tag ) {			if( false !== strpos( $chunk, $tag ) ) {				return true;			}		}

And then writing a regex to match <head and not <heading

Which is certainly do-able due to the beauty of open-source software(*) But thankfully (very thankfully) there’s a KMZ (scroll down) format. Which should upload just fine with just a file extension addition. (And the bonus is that it’s far more feature-rich to use.)

(* which, of course, making custom local modifications to your open-source software packages that don’t merit patch submissions back to the package authors is a whole other discussion topic, look for the future post and/or Conversations with Plastic Dinosaurs about being on the hook to maintain custom changes change for any and all future updates to the open-source software packages you use, which inevitably, you’ll forget you made, and then you’ll upgrade, and you’ll break expected functionality, which won’t be noticed for months after you’ve forgotten you ever even upgraded, at which point someone will complain, memos will be written, you’ll get blamed, you’ll bitch about getting blamed, everyone but you, given that you actually do the work, will promise not to forget about it again, which you’ll promptly do six months and thousands of tasks later on the next upgrade. Lather, Rinse, Repeat. )

It ain’t magic

Dealing with magic, magic.mime, and mime.types on Red Hat Enterprise Linux and with PHP, FileInfo, and MediaWiki is a serious pain in the ass.

Who in hell came up with this mess? Apache has a magic file, the os has a magic file, FileInfo complains that it can’t find /usr/share/misc/magic – when it’s really looking for /usr/share/misc/magic.mime. There’s about twenty billion mime.types files – including the one that MediaWiki has itself. And there’s that many symlinks from hell trying to link some of these together.

What a freakin’ cluster-you-know-what.