I like data, I cannot lie.
Out of all the things I enjoy in my job, I think what I like the most is pulling together the bits and pieces of data that we have, aggregating it together and trying to extract some meaning out of it. Maybe it’s because I was a system administrator first, I don’t know. I like the logging, I like the queries, I just like doing it. It’s fun. It feels meaningful.
But I always feel like it is amateur hour when I do. Sure, I know my basic statistics, and if it’s a dataset about activity that I know about, I have an intuitive feel for what’s right and wrong about the data surrounding it. But I don’t really feel like I know how to make something meaningful out of it.
I was, and am, pretty excited that Stanford decided to experiment this fall in mass delivery of three topics, databases, machine learning, and artificial intelligence.
I decided to follow along with the machine learning class, because I was curious about the material, and I think it may help me make more meaningful relationships out of the data that my apps and systems collect.
If you want a perspective on the course from a professional writer, you can follow along with the course from Chris Wilson at Slate, but four weeks in, I wanted to share a little about how I’m feeling about it.
First, let me say what’s great about the course. Dr. Andrew Ng is really good at teaching. He is what I am amazed about when it comes to the best faculty I meet: demonstrable experts in their field that are able to connect with those of us who are not experts, and will likely never be. He has done something that I remember the best of the faculty I’ve learned from do, he’s guided us into having a better intuition about what’s happening with the various mathematical algorithms he’s presented.
Also great are the technical implementation of the course. The student assistants putting together the website, the video delivery, the submission system have all done a great job – especially given the constraints of what delivering a course to thousands (maybe tens of thousands) of students entails. I’m not the biggest fan of the Q&A forum format – but it’s not because of the technical implementation, it’s because I don’t think anyone has quite gotten the “mass of people able to fluidly from niche groups” thing right.
I have learned. And that’s what matters. I’ve completed all the review assignments, scoring 4.25 or better (out of 5) on the first try, learning if I missed parts of a multiple choice review question, and being able to resubmit the review questions with 100% credit on the second.
And I’ve submitted each one of my programmatic assignments with 100% credit on the first try.
But I’ve still felt lost (especially with the neural network assignments).
Part of that is me. I get lost in the math. I don’t know what is different about being able to follow and deconstruct variable names and statements in programming languages and trying to follow and deconstruct mathematical notation. Maybe it’s all the greek letters, I don’t know, but I struggle with it, I always have. I can see it in code, I can’t just “see it” in the math notation. That frustration is part of the reason I didn’t pursue graduate degrees, and what (little) advanced math there is in this machine learning class (we get to pretty much skip all the formula derivation) just serves to throw me off my game for a bit.
So the programming assignments have been great because they’ve helped to clarify the math for me – I come out of them (usually) having a better understanding of the math (but that’s only once I grok enough of the math to get started).
But here’s the one place that I think that the Machine Learning class falls short. The programming assignments are treated as the protected resource. And it’s not limited to Stanford, I think it’s endemic to most Computer Science programs, and that’s the great failure I think of Computer Science instruction.
We’ve been asked to agree to the Stanford Honor Code and not share code before the assignment is due. I can understand that given that if the course is structured to use the programming assignments as a “gradeable” resource (and the internet-wide course is sharing itself with Stanford’s own applied section of their CS229 machine learning course) – and it makes total sense that they would, because you can do automated mass-checking on outputs with mathematical programming. I don’t begrudge that.
But by treating the code as the protected resource, it fails the very thing that these programs need to be doing the most, getting us to solve problems with others, to build on each other’s work (just like we are building on written recognition research in our own programming assignment) – to begin to develop efficient, readable, understandable implementations of the methods to solve these problems.
At the least, there needs to be a time “after” the assignment for sharing and reflection. I’ve scored 100% and I still feel lost. My code pasts the test, but is it efficient? Is it obfuscated? Are there better implementations? How could I build on it and make it better? I still feel a little lost that I’ve gotten it right, I’ve seen hints of it in the Q&A forums where I could learn even more if I had the chance to compare and contrast. I’ve thankfully seen pushing-the-honor-code hints that got me started enough in the right direction past the math notation that the assignments themselves did not.
But that’s not possible in this course, because the programming assignments are still “open” after the deadline for reduced credit.
(It’s made a little worse that the “credit” for a course like this is moot, the internet mass of volunteers here to learn, not for a grade of any kind – but again, we share with the for-credit work of students at Stanford.).
I know from my own degree that Computer Science is not software development, and probably never has been. I think that things have changed in the 15 years since I got my own degree, not so much in the early courses where the algorithms still do the grading, but in the later courses. But it still hasn’t changed enough.
I am very thankful for the opportunity that Stanford and Dr. Ng have provided to us, to give us Stanford-quality instruction in a way that gives us a live progression with others going through it at the same time. I think that part of it may well be a coming future for education.
But until the software part of computer science instruction becomes a lot less of a protected resource in the process – this isn’t the future of computing education.