My fiancé is a graduate student in neuro-science and a good one. Talking with her about her work, I get a really good look at just how much a research scientist generates in duration of one experiment, and just how much of her time is spent managing and analyzing the data she produces. Nothing makes the programmer part of my brain wince more than when I hear about her having to spend hours painstakingly transferring or transforming data from one spreadsheet or file into another by hand. On several occasions, I’ve volunteered, or been asked,  to write simple scripts to automate some of the repetitive tasks, like aggregating some data from multiple spread sheets into one cohesive whole.

That got me to thinking.  As a programmer, it’s obvious to me to look at every process and look for places I can automate to save time in the long run.  Having graduated from CMU, my fiancé has had an introductory programming class, at the time taught in java. I don’t think, however, the course gave her anything that really met her needs.   One introductory, programming-focused course doesn’t really build the skills to quickly handle a problem like the I see her facing on a day to day basis.  The class certainly taught fundamentals of control flow, loops, etc but it didn’t properly teach how to identify problems where writing a program can be worth the time invested.  It didn’t teach the mindset of looking for places to automate, something that is second nature to most programmers.

I think a much better class would be “Scripting and Data Management 101:  Practical programming data management for non-programmers ”  Basic scripting would obviously be a must, in an easy teaching language.  There are plenty to choose from but python already has good traction as a scientific language so it would seem like a natural choice.  I can see other topics dovetailing nicely into the class, maybe a quick intro into a simple database system, how to design and query tables.  Assignments could be case studies where the goal is not to simply produce a working program, but rather analyze a process and look for places were writing a program might be the right solution.

I feel like some variant of the class above should be taught at most research based graduate programs.  It might be my programming-centric views talking but I feel like such a class would be of immense value.  I’ll certainly stand by the fact that the amount of data a researcher is generating is only going to increase with time.  Maybe this way they can spend more brain cycles thinking about solving cancer, and less about how to deal with their piles of data.  All I know is, I’ve won over at least one convert.  My fiancé is rapidly picking up python.

Anyway, wordy post. Here’s something funny and relevant from xkcd: