Is “data scientist” the new “programmer”?

Back in the 1970s, being a “programmer” meant writing one or files of code that input data, processed it in some way, and then output a result. A program that occupied more than 256 KB of memory, even on a mainframe, would have been considered bloated (and wouldn’t have run at all on a “minicomputer,” at least not without a painful process of overlaying). Thus, there tended to be a lot of interesting stuff going on within every few lines of code and certainly an entire file of code might contain nearly everything interesting about an application.

Today’s “software developer” is typically mired in tedium. To trace out the code behind a simple function might require going through 25 files, each of which contains a Java method that kicks a message to another method in some other file. Development tools such as Eclipse can speed up the tedious process of looking at a 20-layer call stack, but there remains a low density of interesting stuff to look at. A line of code that actually does something is buried amidst hundreds of lines of glue, interface, and overhead code. How did applications get so bloated and therefore boring to look at? I blame hardware engineers! They delivered the gift of infinite memory to the world’s coders and said coders responded with bloat beyond anyone’s wildest imagination.

Does the interesting 1970s “programmer” job still exist? While teaching an intro “data science” class at Harvard, I wondered if the person we call a “data scientist” is doing essentially the same type of work as a 1970s Fortran programmer. Consider that the “data scientist” uses compact languages such as SQL and R. An entire interesting application may fit in one file. There is an input, some processing, and an output answer.

Readers: What do you think? Is it more interesting to work in “data science” than “software engineering” or “programming”?

Older readers: Is today’s “data science” more like a programming job from the 1970s “scarce memory” days?

Related:

 

Full post, including comments

If Piketty is right about rich people getting high returns, why do banks lend at low rates?

A critical assumption in Thomas Piketty’s Capital in the Twenty-First Century is that rich people get a better return on investment than average S&P 500 buy-and-hold idiots. This assumption leads to runaway wealth inequality and the necessity for a new worldwide tax on wealth.

Why do rich people get such a great return, according to Piketty? They have access to brilliant financial managers and investments that the rest of us can’t find.

What about big banks then? With billions in assets they are unarguably rich. Their office towers are stuffed full of the best financial managers that $500k-$20 million/year salaries can buy. They sit in the biggest cities and have access to every conceivable business idea. Big Banks should have ever better investment opportunities than Mr. Generic Rich Bastard. Yet they are happy to lend out money right now for a return of about 2 percent (e.g., margin interest on stock holdings, best adjustable mortgage rates for the first five years). If there are such great investment opportunities out there for the sufficiently rich and sufficiently connected, why would a big international bank want to lend out money at 2%?

Full post, including comments

Limits to government power

In Alan Greenspan’s autobiography, he says that the Fed could not control interest rates; if the Fed had insisted on a high rate of interest for dollars, banks could have borrowed dollars at lower rates from Chinese holders of dollars.

In today’s New York Times, “Why Markets, Not the Treasury, Determine Bank Capital”, tries to explain why the government’s scheme for bolstering bank capital turned into higher dividends for shareholders, acquisitions (e.g., Bank of America buying Merrill and Countrywide), executive bonuses, and just about anything other than more bank capital.

Looking at my economic recovery plan it seems that it still makes sense in light of information that the government is not all-powerful. All of the changes that I propose are to things that the government does control or run, e.g., tax rates, rules for corporate governance, schools, public employee unions, immigration decisions.

Full post, including comments

Java is fading as a Web development tool… along with the SUV?

In September 2003, I innocently posted Java is the SUV of programming languages? based on the fact that students in 6.171 who’d chosen to use Java were incapable of getting anything done.  It created quite a stir in the comments and on Slashdot.  This semester is the first time that we’ve taught 6.171 since then.  Despite the fact that all the students are expert Java programmers, having used Java to build a big project in 6.170, none have chosen to use Java this semester.  It is all Ruby on Rails, Microsoft .NET (C#), and a touch of Python.


Is it safe to pronounce Java dead as a programming environment for Web applications?  Who is using Java these days to build great things?

Full post, including comments

Fallout from the Java = SUV posting

The “Java = SUV posting” continues to resonate in my inbox.


The last two students using Java dropped 6.171.  They were not keeping pace with the PHPers and those who sold their souls to Bill Gates.  (Recall that all the students in 6.171 had built a 10,000-line Java program in 6.170 so they all knew the language itself quite well.)


Lots of professional Java programmers emailed to say “If only those students had used Libraries X and Y, they would have done okay.”  Sadly X and Y were never the same in any two emails so it is easy to understand how the students went wrong (i.e., it is not obvious how one is supposed to choose among the 100 different ways to get something done in the world of Java tools).


Similarly there was no agreement among Java programmers as to whether it is good to have SQL queries prominently featured in source code or better to make everything into Java objects and magically generate SQL behind the programmers’ backs.  Half of those emailing said that SQL was impossibly hard to write and what people really needed was to see the programmers’ custom-created methods.  The other half seemed to think that a database application ought to be primarily expressed in SQL, a concise declarative query language that has been standard for 25+ years.  These are 100% incompatible points of view.


My friend Curtis, an old-time Silicon Valley monster C hacker, AIMed me to say that he’d seen the Slashdot article:



“My problem with Java is that it makes hard things hard, and easy things hard.  The amount of hassle doesn’t scale with the complexity of the problem.  Whereas with PHP you can write “Hello World” without having to read a 200-page book.  Java is a train wreck with dozens of classes with slightly different methods that do similar things.  On the other hand, it kills me that the PHP database interface is so bad.  Actually PHP just kills me anyway…why they had to invent a new language, I’ll never know.”


I pointed out to Curtis that the latest Technology Review, MIT’s alumni rag, picked the developer of PHP as one of its “100 Bold Young Innovators You Need to Know”:



“Rasmus Lerdorf has learned five languages while living around the world.  But it’s the language he invented that has had global impact.  In 1995, without any formal programming training, Lerdorf developed a server language to help him set up Web sites. … He named the language PHP, for PHP hypertext preprocessor.”


Curtis’s response to Tech Review?  “People mistake creation for innovation”.

Full post, including comments

Lisp diehards = Holocaust deniers

Hmm… it seems that the “Java = SUV of programming languages” posting has stirred up a bit of controversy over at Slashdot and right here on this server.  Some people read it as a personal endorsement of PHP, VB, and other semi-baked programming languages.  Actually my personal preference is a much darker, uglier, and more shameful secret:  Common Lisp, CLOS, plus an ML-like type inferencing compiler/error checker (with some things done in a sublanguage with Haskell semantics and Lisp syntax).  Common Lisp dates from around 1982 and ML from 1984.


I try to keep this preference concealed from young people who’ve been raised on a diet of C, Java, C#, Perl, etc.  They just wouldn’t find it credible that 20-year-old systems and ideas are actually better than the latest and greatest from Microsoft and Sun.


Imagine my delight in running into a friend yesterday.  She’s a 23-year-old graduate student in computer science at Harvard. Conversation rolled around to programming tools. Unprompted she said “What I think would be best is Common Lisp Object System with a modern type system”. I was stunned. I thought it was only dinosaurs like me that clung to Lisp.


I had a second ephiphany for the week… Believing that Lisp circa 1982 plus some mid-1980s ML tricks thrown in is better than all of the new programming tools (C#, Java) that have been built since then is sort of like being a Holocaust denier.

Full post, including comments

Java is the SUV of programming tools

Our students this semester in 6.171, Software Engineering for Internet Applications have divided themselves into roughly three groups.  One third has chosen to use Microsoft .NET, building pages in C#/ASP.NET connecting to SQL Server.  One third has chosen to use scripting languages such as PHP connecting to PostgreSQL and sometimes Oracle.  The final third, which seems to be struggling the most, is using Java Server Pages (JSP) with Oracle on Linux.  JSP is fantastically simpler than “full-blown J2EE”, which is the recommended-by-Sun way of building applications, but still it seems to be too complex for seniors and graduate students in the MIT computer science program, despite the fact that they all had at least one semester of Java experience in 6.170.


After researching how to do bind variables in Java (see the very end of http://philip.greenspun.com/internet-application-workbook/software-structure), which turns out to be much harder and more error-prone than in 20-year-old C interfaces to relational databases, I had an epiphany:  Java is the SUV of programming tools.


A project done in Java will cost 5 times as much, take twice as long, and be harder to maintain than a project done in a scripting language such as PHP or Perl.  People who are serious about getting the job done on time and under budget will use tools such as Visual Basic (controlled all the machines that decoded the human genome).  But the programmers and managers using Java will feel good about themselves because they are using a tool that, in theory, has a lot of power for handling problems of tremendous complexity.  Just like the suburbanite who drives his SUV to the 7-11 on a paved road but feels good because in theory he could climb a 45-degree dirt slope.  If a programmer is attacking a truly difficult problem he or she will generally have to use a language with systems programming and dynamic type extension capability, such as Lisp.  This corresponds to the situation in which my friend, the proud owner of an original-style Hummer, got stuck in the sand on his first off-road excursion; an SUV can’t handle a true off-road adventure for which a tracked vehicle is required.


With Web applications, nearly all of the engineering happens in the SQL database and the interaction design, which is embedded in the page flow links.  None of the extra power of Java is useful when the source of persistence is a relational database management system such as Oracle or SQL Server.  Mostly what you get with Java are reams of repetitive declarations at the top of every script so that the relevant code for serving a page is buried several screens down.  With a dynamic language such as Lisp, PHP, Perl, Python, Tcl, you could do bind variables by having the database interface look at local variables in the caller’s environment.  With Java the programmer is counting question marks in the SQL query and saying “Associate the 7th question mark with the number 4247”, an action that will introduce a bug into the program as soon as the SQL query is modified (since now the 7th question mark has been moved to become the 8th question mark in the query).

Full post, including comments

A silent PC

http://www.hushtechnologies.net/ shows a reasonably fast (933 MHz, up to 1 GB of RAM) reasonably cheap (under $1000) WinXP machine that is cooled via heat sinks rather than fans.  Another very quiet PC option is the Gateway Profile, which looks like half of a laptop computer mounted on a small pedestal.  My friend Doug and I removed the (pretty quiet) fans from a couple of old ones (500 MHz Celerons) and they continued to run just fine.


 

Full post, including comments