Thursday, January 23, 2014

Response to Wikipedia on Software Engineering

     Let's start with the formal definition of software engineering.  I would say the one that begins this article is the most accurate: ". . . the study and application of engineering to the design, development, and maintenance of software".  I think that this definition, unlike some of the alternate ones following it, covers the whole field without being too narrow.  One of these definitions states that the field exists to ". . . economically obtain software", which is basically true, but sort of omits the critical area of software maintenance.  Another of the definitions calls it an 'engineering discipline'; as we'll talk about later, this description is not without contention.  One of the informal definitions given by the article is especially interesting to me.  It states that software engineering is a broad term relating to the practice of programming computers, as opposed to computer science, which pertains to the theory of programming computers.  As evidenced by the myriad courses everyone in this class has taken which have little or no programming in them, that synopsis of computer science seems inaccurate at best.  The analogy I always think of is that saying computer science is about programming is akin to saying math is about arithmetic.  Despite that, I would wager that many among us (myself included) have given a similar answer when asked to explain what we are going to school for.  That description at least gets most of the point across, and prevents the listener from thinking we are learning to fix computers (physically) or provide over-the-phone tech support.

     Some of the notions around on how to measure the complexity of software are, to me at least, laughable.  This article has a formula listed in it, called the "Construction Cost Model", to estimate the number of man-years required to make a piece of software having a certain number of lines of code (SLOC, or Software Lines Of Code).  The very definition of a "line of code" is itself ambiguous and doesn't consider the varying expressiveness of programming languages (think x86 assembly vs. Haskell).  Recently, I was tasked with estimating the size and complexity of a large piece of open-source software at my job.  There are various software tools that do such a task, and their results can vary enormously.  When I used a few different tools on the same source code repository, I got answers that varied by hundred of thousands of lines of code (something like ~25-50% of the approximate code base).  These differences are the result of a lack of established metrics on what should be included in the count; for example, does a line of C++ with only a closing bracket ('{' or '}') count as a line of code?  The Wikipedia article later counters this practice by saying "This [counting SLOC] is like describing and measuring a complicated piece of machinery in kilograms only".  I couldn't agree with that statement more.  Furthermore, measuring the number of lines of code does not account for myriad other factors relating to the development speed and quality of code.  These factors include development tool quality and utilization, physical development environment (i.e. loud and cooperative vs. quiet and isolated), novelty of the problem(s) to be solved, and more.  The claim made by some (including the man with the man-years formula, Barry W. Boehm) that the main factor in software development is the skill of the developers themselves woefully underestimates the number of factors present in the process.

     A short word with my take on licensing and certification for software engineers.  Namely, why I think it has proved more difficult than expected to nail down and define.  First, it's the fact that the field is so new; it has only been around for around 70 solid years, and for most people, much less.  It has grown and changed tremendously during that time and continues to.  Second, it can be argued that some aspects of software development are less of an engineering discipline and more of an art.  After all, authoring good software is in some way like writing a book.  Sure, there are basic guidelines that most everyone should follow (like grammar and layout), but what makes a 'good' book is hard to define and analyze at present.  I believe the same is often true for software.  This essay by Paul Graham describes this idea pretty well, but don't feel obligated to read the whole thing:

http://www.paulgraham.com/hp.html

   In closing, I would like to mention something I strongly believe in, but that was only briefly stated in the article.  Under "Software Development Process", it says, "the software engineering market is being gradually shifted towards component based".  I really think that modular design (individual components that can be connected together in flexible ways) is the only future for powerful and reliable software in this world.  I think we have all had experience with writing code that we are just sure has already been written by others in this same programming language, probably a hundred times.  We, as an industry, do not seem to reuse our work nearly as much as we could.  Modular design will hopefully do for software what similar principles did for manufacturing and mechanical systems.  Before the days of Henry Ford, Eli Whitney, and Samuel Colt, many machines were built custom rather than by integrating standardized parts.  This made repairs difficult and different for each machine.  I think this design principle is critical, and applies to software just as much as to cotton gins and revolvers.

No comments:

Post a Comment