Saturday, August 24, 2013

Helping Users

There are two broad categories of documentation which exist and are or can be maintained outside the code. 

         These can broadly be defined as User Help/Specifications and Design/Architectural Descriptions. This blog posting is about User Help/Specifications.  Originally the two were intended to be in the same posting but because of size and some work I want to do for the Design/Architectural posting, they have been broken up.

          This type of documentation is primarily intended to be read and used by end users. It most commonly takes the form of ‘Help’ information presented at run time. The 'Specification' part most often takes the form of manuals but that is not a topic for this blog.  The distinction between run time help and manuals is sometimes blurred.  A simple example of this type of blurring is when a description of the format of an input/output file is needed.  

          The first question I ask is "does the user need to know this before the program starts?".  If the answer is yes then a manual is needed.  The second question is "does the information take more than half a screen height to display?".  If the answer is yes then a manual may be needed or a more complex display system such as a web page is needed.

          This topic is part of maintainable software because it is often part of a maintainer’s job to keep it accurate and it also serves as a reminder of what the application software is supposed to be doing and not doing. The run time 'Help' information is typically the most volatile aspect of maintaining software.  But if implemented with some forethought it can be done without requiring code changes thus avoiding the overhead of a formal release.

          The display of help information is usually triggered via a command line option which causes the program to display the information or switches control to a web page which presents the information.
In cases where there is no such command line option the program may display a line with the program name its version id(number) and possibly a statement of what the program is supposed to do.

          The choices of where to put this type of information are simple, build it into the software or put it in files outside the program.  Building it inside the software works quite well for script languages such as Perl but not so much for compiled languages such as C or C++ .  If the information must be available for web or manual publication then the only practical choice is to put it into file(s).

          One of the most successful methods I have used to minimize the maintenance effort is to put the information in topic specific files then use Doxigen to format and organize the files for program and web use. This means that there is only one copy of the text to maintain and that the code is not affected by changes to the content.  This is an example of the WORM (Write Once Read Many) principle, which basically says that a piece of information should only be stored in one place no matter how many references to it exist.

          The above pattern can be implemented by a simple encapsulation function which allows the calling code request the display of a specific topic (string) and that the only thing the calling code may need to know is if the display was done properly or not (return value) (e.g. the information was available and displayed). 

          Maintainability can be enhanced by having the encapsulating routine translate the topic name (as supplied by a user) converted to a file name by simply adding a suffix and/or perhaps prefixing the program name. This mechanism allows the addition of help topics by only changing the help files. This assumes that the command line description is also contained in a file.  A variation on this is to place the topic name in a URL which is then sent to a browser for display and handling.

          A related aspect of maintaining information a user might want to know is maintaining a programs version id/number.  At one point in my career I designed and implemented a subversion based system which among other things automated the checkout, testing and release of numerous programs.  A part of this was a mechanism to automate the update of version ids for these programs (independent of the implementing language of the program) and support for all types of external documentation.  The premise was that if it is easy to do the right thing (including maintaining documentation) then maintainer’s even engineers in a hurry would do it.  This has proved to be true, even years later.

          In my opinion, one of the best examples of a good command based help system coupled with a manual is the one implemented by subversion red book (http://svnbook.red-bean.com/en/1.7/svn-book.pdf) and the svn command.  I have used this as a model for several applications.  I have also designed/implemented applications which used web pages to present the help information.  All of these applications were designed with the goal of not having to make code changes just to update the help information. 

Wednesday, August 21, 2013

Comments on comments

Comments on Comments

I have been around long enough to realize that there are as many sets of commenting guidelines as there are organizations which produce software and that the rules will change over time.  Here are my personal thoughts and opinions on this subject mostly from a maintainers perspective.

The most important thing is to give information which will quickly let a maintainer know what the code/method/subroutine/module is supposed to do.

The simplest type of documentation is to use symbol names ( variables, subroutines, modules etc ) which convey an idea of what the item is used for.  I once thought this was not particularly important until I was asked to make a few changes to a program where the comments and code were written by a French programmer.   I also once had a very inventive colleague who chose names and used comments which made the code read something like a novel.  The code was very memorable but very difficult to debug or make rational changes to.

A exception to the above rule is the use of variables named I, J, K etc as counter or index variables which are only used as a counter or index into a data structure and no meaning at all outside the loop.

It is not a good thing to describe in great detail what the code is doing.  This is typically not very helpful and when the code is changed for any reason,  the comments will need to be updated also thus doubling the required effort ( it might be better to just remove the offending comments ).  The worst thing would be to update the code but not the comments because a poor maintainer will not know if the comments are correct or the code.

The simplest type of comment is appended to a line of code. This at least  has the virtue of being removed if the code line is removed.

One of the most useless comments looks like the following:
     A = B;  // assign B to A
A better comment would be to describe why the assignment is needed e.g.
    // Preserve parameter B for later restoration

If for some reason you feel compelled to place a comment for every line of code please do it a the end of the line so that a maintainer does not have to skip every other line while trying to figure out what the code is actually doing.
   getParams(args);         // put command line parameters into the global space

For code blocks, a simple one or two line comment just ahead of or a the end of a block can be very helpful.
  // Initialize all framework global data areas
    getParams(args);
    getProperties();
    getEnvVariables();
 // end of initialization

The above is a very simplistic example and the comments may be not be needed but:
  • They should never need to be changed ( i.e. no maintenance required )
  • They very effectively mark where any initialization changes should be made
For larger blocks of code e.g. modules and subroutines it can be effective to create off a block of comments before coding begins which at least describe the code blocks in the order in which they appear with TBD (to be done) in the places where code is supposed to go.  This serves as a guide during code implementation and encourages simple changes to maintain accuracy.  The TBD or some thing similar can be quickly recognized as a place where code is incomplete.

Many IDEs such as eclipse support comment generation for various type of code blocks -- use them.

Comments should be brief and describe at least what is being done and why not how it is being done.
All rules have exceptions including this one.

An exception to the brief rule is for library routines written in script type languages.  In this situation, the comments near the front of the code also serve as the user documentation and of necessity should contain enough information to allow proper use of the code.

An exception to the don't explain how rule should be made for blocks of very dense or intricate logic.  In these cases the comments should explain how the code accomplishes the stated purpose of the code.

Another type of comment I have seen many times reads something like:
// Fixes bug 234
Unless the the bug report data base is readily available, the only purpose this serves is to raise a flag if there are several such comments in a given block of code.  This situation should cause serious consideration to be given to a plan to re-implement the affected functionality.  I stated it the way I did because it may not be a problem with the flagged code but that the calling code is using the this code incorrectly or with incorrect expectations.

To re-iterate, the most important thing is to give information which will quickly let a maintainer know what the code/method/subroutine/module is supposed to do.

Saturday, August 17, 2013

Maintainable Software Overview

The purpose of this blog is to promote the creation of software and firmware which is designed to be maintained.

The focus is on ways and means of creating software which are as independent as possible of any development environment/eco system.  That is to say it will focus on aspects of software architecture, design and coding practices which will maximize maintainability.

All of the information presented here is based on forty plus years of personal experience as a software engineer and anecdotal material from co-workers.    The material presented is expected to be applied where practical and with at least a good understanding of why it does or does not apply to a specific situation.

My experience has been primarily with main-frame and workstation class computer systems.    Programs requiring a Graphical User Interface and Web based applications have not been considered primarily due to insufficient experience.  Non application software (device drivers, file systems and midlevel network code, language compilers etc.) is also not specifically considered primarily because often the need for speed or severe constraints on RAM and/or other resources override some of the recommendations.

I have defined maintainability as:  a measure of the effort required to change the functionality of application software.  A measure of ‘effort’ must include time, resources and expertise.

In general any software development manager is familiar with this definition of ‘effort’ as it applies to creating software.  The term ‘change the functionality’ applies to both enhancements as well as bug fixes.   It might also be said that maintainable code is designed to be leveraged.

Maintainability is related to several other “ilities” such as

  •  Flexibility:  The ability to work with un-anticipated data/conditions without code changes.
  • Portability:  The ability to operate in environments other than the one originally deployed in.
  •  Reliability:  The ability to operate correctly in-spite of failures in the programs environment or inconsistencies in the supplied data.
  •  Reusability:  The ability to use code in a different application without modification.  It could be said that this is the ultimate goal of maintainability.
    A program has been described as being composed of data and algorithms.  I also add control or control flow.  I use the term ‘algorithm’ to refer to any section of code which manipulates application data.  The term ‘control’ refers to code which determines which algorithms and/or the order in which algorithms are executed.  It can therefore be stated that ‘data + algorithms + control = program’.
Experience has shown that the areas of volatility from greatest to least are:

  1.  Control
  2.  Algorithms
  3.  Application data.  Note the qualification on data. 
What this order means is that the effort to improve maintainability will be most effective when applied firstly to flow control then algorithms then application data.  Experience has shown that during the architecture and design phases of creating a program the order of importance is generally reversed.  That is to say that the most important thing to understand is the nature of the data to be processed,  then the algorithms (code) which will be require to process the data and then the conditions under which and the order in which the algorithms will be applied.

In keeping with the above definition of a program, the associated postings are generally organized into the following broad topics.  Each topic covers associated maintainability problems and ideas on to how to mitigate them.

  • Documentation
    • This is something of an anomaly in that it can be the simplest and the least volatile as well as the most difficult to do well and the most volatile.
  •  Data
    • This covers application data, control data and parameters.
  •  Algorithms
    • This mostly covers encapsulation considerations and is confined to application specific data manipulation.
  • Control
    • This covers control patterns and strategies
      • Simple Controls (e.g. if, loop and case)
      • State machines
      • Simplified AI machines
  • Multi-threaded Applications (might be considered a subset of Control)
  • Mutlitasking
    • Cooperative multitasking
    • Interrupt driven multitasking
  • Considerations for Object Oriented and non-Object Oriented designs