Introduction to software engineering entropy

Software engineering entropy vs software entropy
When I started thinking about this topic, I planned to name this blog "Some new thoughts on software engineering entropy". After doing some searches and studies, I realized that there is actually no such thing called "software engineering entropy".  All the studies are carried out focusing on "software entropy", which, in my own perspective, is a measurement of overall software complexity. The major contribution to software complexity is the complexity of the problem domain. The concept I want to discuss in this blog, however, is a way to measure how well the problem is solved. So I changed the name as it is now and hope it can inspire some similar thoughts.

The complexity of software comes from two major sources, the complexity of the problem domain itself and the complexity introduced due to imperfection of implementation. As most discussions I found are focusing on the first one, which actually unavoidably mixes with the complexity from the latter one, I would like to focus on the latter one which contributes most to the quality of software from a software engineering perspective.

Entropy can be viewed as a measurement of disorder or ignorance. The latter one is quite helpful to understand the quality of software engineering. I would like to give a semi-definition of software engineering entropy here: assuming you are looking into an immediate context to understand how this context works, every time you have to leave your immediate context to seek some knowledge in order to understand how your immediate context is working, it signifies an instance of ignorance of your context and all instances of such ignorance thus determine the entropy of the context in question. This entropy is called software engineering entropy.

A few clarifications are needed for this interesting definition. First of all, this concept is not a completely new discovery. Software engineering has been a hot study area for the past decades. Many recognitions are obtained and new methodologies are being put into practice. This new concept will provide new insights into existing topics and hopefully improve existing methodologies for higher quality. Secondly, this definition has a strong dependency on observer. For example, an engineer who enters a new project will face maximum entropy because of the complete ignorance and the entropy will decline over time and stay around some value eventually. It's obviously true that the engineering entropy cannot be zero. But the stable value varies based on the engineer. This actually reflects the reality that more experienced engineering can grab key concepts and move forward with certain amount of ambiguity or entropy. Putting those two factors together, entropy comparison can be useful for one observer to measure two different projects. And also, for engineers who have stayed in one project for long enough, the average entropy could reflect the quality of the project. Thirdly, the complexity from problem domain could be a cause of engineering entropy. But this impact is decreasing when an engineer gets more knowledge of the core problem over time. Eventually and ideally no more ignorance from the problem itself. Lastly, it's true that the engineering entropy defined this way has not got practical ways to do real measurement. But high entropy areas could be identified according to the definition and improvements can be found accordingly.

Engineering entropy as a thinking tool
As stated above, this entropy is a new way to see how well engineering is done and we will study a few cases to demonstrate this. It should be emphasized that the entropy idea could be applied at quite different scales. It could be used to evaluate the quality of the implementation of one single function and it could also be used to evaluate the engineering process for a multi discipline engineering group.  We will see cases for both of them.

Case 1
Naming is an important factor for every careful engineer. Many have been discussing the importance of it and the confusions caused by it for years. I'm also one of them. But the arguing of confusions is more or less subjective as it's lack of a definitive answer. This will remain true regardless of what effort is put to resolve it. In this article, we want to discuss why bad naming is causing problem from the perspective of engineering entropy and how this entropy concept could contribute to the rationality of the everyday naming arguing.

Two typical arguments regarding naming problems are variable naming and function naming. Variable naming relates to maintaining invariant in the context of a function. The invariability maintained by the name allows engineers to understand its meaning in a clear way which does not require tracking how the variable changes its meaning in the code path. Thus, it does not require leaving current part of a function to find out the meaning of a variable and this avoids engineering entropy. Contrarily, all kind of bad names could be interpreted in two or more possible ways and when reading some line of implementation, engineer has to leave current context to find the real meaning and thus results in higher engineering entropy. Function naming is another hot area as the name of function is a representation of protocol between caller and callee. For the caller, the intention of making the call should be present in the caller part. It could be simply perceived from the function name or comment elaborating the reason. If one feels the need to go inside the implementation to understand why the caller is making such call, it signifies an ignorance and thus indicates higher engineering entropy. In this case, the name of function always plays an important role. From the callee's perspective, ideally the name of the function should cover no more and no less what the implementation is doing. This is hard to be maintained in everyday work and the bottom line should be that a function name should be specific enough that it could be easily distinguished from others. Putting them together, an experienced developer who has sufficient knowledge about the problem domains and the overall architecture, should be able to understand a problem in its local context. If this is not the case, the switching between contexts should be considered as evident of higher engineering entropy. An effort should be made to identify what is causing the ignorance and correct it so that understanding could be achieved easily. This could be also used as a definition of a good name. A good name should make sure that everyone works on the same codebase can understand what the variable or the function means in local context.

Case 2
We will now discuss the nonadditivity property of engineering entropy.  Suppose that there is a component K. The input and output of this component is very clear so that you can absolutely understand how K works by working with K as your immediate context. It means that component K has value 0 as its engineering entropy. This is definitely an ideal example but it fits this case study very well. There are 2 sub-components of K and let's call them A and B. It turns out that we want to take a closer look at A after understanding that K has an entropy of 0. By studying A, we find out that A is so hard to understand unless we have to know how B is working. The same thing happens to B. This is a situation that sub-systems A and B have a large value of entropy while the parent system K has an entropy of 0. This is the nonadditivity property of software engineering entropy. Such phenomenon is known to physicists as "quantum entanglement". In software engineering, we call it "coupling". While quantum entanglement is the building block of quantum computing, "coupling" is something we absolutely want to avoid in software engineering. This short case study demonstrates how we can use "engineering entropy" as a thinking tool to evaluate the patterns we can observe in everyday engineering work.

Case 3
In the last case study, I want to apply "engineering entropy" to a larger context, say a multi discipline engineering group. In such a group, there are a lot of feature teams who own different things in different layers of the final product. Nowadays, many groups are leveraging Scrum model as a development framework. Scrum model provides the possibility that things are changing in a fast pace. So it should not be surprising that one component layer is changing its implementation dramatically. But such flexibility inevitably introduces engineering entropy, as one's understanding of the dependencies are going out of date pretty soon. This is a topic in project management instead of software engineering as such ignorance is the result of the separation of engineering entities. By understanding the source of such entropy, it could be easily figured out that a careful management of such entropy is needed. One important way to reduce the entropy is to communicate changes in a clear way so developers are aware of them. When certain change starts causing issues, the information of "why and how" about the change should be immediately available, at least contact information should be available so that assistance of resolving the issues could be pursued as early as possible.
Another similar example is what could happen inside a feature group. As developers in the same group are focusing on different areas, the knowledge of the same component/system could gradually become out of date. As time goes, the entropy of the same component will build up and it could require more efforts to resolve it. A planned catch-up could help to reduce this entropy. A catch-up could be a formal one so that everyone in the team can focus on gaining more insights together or it could be informal so that individual developer can arrange his/her own catch-up in a way that fits his/her own flexibility. The key point is that this should be done with a clear goal in mind and the goal is gaining new knowledge of the product which has been changing for a certain period.