It always annoys me when I find an error in either an operating system or a user program. The more stupid a bug is, the more it seems incomprehensible how such bugs can remain in a product with today’s modern program development tools.
Perhaps the theory and practice of programming is not as advanced as we would like to believe? Perhaps haste and the rapid hunt for money explain the presence of primitive errors? Or is it something else entirely?
I started to think about this and imagine a testing system that could be used to filter out all programming and system errors. And I realized that such a system cannot be created, at least as far as one type of tests is concerned, the “black-box” (black-box, or external) testing.
This is a testing technique where the tester does not have the source program, only the final product. This is what you need to determine if it is flawless or not. Let’s say that you have the specification in your hands, you check the expectations contained in it one by one, if you have taken them all one by one and found no errors during the run, then the tested product can be considered flawless. Not at all. The specification may not include special cases. The disk runs out of space, the memory runs out, the network connection is interrupted, the user can give the program data that he could not give according to the specification, but the user is just like this, intentionally or accidentally, gives data that the specification makers didn’t think so.
But even if we have a complete specification that includes behavior descriptions for all edge cases and special situations, we cannot perform our testing task perfectly.
To understand this, let’s look at a very simple case, we get a max() function from somewhere (say, in a DLL), which expects two integers and returns the larger of the two. Can we black-box test this function in such a way that we can declare that the function is 100% error-free and will return the larger of the two received integer values under all circumstances?
Let’s write a test program, include the calls max(1, 2) and max(2 ,1), in both cases we should get 2 as a result. To be on the safe side, we call the function in the max(2, 2) way, so we should still get 2. Are we ready with the test? Is the code 100% error free?
In principle, yes. In practice, however, we have to consider the case of the malicious coder, this is a programmer who deliberately hides an error in the code, and can do this in very sophisticated ways. In principle, our 100% error-free code can actually contain a lot of errors.
Because what if there is a detail in the code that works depending on the received arguments, and if one of the arguments is, say, 1,000,000, it returns not the larger argument, but the smaller one. We can only test this and detect the error by calling the max() function in all kinds of combinations up to 1,000,000. And since we can’t know where the error is hidden, we have to try all combinations of calls not only up to 1,000,000, but up to the maximum integer that can be represented on the machine.
This test will take longer than the very first one, we will run it, and we will not find any errors. Can we now declare the function 100% flawless?
No not yet. Our programmer can be even more malicious than that and hide an error based on date and time. To find this, we need to set our machine’s clock to all possible dates and times and repeat the test for all possible integer combinations, since the programmer could have combined the date, time and argument tests in such a way that the function could conceivably will only give an incorrect answer to a single combination in one specific second of the next fifty years, otherwise it will be flawless. But a single error is enough to make the code not completely error-free.
I think we already see the hopeless future. In addition to the arguments and the system time, our malicious coder can take the size of the program, the size of the total and free memory, whether there is an even or odd number in a certain register, it can store how many times it has been called, and it works well or poorly depending on it. The possibilities are practically endless, and because of this it is completely impossible to finish in human time.
Of course, we can say that these examples assume rather absurd malice on the part of our coder, but we didn’t examine how viable this idea was, but we wanted to find out whether perfect black-box testing is possible in principle.
Based on what I have said so far, I see it as proven that external means cannot be used to determine whether a program or code is flawless or not. We can’t even give a percentage estimate, since we don’t know how many undetected errors there are for each detected error.
I don’t want to give the impression that all those who issue a wrong code deserve to be exempted. There’s no question about that. It is simply a fact that black-box testing cannot give us complete security.
So where do we look for complete security? First of all, we can trust our own code created for ourselves, since we know that it was not written by a malicious coder and not with the intention of causing harm. On the other hand, we can increasingly trust those open source codes, which in principle can be checked by many people, so it can be found out if there is possibly malicious code in it. Although, just because something is open source doesn’t mean it’s verified. In principle, every open source code should be accompanied by a checklist of who analyzed the code, when, what parts, and to what depth. Needless to say, such a thing does not exist nowadays, and it is hard to imagine how many man-years it would take to create a checklist for an open source program. And this list would only be reliable until the first modification, after which it would have to be revised after each modification.
Now that we’ve seen how reliable black-box testing is, let’s see if we can improve its reliability. For this, we have to open the “black-box” to some extent, i.e. we have to look into the code as well. For this, we can use a tracker, a reverse translator, or an automatic analysis program. Thus, we have slightly more chances to find a strange constant or conditional jump instruction, which may indicate malicious code. For example, this may work for a max() function, but we can write a max() function ourselves instead of spending a lot of time analyzing it. However, the real codes are usually much, much more complicated than the max() function in our example.
Finding hidden code in a larger program by tracing or disassembling it without understanding the operation of the entire program is a very big task.
So it looks like we won’t be able to reach 100% certainty with the “open black-box” technique, or even with the analysis of the open source code.
To examine if we can go further with other tools, it is best to take a closer look at antivirus programs to see what level they have reached in the automatic detection of malicious codes. Unfortunately, we can say that there is a lot of uncertainty. When the heuristic analysis of an otherwise popular, widely used, purchased search engine has to be turned off because it deletes programs that we cannot work without (not one we wrote ourselves), then it seems that the analysis finds code that is not a virus, or it is too lenient or meek, thus making even truly viral code look harmless.
Perhaps expecting 100% performance from automatic code analysis is as impossible an expectation as trusting writing a program to tell any other program whether it will enter an infinite loop or stop at some point. We already know that this is impossible. Perhaps the 100% detection of malicious code and program errors by automatic analysis is an equally impossible task. Of course, I can’t prove this.
Now is the time for someone to make a list of open, yet to be solved computing problems, similar to David Hilbert’s 23-point list (which listed problems to be solved in mathematics), this list could include the question: “Is it possible to write a program that proves the correctness of a program, if not, why not?”. Perhaps we already see the answer to the question, probably the intractability of the stopping problem is the key here as well.
I know that if such a program were to be completed, it would first have to be verified that it is correct, and perhaps this is the reason why this program will never be completed either…
Nyíregyháza, September 27, 2018 – April 7, 2019