Back

Batch totals, control totals and check sums

Batch totals, control totals and check sums
When data is entered into a file offline so that it can be processed using batch processing. Integral to this concept is ensuring that the data can be collected into a file accurately. To help us ensure this, we can make use of batch totals, control totals and check sums.

Batch totals
One method used to ensure that the integrity of a set of data is maintained when it is entered into a computer system is to use a ‘batch total’. Before an operator enters a set of data or a set of records into the computer, they calculate a number known as the ‘batch total’. This is worked out by adding up the total number of data items or records in the set to be entered. This ‘batch total’ is then entered into the computer along with the data or records.

For example, if an operator was entering a set of product codes into a computer system, they could generate an extra number by counting up manually all the actual codes that had to be entered. This number is the ‘batch total’ and is entered into the computer by the operator along with the product codes. Once the codes and the batch total have been entered, the batch total is re-calculated automatically by the computer. It is then compared to the original batch total entered by the operator. If they are the same, then the data’s integrity has (probably) been maintained. If they are different, then the operator may have missed out a product code or may have entered a code twice accidentally. The computer system would then generate an error message to highlight the problem to the operator.

Control totals (also known as hash totals)
This is another check that can be easily done on sets of data and can also be used in addition to batch totals. To generate a control total, the operator might add up pieces of the data entered. In the example used to describe what batch totals were, the operator might add up the first digit of each of the product codes. This produces the ‘control total’. The control total is then entered into the computer system (perhaps also with a batch total as a double-check). Once again, the computer will recalculate the control total automatically and signal to the operator if it differs from the one the operator entered. For example, an operator wants to enter in the following product codes: 

2324455   3434333   7823444    6555678    6556665

The batch total would be calculated as 5 (because there are 5 codes to be entered). A control total might be calculated by using the algorithm ‘add the first numbers of all data entered’). This would give 2 + 3 + 7 + 6 + 6 = 24. The operator would then enter not only the data but also the batch total and the control total. The computer, once all the product codes and the batch and control totals were entered, would recalculate the batch and control totals automatically. If they differed to what the operator entered then an error message would be displayed on the input screen. If the data consisted of words rather than numbers, then the characters in each word could be converted into numbers using their ASCII equivalent codes. These can then be used in a suitable algorithm to generate a control total.

Check Sums
When computerised data is sent from one computer system to another, the sending computer can calculate a ‘check sum’ automatically. This is simply a number produced by adding up all of the individual pieces of data. When the receiving computer gets the data and the check sum, it recalculates the check sum and compares it to the check sum it received. If they are different, it means the data was sent incorrectly. The receiving computer can then signal to the sending computer to resend the data. Check sums are very similar to batch totals. The main distinction is that check sums are calculated by computer systems whereas batch totals are calculated by humans!

Data consistency
As well as ensuring that data gets entered into the computer correctly, designers also try to ensure that the consistency of the data is maintained. The terms data consistency refers to the ‘correctness’ of the data. If sensible and comprehensive validation rules have been set up in a database, for example, then it should be possible to exclude data that does not conform to those rules at the data input stage. What you don’t want, for example, are some dates entered into a database in the form of DD/MM/YY whilst others in the same field being entered in as DD/MM/YYYY or MM/DD/YY. If this happened, we would say the data is inconsistent. We want our data to be consistent because then we will be able to accurately sort and search it in queries we set up. We cannot successfully do that if the data is inconsistent.

Data integrity
So far, we have considered the use of validation rules to try and maintain data consistency as well as the use of batch totals, control totals and check sums to ensure that all the data gets entered in to a system. However data already in a system can become corrupted as a result of some processing that is done on that data. The phrase ‘data integrity’ refers to the whether the data already in a computer has been corrupted after some processing has taken place on it. For example, consider a computer system that had to work out the VAT on some prices and then restore the new prices (including VAT), over-writing the old ones. If, as a result of the processing, some negative numbers were produced that then got restored into the system, then clearly something went wrong with the processing of the data! Has a result of the processing, the data became corrupted. It has lost its integrity.

Back