Back

Dealing with collisions

 

Good hashing algorithms
Creating a random file is an excellent way of creating a fast access file structure. You do need, of course, a direct access storage device, not a magnetic tape, for example. You also need to ensure that you have a 'good' hashing algorithm. The one above is very poor indeed because there will be lots of different surnames that give the same memory address! This is called a 'clash' or 'collision'. You need to design a hashing algorithm that minimises clashes because they slow down access times. On the other hand, an algorithm might also spread out the data so much that large areas of storage are used up! Having large areas of storage that aren't used efficiently is known as ‘redundancy’. A further consideration is the hashing algorithm itself. It mustn’t be so complicated that it takes ages to calculate anything!

A good hashing algorithm will:

    • Minimise clashes.
    • Ensure that the hash codes of data aren't spread too far apart, wasting memory.
    • Be quick to calculate.

Dealing with clashes
Some clashes (collisions) are inevitable. When they do happen, there are two techniques that can be used to deal with the situation:

    1. The computer can still go to the memory address that was calculated by the hashing algorithm. It then starts searching sequentially from that point to either find the data that it was searching for or to store the data it wanted to store.
    2. The computer can set-up an overflow area. When there is a clash, the computer can jump to this special area to either find the data that it was searching for or to store the data it wanted to store.

Back