Back

Fixed and variable length records

An example of sizing a fixed-length file
Let us look in more detail at how a computerised database might store data using fixed-length records. We will use an example to see how to work out the size of a file that holds fixed-length records and what we need to consider. Information in a computer database is stored in an organised manner. Consider the following database of information, about dogs. There are seven ‘fields’ in each ‘record’ in this ‘file’.

ID

Name

Type

Date of Birth

Gender

Weight (Kg)

Owner’s phone number

0001

Prince

Poodle

21/08/02

M

15.7

234232

0002

Gnash

Bulldog

03/04/99

F

16.5

554543

0003

Sammy

Terrier

10/09/01

F

13.9

654455

0004

Freddy

Alsatian

12/12/00

M

20.0

712323

0005

Sammy

Greyhound

07/01/97

F

18.5

665643

 A file of records about dogs.

There are five database terms you need to be familiar with.

Field
A field is an individual piece of information. In the above example, ID is a field, Name is a field, Type is a field and so on. Fields are also known as ‘attributes’. Note that fields correspond to columns in a table.

Record
When you put some data into fields, all the data about one person, or one pet, or one subject and so on, make up a ‘record’. Records correspond to rows in a table. For example, all the information about Prince is referred to as ‘Prince's record’.

File
All of the records together make up a file of information.

Database
Whether you have just one file, or whether you have lots of files in a system, together they are known as a ‘database’. If your files are interconnected in some way, then they are known as a ‘relational database’. You will learn to love relational databases later in the book!

Primary key
Every record needs a primary key. It is a piece of data that is unique for that record. It can be used to pick out one and only one record. Looking at the small example above, there are 2 dogs called Sammy. Imagine if there were hundreds of dogs called Sammy. We will need to pick out from time to time individual records. Often, the only way we can do this is to add a new field to our database. This field is called the ‘Primary Key’. You may know lots of examples of primary keys - a pupil ID number on the pupil database, a National Insurance number, a driving licence number, a club membership number and so on. Without primary keys, you have no way of picking out only one particular record. If you are making a database and you don't have a field whose data is unique for each record, then you need to add one, perhaps an ID field, for example.

Decide on the data type and size to allow
If you have ever set up a database, then you will have told your application what data type you want to use for each of your fields in a file, and also some information about how much space to allow for each field. This information would be brought together in a document called a ‘Data Dictionary'. For example, in the dog file, you may have decided on the following:

ID. This field is not a number but is an ID code. Therefore we will not use data type Integer but will use data type text instead. We will assume that the maximum number of dogs that will ever be in this file is 5000 so that an ID code of 4 characters long will be fine. We will allow 4 bytes for the ID code, one byte for each character.

Name. We know that some people give their dogs very long names. It is difficult to judge what to allow so we will allow plenty of room for error. We will allow 50 bytes, data type text.

    • Type. Let us assume that we have identified 203 different breeds of dog. The longest breed name is 28 characters long. If we allow that for each breed, there will be a lot of waste. For example, ‘Poodle’ only needs six bytes not 28. Because there are a fixed number of choices, we will code up the breeds. If we use one character, we can represent 26 breeds. If we use 2 characters, we can represent 26 x 26 = 676 breeds. This is more than enough. We could give Poodle the code PO, Alsatian AL and so on. We will therefore make this field data type text and allow 2 bytes.
    • Date of birth. This needs to be in the format DD/MM/YY. This is therefore data type Date and requires 6 bytes.
    • Gender. This is data type Boolean because a dog can only be either male or female. Allow 1 bit.
    • Weight. This is a real number. We only need 1 decimal place and the range of numbers is small therefore allow 2 bytes. (Real numbers will be represented using the floating-point system).
    • Owner's telephone number. This is data type text (because telephone numbers may include leading zeros and spaces). We will allow 6 bytes, to represent 12 digit numbers in BCD format.

We now have the following:

ID            Text       4 bytes 
Name     Text      50 bytes               
Type       Text       2 bytes 
DofB       Date     6 bytes 
Gender  Bool      1 bit      
Weight   Real     2 bytes 
Phone    BCD     6 bytes 

sizing
 

We need to remember when sizing a file that we are estimating the file size, not working out an exact size. Therefore, we can round numbers as we see fit. Let’s now look at the five steps in estimating the file size.

STEP 1 - Decide on the fields.
Write down the fields you need in your database.

STEP 2 - Allocate data types and sizes.
For each field, state what data type it is and then state the number of bytes you are going to allow for that field.

STEP 3 - Work out the total size of one record.
The total size for one record is 4 + 50 + 2 + 6 +1bit + 2 + 6 = 70 bytes and 1 bit exactly, or 70 bytes approximately.

STEP 4 - Work out the maximum size the file could ever be.
Let’s assume that the maximum number of dogs we are ever going to need to store is 5000. Therefore, the storage space required for the file will be up to 5000 x 70 = 350000 bytes.

STEP 5 - Work out the overheads.
When you store files, however, you don't just store the data! For example, information about when the file was created, last accessed, last changed and its size are stored. We refer to these items as ‘overheads’. We need to allow some extra storage for these overheads. The rule is to allow 10% of our estimated file size. The size of the overheads in this example can be worked out therefore as 10% of 350000, or 35000 bytes.

STEP 6 - Work out the total maximum file size, including overheads.
The total file size is now 350000 + (10% of 350000) = 350000 + 35000 = 385000 bytes

STEP 7 - Put the answer into appropriate units and round the answer up or down as appropriate.
Our final answer is 385000 bytes. The problem is that the units are not the best ones to select. We can do better than this. If we divide our answer by 1000, we will get the answer in Kbytes (Kilobytes) approximately. If we divide an answer by 1000000, we will get the answer in Mbytes (Megabytes). If we divide by 1000000000 we will get the answer in Gbytes (Gigabytes). The most sensible thing to do in this example is to divide by 1000. This gives us 385000 / 1000 = 385 Kbytes. The answer to this problem, then, is that the file size is approximately 400 Kbytes long. Don't be afraid to round up or round down numbers! Just remember to put 'approximately' with the answer. If you round an answer, it demonstrates that you understand what you are doing, that you are estimating and not working out exact numbers.

A summary of how to size a fixed-length file of records

    • Decide what fields you need.
    • Decide how many bytes you need for each field, justifying your decisions.
    • Add up the total numbers of bytes for 1 record. Round the answer if it helps.
    • Multiply the size of one record by the maximum number of records you are likely to store.
    • Add 10% for overheads.
    • Put your answer in appropriate units. Round the answer if it helps.
    • Put approximately next to your answer.

Variable-length records
Whilst fixed-length records are relatively easy to process, they can waste a lot of storage space on disk, especially when a short length data item of, for example, five bytes has been entered into a text field that has been defined as a long fixed length of perhaps 30 characters long. 25 bytes of valuable storage space are going to be wasted each time we do this. One way to address this problem is to use variable-length records. This system is more complicated than fixed-length systems because calculations have to be done but this system will save storage space. Suppose you had to store the following data:

First name: Fred
Second name: Jones
Name of course: Art

First name: Mandy
Second name: Shu
Name of course: English

First name: Ali
Second name: Patel
Name of course: Mathematics

Some records we want to store as variable-length records.

In a fixed-length system, you might have defined the following:

    • First name: data type text, allow 30 characters.
    • Second name: data type text, allow 30 characters.
    • Name of course: data type text, allow 4 characters, coded.

If we stored Fred Jones who is on an Art course, we should actually need only 4+5+3 = 12 bytes but would in fact use 64 bytes. This is because all of the fields in the record are a fixed size. If we stored the records using a variable-length system, we could diagrammatically represent what we are storing as:

Fred$Jones$Art#Mandy$Shu$English#Ali$Patel$Mathematics#

Note two points about this system:  

    • The end of a field is signalled by a dollar sign (EndOfField marker).
    • A hash sign signals the end of a record (EndOfRecord marker).

Imagine a user wants to enter some data into their database. The user starts to enter data. They enter the first piece of data into the data input form where it says “First name”. When the user presses ENTER and moves to the next data entry field, a dollar sign is automatically inserted after the field by the database software. The second name is then entered. As soon as the user presses ENTER again to go to the ‘Name of course’ field, another dollar sign is automatically entered after that field. After the user presses ENTER again (to enter the name of the course and to get a new blank data input form up on the screen) a hash symbol is entered. This signals the end of the record. The whole process is repeated for the next record, and the next, and so on.

Now there is no space wasted to store each record. If we stored Fred Jones who is on an Art course, we would need only
5+6+4 = 15 bytes (including two EndOfField markers and an EndOfRecord markers). These markers can easily be used by a program when reading records back from a file. All it has to do is keep looking for EndOfField markers and EndOfRecord markers. These will signal the end of a particular piece of data or a particular record.

Appropriate uses of fixed and variable length records

    • Variable-length records save space and produce a smaller overall file of records. If storage space is important, then this might be the record type to design.
    • A smaller file of records is quicker to work with (for example, to search and sort) than a larger file.
    • Variable-length systems are more complex to program compared to fixed-record systems.
    • It is easier to estimate the overall file size with fixed-length records than variable-length records. You have to make assumptions about the average size of fields, which may not be easy to do or correct.

Back