File systems are an integral part of any operating systems with the capacity for long-term storage. There are two distinct parts of a file system, the mechanism for storing files and the directory structure into which they are organised. In modern operating systems where several users can access the same files simultaneously, it has also become necessary for such features as access control and different forms of file protection to be implemented.
A file is a collection of binary data. A file could represent a program, a document or in some cases part of the file system itself. In modern computing, it is quite common for there to be several different storage devices attached to the same computer. A typical data structure such as a file system allows the computer to access many different storage devices in the same way, for example, when you look at the contents of a hard drive or a cd you view it through the same interface even though they are entirely different mediums with data mapped on them in entirely different ways. Files can have very different data structures within them but can all be accessed by the same methods built into the file system. The arrangement of data within the file is then decided by the program creating it. The file systems also store several attributes for the files within it.
All files have a name by which the user can access them. In most modern file systems the name consists of three parts, its unique name, a period and an extension. For example, the file ‘bob.jpg’ is uniquely identified by the first word ‘bob’; the extension jpg indicates that it is a jpeg image file. The file extension allows the operating system to decide what to do with the file if someone tries to open it. The operating system maintains a list of file extension associations. Should a user try to access ‘bob.jpg’ then it would most likely be opened in whatever the systems default image viewer is.
The system also stores the location of a file. In some file systems, files can only be stored as one contiguous block. This has simplified storage and access to the file as the system then only needs to know where the file begins on the disk and how large it is. It does, however, lead to complications if the file is to be extended or removed as there may not be enough space available to fit the larger version of the file. Most modern file systems overcome this problem by using linked file allocation. This allows the file to be stored in any number of segments. The file system then has to store where every block of the file is and how large they are. This dramatically simplifies file space allocation but is slower than contiguous allocation as it is possible for the file to be spread out all over the disk. Modern operating systems overcome this flaw by providing a disk defragmenter. This is a utility that rearranges all the files on the disk so that they are all in contiguous blocks.
Information about the file’s protection is also integrated into the file system. Protection can range from the simple systems implemented in the FAT system of early windows where files could be marked as read-only or hidden to the more secure systems implemented in NTFS where the file system administrator can set up separate read and write access rights for different users or user groups. Although file protection adds a great deal of complexity and potential difficulties it is essential in an environment where many different computers or the user can have access to the same drives via a network or time-shared system such as a raptor.
Some file systems also store data about which user created a file and at what time they created it. Although this is not essential to the running of the file system, it is useful to the users of the system.
In order for a file system to function correctly, they need a number of defined operations for creating, opening and editing a file. Almost all file systems provide the same basic set of methods for manipulating files.
A file system must be able to create a file. To do this, there must be enough space left on the drive to fit the file. There must also be no other file in the directory; it is to be placed with the same name. Once the file is created, the system will make a record of all the attributes noted above.
Once a file has been created we may need to edit it. This may be merely appending some data to the end of it or removing or replacing data already stored within it. When doing this, the system keeps a write pointer marking where the next write operation to the file should take place.
In order for a file to be useful, it must, of course, be readable. To do this all you need to know the name and path of the file. From this, the file system can ascertain where on the drive the file is stored. While reading a file the system keeps a read pointer. This stores which part of the drive is to be read next.
In some cases, it is not possible to simply read all of the files into memory. File systems also allow you to reposition the read pointer within a file. To perform this operation, the system needs to know how far into the file you want the read pointer to jump. An example of where this would be useful is a database system. When a query is made on the database, it is obviously inefficient to read the whole file up to the point where the required data is. Instead, the application managing the database would determine where in the file the required bit of data is and jump to it. This operation is often known as a file seek.
File systems also allow you to delete files. To do this, it needs to know the name and path of the file. To delete a file the systems simply removes its entry from the directory structure and adds all the space it previously occupied to the free space list (or whatever other free space management system it uses).
These are the most basic operations required by a file system to function correctly. They are present in all modern computer file systems, but the way they function may vary. For example, to perform the delete file operation in a modern file system like NTFS that has file protection built into it would be more complicated than the same operation in an older file system like FAT. Both systems would first check to see whether the file was in use before continuing, NTFS would then have to check whether the user currently deleting the file has permission to do so.
Some file systems also allow multiple people to open the same file simultaneously and have to decide whether users have permission to write a file back to the disk if other users currently have it open. If two users have read and write permission to file should one be allowed to overwrite it while the other still has it open? Or if one user has read-write permission and another only has read permission on a file should the user with write permission be allowed to overwrite it if there no chance of the other user also trying to do so?
Different file systems also support different access methods. The simplest method of accessing information in a file is sequential access. This is where the information in a file is accessed from the beginning one record at a time. To change the position in a file, it can be rewound or forwarded a number of records or reset to the beginning of the file. This access method is based on file storage systems for tape drive but works as well on sequential access devices (like modern DAT tape drives) as it does on random-access ones (like hard drives). Although this method is straightforward in its operation and ideally suited for specific tasks such as playing media, it is very inefficient for more complex tasks such as database management.
A more modern approach that better facilitates reading tasks that aren’t likely to be sequential is direct access. Direct access allows records to be read or written over in any order the application requires. This method of allowing any part of the file to be read in any order is better suited to modern hard drives as they too allow any part of the drive to be read in any order with little reduction in transfer rate. Direct access is better suited to most applications than sequential access as it is designed around the most common storage medium in use today as opposed to one that isn’t used very much anymore except for massive offline back-ups. Given the way, direct access works it is also possible to build other access methods on top of direct access such as sequential access or creating an index of all the records of the file speeding to speed up finding data in a file.
On top of storing and managing files on a drive, the file system also maintains a system of directories in which the files are referenced. Modern hard drives store hundreds of gigabytes. The file system helps organise this data by dividing it up into directories. A directory can contain files or more directories. Like files, there are several basic operations that a file system needs to be able to perform on its directory structure to function correctly.
It needs to be able to create a file. This is also covered by the overview of the operation on a file but as well as creating the file it needs to be added to the directory structure.
When a file is deleted, the space taken up by the file needs to be marked as free space. The file itself also needs to be removed from the directory structure.
Files may need to be renamed. This requires an alteration to the directory structure but the file itself remains unchanged.
List a directory. In order to use the disk correctly, the user will require to know what’s in all the directories stored on it. On top of this, the user needs to be able to browse through the directories on the hard drive.
Since the first directory structures were designed, they have gone through several substantial evolutions. Before directory structures were applied to file systems, all files were stored on the same level. This is basically a system with one directory in which all the files are kept. The next advancement on this which would be considered the first directory structure is the two-level directories. In this, there is a single list of directories which are all on the same level. The files are then stored in these directories. This allows different users and applications to store their files separately. After this came to the first directory structures as we know them today, directory trees. Tree structure directories improve on two-level directories by allowing directories as well as files to be stored in directories. All modern file systems use tree structure directories, but many have additional features such as security built on top of them.
Protection can be implemented in many ways. Some file systems allow you to have password-protected directories in this system. The file system won’t allow you to access a directory before it is given a username and password for it. Others extend this system by given different users or groups access permissions. The operating system requires the user to log in before using the computer and then restrict their access to areas they don’t have permission for. The system used by the computer science department for storage space and coursework submission on raptor is an excellent example of this. In a file system like NTFS all type of storage space, network access and use of a device such as printers can be controlled in this way. Other types of access control can also be implemented outside of the file system. For example, applications such as win zip allow you to password-protect files.
There are many different file systems currently available to us on many different platforms and depending on the type of application and size of drive different situations suit different file system. If you were to design a file system for a tape backup system, then a sequential access method would be better suited than a direct access method given the constraints of the hardware. Also if you had a small hard drive on a home computer, then there would be no real advantage of using a more complex file system with features such as protection as it isn’t likely to be needed. If I were to design a file system for a 10-gigabyte drive, I would use linked allocation over contiguous to make the most efficient use the drive space and limit the time needed to maintain the drive.
I would also design a direct access method over a sequential access one to make the most use of the strengths of the hardware. The directory structure would be tree-based to allow better organisation of information on the drive and would allow for acyclic directories to make it easier for several users to work on the same project. It would also have a file protection system that allowed for different access rights for different groups of users and password protection on directories and individual files. Several file systems that already implement the features I’ve described above as ideal for a 10gig hard drive are currently available; these include NTFS for the Windows NT and XP operating systems and ext2 which is used in Linux.