Unix is general-purpose, multi-user, interactive operating system, it offers several new features hardly found in other larger operating systems back in the day. These features include (1) a hierarchical file system incorporating demountable volumes; (2) compatible file, device, and inter-process I/O; (3) the ability to initiate asynchronous processes; (4) system command language selectable on a per-user basis; and (5) over 100 subsystems including a dozen languages.
Simplicity at its Core
Simplicity was engraved into the gene of Unix since its birth, as the paper states: “Perhaps the most important achievement of UNIX is to demonstrate that a powerful operating system for interactive use need not be expensive either in equipment or in human effort”. Therefore, it is important to keep in mind how simplicity is reflected in the design of Unix.
The File System
Perhaps the singly most important part of Unix. The “everything is file” concept that influences all modern system designs. Here is a short description of each major file types.
- Ordinary Files: no particular structuring is expected by the system. The structure of files is controlled by the programs which use them, not by the system.
- Directories provide the mapping between the names of files and the files themselves, inducing a structure on the file system. The only difference between directory and normal file is the the directory can’t be written on by unprivileged programs, meaning the contents of directories are controlled by the system.
linking allows the same non-directory file to appear in several directories under possibly different names; a directory entry for a file is sometimes called a link. All links to a file have equal rights. A directory entry for a file consists merely of its name and a pointer to the file metadata. Therefore a file exists independently of any directory entry. Directory can be considered as link.
- Special Files: perhaps the most prominent feature of the “everything is a file” principle. They are read and written just like ordinary disk files, but requests to read and write will result in activation of the I/O device. It blurs the line between file and device I/O since they share identical interfaces and are subject to the same protection mechanism.
Removable File System
The Unix file system has a mount system request which, in effect, replaces a leaf of the hierarchy tree (the ordinary file) by a whole new subtree (the hierarchy stored on the removable volume). It provides a unified abstraction of the file system hierarchy where the underlying storage components become transparent to the user.
One exception to the identical treatment of files on different devices: no link may exist between one file sys hierarchy and another. Otherwise, some form of bookkeeping would be required to when a removable volume is dismounted from one file system but not the other.
Each user is assigned a unique user ID. A file, upon its creation, is marked with the user ID of its owner. Also given for new files is a set of seven protection bits. Six of these specify independently read, write, and execute permission for the owner of the file and for all other users. This is a perfect example of ACL (access control list) system.
Once again, we see how Unix is trying to provide a unified interface such that performing I/O on different devices doesn’t would not require different accessing patterns or styles. There is no distinction between “random” and sequential I/O, nor is any logical record size imposed by the system. Calls like open, seed, read, and write can be found in all major Unix-like systems today.
I found it interesting that the authors were arguing why there are no user-visible locks in the file system. The first argument says: “they are unnecessary because we are not faced with large, single-file data bases maintained by independent processes”. It might be different today on modern systems so I have some doubts on that argument. The next one is “they are insufficient because locks in the ordinary sense, whereby one user is prevented from writing on a file which another user is reading, cannot prevent confusion when, for example, both users are editing a file with an editor which makes a copy of the file being edited.” This certainly is true because the the copies are separate files with distinct metadata during editing but once the editing is finished then it becomes tricky when the updated content needs to be written back to the original file without some form of synchronization or ordering.
The paper further explains the the system has sufficient internal interlocks to prevent these situations from happening. The exact details of how it works is not quite clear at this stage.
As we’ve already known, a directory entry contains only a name for the associated file and a pointer to the file itself. This pointer is an integer called the i-number. When the file is accessed, its i-number is used as an index into a system table (the i-list) stored in a known part of the device on which the directory resides.
Directory entry -> (File Name, i-number) -> i-list -> i-node -> description of the file
Because the file is described by its corresponding i-node, any copy and deleting operations are circulating around modifying directory entry or i-node link-count field without actually touching the bulk of the file itself.
It important to distinguish between file descriptor and inode. By definition, files are represented by inodes. The inode of a file is a structure kept by the filesystem which holds information about a file, like its type, owner, permissions, inode links count and so on. Other other hand, the file descriptor is the value returned by an open call is termed a file descriptor and is essentially an index into an array of open files kept by the kernel. There is an inode in the i-list but every process can have its own file descriptor for one file.
A process is the execution of an image. An image is a computer execution environment. It includes a core image, general register values, status of open files, current directory, and the like. An image is the current state of a pseudo computer. You can imagine the image as a motionless snapshot of current state of the processor, or you can image as the content saved to the main memory when a currently executing process is preemptied by another one.
The user-core part of an image has three logical segments. The program text segments starting from location 0. At the first 8K byte boundary above the text segment is a non-shared, writable data segment. The highest address in the virtual address space is a stack segment.
One key feature of UNIX is a new process can come into existence only by ise of the fork system call. Another system primitive is invoked by execute. This call resembles a “jump” machine instruction rather than a sub-routine call.
Shell is a command line interpreter. Programs executed by the Shell start off with two open files which have file descriptors 0 and 1, representing files for reading and writing. The symbol “<” and “>” represent what files the file descriptor 0 and 1 will refer to for the duration of the command passed to shell.
A filter, represented by “|”, is a program that copies its standard input to its standard output (without processing).
Command separator, represented by “;”, is used to separate multiple commands. A related feature is “&”, which execute the command in the background. When the shell doesn’t wait for the completion of a command, the identification of the process running that command is printed. In addition, parentheses can be used to enforce order of execution.
It’s worth noting the shell is itself a command, and may be called recursively.
Since it’s a command, it also shared the luxury of having standard I/O file descriptor. Thus, command such as:
sh < file_containing_shell_commands would work.
The last step in the initialization of UNIX is the creation of a single process and the invocation of a program called init. init have various sub-instances prompting for user login information. If the login succeeds, init performs an execute of the Shell. Essentially, init is the parent process of Shell.