Naming and Binding

Paul Krzyzanowski

March 9, 2023

Goal: Understand the terminology of names, addresses, and binding and issues in name choices.

What’s in a name? That which we call a rose By any other word would smell as sweet.
Shakespeare, Romeo and Juliet 2.2.43–44

Definitions

We use names throughout computer systems and life. We use them to refer to users, computers, files, objects, devices, variable names, and services – essentially everything we need to reference in any way.

One danger in any discussion of naming is that of getting carried away with both the metaphysical and semantic definitions of naming. We will attempt to avoid that here. John F. Shoch’s definitions in A note on Inter-Network Naming, Addressing, and Routing does a fine job in defining the basic terms:

Name
A name identifies what you want.
Address
An address identifies where it is.
Route
A route identifies a way to get there.

A few useful additional definitions put forth by [J.H. Saltzer[(https://web.mit.edu/Saltzer/www/publications/nbo/nbo.html) include:

Binding
Binding is the process of mapping a name to an address. J. H. Saltzer defines it as choosing “a lower-level implementation for a particular higher-level semantic construct.” We can avoid the semantic problems of differentiating names from addresses by thinking of binding as the process of mapping one name into some lower-level name. For example, the domain name ls.cs.rutgers.edu binds to the IP address 128.6.13.171. However, at the level of the IP driver, we can think of 128.6.13.171 as a name that binds to the lower-level representation, the ethernet MAC address b0:7b:25:1d:23:f3.
Directory or Naming Network
A set of catalogs (name→object binding tables) that may include other directories
Naming hierarchy
A naming network organized in a tree-structured form.
Pathname
A multi-component name traversing a path in a naming hierarchy.
Root
A starting catalog in a naming network.
Indirect entry
An entry in a catalog that binds to a name instead of the underlying object.
Name service
A service that provides a binding function. A computing environment will have many names, each relevant within a specific context. On a networked system, we may need names for:
Name Description
Services for example, time of day, file service, domain name service, …
Nodes identify a computer that can run services or programs
Network attachment points ports on a network: places where a node is attached (not exactly applicable for an ethernet)
Paths the route between network attachment points
Objects within services for example, file names on a file server

A name can be anything that is convenient or makes sense in describing some object or service. For humans, it may be a human-readable character string. For machines, it may be a binary identifier (what we humans would think of as an “address”).

A service may run at one or more nodes. It may need to move from node to node without losing its identity. A node may be connected to one or more network attachment points and may need to move from one place to another on the network without losing its identity. Multiple paths may exist between network attachment points or the path might change. These conditions warn us that it is not a good idea for a name to contain an implicit binding, one that can be derived from the name alone (for example, having a node name be a textual representation of the machine’s Ethernet address).

Names may be pure or impure. A pure name is one that contains no information about the object or where that object may be found. For example, a user ID is a pure name. An ethernet MAC address is a pure name (a low level one) since it does not tell you anything about the device or where it is located. Pure names contain no discernable context. An impure name contains information about the underlying object encoded within the name. For example, an email address is an impure name since it contains a domain name as a context – supplementary information about the name. An IP domain name, such as cs.rutgers.edu, is another example of an impure name since it contains embedded within it a hierarchy that identifies the context of that name: cs is a pure name but it is a component within rutgers, which is within edu. A file pathname is similarly impure since it contains identification about where the object is located. If the object is moved to a different place in the directory hierarchy, the name is no longer relevant.

Note that impure names are not inherently bad. They can make lookup of names easier in many cases. The basic problem with impure names is with moving objects around in the namespace; it may lead to their name changing. A problem with pure names is managing the namespace: how do you look up an object efficiently if you have billions of names? How do you know for certain that an object does not exist?

Uniqueness of names

Uniqueness for names can be a challenge, particularly when there is a huge amount of names involved and multiple users (or organziations) that choose them. A way of achieving uniqueness is to use a hierarchy. A hierarchy is a collection of pure names that creates a compound name. This leads to an impure name but each level of the hierarchy can manage the scope of names at that level. For example, the .edu administrators manage the uniqueness of the name rutgers.edu. The rutgers.edu administrators manage the uniqueness of the name cs.rutgers.edu and the CS department at Rutgers can pick whatever names they want as long as they are unique within that domain. Examples of compound names are domain names (www.cs.rutgers.edu), URLs (http://pk.org/417/notes/naming.html), and file pathnames (/usr/share/dict/words).

Naming convention

A naming convention is a set of rules and guidelines that are used to create consistent, meaningful names for files, variables, functions, and other elements in the environment. The purpose of a naming convention is to make it easier to identify, organize, and manage these elements by providing a consistent and predictable way of naming them.

Examples of naming conventions include:

  • Rules for choosing variable names in specific programming languages and organizations. These rules may be based on the syntax of the language (e.g., a name must start with an alphabetic character or an underscore and contain only alphanumeric characters or underscores) or guidelines (e.g., function names should be verbs and variables should be nouns).
  • Linux pathnames – how we can represent a valid filename in a system. For instance, /home/Paul/src/gps/gui.c.
  • The syntax of Internet domain names, such as www.cs.rutgers.edu.
  • The syntax of LDAP (lightweight directory access protocol) names, such as cn=Paul Krzyzanowski, o=Rutgers, c=US (this format is used within digital certificates, among other places).

Context and namespace

A context is a particular set of name → bindings. A name only has meaning relative to some context and should be unique within that context. For example, the file name /etc/postfix/main.cf refers to a specific object on one computer but to something different on another computer.

A namespace is the container for the entire set of names that are present within a context. Namespaces have a scope, which is the region where the name exists and refers to an object. Examples of namespaces include:

  • The names of functions that your program can access (a scope may be the entire function or a specific class).
  • The set of variable names that a function knows about (again, the scope may be global or from the point of view of a specific function or a method within a specific class).
  • The set of files on a specific computer.

A namespace may be a flat set of names or a tree-structured hierarchy. Hierarchical names may present the full hierarchy (these are fully-qualified names) to identify objects outside the local namespace. For instance, the file readme.txt refers to a file by that name in the current directory while /usr/local/src/boing/readme.txt may refer to a different file even though the base name of the files is the same.

Resolution and binding

The term resolution refers to looking up a name to find its underlying representation. The association between the name and the lower-level representation is called binding.

Let us consider the basic mechanism of sending a data packet to a service and the bindings that are involved in doing this.

  1. Find a node on which the required service resides. This requires service name resolution. The binding is service name to node name.

  2. Find a network attachment point to which the node is connected. This requires locating the node name. The binding is node name to attachment point (or address).

  3. Find a path from this location to that point. This is the routing service. The binding is address to route.

As an example of naming and binding, we can consider the machine names on the Internet. A name of cs.rutgers.edu may bind to the IP address of 128.6.4.2. In turn, the IP address 128.6.4.2 may bind to the Ethernet address 08:00:20:1f:13:83. Note that the term address becomes context-dependent. We tend to think of addresses as the “lower-level” representation of a name. In essence, addresses are just names. A human is comfortable dealing with cs.rutgers.edu. An IP driver finds it easier to work with the 32-bit name 128.6.4.2, and an Ethernet driver prefers the 48-bit name 08:00:20:1f:13:83.

If we consider naming and binding in file systems, we have the user-friendly and programmer-friendly textual names that are bound to internal names that are a function of the file system implementation. For example, a pathname of /bin/ls may bind to {device major number=259, minor number=2, inode 393896} on a particular computer (ls.cs.rutgers.edu in this case).

An important issue in binding is when the binding takes place. It’s desirable to delay it for as long as possible to ensure that we have the very latest binding in case names change or resources move but we need to balance that with performance implications. A machine’s address may change while its name remains the same, yet the services on that machine should be accessible. A service may run on a different port, yet a client should be able to locate it. The inode allocated to a file may change, yet that should not cause problems in accessing the file by name.

The least flexible binding is static binding. This is essentially a hard-coded binding. For example, a program may assume that SMTP mail service is always available on port 25 and simply access port 25 instead of attempting to resolve the binding through other means. Fortunately, well-followed conventions will often save that program. A more dangerous example is that of a program attempting to contact a machine by using a hard-coded IP or Ethernet address.

An alternative to static binding is dynamic binding. With dynamic binding, we no longer rely on a hard-coded name address binding but have some mechanism for resolving the name on demand. One form of dynamic binding is early binding. In this case a binding operation is actually performed, but it is performed some time before the binding is needed. For example, if a program needs to contact a server multiple times during a long period of execution, it might perform a name to IP address binding once at the start for efficiency. The danger here is that the server’s address may change during the execution of this program and the program will be unable to contact the server at some point in time (or will contact the wrong server). At times, early binding is a crucial optimization. IP address to Ethernet address binding is an example of a case where it would be prohibitively expensive to resolve an IP address for every packet sent from a machine. It makes sense to maintain a cache (the ARP cache) of previously used bindings. Problems arise if the bindings change. With ethernet addresses, the problem is usually that of not being able to reach the destination machine and the system attempts to perform another binding. Early bindings hurt in a dynamically changing environment.

Most flexible is late binding, a form of dynamic binding where the binding is performed on demand, just before use. Previous bindings are not cached. An example where this is useful is accessing a file system. Consider a user editing a file and frequently saving changes. Assume that the editor is implemented in such a way that it writes into a temporary file during the edit and renames the file to the permanent name upon save. In this case, a different inode is allocated each time the file is saved, yet the name remains the same. The only way of assuring that the correct file is opened (say, by another process) is by performing the exernal to internal name binding at the time of the open. The caveat with dynamic binding is that the name resolution process often takes quite a bit of time. If the same name has to be resolved over and over again, early binding may yield considerable performance gains.

Name service

It’s one thing to say that we have a name for an object and have a name-to-address binding for that object but where does this list of bindings reside? How do we look up names?

A solution, of course, is to provide a naming system for a particular set of names. This system is often made available as a network service: a name service (and the program providing this service is called a name server). that is accessible on some well-known system on a known port.

The domain name server, DNS, which resolves IP domain names to IP addresses, is an example of a name server. On a large scale, a single server can become a bottleneck so it may be implemented as a distributed collection of name servers.

Replication of name server contents is useful for both scalability and fault tolerance. It helps scalability because several servers with replicated data can handle increased client load. It helps with fault tolerance because a client can contact another replicated server if one is not accessible. Caching is also a form of replication since a copy of frequently used name-to-address bindings resides in the cache.

An ARP (address resolution protocol) cache (a cache of recently used IP address to ethernet address bindings) is an example of this. Any form of replication is at risk of consistency problems. If a master copy is modified, replicated copies will have stale bindings. For updates to be consistent, all replicated copies will need to be locked from access during the update (cached copies can be locked or invalidated).

References

  • Shoch, John F., “Inter-Network Naming, Addressing, and Routing,” IEEE Proc. COMPCON Fall 1978, pp. 72–79. Also in Thurber, K. (ed.), Tutorial: Distributed Processor Communication Architecture, IEEE Publ. #EHO 152–9, 1979, pp. 280–287.

  • J. Saltzer, On the Naming and Binding of Network Destinations, RFC 1498, 1993. [A short document discussing issues of naming in communication networks.]

  • John F. Shoch, Inter-Network Naming, Addressing, and Routing, Internet Experiment Note #19, January 1978. [An early, short, and easy-to-read document covering the basic definitions of naming, addressing, binding.]

  • J. H. Saltzer, Naming and Binding of Objects, Lecture Notes in Computer Science, 60, Operating Systems – An Advanced Course, Chapter 3A, © 1978 J. H. Saltzer.

  • Sape Mullender, Ed., Distributed Systems, Chapter 12: Names by Roger M. Needham, © 1993 Addison-Wesley [good coverage, but make sure you read Saltzer’s paper first]

This is a revision of an original version that was written on september 29, 2012.

Last modified March 9, 2023.
recycled pixels