Tag vs. Type Names C treats tags as second class types. C++ isn‘t much kinder. Here‘s how to give them first-class treatment in both languages.
Identifiers are among the most basic elements of programming languages. Languages use them to name entities such as functions, objects, constants, and types.
In C and C++, an identifier is a contiguous sequence of characters starting with a letter or underscore and followed by zero or more letters, underscores, or digits. Identifiers are case sensitive, so that DAN, Dan, and dan are distinct identifiers.
For as long as I‘ve known C, I‘ve been struck by the curious way that C treats identifiers for structures, unions, and enumerations as second-class citizens. An identifier that names a structure, union, or enumeration names a type, yet you cannot use the identifier to refer to that type in the same way that you use a typedef name.
This month, I‘ll look at this quirky behavior and how it carries over into C++ as behavior that‘s cleaner on the surface but messier underneath. I‘ll also suggest a simple programming style that tidies up the mess in both languages.
Tag names in C
In C, the name s appearing in:
struct s { ... };
is a tag. A tag by itself is not a type name. If it were, then C compilers would accept declarations such as:
s x; // error in C s *p; // ditto
But they don‘t. You must write the declarations as:
struct s x; // OK struct s *p; // OK
The combination of struct and s-in that order-is called an elaborated type specifier.
The names of unions and enumerations are also tags rather than types. For example:
enum day { Sunday, Monday, ... }; ... day today; // error enum day tomorrow; // OK
Here,enum day is the elaborated type specifier.
For the most part, C does not permit different entities to have the same name in the same scope. For example, when these two declarations appear in the same scope:
int status(); // function int status; // object
the compiler will flag the second one as an error. But C treats tags differently than it does other identifiers. C compilers hold tags in a symbol table that‘s conceptually, if not physically, separate from the table that holds all other identifiers. Thus, it‘s possible for a C program to have both a tag and an another identifier with the same spelling in the same scope.
For example, C compilers will accept both:
int status(); // function enum status { ... }; // enumeration
in the same scope. They will even accept:
struct s s;
which declares object s of type struct s. Such declarations may not be good practice, but they are C.
Tags and typedefs
Many programmers (including yours truly) prefer to think of struct tags as type names, so they define an alias for the tag using a typedef. For example, defining:
struct s { ... }; typedef struct s T;
lets you use T in place of struct s, as in:
T x; // OK T *p; // OK
A program cannot use T as the name of both a type and an object (or a function or enumeration constant), as in:
T T; // error
This is good.
The tag name in a struct, union, or enum definition is optional. Many programmers fold the struct definition into the typedef and dispense with the tag altogether, as in:
typedef struct { ... } T;
This works well, except in self-referential structures containing pointers to structures of the same type. For example:
struct list_node { ... struct list_node *next; };
defines a struct called list_node, which contains, among other things, a pointer to another list_node. If you wrap the struct definition in a typedef and omit the tag, as in:
typedef struct { ... list_node *next; // error } list_node;
the compiler will complain because the declaration for member next refers to list_node before list_node is declared.
With a self-referential struct, you have no choice but to declare a tag for the struct. If you prefer to use a typedef name thereafter, you must declare both the tag and the typedef. Many programmers follow naming conventions suggested by Kernighan and Ritchie.[1] In the first edition of their book, they use a short, somewhat cryptic, identifier for the tag, and a longer uppercase identifier for the typedef, as in:
typedef struct tnode { ... struct tnode *left; struct tnode *right; } TREENODE;
For the second edition, they changed TREENODE to Treenode.[2]
I‘ve never understood why they use different names for the tag and the typedef when one name will do just fine:
typedef struct tree_node tree_node;
What‘s more, you can write this definition before, rather than after, the struct definition, as in:
typedef struct tree_node tree_node; struct tree_node { ... tree_node *left; tree_node *right; };
You need not use the keyword struct in declaring members, such as left and right, that refer to other tree_nodes.
Tags in C++
The syntax for classes in C++ is an extension of the syntax for structs and unions. In fact, C++ considers structs and unions just as special cases of classes; so I will, too.
Although the C++ standard doesn‘t call them tags, class names act very much like tags. For example, you can declare an object of class string with a declaration such as:
class string s;
Of course, few, if any, C++ programmers actually do this.
C++ was designed so that user-defined types can look very much like built-in types. Declarations using built-in types, such as:
int n;
don‘t use the keyword class, so using the keyword class in the declaration for s just above only serves to remind readers that string is not a built-in type. Therefore, C++ lets you omit the keyword class and use class names as if they were type names, as in:
string s;
Again, the C++ standard never utters the word tag. In C++, class and enumeration names are just type names. However, there are several rules that single out these type names for special treatment. I find it easier to continue to refer to class and enumeration names as tags.
If you want, you can imagine that for each tag, C++ automatically generates a typedef with the same spelling. For example, when the compiler encounters a class definition such as:
class gadget { ... };
it automatically generates a typedef, as if the program defined:
typedef class gadget gadget;
Unfortunately, this is not entirely accurate. C++ can‘t generate such typedefs for structs, unions or enums without introducing incompatibilities with C.
For example, suppose a C program declares both a function and a struct named query:
int query(); struct query;
Again, this may be bad practice, but it might happen if the function declaration and struct declaration reside in separate headers by different authors. In any event, given these declarations, query by itself refers to the function and struct query refers to the type.
If C++ automatically generated typedefs for tags, then when you compile this program as C++, the compiler would generate:
typedef struct query query;
Unfortunately, this type name would conflict with the function name, and the program would not compile.
In C++, tags act just like typedef names, except that a program can declare an object, function, or enumerator with the same name and the same scope as a tag. In that case, the object, function, or enumerator name hides the tag name and the program can refer to the tag name only by using the keyword class, struct, union, or enum (as appropriate) in front of the tag name. Thus, a C program that contains both:
int query(); struct query;
behaves the same when compiled as C++. Again, the name query alone refers to the function. The elaborated type specifier struct query refers to the type.
The way to avoid accidentally hiding a tag with a non-type name is to intentionally hide the tag name with a typedef name. That is, each time you declare a tag, you should also define a typedef with the same spelling as an alias for the tag, as in:
typedef class gadget gadget; class gadget { ... };
Then, if there‘s a function, object, or enumeration constant with the same name in the same scope, you‘ll get an overt error message from the compiler, rather than a potentially subtle bug you‘ll have to track down at run time.
Recommended guidelines
What I suggest is that you adopt a uniform style for turning tags into typedefs.
For each tag, define a typedef name with the same spelling in the same scope as an alias for the tag.
This style works equally well in both C and C++.
For each class, you can place the typedef definition either before or after the class definition. (Again, classes in C++ include structs and unions.) Placing the typedef before the class definition lets you use the typedef even inside the class, so that‘s what I recommend.
For each enumeration, you must place the typedef after the enumeration definition, as in:
enum day { Sunday, Monday, ... }; typedef enum day day; // OK
Placing the typedef before the enumeration definition provokes a compile-time error:
typedef enum day day; // error enum day { Sunday, Monday, ... };
Admittedly, the incidence of errors arising from tag name hiding appears pretty small. You may never run afoul of these problems. But if an error in your software might cause bodily injury or death, then you should write the typedefs no matter how unlikely the error.
I can‘t imagine why anyone would ever want to hide a class name with a function or object name in the same scope as the class. The hiding rules in C were a mistake, and they shouldn‘t have been extended to classes in C++. You can compensate for the mistake, but it requires more programming effort than should have been necessary.
Dan Saks is the president of Saks & Associates, a C/C++ training and consulting company. He served for many years as secretary of the C++ standards committee. With Thomas Plum, he wrote C++ Programming Guidelines. You can write to him at [email protected].