Handles and opaque structures

In chapter 11 we are going to encounter the getaddrinfo system call, which makes use of a special technique with one of its parameters. These notes will give you the necessary background information to understand this technique.

Pass by value and pass by reference

All non-pointer parameters in C make use of the pass-by-value parameter passing strategy. Here is a simple example.

void f(int x) {
  x = 4;
  printf("Inside f() x is %d\n",x);
}

int main() {
  int y = 2;
  f(y);
  printf("Outside f() y is %d\n,y);
  return 0;
}

When we pass y as a parameter to f(), we pass its value (2) to the parameter x. If f() later changes the value of x, that has no impact on the value of y.

C implements pass-by-reference by pointers. Here is an example of pass-by-reference:

void f(int *x) {
  *x = 4;
  printf("Inside f() x is %d\n",x);
}

int main() {
  int y = 2;
  f(&y);
  printf("Outside f() y is %d\n,y);
  return 0;
}

Since f() now receives a pointer to the variable y it can use that pointer to assign a new value to y.

Structs and pass by reference

The pass by reference stategy is very often used with structs, because structs can sometimes be quite large and pass by value makes copies. Making a copy of a large structure can be quite inefficient, and should be avoided. Passing by reference can be a convenient method to avoid unnecessary copying of large structures.

struct big_struct {
  int a;
  double b;
  char c[64];
};

void init_big_struct(struct big_struct *A) {
  A->a = 0;
  A->b = 0.0;
  A->c[0] = '\0';
}

int main() {
  struct big_struct B;
  init_big_struct(&B);
  return 0;
}

Designing all functions that work with structs to use pointers to the structs also has the added bonus of allowing us to be agnostic about how best to create the struct in the first place. Both of the following alternatives are legitimate:

int main() {
  struct big_struct B;
  init_big_struct(&B);
  return 0;
}

int main() {
  struct big_struct *B = malloc(sizeof(big_struct));
  init_big_struct(B);
  free(B);
  return 0;
}

The second alternative is even preferable in cases where the creation and destruction of the struct are far separated in time and space.

Handles

A variant on the strategy that uses malloc to allocate a struct is a strategy that relies on a utility function to do the memory allocation for you as part of the initialization process. Here is a variant on the last example that uses this approach:

struct big_struct {
  int a;
  double b;
  char c[64];
};

void init_big_struct(struct big_struct **A) {
  *A = malloc(sizeof(big_struct));
  (*A)->a = 0;
  (*A)->b = 0.0;
  (*A)->c[0] = '\0';
}

void free_big_struct(struct big_struct *A) {
  free(A);
}

int main() {
  struct big_struct *B;
  init_big_struct(&B);
  free_big_struct(B);
  return 0;
}

In this example init_big_struct works with a pointer to a pointer variable. Since we have a pointer to the pointer variable, we can modify the pointer variable by assigning to a dereference. init_big_struct makes use of this capability to allocate the structure with malloc and assign the address that malloc returns to the original pointer variable.

This technique of passing a pointer to a pointer is widely used. A pointer to a pointer variable is commonly known as a handle.

Opaque structs

The handle technique makes possible another common programming trick, the opaque structure. In this technique a set of functions in a package all work with a common structure. An added twist is that the exact nature of the structure is hidden from the outside world.

Here is some code to show how this works. Suppose we want to make a package that sets up a particular structure and then offers a set of functions designed to work with this structure. To do this we would first set up a header file, my_struct.h:

struct my_struct {
   int n;
   char mystery[16];
};

void init_my_struct(struct my_struct **A);
void do_something(struct my_struct *A,int x);
void free_my_struct(struct my_struct *A);

A client program would then look something like this:

#include "my_struct.h"

int main() {
  struct my_struct *B;
  init_my_struct(&B);
  do_something(B);
  free_my_struct(B);
  return 0;
}

Here now is the implementation file for this package, my_struct.c

#include "my_struct.h"

struct real_struct {
  int n;
  double y;
  int m;
  char padding[4];
};

void init_my_struct(struct my_struct **A) {
  *A = malloc(sizeof(my_struct));
  struct real_struct *B = (struct real_struct*) *A;
  B->y = 0.0;
  B->m = 42;
}

void do_something(struct my_struct *A,int x) {
  struct real_struct *B = (struct real_struct*) A;
  B->m = x;
}

void free_my_struct(struct my_struct *A) {
  free(A);
}

What is going on here is that the my_struct package advertises a structure definition to the world and then claims to be able to do things with that structure. Behind the scenes, though, the package is really working with a real_struct structure and not a my_struct. It can carry off this bit of trickery because real_structs are cleverly designed to have exactly the same size as a my_struct. The package takes any pointer to a my_struct that it receives and immediatly type casts the pointer to the actual structure type the package uses. This is essential, because the package is going to make use of some fields in the structure that do not appear in the version presented to the outside world.

In any case where the structure definition that a package presents to the outside world does not present complete information about what is really going on behind the scenes, we say that the public stucture, my_struct, is an example of an opaque structure. Behind the scences, all pointers to my_structs are really pointers to real_structs, and we say that the real_struct is the implementation type.

Why carry out this trickery? The main motivation is change management. Suppose the designers of the package in question suspected that their package may have to evolve over time and may eventually need to add new capabilities. As those new capabilities came on line, they may cause the package designers to change the real_struct data type. If the package designers had put the actual structure of the real_struct in the header file they would have opened themselves up to a problem. Once clients could see the actual structure of a real_struct they could write code that relies on that information. If the structure of the real_struct changed in a later version of the package, that client code would suddenly break. By hiding the actual structure in the package and only advertising an opaque version of the struct to the outside world, the package designers make it easier to change the real_struct definition in a later version without breaking client code.