Code Style & Taste
A blog where I share thoughts on code and practices

I Like and Use Global Variables

I know saying global variables will make some of you twitch. If that's you go to your happy place, otherwise strap yourself in. Not only will I show you how I like to use global variables, but I'll also show you in the hostile language C++.

Inside some_func() what do you consider as using a global?
int logLevel;
static int prv_counter;
int counter() { return ++prv_counter; }
struct MyStruct { int debugID; MyStruct() { static int serial; debugID = ++serial; } };

void some_func() {
	auto originalLevel = logLevel;
	logLevel = moduleLogLevel;
	auto count = counter();
	MyStruct data;
	printf("What have ");
	//code
	printf("I done\n");
	logLevel = originalLevel;
}
Take a moment to think.
Tap to see what you may want to consider.
  • logLevel doesn't affect the state of the program. It strictly changes the verbosity level.
  • counter() is implemented with a global but some_func isn't directly reading or writing it.
  • MyStruct constructor uses a static int which lives outside the stack and retains its value across multiple calls.
  • printf may copy your string into a buffer. You may not be able to access its state, but multiple threads can write partial lines to affect output.
I consider all of the above as using a global variable. Not only that but I think they are all good uses.
Here are bad uses
int logLevel;
static int counter;
struct MyStruct { int id; MyStruct() { static int serial; id = ++serial; } };
void clear();
char*print(const char*sz);

void func2(const char*title) {
	clear();
	print("at ");
	auto sz = print(title);
	//use sz
}

void some_func() {
	logLevel = moduleLogLevel;
	auto id = ++counter;
	clear();
	auto tempSz = print("some_func ");
	auto idA = counter;
	auto idB = counter+1;
	counter += 2;
	func2("before some_func code");
	//more code
}

If this code looks similar, then it's no wonder why you dislike global variables. You can't see the good from the bad.

print is a disaster. Unlike the first example, you can get access to the internal buffer. Do people expect tempSz to be append to or in this case overwritten after calling a function? Is the use case to call clear before using print? (which func2 is doing), or is it incorrect because the parent cleared it? Why is print returning a non-const pointer? Do some use cases concatenating text outside the print function? Is it mutable so you can shorten the text by writing null? All these potential problems make that print function a disaster.

A Few Rules For Using Globals

  1. It should be hard or impossible to use incorrectly. For example, counter() keeps increments consistent.
  2. If you change observable state, restore it.
  3. Don't return references or pointers to internal state. Just because a function is callable from anywhere, doesn't mean you should directly modify its state from anywhere. printf doesn't return a pointer. There's no way for you to hold onto its buffer long enough for the contents to be overwritten.
  4. Don't make it hard to test. You may load a config file, have only one instance of it, and have it read-only. This technically would satisfy the rules above, but really you just created a set of globally accessible hard-coded values. Not only is it nearly useless but people will be writing workarounds due to not being able to change the state. Having dozens of files use their own set of global variables to tweak behavior will quickly cause state to go out of sync and create a high amount of bugs. The harder the code is to test, the more likely someone will write a workaround to avoid limitations.

If you're using threads, global and static variables should be thread local. If they are not then the discussion becomes about sharing data across threads and synchronization, which is a different topic.

Let's take a look at one of my favorite uses
#include <vector>
class INode {
public:
	virtual void process()=0;
	virtual ~INode() {};
};
void append_work(INode*task);

class NodeA : public INode { void process() override { } };
class NodeB : public INode { void process() override { append_work(new NodeA); } };
class NodeC : public INode { void process() override { 
		append_work(new NodeB); append_work(new NodeB);
	}
};

//Assume everything below is private
thread_local std::vector<INode*>workList;

void append_work(INode*task) { workList.push_back(task); }

void process_nodes(INode*startNode) {
	//Save state so this function can be reentrant
	std::vector<INode*>original;
	original.swap(workList);
	std::vector<INode*>current;
	workList.push_back(startNode);
	while(workList.size()) {
		current.clear();
		current.swap(workList);
		for(auto*node : current) {
			node->process();
			delete node; //You can avoid this with arenas
		}
	}
	workList.swap(original);
}

int main(int argc, const char*argv[]) {
	process_nodes(new NodeC);
}
Let's go through the checklist.
  1. ✅ It's hard to use incorrectly. Nodes only have access to append_work. They can't clear the workList or modify nodes on it
  2. ✅ Original work list is restored.
  3. ✅ workList isn't returned and nothing inside it is returned.
  4. ✅ Not hard to test. You can use multiple threads or reorder test just fine. The only thing that affects process_nodes/workList is the node passed in
Let's look at a bad example
#include <string>

//Assuming we're using a language that forces us to use static variables
class Globals {
	static std::string personName, personEmail;
	static int personNumber; //technically wrong, supposedly there are numbers with leading 0's
public:
	static std::string getName()  { return personName; } //a copy is made here
	static std::string getEmail() { return personEmail; }//and here
	static int getNumber() { return personNumber; }

	static void setName(std::string name) { personName = name; }
	static void setEmail(std::string email) { personEmail = email; } 
	static void setNumber(int number) { personNumber = number; }
};

void second_func() {
	auto originalName = Globals::getName();
	auto originalNumber = Globals::getNumber();
	Globals::setName("New test");
	Globals::setNumber(321);
	//do work
	Globals::setName(originalName);
	Globals::setNumber(originalNumber);
}

void some_func() {
	Globals::setName("Test Name");
	Globals::setEmail("Test email");
	Globals::setNumber(12345);
	second_func();
}

int main(int argc, const char*argv[]) {
	some_func();
}
//Ignore these, C++ requires this
std::string Globals::personName, Globals::personEmail;
int Globals::personNumber;
Let's go through the checklist.
  1. ☢ It's extremely easy to use incorrectly. Did you catch the problem?
  2. ✅ Let's you change and restore the state
  3. ✅ Doesn't return internal objects, ⚠️ but this sure does make pointless copies
  4. ✅ Not hard to test except, for how error-prone it is to write the test (first rule).

Is breaking one rule that bad? YES! A person not familiar with the codebase may not notice the email is being used with the wrong name/number when entering "do work" in the second function. Imagine if you add a fourth attribute, are you going to correctly add and restore everywhere?

The third rule is misunderstood here. The getters/setters suggest that these can be read/write from anywhere. If that wasn't the case the function/file/module should access the variable directly. Since anyone can set the name/email/number however they like, it's clearly not internal state. The copies make no sense and could get in the way. Here's what I would like to see

#include <string>

struct Person {
    std::string name, email;
    int number;
};
//Assuming we're using a language that forces us to use static variables
class Globals {
public:
    static Person*activePerson;
};

void second_func() {
    Person secondTest("New test", "", 321);
    
    auto originalPerson = Globals::activePerson;
    Globals::activePerson = &secondTest;
    //do work
    Globals::activePerson = originalPerson;
}

void some_func() {
    Person test("Test Name", "Test email", 12345);
    
    auto originalPerson = Globals::activePerson;
    Globals::activePerson = &test;
    second_func();
    Globals::activePerson = originalPerson;
}

int main(int argc, const char*argv[]) {
    some_func();
}
//Ignore this, C++ requires this
Person*Globals::activePerson;

Now it's impossible to have the wrong name/email/number used together since all fields are updated when you change a single object. You may notice the active person will always be null when you enter the function so it's pointless to take the active person and restore?, no, that doesn't matter. If some_func is a public function it may be called anytime. You don't want code to break because you made the assumption that you can always overwrite a variable.

Many languages have a defer feature that allows you to execute code when you leave the block or function. Using it allows restoring variables to be less error-prone. C++ does not have defer, but can simulate it with a macro and a destructor. I write PUSH(globalVar, newValue); and the macro+destructor handles the rest.

Some libraries have a context variable they want you to put state in. All of the above applies to that. If you change state you likely want to restore it. If you want to avoid globals by passing arguments around, you still need to remember to restore what you change or you could have separate arguments for every set of variables you mutate. If you do the latter and there are a lot of call sites, it may make the code hard to change on account of how many lines/functions you would need to modify.

Closing Thoughts

Global variables are extremely useful. When used right it can make your codebase feel less like spaghetti on account of how few places your object can be stored to. People may remember re-using variables as a beginner and having bugs because a value they needed was overwritten. We figured out we shouldn't reusable variables, I'm sure we can understand you shouldn't overwrite global variables with complete disregard for the value it was holding. If you never use global variables, start on a personal project. I find globals the most useful when walking over a tree, the next time you find yourself walking over a tree, try using them to collect stats or be where you write your output. Good Luck and remember those 4 rules.