In this article I'll show an example where avoiding a global variable has led to a bug, I'll define what global variables are, explain the problem, and then give examples where I have used them successfully.
Global Variables Are Not the Problem
We're all taught that global variables are bad. They can be modified from anywhere, sometimes force functions to be called in a specific order, and can be impossible to debug if the program is large enough or the state is random enough. We're usually taught not to use them within the first year of programming, but many of us never figure out when we should.
First, we'll look at code without globals. Here we want to see how many times the 'simple' function is entered before it throws an exception (not shown) so we can set a breakpoint at the top of the function. There's a bug in our counter logic. If the problem isn't obvious you may find this frustrating.
let counter = { count:0 }
let obj = { counter:counter };
function simple(obj) {
console.log(++obj.counter.count)
if (obj.counter.count == 123) {
//let's set a breakpoint before the exception
}
/* rest of func with buggy logic */
}
function complex(obj) {
let temp = structuredClone(obj)
simple(temp)
simple(temp)
}
simple(obj)
simple(obj)
complex(obj)
simple(obj)
If you run the code you'll see 1 2 3 4 3 printed instead of 5. Before I tell you the problem let's look at the version that uses a global variable, which runs correctly.
let count = 0 //global
let obj = { };
function simple(obj) {
console.log(++count)
if (count == 123) {
//let's set a breakpoint before the exception
}
/* rest of func with buggy logic */
}
function complex(obj) {
let temp = structuredClone(obj)
simple(temp)
simple(temp)
}
simple(obj)
simple(obj)
complex(obj)
simple(obj)
The problem is that structuredClone made a deep copy of our object. When the complex function executed the simple function, the wrong counter was modified, causing us to see repeating numbers and making us unsure of the correct time to breakpoint.
Now that we can see avoiding globals can still be problematic let's define what a global variable is and what counts as using one.
Defining Global Variables
My definition of a global variable is any variable that isn't passed in as an argument or defined within the function, Here are some types of variables across different languages.
- Global: These are defined outside of a class and function and visible to other files. I rarely use these.
- Private/Static: These are global variables that are not visible outside of the file. I primarily use this.
- Thread Local: These are global variables (may be static) that have a unique instance on a per-thread basis. When I work with threads I'll avoid all types of global and use thread local instead.
- Static Member: A variable you have one of. In some languages, you may use this to have a read-only 'empty', 'min', 'max', and other variables. I generally avoid this when possible.
- Static Function Variable: In C inside a function, you can declare a static variable. You could consider this as a local variable but I don't since it's on the heap, and can be returned and modified outside of the functions. I absolutely avoid this except as a counter whose address I never return.
Would you consider the below as using a global variable?
- Calling a function that uses a global internally. As an example instead of ++count in our code above, we could call inc() which increments a counter and returns the value.
- Calling a function that has side effects that we can't easily see. For example, print, writing to a file, and writing to an audio device.
- Mutating variables that don't change the state of our program. Such as a logger verbosity level, or a variable to enable recalculation of some data so we can detect if logic is wrong.
The Problem?
The problem is data access. Nothing more, nothing less. There's a term for this that has nothing to do with global variables: "action at a distance." If a program keeps copies of a pointer you passed in, you may have objects that affect another when you had no idea they had any association. One reason people like to clone objects is to avoid unexpected mutations. However, in the example code above, a clone actually causes the problem.
When you make a mistake with global variables it's pretty easy to blame the fact that it's global. People usually don't recover from mistakes by cloning the global and restoring the value. It's also easy to have bad habits when using them. Beginners might be lazy and use a global variable instead of changing the signature of a dozen functions, which blows up in their faces when they overwrite a value they need.
Global Variable Use Cases
I try not to write long posts. If you'd like code samples let me know and I may write a follow-up article. Here are some cases where I like to use globals.
- A counter to print how many times I entered a specific function, which I may use to set a breakpoint.
- A serial ID to easily tell two objects apart that have the same contents (pointer addresses aren't reliable)
- Logging, custom allocators, thread-safe database connection (or memory) pool.
- A message queue or append-only work list. For example, let's say I have a function that processes an event that may produce many events that also need to be processed. The entry function may allocate or clear the current worklist and global worklist. The first event is added to the current, then the function processes the current worklist in a loop. The events are called via virtual functions, which will append work to the global worklist. Once the current worklist has no more jobs it is cleared, then swapped with the global worklist. Then repeat until all events/jobs have been processed. It's hard to mess this up if the only operations are append and only the entry function does a swap (and clear)
- An active file/buffer/node that is set in an entry function and used while walking a tree. You'll need to restore the original active node if the function is recursive. Instead of having an active global you could pass in a context object to each function. I once had over a hundred types (I'm being literal, they all overloaded at least two virtual functions) and it felt extremely verbose to see a context argument passed into nearly a thousand call sites. Most of the tree would only pass the context to a leaf node which made it feel even more unnecessary.
With a little encapsulation, you can make globals error-proof, after all, no one ever complains about print or memory allocations except for having too much of them. As an example of encapsulation the global counter in our original example could be an inc() function. The append-only worklist could either be accessed behind a function or a type that only allows append operations.