How I Use OOP
I watched Casey Muratori talk The Big OOPs: Anatomy of a Thirty-five-year Mistake. I like it, and I saw someone leave a comment about how people are for or against OOP, and not many use it in the right place. I thought it'd be fun to write about how I use objects.
The kind of software I write
I write complex software. The most complex has been my language+compiler. It's a non trivial compiler, it has automatic memory management without rc or a gc, features not seen in other languages (error blocks and on statements are two), and static analysis, so I could have compile time bounds checking on arrays and slices. Here is an example of how that works.
At the moment I'm working on an IDE called Bold. Bold is completely multi-threaded, has to deal with third party code, which has been a pain, and has a ton of custom data structures so opening a 1gb file doesn't take up 10gb of memory (especially after highlighting). The kind of code I write may be very different from yours.
The Most Important Idea
I look at data and state as separate things. What's the data and what's the state in this example?
class Stack {
T*ptr;
int size, capacity;
public:
void push(T val);
void pop();
};
void exampleUsage(Stack<Point>&stack) {
stack.push({x, y});
stack.push({100, 200});
}
All the members here are state to me. My data is what I put into the object (the two points I pushed,) I never want to think about the capacity or pointer. I might want to know if the stack is empty or not, but I never care about the value of 'size' so it's just state that I might query once in a while.
Generic Functions
Another object I use are file and memory streams.interface IStreamWriter {
void print(Text text);
void write(Bytes data);
void unicode(int glyph);
}
Clearly I put data in. I don't care about the state/internals of the writer I call.
What about this?
struct MenuItem {
string title;
int actionID;
Array<MenuItem>*subMenu;
};
struct Project {
int projectID;
string path;
Setting*setting;
map<string, Array<MenuItem>> menus;
}
This is data. I absolutely do not want to couple data to functions. One piece of code may be responsible for adding/removing a project, but it doesn't have any idea about menus. The code that creates the menu may want to know the project path is or what the settings are to apply customizations, but it doesn't care about the rest of the project. The rendering code will use the menu data to display a project specific UI, it may also display the project path, but the rendering code should never ever be part of a project code. These two structs are data and shouldn't have functions.
Trees & Virtual Functions
Sometimes I parse documents (or source code) and build treesinterface IStatement { void buildAST(); LineCol lineInfo(); }
struct IfStatement : IStatement { public: RValue cond; IStatement trueBlock, falseBlock; }
struct DeclareStatement : IStatement { public: StringNode type; Array<StringNode> vars; }
struct ContinueStatement : IStatement { public: StringNode keyword; /* used for lineInfo */ }
This is different, there's is no state here, there's no private members, there's data and virtual functions. Some reasons I may want virtual functions are
- I have many objects (let's say a large source file) and would rather have a virtual function than a switch statement that mispredicts multiple times per input.
- I look at a node more than once and want the functions near eachother.
- Debugging, sometimes a tree is complex enough that I want to print a graph or diagnostics, which I call in a log statement and in a debugger. If I were using a tree without virtual functions, I would have a very bad time in languages that don't support function overloading, and if the debugger can't use overloaded functions I wouldnt be able to look at it there
Reviewing The Above
- The first example is a container. It holds my data and I don't want to think about its state. I usually care about the implementation since an array, double-ended queue, and set have different characteristics.
- The second is an object that consumes my data, it's generic and uses virtual functions. If a function uses an interface it doesn't care about the details, which is common when doing IO. If I'm writing text I may not care if the stream is compressing, writing to file, writing to a socket, to null, or encodes it so it can be written elsewhere (ex: bytes to base64 encoder to write into a json file)
- The third is a tree with no state, just virtual functions and the tree node type. The start of a line in a source file might be a if statement or a return statement which has nothing to do with eachother except that it's valid to use at the start of a line.
Just because I don't want to think about state, doesn't mean data should be hidden from me. Data and state are different things. Data is mine, state is the object I'm using. If I have to think about state it's a bad object or it should be data that I control outside of the object
An Example Of What I Don't Like
I like C# so I'll pick on it. Take a quick look at the code in (Introduction to Razor Pages in ASP.NET Core). The first codeblock shows many app.UseSomething()
functions. I heavily dislike it because
- What happens if I call app.UseExceptionHandler twice? Could there be more than one handler based on the state of app when I called the function?
- There's many functions and just about all don't take a parameter, why aren't these set by using a flag enum?
- Does order affect anything?
- Is anything required? If I don't want to use razor pages do I need to call something in place of MapRazorPages?
- If I config both builder and app, would that not complicate some code? Is there a situation where I need to call a function with the builder than remember to call a second function with app?
Maybe you'd think my questions are silly and this is standard. But it's outright bad. Here's what I'd prefer
var webOpt = WebApplication.OptionsFromFile("file.config")
webOpt |= WebOpt.RazorPages
WebApplicationAppFlags appOptions;
var handlers = new WebHanders;
if (Environment.IsDevelopment()) {
handlers.ExceptionHandler = "/Error";
appOptions |= WebApplicationAppFlags.HSTS;
}
appOptions |= WebApplicationAppFlags.HttpsRedirection
| WebApplicationAppFlags.StaticFiles
| WebApplicationAppFlags.Routing
WebApplication.Run(webOpt, appOptions, handlers, MapRazorPages);
This is better because
- There's clearly one exception handler
- Using a flag variable IMO looks much better than calling many parameterless functions
- The order we write the flags does not matter
- What is required is a parameter. Here I imagine MapRazorPages as an enum of several options
- The code is simple. If there was a situation where I needed to call two functions, one for the builder and a follow up with app, it can now be a single call.
In Closing
I wouldn't be surprised if this is obvious to many of you. However, there are a lot of people writing Java and C# code in the style of the C# example. IMO that code is horrendous. I hope people start showing different ways of encapsulation that worked for their problem.