Mainstay Monday: Linked Lists

Dear data structures, which of you is most useful? I don’t know, but linked lists are pretty awesome. Linked lists are great for any number of things, a list of data that can be grown without bound, a data structure that can be incrementally increased, a queue or a stack. There are even more things that can be done with linked lists, but before we can dig into what can be done with a linked list, we need to understand what it even is.

The most basic linked list is a series of objects that point from one to another. This is what we are going to dig into today. To get a basic understanding of how a linked list works, let’s have a look at a basic diagram of a linked list.

|Object A| -> |Object B| -> |Object C| -> null

By this diagram, it makes sense to say that an empty list is null. It’s not like null or falsey, but it’s actually null. Null contains nothing and points to nothing. As soon as you put your first element into the list, you get this:

null
=> |Object A| -> null

The last object in our linked list, always, is null. This actually makes it really easy to identify when we have hit the end of the list. This kind of list is called a singly-linked list. This means we can only traverse the list in one direction. Typically when a linked list is created, the head of the list is held as a pointer. Each object, including the head is an object and they are all, essentially, the same. Let’s have a look at an implementation of a basic linked list item.

There’s really not much to this, but the important items here are the constructor which sets the internal value, which I am treating as read-only for our case. Let’s just say it’s better that way. After that, we need an accessor, so let’s use val. Finally we need to be able to set the next value in the list and retrieve the next value in the list; append and next will do just fine for this. Now if we want to create and use our list it would look like the following:

This is a pretty manual process, but it gets the job done and we can see the basic use of our linked list object. Now we can see how each of the objects links to the next and the last object always refers to null. This gives us a nice, predictable structure and clear, obvious algorithm for accessing each element in the list.

Let’s add some convenience functionality around our list so we can dig into some of the interesting characteristics of a linked list. We can create a list object and an iteration object. These will give us a nice API to work with.

Here’s what our original usage looks like once we wrap everything up:

This is much cleaner and it gives us a little insight into the first aspect of linked lists, access performance. If we want to access the first or last element of our list, it happens in constant time, which we can express in big-o notation as O(1). This is really fast. It’s just about as fast as you can get for value access.

On the other hand, we also lose something for this gain at the front and back of the list. Accessing any of the elements in the middle can only be accessed in linear, or O(n), time. This means, to reach the nth element, you have to cycle through each element before it. For small lists this is not a problem, but for large lists, this can be a problem.

These performance characteristics make linked lists great for small data sets and things like stacks and queues. Linked lists, however, are not suited for random access or repetitive search. Sorting’s not great either, but that’s another discussion for another day. Let’s look at accessing elements.

Modification characteristics, on the other hand, are fantastic. If you need to add elements to a list, it’s fast. The addition action is nearly as fast as reading the head or tail of the list. If you have the list element you want to insert after, you get an O(n) insertion speed. The most costly part is the instantiation of a ListItem object. Each time you call append, it just adds an element and you’re done. Speedy!

At the end of the day, there is another kind of list: the doubly-linked list. As it turns out the performance characteristics aren’t terribly better. You get the benefit of moving up and down through the list, but access is about the same speed as is appending.

Linked lists, by themselves, are useful for the purpose of fast writing and small memory footprint for storage expansion. It also doesn’t require any pre-allocation, and can grow incrementally. When we look at other data structures, linked lists can make a good foundation structure because of the fast write behavior. There is also a removal characteristic that is equally fast. We’ll take a look at those structures in another post. Until then, think about your data and how you need to use it. Perhaps a list is what you’ve needed all along.

What the Heck is a Functor?

With functional programming ideas becoming more popular in languages not known for functional practices, words that are understood in functional and mathematics worlds are starting to surface in discussions outside of academia and in business solutions circles. One word people like to throw around is “functor.”

What the heck is a functor?

It turns out a functor is just a function that maps one set to another set. That’s really it. Functors come in a variety of forms, but they all, at the most abstract level, do the same thing. The particulars of how the mapping happens is what makes your function a solution to a distinct problem. To start off, let’s have a look at a mathematical functor and the underlying logic, then we can turn that into code. I promise this will be gentle.

Φ() is a one-to-one, onto function that maps x -> 3x.

A = { 1, 2, 3, 4 }
B = { 3, 6, 9, 12 }

Φ(A) = { 3, 6, 9, 12 } = B

=> Φ(A) = B

Φ(S) can be represented by the function f(x) = 3x

Sooo mathy.

Fortunately, Javascript can easily convert this into something that is a little less symbolic and a little more like what we might do in our day-to-day lives. Let’s take a look at a Javascript equivalent of performing the same operation. Incidentally, we are going to use the map function to accomplish our goal. Let’s have a look:

Mainstay Monday: SOLID – Dependency Inversion

This post is part of a series on the SOLID programming principles.

This is it, the final SOLID principle. Dependency inversion is probably one of the most talked about SOLID principles since it drives so much of what we see in programming today. From Inversion of Control (IoC) libraries to the dependency injection in modern Javascript frameworks like Angular and Ember, popular OO programming has embraced this principle more than any other. It’s possible that the success of dependency inversion is related to the fact that it can be enforced with a technical solution.

Let’s start by talking about what dependency inversion is. As Uncle Bob Martin says, dependency inversion is the dependency on abstractions instead of concretions. More specifically, this means that any object or function should not rely on the existence of specific concrete parts of an object, but instead, it should expect and use a contract while letting an outside entity define the concrete object that will be used.

To demonstrate this idea, let’s first take a look at a counter-example to our principle. This is probably one of the most flagrant violations of the dependency inversion principle I have encountered in Javascript and it is right on a popular examples page. The following example is lifted directly from that examples page:

Clearly, this is an example from the Backbone site. Backbone, you’re doing it wrong.

If you look back to my post on dependency injection, you’ll see we could easily create a factory for each of these instance-creation lines. It could be simple, like the following code:

It’s a few extra lines of code, but the separation of concerns you get can make a huge difference when you write unit tests or if the creation of a message or mailbox ever becomes more complicated than simply calling new. By inverting the dependency chain on this small example, we centralize our creation, eliminate the new keyword from our code and provide a facility for easily injecting a substitute factory to help in testing that our code does only what it is supposed to do.

The other thing that happens when we break out the creation logic is, it becomes obvious what is really happening in the code: we are creating a messages object that is, really, an event handler. Now we can isolate this behavior fully and put guarantees around the actions our model will take when we trigger certain events.

Let’s take a look at the other side of the picture and write a little Jasmine code to test our message model.

If we had tried to do that without our factory, it would have been a lot more challenging to wrap everything up in tests. Since we broke out the new object from the mailbox definition testing became as easy as replacing our definition for the factory and we got direct access to everything we needed inside the object setup.

Finally, when we separate our object instantiation out from the core body of our code, it becomes much easier to modularize all of our code and separate pieces as they become unwieldy. This gives us better guarantees around the stability of each part of our code, since each module can be written and tested independently, breaking the code apart at intentional seams.

The takeaway from all of this is, when we invert our dependencies in our code and rely on abstractions to define our program, we improve stability, guarantee contract integrity and improve the testing story. All of these things add up to better programming and more stable output. Isn’t that really what we all want?

An Open Letter to Single-Language Programmers

I work with several programmers every day and I end up giving advice a lot. Let’s not discuss whether people are actually soliciting my advice, or not, but I give it regardless. This evening I started thinking about things I think every programmer should experience. There are lots of different topics that come to mind, including math, hardware, networking and so on. One thing that keeps coming back to me, though, is languages.

When I started studying computer science in college, I thought, “learn one language, you basically understand them all.”

Incorrect!

Some languages are very similar to one another, but each one has its own specific flavor or idiom. Some languages, on the other hand, are radically different from anything that is used in the mainstream commercial development landscape.

Early in my career resurrection I bounced from language to language, always staying close to home. PHP, C#, Java, JavaScript and back again. I wrote the same way in each of these languages, only doing things slightly differently only because there was a slightly different method to dig into what I needed.

The first time I really got a clear understanding that all languages are not made equal was when I built a content cache in Java. Everything was in the runtime and I just created a static cache instance that lived outside of the current control flow and stored content I could retrieve later. I wasn’t really breaking new ground, but I had to deal with one thread safety, cache invalidation and so on.

I swung back around and decided I was going to do something similar in PHP. Lo and behold, I discovered that no matter what I did, there was no way I could spin up a long-running thread or long-lived memory location I could easily store and retrieve anything from. The PHP lifecycle is akin to the lifespan of a housefly.

That was the first time I got a really clear picture of how different languages could be, even when they feel similar to an outsider.

My next eye-opening experience was when I decided I was going to look into a functional language and started seriously playing with Clojure. For the record, I love Clojure. It just feels right to me. I don’t know why, but it does. Moving from languages which borrow syntax from C to a Lisp really changed the way I think about programming altogether.

All of a sudden my view of programming took a hard right turn and I stopped thinking about shared state and started thinking about data transformations. State became nothing more than a transition from one data form to another.

My day job revolves almost exclusively around writing JavaScript and I couldn’t stand looking at sets of loops and conditional blocks. It was a rat’s nest of too many variables and too much ceremony. Every line looked like a duplication of code and all of the code looked like a bomb waiting to go off.

Clojure changed me.

This doesn’t mean that I spend my days writing Clojure now, in fact I still write JavaScript, but every language I have touched has changed me a little. Java taught me about working in a long-running environment, PHP taught me about short-lived scripts. C# taught me about data structures and Clojure a lot about data management.

In the end, the reason I continue to work with JavaScript is because it reminds me a lot of myself. It draws upon a little bit from a lot of different influences. The important thing is, JavaScript reached out and touched all of those things and brought something back with it.

I share all of this because I want to say to anyone who has gone their entire career only ever writing in a single language, try something new. Reach outside of your comfort zone and do something that looks nothing like the work you’ve done until now. Write a program that scares you. Use a language that changes you. Make yourself better and never look back.

Similar posts in Coding, General Blogging, Javascript

Sanitary Data: Maybe, Either and Deref

Although there are times when conditional blocks are necessary for functionality, it is common that conditions simply wrap up data which may be null or undefined. These kinds of conditional blocks have become so common that they are considered idiomatic code. The underlying goal, however, is to guarantee sanitary data.

Sanitary data is data that is guaranteed to be safe for use in your function without worrying about edge cases which could arise around data that could be subtly wrong or unsafe. Sanitary data is more than just data that exists. It is data that adheres to a set of qualifications which make the program stable and safe. These qualifications include things such as truthyness and type guarantees.

Since it is considered idiomatic to bundle up data in conditional blocks, often times functionality is wrapped up in the condition, muddying the waters between sanitary data and function execution. I would like to break that idiom with a concept introduced in Haskell, known as maybe.

In Haskell, maybe(a) returns Just(a) or nil. Javascript does not have algebraic data types, Just or nil, so we are going to have to adjust a little. That said, there are still some guarantees we can put into our maybe that will flow forward to other variables. Let’s have a look at an implementation of maybe that will set the foundation.

As you can see, it gives you a strong guarantee of what your data should look like. If your data is acceptable, you will just get your data back, or Just(a). If your data fails the provided conditions, you will get null, which is as close as we can come to nil in Javascript.

Maybe alone will not give us the kind of data integrity we want. This reduces the conditions we have to consider down to 2, which is much better than the open landscape of the unknown, but it is not sufficient alone. As you can see in the usage section above, we still are testing our ‘sanitized’ value. Our goal is to remove as many of our data conditions as possible, so we still have more work to go.

When maybe isn’t enough, we can look to another function: either. I use either so often that if I am ever without my functional library which provides either, I build it in as a utility. It’s honestly that important. Let’s take a look at what either looks like.

As you can see, either gives us the strength of maybe with the sanitary nature of a safe default. Now we have removed all of the conditions from our small function. This is excellent! Now we can guarantee our data is sanitary even in a hurricane of strange calls and bad user data, and it’s a tiny helper function that declares exactly what it does every time we use it.

The extra benefit we have with functions like maybe and either is they really do introduce the idea of function purity in a way that works with our everyday, practical application. We can write functions that should never throw type errors, never get null pointers and always give us clean, safe data.

Except…

What happens when you need a value that is buried deep in an object? What if any layer in your object might not exist? This happens a lot. Actually it happens so much that I found myself using large blocks of either statements to guarantee I didn’t break my function. This is bad. We don’t want big blocks of function calls when our entire goal was to remove large blocks of code around data to begin with.

Enter deref. Deref is the automated application of either over and over, ensuring we always get good data all the way through our object. Let’s take a look at what deref could look like:

Now we really have a set of functions that break us out of the data conditional cycle of madness! With as few as one or two lines, we can provide strong guarantees that our data will always match the expectations of our code. By introducing three simple helper functions, we have reduced our code down to the essence of what we mean. We no longer have to spin our wheels cleaning up data or ensuring we aren’t sending bad values through functions that could introduce dangerous runtime errors that could crash our applications when the user does something unexpected.

What is really happening underneath these helper functions is, we are creating sanitary data in an isolated way. Most programs, when you scratch beneath the surface, are little more than data manipulations. By placing strong guarantees on the data we are interacting with, we can stabilize our programs and reduce the amount of stress we feel about strange data circumstances and edge cases that are, largely, unimportant to producing meaningful output.

If you don’t want to litter your code with copy/paste functions from a blog post, download the JFP library and use these functions which are pre-built to make your code sane. Read the docs to see other ways you can make these functions work for you.

The next time you find yourself writing large conditional blocks to handle strange data, think about maybe, either and deref and then breathe a sigh of relief. When we introduce sanitary data into our code and think a little more functionally, the world looks a little brighter, so make your unit tests and your QA person happier and sanitize that data!

Mainstay Monday: SOLID – Interface Segregation Principle

This post is part of a series on the SOLID programming principles.

The Interface Segregation Principle is a close relative to the Single Responsibility Principle. The idea behind interface segregation is your API should use several small functions or methods to accomplish tasks instead of one large function. The traditional definition states you should use many client-specific interfaces instead of one general purpose interface.

Since Javascript doesn’t have support for abstract classes or interfaces, we are going to focus on the functional application of interface segregation. Let’s start off supposing your program is going to deal with a few different cases for objects which will be handed around and managed. You know that your are going to potentially receive objects, which you want the keys from, arrays which you want just the strings out of and you will be receiving a JSON string from the user through some request or input. Here’s the way a single function looks when we try to manage all of these cases:

Obviously this fails single responsibility, but there is more going on here. This function receives two different boolean values and changes the way it behaves based on configuration values. This is a dangerous path to walk and I strongly suggest people avoid it. The other problem that we have here is practically every executable line returns a value. This whole function is a setup for danger.

Note, I have actually seen functions like this. This kind of practice is easy to find in the wild, especially when looking at the code written by novice developers. There is also a variation on this function which handles the creation of booleans inside the function. The code, when rewritten looks like this.

I’m not sure which is worse, but we are really solving three completely different problems with this code. Let’s suppose, instead of all the booleans, we were to start breaking this function down and solving the problems independently. This is where our segregation comes into play. We have one case where we want object keys. By inspection we can see this is not related to the array problem or the user entered data problem. Let’s split that function out.

This function clearly cuts straight to the heart of what it does. Now we can take an object and safely capture the keys This reduces the cognitive load to understanding the cases when each boolean should be passed and whether or not something will go wrong with the code if our cases go wrong. More importantly, any place in our code where we need to call this function can do it without any knowledge that our program could ever receive arrays or user defined functions. Those behaviors are completely outside the scope of this particular piece of functionality.

Let’s deal with our array logic.

This is another one-liner, but it serves a very specific purpose. We no longer have this bundled in with our object or user input logic which means we can understand precisely the roll it plays. Now our code can safely assume it will always get the same information back, so we can call our array function in the correct context and reduce the overhead that accompanies a single, general-purpose function.

Finally, let’s have a look at our segregated user input function.

This one is the biggie of the bunch. User data is notoriously unreliable and this is the one that muddied the water the most. Originally we had a return statement in the try and one in the catch block. This seems like a terrible idea. More importantly this really added a lot of complexity to our original function since we, not only, had to know this was user data, but we had to drop in a block to handle any of the fallout that happens when JSON.parse is called on something that won’t parse.

With this function isolated, we get the same benefits we would get with the segregation of the other parts of our function, but we also get the added bonus of being able to rewrite this function without having to dirty up a general purpose function’s scope with a bunch of variables which may never be used in any of the other behaviors. Now we can clearly define a single entry point and a single exit. This function starts to approach the kind of purity we like when we want to wrap things up in unit tests.

Let’s take a look at one last element of the interface segregation principle. We have looked at how interface segregation and single responsibility work together to clean up a function that increases cognitive load, let’s take a look at the value of wrapping up general purpose behaviors in specific purpose functions. This is where interface segregation can really shine and simplify your programming.

Below is a set of functions I’ve created to demonstrate satisfying specific needs and reducing the exposure of general purpose functions to our code in the large.

Here we can see several things at work. First, we have wrapped up the filter function in a few convenience functions which give us a specific type of output. This is great for sanitizing data as well as function composition. With each of the produced functions, we can provide value: filtering strings or numbers, filtering strings of a certain length or filtering only numbers which are even.

What is even better is, we can actually use these functions in a composite way to build more complex functions or new functions that do something different. Imagine if we had to do this directly in our code. That would be a LOT of duplication and we would have to interact with the general purpose filter function every time.

We’ve looked at two interesting cases around the concept of segregating our interfaces and providing solutions for problems which can be reused throughout our code. First we looked at how interface segregation and single responsibility principle are related, and how one strengthens the other. Then we had a look at wrapping up broad-use functions in solution-driven structures to simplify the process of solving problems in our program.

Interface segregation is a strong principle for simplifying code and providing a clearer strategy for solving problems. It works hand in hand with other principles to make your program cleaner, simpler and more stable, which is what we all really want, isn’t it?

All in the Family: Filter, Map, Reduce, Recur

In programming it is becoming more common to see functional patterns. Filter, map and reduce are all discussed openly in programming circles that live far away from Lisp, Haskell, ML and other functional languages. It is common, in Javascript especially, to see these functions being misunderstood and misused. My goal is to uncover the relation between these functions and provide a really deep understanding of what is going on when these common functions are use correctly.

By the end of this post, I hope you will see the relationship filter, map, reduce and recursion all come together in a harmonious way, allowing the programmer to transform data without loops and without brutish, heavy handed data manipulations. This journey will be part technical understanding and part new-age enlightenment. Don’t worry if everything doesn’t make sense on the first pass. This is a deep dive into the world of functional programming and a departure from the imperative methods people commonly use.

Let’s start off with reduce. Reduce is a function that, as you might have guessed, reduces a list of values. As a side note, in my little corner of the world, I think of arrays as lists and I will refer to them as such throughout this post.1 A common example of reduce is adding numbers. An adder is a simple function to implement and allows for a very simple explanation. Here’s what it might look like:

I want to dig under the surface of this behavior and understand what is really going on with reduce. Let’s take a look at a way to implement reduce. The first implementation is just going to use a standard loop. Here’s what it might look like:

It’s not elegant, but it gets the job done. We loop over everything in the list and apply a reduction. You can create an add function and a list of numbers and try it for yourself. Anyway, if you really want to do this functionally, you would pull out our good friend, recursion. Recursion is a more mathematical means for looping over a set of values, and dates back to some of the earliest prototypes for computing back in the 1930’s.

Before we go any further, I want to introduce a few short functions that will make everything a lot more functional. In our loop-based function would have gotten little utility out of these, but moving forward these are going to be very important.

In the next function we are going to use recursion to handle the looping so the body of the function only needs to be concerned with a single step in the process of reducing our list. Now that we have those out of the way, let’s crack open a functional version of reduce and see what we can find. Let’s have a look.

If this is your first time digging into the world of functional programming and using functions like first and rest, you might want to stop and absorb all of this for a moment. Reduce is a rather complex transformation function that requires keeping a fair amount in your head, so this is a lot to absorb. Another challenge we encounter in Javascript is the lack of pattern matching which would simplify this function significantly. Nontheless, that’s a pretty heavy change from where we started at the beginning of the post and we still have more of this mountain to climb.

For sufficiently small lists of values, this reduction technique will work fine, but as the list gets too big, our reduce function will begin to slow down and fail. This is because recursion in Javascript is not tail-optimized, so each call goes on the stack which will eventually overflow. This overflow is the primary reason why many imperative modern languages discourage recursive algorithms.

Clojure introduces an idea that helps us to remedy this issue. It is possible to use recursion inefficiently in Clojure and fill the stack, however, by using the recur function and calling it at the end of your function, you get the tail optimization you are looking for. Similarly, the JFP library offers a recur function that allows for tail-optimized recursion.2 Let’s rewrite our function using the JFP recur function.

Phew! That was a long walk to get to a really efficient and effective reduce function. It’s elegant, declarative, and it uses tail-optimized recursion so we can really say we are operating in a functional way from the ground up. What’s even cooler is, now we can see that even recursion can actually be looked at from a different angle and managed as a function instead just a concept. However, the title of this post mentions filter and map as well. Fortunately, we don’t have to take the same long walk to get to those functions. We already have nearly everything we need already: looping, function application, even data copying!

Let’s start with filter. Anyone who has used filter correctly understands the power you hold when you start manipulating lists of elements and stripping elements down to the bare bones, leaving only the data you need. It generally looks a little like this:

If we peel back the skin and look at what is happening in filter, we can either look at it as another recursion. This means that filter is a function of recursion, just like reduce. Here’s an idea of what that would look like:

This is a lot like our implementation of reduce. The problem is it’s actually so much like reduce we are actually duplicating code. Duplicate code is a bad thing when you already have a function that will do what you need. Let’s rewrite our filter function as a function of reduce.

If we think of filter as a function of reduce, then, all of a sudden, almost all of the logic goes away. Our original function was so close to reduce that they could have been separated at birth. The only thing we really need to make filter unique is a wrapper function to evaluate our predicate and capture the values that pass. Filter is a reduction function. In much the same way, map is also a reduction function. Here’s an implementation skipping all of the intermediate steps and drawing from what we know about filter:

That’s it! Map and filter are clearly little more than a wrapper around the reduce function. With the relation to reduce, we can say that filter and map are functions of reduce. We have essentially built a family tree of filter and map, which we can see are cousins related to the same reductive heir. This relationship could not work the other way around since only reduce has the flexibility to transform lists in a variety of ways, all while maintaining function purity. So, now when people talk about the filter-map-reduce pattern, we know that they are really talking about the reduce-reduce=reduce pattern.

Why is this important?

When I first starting learning functional programming. I had a really hard time separating myself from the idea that filter, map and reduce were really any different than a wrapper over forEach. I thought of them as highly specialized looping functions that did a little bit of extra work under the covers to add some power to the language.

This idea that these functions are wrappers over a loop is a common misconception made by many people coming from the imperative world into functional programming. It is common to see these functions called with function arguments with side effects, treating them as nothing more than a looping structure.

By breaking down the barrier between the programmer and high-performance recursion, it becomes obvious that traditional loops are not really part of the game at all. Any looping that might be done is used simply to access values within a list. This data access is what’s really important. As it turns out, the order in which the elements are accessed is really unimportant as long as the output list adheres to the same order as the input list.

This break from conventional thinking, and seeing these functions which perform operations on a list as functions of reduce, helps us to understand what is really happening: we are actually transforming the data into something new! The reason the original data is unmodified is not an operation of the language and data mutability, it is the way immutability could exist at all.

When you return to looking at the code you are writing, and use map or filter again, it will be as if you are seeing beyond the curtain into the way the cogs and wheels are really working together. You may, like me, wonder why tail-optimized recursion is not core to the language. Things will make sense in a new way, and instead of writing programs which work, by brute force, to capture and strong-arm data into a form the user can make sense of, you will allow the data to glide effortlessly through transformations and become something the user can enjoy.

Blog Post Notes

  1. As it turns out, arrays in Javascript are more closely related to vectors in languages like Clojure. This distinction can be discussed at length elsewhere, but it is important to note. Ultimately, arrays and vectors have extra memory access features which give them a performance boost when data is accessed at places other than the head or the tail. Lists must be access sequentially. I prefer the term list when dealing with arrays in Javascript because many of the functions we use to process arrays come directly from list processing, and do not benefit from the vector characteristics.
  2. The tail optimization that JFP uses is a generic form of the trampolining method. Each pass of the recursive function finishes and returns. Within the recur function is a while loop which reduces the recursion into a high-performing loop structure. This allows a central recur function to capture the return and either execute the next step or exit, depending on the state of the recursion.

Mainstay Monday: SOLID – Liskov Substitution Principle

This post is part of a series on the SOLID programming principles.

We’ve reached middle and, possibly, one of the more subtle principles in SOLID. Up to now we have dealt in ideas that either prescribe a way to separate and clean up your code, or provide rules for ways to maintain consistency in behavior while adding new functionality. Liskov substitution offers a means to guarantee expectations for developers are met when things change over time.

Arguably this is one of the most difficult principles to apply to functional programming since subclassing doesn’t exist, method overriding is meaningless, however there are still some examples that we can look at to identify similarities. Let’s take a look at a function called deref, which dereferences an object value or returns null if the reference does not exist.

This is a recursive algorithm, which we discussed a couple weeks ago. As you can see, it has two return states, either the current result or the result from the next call on the stack. We’ll assume that the key string passed in won’t be long enough to overflow the stack.

Now, suppose we wanted to take our current deref implementation and extend it to return a default value if the reference we want doesn’t exist. We could, theoretically, add something else to this implementation, but that would violate Open/Closed at the very least. Instead, let’s create a wrapper function that extends the contract.

When we extend the contract for the function, we need to make sure that we don’t break functionality for older code that is only using deref. This means, the new argument must be managed in an optional way. In classical OO languages, we could use method overloading to accomplish this, and it purely functional languages, we would have pattern matching, but Javascript lives in two worlds, so we’re going to handle this our own way.

It only took a couple extra lines of code and we’ve now created a new function that will give us some very powerful added functionality. What’s better with this implementation is, we have maintained the original code, keeping our old functionality insulated from the new behavior. This means any new code that is written can call our new pseudo-subclassed function just as it would have the old function, and get predictable behavior, and we can revisit old code in time and refactor to the new behavior with nothing more than a function name change. Code stability is the name of this game.

Now, let’s have a look at an object oriented approach. Suppose we have a pet class, and we are describing pets which can do the trick “speak.” It’s pretty safe to assume we’re really talking about parrots and dogs, but we’ll assume there are a whole large class of animals that could be pets and do the trick called “speak.” Let’s have a look at our base class:

Obviously our base pet is some sort of program or computer. Perhaps it’s a highly-evolved open worm or a Tamagotchi. At any rate, our pet isn’t very interesting, but it’s easy to extend and that’s what we’re going to do.

Let’s make our pet a dog. Dogs can speak, so we’re okay there. Let’s add another trick, too. Dogs can roll over. Well, mine won’t because they are stubborn, but you can teach a dog to roll over, so let’s use that trick. Here’s what our dog would look like:

If we look at this code, it’s pretty clear that anywhere something is looking for a generic Pet instance, you could pass in Dog and it would be acceptable. It is critical to understand that we intentionally did not change speak. Suppose we were to create another pet that would only speak if you gave it a cracker and it didn’t do any other tricks. This would definitely be a picky pet. Let’s go ahead and call it just that:

As it turns out this is such a well-known violation of Liskov Substitution that my code editor highlighted the new speak method and informed me that it was an invalid extension of the base class. Obviously, anything expecting a conforming instance of Pet would have a problem with our new subclass. As it turns out, Javascript doesn’t care about this violation until runtime and by then, it’s too late.

There are more subtle violations that could also happen, but it’s hard to list them all. Suppose speak did take an argument, but threw no error for any kind of type violation; this kind of code is still a violation since our new picky pet does throw an error. Other kinds of problems can be type mismatches, variations on what is returned by the method or function, removal of functionality that is critical for the parent class to work properly and so on.

Liskov Substitution is a fairly subtle principle which is meant to protect the deepest, most core parts of your program. I have a friend who claims that every other SOLID principle flows forth from Liskov and I would generally tend to agree, though that’s a discussion for another day. Ultimately, if you adhere to the Liskov Substitution principle, your code is more likely to behave well under a broad set of conditions and remain stable even as you enhance you program over time. Think about Liskov Substitution as you work and you will write better code and craft better software.

Refactoring with Boolean Algebra and De Morgan’s Laws

Many junior and mid-level programmers working today have opted to skip a university education and, instead, have either gone through an Associate’s program or a coding bootcamp. Some programmers started their career out of college without a formal background in CS, getting degrees in physics, biology, chemistry or even liberal arts like journalism or design. Although these developers may be quite effective, there are many topics that are core to the standard Computer Science curriculum which developers without a formal CS education may not have been exposed to.

Today we start leveling the playing field. Dear reader, if you are programming, guess what, You’re doing math! Are you surprised? Don’t be. All of the logic that is key making programs work came out of mathematics.1 We’ve all written some sort of condition statement at one point or another or we wouldn’t be here, but could we do it better?

We can rebuild him. We have the technology.

Boolean algebra is the field of mathematics that provides us with the conditional logic we all know and use every day. In order to really understand, at a deeper level, what we are doing, it is important to really get a grasp on managing conditions using the rules uncovered by mathematicians who came before. At the end of this post, we will look at De Morgan’s laws and how they can be used to transform blocks of code with a couple of simple rules.

Before we dive into the bag of tricks, let’s get a little bit of vocabulary and syntax out of the way.

Vocab:

  • Predicate expression – An expression that evaluates to either true or false
  • Tautology – An expression that always evaluates to true
  • Apenantology – An expression that always evaluates to false2

Syntax:

  • && – Logical and
  • || – Logical or
  • ! – Logical not
  • <=> – Logical equivalence, technically means “can be replaced by”

Okay, now that we have that out of the way, let’s talk predicates. The other day I was working through some older code which had been touched by several people. In this code was a conditional statement that looked roughly like this:

This isn’t the worst conditional I have ever seen, but it’s pretty hard to read. It’s not entirely clear what all of the condition states are, or why. Something I knew for sure, was there was a common factor here. Let’s look all the way back to grade-school algebra at the distributive property. We know this is true:

The distributive property holds for Boolean algebra as well. Let’s look at some boolean variables we’ll call P, Q and R.3 I won’t present a formal proof for any of the claims I am going to make, mainly because that’s not what this post is about, and they are already written and available on the web with a simple search on Google. Let’s have a look at the distributive property for predicates.

Going back to our original problem, I could use the distributive law to pull one of the variables completely out of the predicate expression and simplify what I was looking at immediately. Let’s take a look at our new conditional.

That’s already a lot easier to look at and we’ve barely even started. I see another value in the mix I really want to simplify, but somehow it just doesn’t seem like it will be as easy as pulling out our valueList. Let’s do some distribution again and see where that leaves us.

Well, that’s just about as ugly as it was when we started, except for one thing. We introduced the concept of a tautology in the vocab. We actually have one right here. Let’s take a moment and look at a symbolic representation, first.

This means we can do a little bit of trickery and get rid of some stuff. Let’s take a look at the tautology in our code.

With this refactoring we are left with a much simpler expression to evaluate. Before presenting the reduced predicate expression, let’s take a look at one other rule in Boolean Algebra, associativity. When all operators are the same, you can freely associate any set of values. Here’s what it looks like:

With all of that put together, we get a final reduced result which relies on just one instance of each variable. This is, as far as I can see, about as far as the train goes. Here’s our final conditional statement:

This isn’t where the story for Boolean Algebra ends. I introduced a new word, created using some Greek roots, apenantology. This is as useful for evaluating conditional statements as our buddy, the tautology. Apenantology is the state where something always evaluates to false.4 Let’s have a look at a symbolic example.

Here’s where it gets interesting. Where a tautology can be removed from a predicate expression, an apenantology can either be eliminated, or it will actually define the entire expression. Here’s how it works:

Let’s take our predicate expression from when we did our second distribution. Let’s replace the or with and and see what we get instead.

What about De Morgan’s laws?

De Morgan’s laws are named for the mathematician Augustus De Morgan. He discovered two very useful little rules that can be applied with assurance to any logical statements and guarantee equivalence and maintain truth values.

De Morgan’s laws are actually pretty simple, but they give us some tools that can help to either simplify an expression or identify an expression that would give us the negation of the original expression. When I was learning these rules I kind of liked to think of them as the either/or rules. There are two and they look like this:

For something that was named for a mathematician, these seem like a couple of really simple rules. As it turns out, these can be really useful when working with conditional logic to either reverse or simplify a conditional block. For an example, let’s take a look at a conditional reversal using one of De Morgan’s laws

Whoa. That was kind of a drink from the proverbial firehose. For someone who is seeing this material for the first time, all of this math and its relation with code might seem a little dense. I wouldn’t worry too much about getting it all at once. The important thing is to start looking at your code and identifying places where it seems like simplification should be doable. The more complicated your conditional statements are, the more likely a bug is lurking in the dark.

I would recommend you start getting your feet wet with the concept of tautologies. By simply recognizing where an idea is repeated and removing the repetition, your conditional blocks will become clearer and more precise. After you have applied tautologies comfortably to your code, try playing with the distributive and associative laws. These three ideas will clean most of the garbage out of complex, messy conditionals.

Once the foundation work is comfortably in your arsenal, come back and start playing with the ideas around identifying apenantologies and flipping conditionals around to identify the best order for your conditions to be presented in. Sometimes reordering conditions is all you need to make your code clean, clear and the best it can be. These principles of logic lay the foundation for what is crucial to make a program do all the things you ever wanted, so use them and make your code shine.

Blog Post Notes

  1. Technically the logical paradigm came from a combination of Mathematics and Philosophy. A guy by the name of Bertrand Russell worked with mathematicians early in the 20th century to create a formal language which could be used to describe mathematics work that was being done. He and several important mathematicians helped to make formal proofs work in a predictable, and readable way. Thanks, forebears! Without you, computing might not be where it is today.
  2. Apenantology is a neologism I am introducing here in order to accurately describe a situation which is diametrically opposed to a tautology. Tautology is built from the greek roots tautos, meaning identical, and logos meaning word. I constructed apenantology from apenanti, meaning opposite and logos meaning word.
  3. This naming is common for mathematics and it makes it easier to read the expressions we are tinkering with. Most logical expressions are written using P, Q, R, S and occasionally T. When more than five variables are involved, I personally start to worry about complexity.
  4. A rough proof is provided as a gist for those curious as to whether apenantologies are supportable mathematically.

Mainstay Monday: SOLID – Open/Closed Principle

This post is part of a series on the SOLID programming principles.

Last week we discussed the concept of single responsibility in programming. Continuing the process of discussing SOLID principles, let’s take a look at the Open/Closed Principle. The name of the principle isn’t quite enough to declare what it intends — open to extension, closed to modification — so we’ll need to figure out how this applies to our daily life.

What does it really mean to be open to extension or closed to modification? Clearly, we can’t strictly enforce that someone can’t come along and change our code, so there has to be a certain amount of self control that comes along with this principle. Something else that might help us along the way is the discussion we had about contracts a little while ago.

Uncle Bob Martin states, in his article on the open closed principle, that when requirements change, it is incorrect to modify working code to manage new, or updated requirements, but rather, the existing code should be extended to support the updates. There is an assumption here that the code satisfies a single responsibility to the original requirement. When new requirements arise, an extension to the original functionality should be created instead of modifying the original code. Let’s take a look at a functional approach to this.

In this example, isShortString is clearly an extension of the requirements that were defined in the original isString function. Both functions accept any kind of value and return a boolean, so they adhere to the contract, and isShortString is intentionally built off the assumption that isString will always perform the necessary check for for the basic string type before any further checks happen. This means that isShortString will, effectively, act as isString for any strings that are naturally shorter than the constraint.

Since SOLID was originally developed as a tool for people working in Object Oriented Programming (OOP) we can’t overlook the original intent. If we want to apply this principle in an OO context, we need something to work with as a foundation. Let’s pick something easy that everyone can relate to from their grade-school years. Let’s have a look at a basic rectangle object, which defines the shape and has a function for getting the area and the perimeter.

For as unexciting as this is, rectangles have a special form we all know and love: squares. Suppose we created a rectangle class and we’re using it throughout our code. One day the product owner says we need certain sections of the code to only support squares. We could, theoretically, modify this code to handle rectangles and their regular form, squares, but that violates the single responsibility and open/closed principles. This is a perfect opportunity to subclass and extend our original object.

What makes our square class interesting is, the only addition to the original class we made is a check to make sure the height and width are equal. Rectangle does everything we need without modification, aside from add an assurance that business requirements are met. Another item of note is, we ensured the original contract was enforced so anything using the Rectangle class would be able to use the Square class without modification.

When we take time to carefully extend existing functionality there is a little bit of magic that happens – we end up writing less code! If we had created a rectangle and then created a square, independent of the rectangle definition, we would have created duplicate code. If, for some reason, we needed to add something to Rectangle, we would have to go track down Square and add it there too. I don’t know about you, but I hate doing double duty.

Being a good programmer is a combination of being diligent and lazy

By front-loading the effort of good, well-written code, you get the benefit of doing less work later when things change. I will trade ease of upkeep for a little work now any day. I have seen code bases that weren’t crafted carefully and, as a result, required a tremendous amount of work to make even minor changes. In the foolishness of my youth, I might have even created code like that.

Coming back around to something Uncle Bob wrote in his article, expecting that all code will be closed to modification is generally unreasonable. Sometimes things change in a way that the original code no longer works as desired at all. Due to the nature of changing business requirements, it’s important to keep in mind that closure should be applied carefully and appropriately.

Since this is about Javascript, it’s easy to point to view-related code. Views can change on a whim and whole swaths of code may be rendered obsolete. This kind of sweeping change makes it important to clearly separate and define your business rules from the way information is presented to the user. Business rules seldom change. If someone wanted to write an accounting app, the view layer could change to enable easier entry. Common accounting principles like GAP are safe, and wise, to consider set in stone.

The Open/Closed Principle is designed as a tool to support good programming practice. By following the ideas presented, it becomes much easier to write an application that supports business rules and allows developers to safely and correctly enhance behavior as the business continues to grow and change over time. Consider your code as you write and ask yourself if you can define you work in extensible pieces which can support good development in the future without over-developing now.

Blog Post Notes