Javascript: Require and Import Found Harmful

For the moment, let’s go ahead and make an assumption: automated tests (unit tests, integration tests, etc.) make code safer to write, update and change. Even if tests break, it says something about how the code was written and pulled together while running. I will address this concern in another post at a later date. Nevertheless, I am going to rely on this assumption throughout this post, so if you disagree with the initial assumption, you might be better served to drop out now.

Now that the grumpy anti-testers are gone, let’s talk, just you and I.

I don’t actually believe that require or import — from the freshly minted ES module system — are inherently bad; somewhere, someone needs to be in charge of loading stuff from the filesystem, after all. Nevertheless require and import tie us directly to the filesystem which makes our code brittle and tightly coupled. This coupling makes all of the following things harder to accomplish:

  • Module Isolation
  • Extracting Dependencies
  • Moving Files
  • Testing
  • Creating Test Doubles
  • General Project Refactoring

The Setup

Let’s take a look at an example which will probably make things clearer:

To get a sense of what we have to do to isolate this code, let’s talk about a very popular library for introducing test doubles into Node tests: Mockery. This package manipulates the node cache, inserting a module into the runtime to break dependencies for a module. Particularly worrisome is the fact that you must copy the path for your module dependencies into your test, tightening the noose and deeply seating this dependence on the actual filesystem.

When we try to test this, we either have to use Mockery to jam fakes into the node module cache or we actually have to interact directly with the external systems: the filesystem, and the external logging system. I would lean — and have leaned — toward using Mockery, but it leads us down another dangerous road: what happens if the dependencies change location? Now we are interacting with the live system whether we want to or not.

This actually happened on a project I was on. At one point all of our tests were real unit tests: i.e. they tested only the local unit we cared about, but something moved, a module changed and all of a sudden we were interacting with real databases and cloud services. Our tests slowed to a crawl and we noticed unexpected spikes on systems which should have been relatively low-load.

Mind you, this is not an indictment of test tooling. Mockery is great at what it does. Instead, the tool highlights the pitfalls built into the system. I offer an alternative question: is there a better tool we could build which breaks the localized dependence on the filesystem altogether?

It’s worthwhile to consider a couple design patterns which could lead us away from the filesystem and toward something which could fully decouple our code: Inversion of Control (of SOLID fame) and the Factory pattern.

Breaking it Down

To get a sense of how the factory pattern helps us, let’s isolate our modules and see what it looks like when we break all the pieces up.

With this refactoring, some really nice things happen: our abstractions are cleaner, our code becomes more declarative, and all of the explicit module references simply disappear. When modules no longer need to be concerned with the filesystem, everything becomes much freer regarding moving files around and decoupling concerns. Of course, it’s unclear who is actually in charge of loading the files into memory…

Whether it be in your tests or in your production code, the ideal solution would be some sort of filesystem aware module which knows what name is associated with which module. The classic name for something like this is either a Dependency Injection (DI) system or an Inversion of Control (IoC) container.

My team has been using the Dject library to manage our dependencies. Dject abstracts away all of the filesystem concerns which allows us to write code exactly how we have it above. Here’s what the configuration would look like:

Module Loading With Our Container

Now our main application file can load dependencies with a container and everything can be loosely referenced by name alone. If we only use our main module for loading core application modules, it allows us to isolate our entire application module structure!

Containers, Tests and A Better Life

Let’s have a look at what a test might look like using the same application modules. A few things will jump out. First, faking system modules becomes a trivial affair. Since everything is declared by name, we don’t have to worry about the module cache. In the same vein, any of our application internals are also easy to fake. Since we don’t have to worry about file paths and file-relative references, simply reorganizing our files doesn’t impact our tests which continue to be valid and useful. Lastly, our module entry point location is also managed externally, so we don’t have to run around updating tests if the module under test moves. Who really wants to test whether the node file module loading system works in their application tests?

Wrapping it All Up

With all of the overhead around filesystem management removed, it becomes much easier to think about what our code is doing in isolation. This means our application is far easier to manage and our tests are easier to write and maintain. Now, isn’t that really what we all want in the end?

For examples of full applications written using DI/IoC and Dject in specific, I encourage you to check out the code for JS Refactor (the application that birthed Dject in the first place) and Stubcontractor (a test helper for automatically generating fakes).

Typed Thinking in Javascript

Javascript is a dynamically typed language. I suspect this is not news to anyone reading this. The reason this is important, however, is I have heard people say Javascript is untyped. This statement is most definitely false. Javascript has and supports types, it simply does not actively expose this to the world at large.

Javascript is in good company when it comes to dynamically typed languages. Python and Ruby are also popular languages which are dynamically typed. Other venerable languages which are dynamically typed include Clojure, Elixir, Io and Racket. People coming from statically typed languages often say that Javascript’s dynamic typing is a hindrance to good programming. I disagree. Bad programming is a hindrance to good programming. I feel programmers coming from the languages listed above would probably agree.

What’s the Difference?

Several popular languages today, including C#, Java and C++, are statically typed. This means the programmer declares the values they plan on using to accomplish a task when they define a method. There are distinct benefits to this kind of programming, specifically, the compiler can quickly determine whether a method call is valid. This kind of validation is useful and can prove a good tool for programmers, no doubt.

As you can see above, everything is explicitly annotated with a type definition. This kind of annotation is effectively a note to anyone who reads this code, including the compiler, et al, that this function behaves this way. Unfortunately, this convenience comes with a price. Suppose you wanted an add function for any sort of number including mixed arguments…

Modern improvements on type values has helped improve this problem (don’t shoot me, Java people), but it becomes obvious rather quickly that having restricted type flexibility means there is a lot more work which must be done to accomplish a seemingly simple task. We have to make a trade to get this compile-time help from the language.

Dynamic typing, on the other hand, does not have this restriction. In Javascript (or Python, Clojure, etc.) no type annotation is needed. This means the language will perform what is called type inference to do the right thing. Languages like Python or Clojure are less forgiving if types don’t line up correctly. If, for instance, you attempted to add a number and an array in either of these languages, an error would occur and everything would go downhill from there.

Javascript works a little harder to do the right thing; perhaps a little too hard. In a strange twist of fate I, once, attempted to demonstrate that Javascript would throw an error when trying to add a string and a function. Instead I got a string containing the original string, and the source code for the function. Suffice to say, this is not what I expected.

Nevertheless, this kind of type management is both the weakness and the strength of a dynamically typed language. Rather than having to spend time really thinking about strings, ints, doubles, bools and so on, you can spend more time thinking about the way your program works…

Until it doesn’t.

Correctness and Types in a Dynamic World

One of the most important things to consider in Javascript is intent. Although the kinds of strange things can be accomplished by applying common actions to unexpected values can be entertaining, it is not particularly helpful when attempting to write a correct program.

Correctness in programming is when a program performs the expected action and, within the domain of acceptable values, returns the correct output. In other words, an adder would be incorrect if it always returned 9, regardless of the input; an adder which always returned a valid sum would be considered correct.

By considering correctness, we must consider input and output types. Let’s keep using our add function because it’s easy to understand. Above, when we discussed types annotations, we looked at an add function in Java. We said that the input values a and b were both integers and the output is an integer. This forces the idea of correctness upon our function which, actually, could be defined as correct in a broader sense. What if, instead of declaring all of the different types and overloading the function again and again, we made up a new type. Let’s call this type Addable. Suppose we had an addable type in Java and could rewrite our function accordingly:

We can actually define a notation which will help us to understand the correct input/output values of our function. We can say add has a function signature which looks like this: Addable, Addable => Addable. In other words, our function takes two Addable values and returns a new, Addable, value. All of this is true and we could test this function via various methods to prove the specific addition behavior is correct.

This new Addable type is effectively what we get in Javascript with the type “number.” We know that any number can be added to any number, so whether one number is an integer and another is a floating point value, we know they can still be added together. This means we can actually go so far as to eliminate the type annotations altogether and simply write our function as follows:

Of course, the problem we face here is there is no annotation to tell the next programmer what types a and b should be. Moreover, Javascript is quite forgiving and will allow a programmer to pass anything in which might be usable with a “+” operator. There are solutions to each of these, though we will only look at solutions for telling the next programmer what we intended.

Ad Hoc Properties to the Rescue

Under the hood, Javascript shares some really interesting characteristics with Smalltalk. Specifically, everything in Javascript, when managed within the runtime, is an object. This means we can do all kinds of neat things with functions, like assign properties.

What this means is we can actually do something real about making our programming intentions more clear. What if we took our add function and assigned an ad hoc property to the Function object instance called “signature?” By creating a property which declared what the function should do we get two benefits. First, anyone reading the source can immediately see what we meant to do and, second, we can actually create an artifact in our code which can be called upon elsewhere to get immediate feedback on what our behavior should look like. Here’s an example:

Now, looking at our code we can see what add does. It takes two numbers and returns a number. We can use this same property to our advantage elsewhere in our code. If we were planning to use add and wanted to see what the expected input and output are, we can simply output the signature. Here’s how we could do that:

Now we know! Better yet, if add was somewhere deep in a third-party library, we wouldn’t have to dig through third-party code to understand what the contract for add might be.

Thinking Types

The really important idea here is, even if they aren’t expressed in code, types live within everything we do in Javascript. As we develop software, it becomes really easy to simply not think about what a function signature looks like and call it with whatever we have, hoping it does what we expect.

Programming this way is dangerous and can lead to bugs which are hard to triage and fix. Instead of using the spray and pray approach, it is helpful to understand, more fully, what you intend to do and work with the types which are intended in a functions activity.

What this means to the dynamic programmer is, we have to be more vigilant, more cautious and more prepared while solving a problem than someone working with a statically typed, explicitly annotated language.

There are two ideas we must always keep in mind when programming, the goal of a correct program and what we must do to get there. The first idea is related to the company goal related to whatever problem we are actually trying to solve. The second idea encompasses types and actions almost exclusively.

Summary

Regardless of the typing mechanism for the chosen language with which we solve a problem, types are part of the solution. Javascript does not express the value and function types explicitly in the source code, but the types we use are equally important to anything used in a statically typed language.

We can solve the problem of expressing our function signature through using comments or adding a property which can be read and understood by other programmers. This will help alleviate the challenges which arise from misunderstanding source code.

Finally, as we work we must always be aware of the types we are interacting with and how they lead to the solution for whichever problem we are solving at the time. Instead of throwing things at the wall and seeing what sticks, let’s work carefully and with intent to write correct, valid programs.

P.S. If you don’t want to remember all of the metadata stuff you have to do, check out signet.

Objects Are Still Shared State

Dear programmers coming from Classical Object Oriented programming, please stop thinking that encapsulation of variables eliminates the “globalness” of your variable. It’s a hard truth, but you had to hear it from someone; you have a problem. Consider this an intervention.

I had a conversation a couple months ago where I looked at some code a senior developer had written and asked, “why are you using a global variable?” The response I got was “it’s the exposing module pattern, so it’s local and encapsulated. It’s not global.” The variable was a cache object exposed outside of the module; and it was global anyway.

When I say global, it is not about whether the entire program, or the entire world, can access your value, it’s about how your variable gets managed and modified. Really, the problematic aspect of a global variable comes from the fact that global variables, in many popular languages, represent shared, mutable state.

Consider a world where every variable is actually immutable, i.e. once you create a variable, you can’t change the value. In this particular case, a global variable is really nothing more than a globally readable value. You can’t write to it, so you can’t impact the rest of the running program. Is that global variable actually a problem? Decidedly less so, that’s for sure.

Mutating Object State

Let’s take a look at a very simple, though rather common, example of the way variables are often managed inside objects.

There are two things wrong with this if value is actually important to the internal state of the object. First, since Javascript does not support private variables (explicitly, but we’ll come back to that), then this suffers from the Indecent Exposure code smell. Essentially, anyone in the world can directly access and modify the internal state of this object. That’s bad news.

The second issue with this object is the getter actually modifies the internal value of our object and returns a representation of the previous object state. Effectively, our getter is modifying the internal state of the object and lying to us about it.

Before you proclaim “I never do that! How very dare you,” keep in mind that this pattern shows up all the time. Popular frameworks like Angular and Ember actually encourage this kind of thing through the controller pattern. This is a sneaky trap that is hard to avoid.

Although we can’t quickly resolve the code smell, let’s take a look at a remedy for the lie that is our “get” method name.

Now we understand and declare what the method does. For some people this is enough and we need to go no further. I, on the other hand, feel this is still rather suspect and would prefer to see a cleaner, more elegant construction.

Separate The Activity

The issue I draw with our updated object is, we have one method which does all the things. This is a really bad idea since it really doesn’t protect the programmer from a micro-god function. (Hey, You can have micro-frameworks and micro-services.) Effectively we have fixed the naming problem, but we haven’t actually resolved the smelly code which lives within our method.

Typically I prefer a single function which will return the current state of affairs and other function, if you MUST, which modifies the internal state. This kind of separation of concerns actually helps to keep object state sane and useful. If not for the exposed internal value of the object, we would be on our way to saner code.

We can see this code actually separates the functionality and has the lovely side effect of making the code more readable. If I were working in a project using an MVC paradigm, I would call this good and move on. We have separated the behaviors and tried to keep everything clean, tidy and meaningful. Our view would be able to access the values it needs and we keep our state management safe from accidental update.

Turn Up The Encapsulation

From here we can start looking at working on our fine detail. Up to now, we have accepted that our internal values are exposed and available for the world to manipulate, AKA Indecent Exposure. It’s time to fix that little bit of nastiness and make our object water- and tamper-proof.

The only way to actually protect a variable from external access in Javascript is through closures. Since functions are objects and objects are built atop function constructors, we can perform a little scope management surgery and make our object really safe and secure. Let’s take a look and see what we can do to lock things down.

This code does a little fiddling around with scope by partially applying the object’s internal state to our get and set functions. This protects our variable from being accessed by the outside world, but allows our get and update methods to access our value freely. When your data must be locked away, this will get you there.

Our Code Goes to 11

In order to finish up this journey, it seemed only right to create a completely pure, immutable object just to see where it would lead us. If we were to really go all the way, we would need to do a little more work to ensure everything still worked as we would expect.

We know the variable “value” maintains a count for some reason, so it will be important to ensure value is always an integer. We also want to make sure the get method always gives the current count. Finally, update should do just that: update the count value. What does it mean to make an update if everything is immutable? Let’s have a look and find out.

This is just chock full pure functions and added behavior. With all of that added behavior, we get something magical. Instead of having an object which is mutable and, ultimately, somewhat unpredictable and hard to test, we end up with an object which has the following properties:

  • Immutable
  • Contains pure methods
  • Has a single, pure, static method
  • Is compositionally built
  • Updates through new object construction

This whole object construction could lead us down many discussions which would get into types, values, mutability, function composition and more. For now, it will suffice to say, this kind of development creates the ideal situation for developing safely and really turns our code up to 11.

The numbers all go to 11.

Summing Up

Although we got a little spacey at the end, the important thing to take away from this whole thing is, any time an object is built and modifies its own state through method calls, the methods are actually relying on shared, mutable state.

Shared mutable state in an object really is just a micro-global and should be viewed as such. This means, any value which can be accessed and modified should be considered unsafe and untrustworthy. Untrustworthy data should never be viewed as the source of truth.

From here forward, if you start to add a variable to an object or module, ask yourself, does this really need to be global, or can I localize it? Perhaps you will find a better way to keep your code clean and easy to reason about.

Comments Off on Objects Are Still Shared State

Anonymous Functions: Extract and Name

It’s really great to see functional patterns become more accepted since they add a lot of really powerful tools to any programmer’s toolbox. Unfortunately, because functional programming was relegated primarily to the academic world for many years, there aren’t as many professional programmers who have developed a strong feel for good patterns and share them with more junior programmers. This is not to say there are none, but it is important to note that most programmers think of functional programming and say “it has map, filter and reduce; it’s functional.”

Though having those three higher-order functions does provide a functional flavor, it is more important that there are higher-order functions at all. With higher-order functions come the use of anonymous functions. Anonymous functions (also known as lambda functions) provide a great facility for expressing singleton behavior inline. This kind of expressiveness is great when the function is small and does something unexciting, like basic arithmetic or testing with a predicate expression. The problem is anonymous functions introduce cognitive load very quickly which makes them a liability when code gets long or complex.

Today I’d like to take a look at a common use of anonymous functions and how they can cause harm when used incorrectly. There are many times that anonymous functions are assigned directly to variables, which actually introduces one of the same issues we are going to deal with today, but I am not going to linger on that topic. Please consider this a more robust example of why even assigning anonymous functions to variables is dangerous.

Jumbled Anonymous Functions – Our First Contestant

In Javascript, people use promises; it’s a fact of life. Chris Kowal’s Q library is a common library to see used in a variety of codebases and it works pretty well. Now, when someone writes an async function, it’s common to return the promise so it can be “then’ed” against with appropriate behavior. The then function takes two arguments, a resolve state function and a reject state function. These basically translate into a success and error state. I’ve created a common promise scenario so we have something to refer to.

Extract Method

The very first thing I see here that is a problem is, we have two functions logging an error. This behavior is not DRY which is a code smell and violates a commonly held best practice. There is a known refactoring for this kind of redundancy called “extract method,” or “extract function.” Technically we already have a function in place, so we can simply lift it and name it. This will reduce our footprint and make this code cleaner already. Let’s see what this would look like with our logging behavior extracted.

With this simple extraction, we now know more about what our function does and our code has become more declarative. Although logError is a one-line function, the fact that it does exactly one thing makes it both easy to reason about and easy to test. We can inject a fake logger and capture the logging side effect, which gives us direct insight into what it does. Another benefit we get is that we can hoist this function further if need be, so we can reuse it across different modules or files.

Debugging Problems

Now we get to the real nitty gritty. We have two anonymous functions which do not explicitly tell us what they do. Instead, they just contain a much of code which performs references into an object. We run up against two different issues because of this. First, the lack of declarative code means the next person who looks at this, which might be you, will have to sit and stare at this to understand what is happening.

Another, bigger issue than immediate comprehension is debugging. Suppose we take this file and concatenate it with all of the other files in our project and then uglify the whole thing and deploy it out for use in someone’s browser. All of our code now lives on a single line and may not even have meaningful variable names anymore. Now, suppose one of the data objects comes back null. Our debugging error will contain something like “error at line 1:89726348976 cannot treat null as an object.”

This is bad, bad news. Now we have an error which we can’t easily identify or triage. One of the calls we are making no longer does what we think it does and it’s causing our code to break… somewhere. Whoops! We can actually use the same pattern we used for our error logging to extract our methods and make sense of the madness. Let’s take a look at what our refactoring would look like.

Now that we have lifted our last two functions out of our promise chain, everything makes a little more sense. Each of our behaviors is easy to reason about, we can test each function independently and all of our functions have a unique identifier in memory which saves us from the insidious debugger issue which can cost time and money.

There are other places we could go from here with our code to make it more fault tolerant, but that’s outside of the scope of this article. Instead, when you look at your code, see if you can easily understand what is going on. Look at it like you’ve never seen it before. How many anonymous functions are you using? How many different steps are crammed into a single function?

When you see this kind of muddy programming, think back on our reduction to simpler functions, avoid complex anonymous functions and think “extract and name.”

Comments Off on Anonymous Functions: Extract and Name

Commenting Code: Why, Not How

If you have written any amount of code in any language you are likely aware of code comments or remarks and how they work. This isn’t really what I’m interested in. I came across a discussion about an open source project which had a requirement that all code must be left completely uncommented. People were surprised and alarmed as to why this project would require all code to be uncommented, and I tend to agree. This post is to address comment content. Hopefully, you will share my opinion that comments are important by the end of this.

New programmers are often told to comment their code by instructors, but they aren’t told what the comments should contain. I remember my C language instructor chastised me for commented all of my functions without regard to the importance of the function or value of the comment. He said “there are too many comments, are you trying to make your code unreadable?” Others received feedback that they didn’t comment enough.

While we are on the topic of novice programmers, let’s take a look at a comment style I have seen in code written by inexperienced developers. It usually contains information about what the function does and how it does it. Although it is lovely to see the program explained in clear English, it is not particularly helpful since good code should be not only functional but illuminating.

From this description anyone with experience in the language could probably devise a body of code which would perform the actions listed in the comment. The problem is, I have no idea why this code exists. Code which exists for no other purpose than just to be written is not useful code. This could be dead code, or it could be a problem which could be solved another way. If this code does something which the surrounding context gives us no clue to, we would never understand the value, just the means.

Instead I would write a comment like the following:

Now we understand the context for the function. We can not only see what the function does by the name alone, but the comment provides immediate context for the use. Even more than that, this comment is far less likely to be out of date by the next time someone looks at this bit of the code. Comments which detail the inner workings of a function are more likely to fall out of date as the life of the code gets longer. People may modify our function to behave differently, however the context of the creation of the function is unlikely to become obsolete even with (anti-SOLID) modifications.

This brief discussion can be summed up by the phrase “comments should tell the programmer why something was done, not how.” I like to call this my “why, not how” principle. This helps to guide the comment writer’s hand when they feel the need to add a comment to the code. If your code needs explanation as to how something was accomplished, the code is likely too clever. On the other hand, sometimes obscure solutions are unavoidable, which means the function context may not exist within the codebase. This is precisely the why that should be added as a comment. For example, this:

In Javascript there is another situation where comments are desirable. Although I believe JSDoc is a tool which was ported almost blindly from its source (JavaDoc) it is, hands down, the best tool for the job available in Javascript. I don’t believe that every function, method, class and module in your code should contain JSDoc comments, it is useful to annotate functions which might be used in a variety of situations. Essentially, JSDoc is a good initial way to document slow-changing program APIs which others might want to use. Following is an example of JSDoc use for the same function.

As you can see, the context comment is still part of the description associated with our function. This contextual clue is still the best part of our comment. Hopefully this function will be slow to change, so the arguments and return values will remain consistent. Even if they don’t howeever, our context clue will still provide important insight into why this function was created and what purpose it serves.

In the end, comments are not always necessary. They can be extra noise, or they can be misleading. As programs grow large and the context becomes unclear, comments become a life raft for programmers who are digging deep into old code to identify functionality which is important or no longer in use. If you choose to write comments, and I hope you do, think about WHY you wrote your code and let the code tell the next programmer HOW.

Bottlenecks and Micro-Performance

After my last blog, I got a response regarding functional programming and performance. This is actually a common theme when people talk about functional programming versus imperative programming. Before we move into the actual performance discussion, I will openly admit, there are often times when functional programming is slower, performance-wise, than imperative programming. I have never claimed otherwise, nor will I begin doing so today.

Now, let’s move away from the very specific case of functional versus imperative programming and take a look at application performance in general. It is common to look for performance bottlenecks in applications. An application slows down and we want to dig in and uncover the performance issue. This particular situation arose at my work just a few weeks ago.

We had a form which, when big enough, slowed the entire experience to a crawl. This was bad news for us as our customers were quite unhappy with the way our application was behaving. I couldn’t blame them. The experience was miserable.

My colleague and I started digging into the offending code. I discovered a few methods which were running at O(n^2) time and he discovered a, seemingly innocuous, call to perform an external validation. When we moved our search to the validation code, it became obvious this was the problem. The entire form was being revalidated multiple times for every single element on the screen.

I fixed the O(n^2) algorithm, reducing it to an O(n) time execution, which made a visible difference, but the real win was decoupling the localized validation logic from the form logic. We estimated that for each validation call that was made, validation work was being done in the neighborhood of 60,000 times.

This demonstrates the difference between micro-performance and macro-performance. My algorithm enhancement was a macro-performance fix when looking at execution times of single lines of code, but when looking at the application as a whole, it was actually just a micro-performance tuning. The real win came when a true macro-performance fix was implemented and our total iteration count was reduced from 60,000 to about 600. That kind of performance gain can be measured in orders of magnitude and saved the experience of our customers.

Jeff Atwood talks about micro-performance benchmarking as something that only matters when a bad choice is made. If a piece of code is not optimally performant, but it is only executed once, does it matter? Jeff and I agree, it doesn’t.

Let’s take a look at two different blocks of code:

Clearly, addEvensImpertaive and addEvensFunctional produce the same output. If we look at the behavior with regard to constants, addEvensImperative loops over the array once so we can say it has a behavior characteristic function something like 1n + c_0. Meanwhile addEvensFunctional actually loops through the entire list twice in a pathological (worst) case. This means we can estimate the characteristic function to look something like 2n + c_1. This means for each time the functional behavior is called, the pathological behavior will be half as fast as the imperative call.

Let’s take a look at this using big-O notation. In big-O notation, the efficiency of the algorithm is reduced to the highest-power term in the approximate function. This means, all constants are discarded as well as coefficients. When we annotate our functions the imperative function performance is O(n) and the functional function performance is O(n) as well.

What this means is both of these functions have a linear growth behavior. The more values we have in our list, the longer each of these take to complete. Typically what we will see is total execution time measured in microseconds or a few milliseconds. Even large arrays of numbers can be iterated over very very quickly, so, even though the functional behavior is half as fast, the overall performance characteristic loss is negligible.

This is the pitfall of micro-optimization. Even though the perceived performance doesn’t change drastically for the user, it is fussed over because it’s easy to see and easy to measure. The problem is, there is a large blind spot that opens up when thinking about managing efficiency.

There is a phrase “what is measured is managed.” When applied to things which are properly and fully measured and everything is measureable, this is valuable. The problem is by measuring micro-optimizations, we can encounter the problem of “not seeing the forest for the trees.” This means we have accurate measurements of parts of a system, but we haven’t actually accounted for the system.

It’s common to talk about early optimization both in positive and negative light (depending on where you stand). I tend to work on the task at hand, and measure when the work is done. This means, if one function is not fully optimized, but the overall performance is as good or better than business requirements dictate, there is no need to optimize further. If, on the other hand, the system is under-performing and slow, I will look for new optimizations.

The important thing to note, however, is I prefer to optimize from the top-down. If a system is slow, it is highly unlikely that the culprit is a single function which is slower than we could be. More likely, we have a hidden optimization problem which is taking an O(n) function and looping, creating an O(n^) behavior, then looping again creating an O(n^3) behavior, and so on.

This hidden exponential growth characteristic is precisely what bit us in our slow-validating application. The reason my O(n^2) to O(n) optimization only gave us a small win is because we had another set of functions which were creating an algorithm performing at O(n^5) or so. Once we converted that behavior to an O(n) algorithm, the app sped up significantly.

In the end, the difference between a function that is performing at 2n versus n is significantly less critical than a system of functions performing at n^5 instead of 5n. Micro-performance benchmarks are important in systems like robotics, neural nets and so on. Common application behaviors like working with small, finite sets of data, typically benefit most from macro-optimizations, so fix those problems first, then measure again. In the end it is far more important to fix the bottleneck and leave the non-micro-optimized code readable and simple.

Testing Notes

Running a test on my local computer there was no measurable difference in performance between the two functions until the list of numbers got to 10,000 values, at which point the difference appears to be less than 1ms (0ms vs 1ms). At 10,000,000 the difference had grown significantly to about 850ms. Please don’t make your users process 10,000,000 numbers in their browser.

Here is my test script:

Output:

This demonstrates that even a large number of arithmetic operations, my computer (an older Macbook Pro) can still push out better than 10,000,000 functional operations* a second. Arguably, in any software system processing fewer than 100,000 values at a time, the difference in performance when running each function once would be imperceptible to the end user.

* megaFLOPS so, blah blah, analogy, blah 10 megaFOPS (million functional operations per second)

Leveling Up With Reduce

It was pointed out to me the other day that I suffer from the curse of knowledge. What this basically means is, I know something so I don’t understand what it means to NOT know that thing. This can happen in any aspect of life and it’s common for people, especially software developers, to experience this. Many of us have been working with computers in some way or another for most or all of our lives. This means, when we talk to people who haven’t shared our experiences, we don’t understand their position, i.e. we talk and they don’t have any clue what we are saying.

Within various programming communities this can also happen when more experienced developers talk to developers who are still learning and growing. The experienced developer says something they think is imparting great wisdom and knowledge on the person they are talking with, meanwhile the inexperienced developer is baffled and lost.

Functional programming has become one of these dividing lines in the community. There are people who have dug in deep and have an understanding of the paradigm which they then have trouble conveying to people who haven’t had the same experiences. Ultimately the message falls on deaf ears.

One of the least understood, but, possibly, easiest to comprehend concepts is reduce. We perform reductions every day. We reduce lists of values to sums. We reduce records down to a single selected record based on user preferences or our need. Programming and reduction really go hand in hand.

To come to grips with the kinds of behavior we’re talking about, let’s have a look at some common patterns programmers use in their day to day development. The following block of code contains functions for taking the sum of an array, finding a maximum number and filtering an array of integers. If you have written loops, conditionals and functions before, these will probably be completely unsurprising.

These functions are written in an imperative style, and express every minute detail of the reduction process. We start with some sort of accumulator, whether it’s an array or a number, our variable is meant to capture outcome as we move through our iteration. We iterate over the array, performing some action at each step, then returning the result at the end.

These functions aren’t beautiful, but they are effective and predictable. For many readers, this pattern feels warm and cozy like a winter blanket. The problem we run into is, this methodology is really verbose and bloats the code. It also introduces a lot of noise. Do we really care about the inner workings of the iteration process or do we merely care about the output of our functions?

Let’s take a couple examples from our initial three functions, and rewrite them. It has been said that any recursive algorithm, may be rewritten as an iterative loop. I have no evidence to support the inverse, but I can say, with certainty, that we can rewrite all of these as recursive functions.

Just to catch everyone up, recursion is when a function calls itself internally to perform an iterative operation. We discussed recursion relatively recently in a different post. Essentially what we are going to do is put more focus on what happens in each step of the iteration, and make the iteration process less prominent in our code. Let’s take a look at a recursive strategy for sum and max behaviors.

An interesting point to note is, these functions are actually destructive in nature. We could have written them in a way that is not destructive, however it would have added complexity we don’t need to dig through at the moment. Instead, we can slice the array we are sending in to ensure the original array is not modified by the pop behavior.

Each of these recursive algorithms do something very similar. They highlight a single step in the process, allowing the programmer to focus on the immediate problem of reducing no more than two values at a time. This allows us to actually identify the real behavior we are interested in.

Recursion, of course, leaves us in a position where we have to identify a stopping condition, which was more obvious in the original, imperative, code. Nonetheless, if we choose to halt the process on the occurrence of an empty array, we can just replicate the behavior without needing to put too much extra thought in.

When we review these recursive functions, it becomes apparent the main difference is the accumulation versus comparison behavior. Without too much work, we can strip out this unique behavior and create a generic recursion function which accepts a behavior parameter as part of its argument list. Although this makes our recursion function fairly abstract, and possibly a little harder to read, it reduces the load when we start thinking about what we want to do. The recursion function can disappear as a referentially transparent black box function.

This level of abstraction allows the implementation details of our recursion to be safely separated from the details of our immediate functional need. Functions of this type, which take functions as arguments, are called higher-order functions. Higher order functions are commonly highly-abstract and can lead down a rabbit hole known as generic programming. Let’s not go there today, instead let’s cut to the chase and see our abstraction!

This generic recursion is actually the final step toward our goal, the reduce function. Technically, our generic recursor, given the way it behaves will perform a right-reduction, but that is more than we need to bite off at the moment. We could easily rename genericRecursor to rightReduce and we would truly have a reduction function. The problem we would encounter is, our function is backwards! If we really want to replicate the behavior from our original function we need to make a small modification. Let’s rewrite our genericRecursor as a first, and final hand-build reduce function.

The two key changes we made were renaming and changing from pop to shift. Shift is notoriously slower than pop, so this function is useful for illustration, but it lacks characteristics we would like to see in a production-ready reduce function. Instead, let’s jump from our hand-rolled reduce function to the Javascript native implementation.

Javascript’s native implementation really is a black box function if you are working only from the Javascript side. Implemented in C++, reduce works only on arrays, and has a couple of shortcomings we won’t address here. Nevertheless, the native reduce is key to leveling up your fluent Javascript skills, and is a valuable tool for reducing cognitive load and SLOC bloat. Let’s take a look at a couple of examples of using reduce.

If we return to our original filtering function, we can easily replicate the behavior using reduce. We will also introduce a mapping function. Reduce is so incredibly flexible we can actually accomplish many of the iterative tasks we do every day. The primary pitfall of using reduce for all of the iterative tasks is we will begin to introduce bloat again as we replicate more generic behavior. We won’t dig into the details today. Instead, let’s take a look at some of the power we get from reduce as a tool. It’s kind of the electric drill of the programming world: many uses, all of which save time and energy better spent elsewhere.

This discussion is merely the tip of the iceberg, but it exposes the kind of work which can be done with reduce and the energy we can save by using it more often. For as frequently as complex data types like arrays and objects appear in our code, it only makes sense to work smarter and faster. With the power that comes from first class functions and higher-order functions, we can accomplish large amounts of work with small, but highly declarative behaviors.

As you look at your code, try to spot places where behaviors are repeated and the real focus should be on the data you are working with. Perhaps reduce is an appropriate solution. You might even be able to use it in an interview. I leave you with FizzBuzz performed using reduce.

Comments Off on Leveling Up With Reduce

Callback Streams With Function Decoration

Regardless of whether you prefer raw callbacks or promises, there comes a time where asynchronous behavior pops up in your application. It’s an artifact of working on the web and working with Javascript. This means that, although a function was originally written to solve a particular problem, eventually that function may need to be extended. If we follow the open/closed principle, we should not modify the original function since it almost certainly still solves the original problem for which it was designed. What to do…

Function decoration through composition gives us a powerful way to enhance existing function behavior without modifying the original function. This provides guarantees that our program remains more stable for more use cases and only introduces changes in a surgical, requirements-driven way.

Let’s start off with a core request service. It’s important to note that this is written with certain assumptions being made, i.e. we have specific modules which are already defined and that we only care about the service because it is the foundation for our request construction. This service only does a single thing: it makes a get call to a predefined endpoint with a provided ID. It’s uninteresting, but it helps to illuminate how we are going to construct our function stack.

This is the last piece of object oriented code we are going to look at in this post. We are going to assume from here forward that this service has been instantiated with the correct dependencies injected. Now, let’s create a basic function that uses an instance of our service to make a request. This would look like the following.

So far I know nothing about what the callback does, but that’s okay. This is a simple wrapper to handle our request in some business-layer service. The view-model or controller code will be able to blindly call this service with a request matching the contract.

Technically we have already exposed everything that needs to be known about callback streams, but it’s a little early to end the post, since there isn’t much to be gained here, yet. If all we did was wrap up our request in another function, the goodness isn’t going to be readily obvious to someone who is coming fresh to this concept. Let’s take a look at what a callback stream looks like as an image before we start really digging in.

Callback Decoration Diagram

The important thing to take away from our diagram is no one layer needs to know anything more than what is passed from the layer above. It is unimportant to understand what the layer above does or why. It is, however, very important to know how to respond to the callback that is passed in. This is why contracts become so important in decoration. If, at any point, we break a contract, our stream will break and our application will fail. Fortunately, this adheres to the same requirements as calling any other function, so we are not introducing any greater rule strictness than we had before.

So, back to our business-layer abstraction. Suppose something changed at the data layer and a property name in the JSON that is returned was changed. Although we would like to hope this would never happen, we live in the real world and things are never perfect. Fortunately our abstraction layer allows us to handle this gracefully, rather than having our entire application break because of a database or service change.

Here’s a transformation function.

You’ve probably already noticed our transformation function isn’t tied in with our callback at all. That’s actually a good thing. This function is simple. but if there were complex logic, it would be important to isolate it, and unit test it appropriately. This function does exactly one thing and the declaration is clear. Since callback streams already introduce an abstraction layer, anything we can do at each layer to make the code clear and clean will make debugging easier.

Now, let’s take a look at an approach to handle transformation decoration. We will start off with a simple pattern and expand from there. If Josh Kerievsky taught us anything it’s that we should identify patterns as they appear in the code and refactor to them instead of doing extra, unnecessary work. Let’s write some code.

By making changes this way, we silently introduce changes to fix our application without having to go and touch every place where this call is made. All of a sudden data changes become a much smaller liability to mitigate. We have broken a hard dependency that would be scattered throughout our code by adding an abstraction between our view layer and our data access layer. This is one of the biggest wins the common n-tier architecture provides to us. Let’s take a look at what happens when we have a bunch of changes that happen over time.

The amount of cut and paste I had to do to create all those functions made me die a little inside. This is really smelly code. This is where we can start recognizing patterns and cut out a bunch of duplication. What we really care about is the set of data transformations that need to be managed in our call. The rest of this has become boilerplate. Unnecessary boilerplate in Javascript is bad. Don’t do it. Let’s make a change and fix all this. I like to do this one step at a time. Sometimes things appear as you refactor that might not have been immediately obvious.

That’s a lot better already. Now we don’t have to struggle with a bunch of function duplication and copy/paste hoopla. All we care about is the set of transformations we are going to use on the data. We can practically read off the transformation functions we are using in order. This is actually more descriptive of what we intended to begin with anyway!

Let’s actually do one more refactoring on our code. By eliminating one duplication problem, we introduced another, although less-painful, duplication.

Now we’re cooking with gas! Our getSomeData function can be extended with almost no effort whatsoever now. We can simply create a new transform and then decorate the callback as many times as we need to. This decoration process relies on our original idea: callback streams. Since each layer only cares about adhering to a single contract, and we can wrap the callback as many times as we want, multiple decorations, all behaving asynchronously, can be created as a stream of decorated callbacks without worrying about a failure somewhere in the middle of it all.

The more important item to note is, this could be a single step in a long line of behaviors within a stream. We are adhering to the callback contract in our getSomeData function, so we could, just as easily, use this as an intermediate step between the requesting function and the final request. We really only care about the behavior that happens at the edges of our function, so it really doesn’t matter where this code lives!

This discussion fits in the middle of a couple of different common issues. First, this kind of decoration and function streams behavior directly combats the “pyramids of doom” callback issue many people encounter. The other issue this deals with is exposed promise objects that worm their way through many modern Javascript projects which force us to tightly couple our data access layer to our view code. The abstractions are lost unless a new promise is created and resolved at every level throughout the stack. By thinking about the goal of your code, you take back the power of tiered applications and provide smart, well-isolated functionality which can be enhanced while keeping the rest of your codebase blissfully unaware of the ever-changing data that lives just beyond the edges of your application.

Comments Off on Callback Streams With Function Decoration

Code Smells – Conditional Obsession

Jeff Atwood of Stack Exchange and Coding Horror fame wrote a post quite a long time ago about code smells and what they mean. A couple weeks ago, I discussed eliminating switch statements using hashmaps. In that post, I introduced a new code smell that I want to discuss in a little more depth – conditional obsession.

Conditional obsession is when a programmer introduces more conditional logic than would ever be necessary to solve a particular problem. Sometimes conditional obsession comes in the form of a conditional structure taking the place of a common data structure, such as switches and hashmaps, while other times, it is just overwrought code that grew block by block until it became so unmanageable that developers are now afraid to even touch it.

Following is a dramatization of the kind of code I am talking about. This has been taken from real code I have encountered in the wild, but the variable names have been changed to protect the innocent.

It’s a little like the Twilight Zone movie where Dan Aykroyd says, “do you want to see something really scary,” isn’t it?

Clearly there are more smells at work here than conditional obsession, but you can see that this programmer was clearly testing every possible situation under the sun. Even with the original variable names in place, I would defy you to explain to me what this code actually does. This code is so incomprehensible I’m not going to even attempt to restructure it in a single blog. This could take anywhere from a day to a full sprint to unravel and clean up, depending on how pathological the problem is.

I have reached a point in my programming life where I view conditional blocks as a code smell. Sometimes they are necessary, but, often, they are just a bug magnet. The more conditions you attempt to satisfy, the more likely you are to get one of them wrong. The deeper in a code block your condition is, the more likely it is to only occasionally surface, making it extremely difficult to diagnose.

No good code smell exists without some sort of remedy. Conditional obsession is no different. Let’s have a look at different ways we can fix up our code and make it easier on ourselves and nicer for the next programmer who has to take over what we have written.

Refactoring 1 – Reduce nesting depth

If you have your conditions nested two or more layers deep, consider refactoring your logic to handle the cases at a single layer, instead. This will reduce the number of cases where your code becomes unreachable except for a very specific, difficult-to-identify edge case. Let’s take a look at an example.

Now let’s apply refactoring 1.

Even with just the first refactoring, we get code that is easier to reason about. It’s not perfect, and it’s not DRY, but it’s a step in the right direction. Now that we have applied the refactoring, we can identify what some of the conditionals we had in our original code were really trying to accomplish.

Refactoring 2 – Factor conditionals

Factoring conditionals is a lot like factoring in algebra. Suppose we had the following expression from Algebra 1:

5x + 10

We know that a simple factorization would look like the following:

5(x + 2)

Clearly the second expression describes the outcome of the first expression directly. The main difference is, we now know that we are simply dealing in a linear expression, x + 2, which is being multiplied by 5.

The same can be done with conditional statements to help clarify meaning and help us to reduce complexity in our applications. We can factor out common conditionals and separate our logical concerns, simplifying what we must digest to improve our program’s readability and/or maintainability.

Now that we’ve performed our conditional factorization, it becomes trivial to finish the function cleanup. We are doing a lot of variable manipulation here. This kind of juggling leads to small, difficult to spot bugs, so let’s just get rid of all the unnecessary assignments.

By identifying the conditional obsession code smell, we were able to take a function that was small, but still difficult to read and reduce complexity while improving readability. We trimmed about 33% of the bulk from the code and cut closer to the real goal the original code was trying to accomplish.

A nose for code smells is generally developed over time and with practice, but once you learn to identify distinct smells, you can become a code sommelier and make quick, accurate assessments of code that could use the careful work that refactoring provides. While you code, watch out for conditional obsession and work to reduce the complexity of your application.

Comments Off on Code Smells – Conditional Obsession
Similar posts in Code Smells, Coding, Foundation, Javascript

Eliminating Switch Statements with Hashmaps

It has been a really, really long time since I created a switch statement. I’m not saying there is no place for switch statements in programming, I’m just saying, I haven’t had a reason to use one in a long time. Even though I haven’t written a switch in a long time, I have seen them popping up in code examples at work, online and other places a lot lately.

After seeing several different uses, I started asking “what is the programmer really trying to say with these?” Most of the examples I have seen look like the following:

This has a very particular code smell that I haven’t encountered a name for yet. I’m going to call it conditional obsession. In this particular case, the programmer has opted for conditional logic to emulate a well-known and commonly used data structure. Reducing this kind of conditional overhead is akin to using a stack to eliminate recursion.

Switch statements are intended to be a way to simplify multiple conditionals in a more readable way. Since this code is not really, actually handling a set of conditionals, the switch statement has become little more than an extravagant replacement for a hashmap.

For those of you in Javascript land who aren’t familiar with hashmaps, they are a very close relative to the object literal we have all come to know and love. They are so close, in fact, that you can substitute an object literal in for a hashmap at any point in order to maintain an idiomatic look and feel to your code.

Let’s take a look at what a data structure containing our error messages would look like:

Hey, that makes a lot more sense to me. I can look at this and, in a glance I can immediately tell you what our hashmap contains and what the relation means. This, of course, still doesn’t satisfy one thing that a switch statement can do: default behaviors.

Fortunately, we can build a quick, painless mechanism to handle default values and keep all of the readability we have started here.

Now we have reduced our switch statement down to what we really meant to say: find my error message in this set of keys; if a message can’t be found, then provide a default value instead. This leaves us with a single data structure and one conditional that handles the case we were really interested in: when the error code is unknown.

We will need to make one more modification to our original code to really clean it up and give us the clarity we are looking for:

Now sendError doesn’t require every function to perform some preprocessing to capture the error message it needs to send. This reduces the complexity of our code every place an error code switch statement might have existed and allows us to centralize our error messaging and let our core functionality do what it is intended to do.

Here’s our final, refactored code:

Depending on the size and complexity of your code, this refactoring provides the perfect opportunity to abstract all of your error codes out into a centralized configuration file and then provide an error service that will allow you to simply capture an error code and then send it up through the stack and abstract your error messaging away from your core code altogether.

Switch statements, along with other conditional statements, should be used when an action should be taken only when the condition is satisfied. When conditionals are used to replicate core language data structures, it is often preferable to fall back to the core data structure and reduce the complexity of your code. Hashmaps are faster and more intuitive than a switch statement will ever be, so think about your data, refactor your code, then take a couple minutes to marvel at how your code will say what you really meant to say.