Oliver Creswell – 2 September 2020

What dependency hell looks like, and how to avoid it

Every developer, when faced with a tricky problem, has experienced the same excitement and temptation: “oooh I bet I could use a library for this!” The thought is full of the promise of time saved, complexity abstracted and efficiency gained. Just think of all the problems that could be solved without writing a line of your own code!

Fast-forward on a couple of weeks and there’s probably a fifty-fifty chance you’ll find that same developer desperately trying to wrangle a square-peg library into a round hole problem, tussling with inadequate documentation or finding security holes three levels deep in node_modules.

We need libraries

Libraries, packages, dependencies, crates, gems or whatever else you want to call it, downloadable third-party code that you can use within your project is a necessary part of software development. Without it, we’d all be stuck reimplementing the same basic functionality from scratch over and over again.

But while no one in their right mind would argue that code reuse isn’t a vital part of software engineering, I want to argue that dependency on third-party code has a set of benefits that developers tend to overestimate and a set of drawbacks that we tend to underestimate. I’ll also cover how I try to avoid the worst dependency pitfalls while maximising the benefits of third-party code reuse.

High profile cases

Most readers should be familiar with the left-pad debacle, wherein one open-source programmer’s spat with npm sent ripples across the industry after he removed code that was found deep in the dependency tree of numerous other projects. Another high profile controversy involved the event-stream dependency, which was hijacked by a malicious party pretending to be a legitimate open-source contributor in order to attempt to steal digital currency. More recently, in an incident that beats left-pad and event-stream for absurdity, if not magnitude of impact, a broken version of a library whose code could be written on the back of an envelope caused issues for consumers.

Github screenshot showing is-promise code — The problematic is-promise code is only 6 lines long.

These events should clue the conscientious developer into the fact that adding a dependency to an application or to your own library code comes with risks. It’s true that in each case you can point to aspects of the JavaScript, npm and open-source ecosystems that were not functioning as well as they could have, and it’s true that in each case measures have been put in place to reduce the likelihood of similar issues cropping up in the future. However, complex systems involving messy human interactions and thousands of moving parts will always tend towards vulnerabilities and flaws that are tricky to completely eliminate. This analysis points out that looking to solve a single “root cause” can be the wrong way to look at the problem: there were numerous contributing factors to the event-stream issue, some of which are more tractable than others.

With that said, such incidents are thankfully fairly rare and can be blamed, somewhat, on the chaotic youth of npm and JavaScript code reuse culture. These are dramatic, stark examples of the problems that can come with over-reliance on third-party dependencies, but they are far from the only ones.

Everyday dependency issues

Many developers will be familiar with the sinking feeling that overtakes you when you realise the library you eagerly grabbed early in a project has somehow deflated and started acting more like deadweight than the lifejacket you wanted. The promise of problems solved has long since departed, and you’re struggling to cajole the library’s API into doing what you want despite its objections, or you’re staring baffled at its inscrutable behaviour. On other projects, you’ll have third-party code that is doing 95% of what you want, but the remaining 5% is giving you constant paper cuts, or worse, has deal-breaking behaviour for your app.

Why can this happen? I like to divide the issues into problems you can blame squarely on the library, and those that might have more to do with how you’re using it and whether it’s right for your project.

Simple, “bad library” problems first:

The library code might just be terrible! This sounds like a very obvious point, and one that should be easy to avoid – just don’t use the terrible libraries! – but as a developer, you should know that code can be terrible in devious, multifarious ways. Perhaps the dependency works well most of the time, but in specific circumstances, it inexplicably breaks. Perhaps it’s a paragon of simple, beautiful API design, but performance falls off a cliff at the scale you need in production. Perhaps it has security flaws that haven’t been discovered yet. It’s true that libraries with no redeeming qualities probably won’t be popular enough to come to your attention, but there are popular packages out there with all of the issues I’ve just covered.

Many packages are poorly maintained. After their initial enthusiasm and a burst of popularity, open-source maintainers can become understandably fatigued. The project falls into abandonment, with issues unfixed and security holes unaddressed. The transfer of ownership between maintainers can also be fraught, with event-stream as an extreme example.

Documentation can be poor or non-existent. It’s one thing to write great code, but explaining the behaviour of that code to others is a whole different skill set, which is, unfortunately, lacking among many solo or small-scale open-source maintainers. Even if the person or team has that skill, they can only cover the languages they speak, and often rely on community translations to cover the gap.

Bloated bundle size. When developing a front-end web application, libraries that are not written with tree shaking in mind can inflate your bundle size unnecessarily. You may only use a small fraction of the library’s functionality, but you could be paying the bundle cost for all of its code, making your site slower for users.

The problems above can often be avoided, or at least minimised, by reasonably simple auditing of packages before they are included in your project. This can help avoid some of the worst pitfalls. However, even very “good” libraries can cause issues:

You’re tied in. Using a third-party package binds you to the specific API that the package authors have designed. This may be a very “good” API in terms of meeting their specific goals, but those goals might not coincide completely with yours. This can be hard to assess accurately upfront, and such inconsistencies sometimes only become clear deep into a project or feature’s development.

You’ll have no control over changes. The maintainers may not fix the issues you want, or they may take the entire library in a direction that isn’t ideal for your codebase. While you may be able to submit issues and fixes for open-source packages, the core direction of the library will be largely out of your control.

You’ll still need to write some code. Libraries that solve a general and/or complicated problem can have a very general and/or complicated API surface, that may be difficult to learn and fully understand. You may also need to write your own abstraction layers on top of such packages to cater to your more specific use cases. That can be a good way to work, but don’t make the mistake of thinking that a library will solve the whole problem when it will actually take the library plus a significant amount of your own code.

Libraries can be incompatible with each other. Worse, these incompatibilities can be subtle and might only manifest in certain scenarios, for certain users, or after you update something seemingly unrelated. You should always try to check what is officially supported and proceed with caution outside of that.

Strategies for responsible library use

For most software teams, there isn’t a one-size-fits-all set of rules and criteria that you can apply in order to strike a perfect balance between the benefits of code reuse via libraries, and the drawbacks covered above. There will always be judgement calls and trade-offs, and even seasoned engineers can make decisions they will come to regret. With that said, having the right attitude towards dependency management, asking sensible questions, and doing honest cost-benefit analysis can go a very long way.

XKCD comic depicting dependency fragility. Credit XKCD.com — Credit xkcd.com

Overall, responsible developers should apply the maxim that “dependencies are liabilities not assets”. This should not be interpreted as a blanket anti-dependency statement: just as it’s often the right move to take on financial liability, it’s often the right move to take on a code liability, because you’ve determined that the benefits will outweigh the costs.

In order to establish whether or not the benefits outweigh the costs in a particular case, and to make the best use of a package if you decide to include it, you should have a set of questions you ask whenever you are considering adding a package to your project. Your questions might differ depending on your project, team, deadlines and myriad other factors, but this is the set that I start from:

1.
Is the library solving a problem that we do not have the time and resources to solve internally? Is it feasible to implement the functionality internally at a later stage, and use a library as a temporary solution?

2.
Is the library popular? Is it recognised as a standard solution to the problem it is addressing? Has anyone on the team used it before? If not, has anyone that we know and trust used it that we can talk to?

3.
How well does the library fit our use case? Does it have a very general API that we will need to build on top of?

4.
Is the library well maintained? Are issues being addressed? Are the maintainers active in the community? If appropriate, is there a roadmap or plan for future development

5.
Does the library have clear, well-written documentation? Does it seem easy to learn? Does that apply to everyone on the team that will be using it?

6.
Does the library have a lot of its own dependencies? Are those dependencies compatible with our project’s existing dependencies?

7.
Will the library interact with a lot of other moving parts in our project, or will the separation be quite clean?

8.
Is the dependency intended to work well with the kind of project we are building, or will we need to write code “glue” to make it work? For example, using a React component library in a React project should work out of the box, but a generic charting library may need some glue code to work cleanly.

9.
Can we feasibly build an abstraction around our use of the library so that it will be easier to replace if we decide later that we need to do so?

10.
Could the package realistically introduce security flaws? If so, is this addressed by the maintainers or the documentation at all?

11.
Could the package realistically introduce performance issues? If so, is scaling and performance addressed by the maintainer or the documentation at all?

12.
If developing for the web, how much does the library add to bundle size? Does it allow tree shaking?

Answering these questions should not only give you a good basis on which to analyse the gains you will see from using the library versus the potential negatives, but should put you in the right mindset to make sensible, cautious use of a library whenever possible.

Clearly, some packages like a core framework will form an intrinsic part of your codebase that would be tough to abstract, but other libraries can be used behind an abstraction layer. This not only makes them easier to replace if needed, but could have other benefits such as making major-version breaking changes easier to handle, and unit tests easier to write.

Conclusion

We all know that using libraries will always be part of software development. What is often missed is that using libraries well is a skill that engineers should seek to cultivate.

That skill involves being cautious and sceptical, both when choosing whether or not to use a library in the first place, and as you make use of various libraries within your projects. Above all, try to avoid seeing the benefits of third party code without also understanding the potential costs.