One of the things I love about working at Copia is that we’re always challenging ourselves to be better. In this article, I’ll discuss how we narrowed down and addressed a memory issue when rendering Rockwell ACD and L5X files, ultimately improving pre-render memory utilization by an order of magnitude.
The challenge: how do we improve Rockwell file performance metrics?
Copia is more than a change management tool for Programmable Logic Controller (PLC) code — it’s a change visualization tool. The challenge is that Programmable Logic Controller (PLC) files are usually difficult to work with because they’re often large and in proprietary formats.
Displaying a Rockwell Studio 5000 .ACD or .L5X file requires a lot more than just “getting” the information into our applications. Our web and desktop applications need to gather and prepare files before they can render a graphical interface.
After a number of customers noticed issues with larger Rockwell files, we decided it was time to dig into improving our performance metrics. In a computer application, performance is typically separated into “space” and “time” metrics (this is not known as “spacetime” performance, much to my dismay).
In other words: how much memory does a process use and how long does it take. Applications “freeze” or “crash” if a process is too weighty or too slow.1
Diving into the metrics and identifying the problem
As we began our investigation, it became clear that the issues we were facing started with the “preparation” stage of our visualizations. Using Chrome’s performance tools, we identified that our first culprit was a “hashing” function — a function that was taking a 5MB file and maxing out memory utilization at nearly 10 times that size! For reference, if starting this project from scratch we might aim for a memory utilization increase equivalent to the size of the initial file.
It’s important to explain what hashes are and how we use them. A hash is a unique identifier that can be used to represent larger, or smaller, datasets. While hashes are often used to store obfuscated passwords, we need them to compare large amounts of data as short and light weight values. Searching two entire Rockwell Studio 5000 Routines for a small tag change takes significantly longer than comparing two 16-character hashes.2
To make Copia more usable, we’ve added numerous filters to enhance functionality such as ignoring tag value changes or applying “Smart Filter” settings to ignore irrelevant sections of code. In applying these functions, we created the problem ourselves. So as not to modify the source code projected on the screen, we would copy the entire object and remove ignored items from the values used to generate hashes.
This graphic shows us copying each routine once, but in actuality our approach included copying each file three times! In fact, some quick data plotting showed a direct correlation between our the three copies we saw in the code and the amount of memory used:
The regression line formula is y = 3x + 21 !!!
To “copy” means to both read data from one location and save it in another. This takes time and as shown in the chart above, lots of space. Going from three copies to one might seem like the right approach, but since our desired output was a set of hashes and not a copy we asked ourselves if we could get the hashes without any copies.
How did we flip our approach and use a zero-copy solution?
In JavaScript, there’s a concept of “pass-by-reference.” This means that when a new object is created from other objects, it stores pointers to where the data can be found. On the flip side, creating full copies (or “pass-by-value”) takes more intentionality in JavaScript. The below example shows how pass-by-reference works:
In this example, we show that a “set” contains a reference by adding an object with an initial value, changing that value, and viewing the data in the set as also updated.
Sets, in JavaSript, are special objects that serve as collections of data values where data is never duplicated. Since sets like other objects contain references, we can think of them as ledgers that point to where data is actually stored. This “pointer” concept became the backbone of our zero-copy approach:
- Traverse the initial file for all elements and attributes that should be excluded from the hash.
- Add references of those elements to a set.
- Traverse the initial file to create a 16-character hashes, using only elements and attributes that are not found in the set created in step 2
When we applied this new approach, we saw that we never even reached one copy of our initial file size.
The change of slope from 3 to 0.3 was wildly exciting.
The regression line formula is y = 0.32 + 32 !!!
What comes next?
I mentioned earlier that this is just one step in improving our customers’ experience. We as an engineering team are proud of what we were able to do here and have been talking far more about how we can use the tools and methods moving forward. The challenges that lie ahead are going to be a lot of fun and I, for one, can’t wait.
1 If you’re still curious about why an application crashes, try this exercise: Multiply 2x2 in your head. You probably did it right away! Now, see how long it takes you to do 23434123001234 x 1234.55892. Did you stop calculating because it was taking too long or did you stop because you couldn’t keep track of the result?
2 Another exercise. Spot the difference in the following two examples
Short string:
- aAsDF48jkkVo9jip
- aAsDF48jrkVo9jip
Long string:
- asdflj3q8aasdfJAWEFASIHEFnjldsahfl798ha;osuehf89ewaoisdhfa8osdiuncalsh8p943qj’apfip8a’pse9ufa;jis8ruvashd.af/jp0a;9wefujhizsdhfasudohfga79peay;ow8iefhaiwelufghaew8989w;ohueuiaohew.hf;8eaisyrh3;fe98noia80yewio/y32eh8;aoiwly/u90f;e28;a/ofy23a80oyha;iowefyha;8o3f
- asdflj3q8aasdfJAWEFASIHEFnjldsahfl798ha;osuehf89ewaoisdhfa8osdiuncalsh8p943qj’apfip8a’pse9ufa;jis8ruvashd.af/jr0a;9wefujhizsdhfasudohfga79peay;ow8iefhaiwelufghaew8989w;ohueuiaohew.hf;8eaisyrh3;fe98noia80yewio/y32eh8;aoiwly/u90f;e28;a/ofy23a80oyha;iowefyha;8o3f