I work for a company that produces .NET client-server software. We have a process that will generate VB.NET code automatically based on a grid of data that is filled in by a person who knows nothing about coding. To simplify the explanation, assume that this grid represents a bunch of "If [condition] Then [Set a value]" statements.
Our automatic code generation process would loop over the rows, and create a new If statement for each set of data. This was all added into a single function. The code was compiled and then we called that function at appropriate times.
Most of the time, when the number of rows was small enough, we would generate a function with a couple hundred blocks of If statements. However, we had some spreadsheets that had 20,000+ rows. This would generate a function with 60,000 lines of code. And we noticed that when this function was called the first time, it would take almost 2 minutes before it completed. Subsequent calls only took a couple seconds.
We assumed that JIT took a long time to compile our code each time. So, we thought about two approaches to fix this:
1) Create many functions and only call the code inside the If conditions when true
2) Break up all the code into smaller functions and call each one sequentially
Approach 1 seemed like we would gain the most. JIT compiles code, one function at a time, as they are first called. So, if we only JIT'd the functions we actually used the logic from, this would give us a huge gain since many of the If conditions return false. But, we were a little tentative since many of the statements inside of the If conditions were single lines of code. This seemed like additional overhead that may actually hurt us to call a function to execute a single statement. We also considered the possibility of the compiler optimizing our code by undoing what we did and inlining all of it.
We tried approach 2. It worked well with our existing code... we could arbitrarily stop after so many lines of code and call another function. We also made sure not to chain the functions, and produce a stack overflow. We had one main function that would call all of our other functions.
Since we could break at any time, we tested a number of different thresholds for lines of code in our functions. Below are the results of our tests:
Test Case | Lines of Code / function | JIT+Execution Time |
Original | 60,000 | 97 seconds |
Test 1 | 10,000 | 9 seconds |
Test 2 | 1,000 | 7 seconds |
Test 3 | 200 | 2 seconds |
We didn't even test any further optimizations since we were extremely pleased with our results. We took a function and changed its runtime from 97 seconds down to 2!!
This worked amazingly for us. If you have long functions that can be split into multiple functions, consider doing so. Execution time won’t change, but JIT will affect your initial experience. As I’ve described, a small change gained us almost a 5000% performance boost!
No comments:
Post a Comment