Reducing Optimization Time

Quicker Optimizations in Trading Blox 4 (Web Link)


Previous Trading Blox versions executed stepped optimizations one test step at a time.  Versions 3.x and earlier executed a stepped optimization in a serial 1-step at a time sequence.  Serial step testing requires each test step to complete before the next step begins testing.  Processing optimization steps serially adds each step’s test time to the amount of time the optimization consumes to complete all test steps.


Version 4 changes how optimizations are executed by putting each test step into a thread.  Two threads are available when Trading Blox 4 is first installed, and more threads are available as an option.  Each thread can test 1-step at a time, and multiple threads can test other steps in that same test period.  With 2-threads always available in version 4 the time it will take to perform an optimization in version 3.x is about twice the amount of time it will take in version 4.x, assuming the same computer specifications and test suite.


Multiple threads can save time because each thread running a step test provides an independent area where calculations can be tested in parallel.  By running 2-threads at the same time the amount of time 2-threads requires compared to how a single thread test is nearly half the time of a the single thread time.


To put this in more concrete terms, the following table shows the amount of time that four different optimized test needed to run using 1-thread.  All four tests were executed on three different computers.  One computer used both the 32-Bit versions of Trading Blox 3.8.4, and Trading Blox  Two computers supporting 64-Bit software tested both 32-Bit & 64-Bit version 3 and version 4.


Test duration details in this table represent how many steps were in the four tests and the number of seconds each test required.  All tests in this table were limited to 1-thread.  Versions prior to version 4.x were only able to use one thread tests, so that test limitation will be the baseline to show how additional threads help to reduce stepped test times.

TB 1-Thread Step Testing Durations 20130717


Active Threads: ^Top

Being able to do an intensive processing required for good system development quickly is a significant enhancement in our trading platform.  It is significant because it allows us to see the results of more ideas in less time, and having a second thread provides us with a significant time reduction from what was possible in version 3.


In this next table 2-threads are used to perform the same test as shown above, however only data for version 4 is available because version 3 cannot support a second thread to improve performance, so we need to accept the single thread times are the best older versions will be able to provide.

TB 2-Thread Step Test Times 20130717


Time saved versus the time needed for a 2-thread test in Trading Blox 4 versus the time it takes with 1-threads creates a significant time reduction across all three optimizations: ^Top

TB 1_vs_2 Thread % Saved Bar Chart 20130717


A second thread run in parallel for optimization test is a noticeable improvement, and it also asks the question of whether two threads are enough.  My answer is no, two threads are not enough.  It is no because our test ranged from 20 to 120 steps is a fairly small optimization test, but it is a practical size for careful data collection, and it shows how ineffectively we are using the computer’s capacity to get work done.


Computer’s capability should be a factor in making the decision about how many threads are enough; 2-threads on multi-core computers will only generate a low utilization.  It is low because a single core limits 2-threads to process the entire test.  This means all the other cores won’t be helping to reduce test times.


This next image shows the six-core computer running our 120-step test.  Windows Task Manager’s performance tab shows this thread limitation only being able to demonstrate an 18% utilization rate, leaving the other 82% of the computer’s capability at idle. ^Top

R10 TB4 120-Step 2-Thread Test CPU Utilization Detail


For reference Window’s Task Manager Performance Tab gives us a good estimation of the CPU Usage based on how much of the computer’s design capacity is available.


Each in these CPU Usage images the narrow vertical windows represents the number of threads the CPU was designed to handle at the same time.  In this case there are 6-physical CPU cores in the chip, with each core capable of managing two threads, for a total of 12-threads available by design.


In this next image of a quad-core CPU the utilization running the same test creates an estimate of 27% utilization leaving 73% of the CPU’s ability to sit at idle. ^Top

LT2 TB4 120-Step 2-Thread Test Quad-CPU Utilization Detail


Being able to support 12-threads, or 8-threads for a quad-core CPU, doesn't mean all the cores should be loaded at 100% because that would interfere with the user’s need to do other things while the computer was running optimization test.  More threads up to the design limit will improve utilization as long as the operating system and the computer’s memory installation are capable of supporting all the threads. ^Top

TB 2 & 6 Thread % Saved Bar Chart 20130717


In this next thread count test results table image a full range the same our 120-Step test using 1-thread up to 12-threads on each of the computers was executed on all three computers.  Test results details for the other steps lengths are available on request. ^Top

TB4.2.4.5 120 Step Optimization Test Results 201307017

This table shows that 4-core computers performance improvement slows significantly when Trading Blox is sets to allow 6 of the possible 8 threads a quad-core CPU will support.  On a 6-core computer the number of thread time reduction differences slows when there are 8 to 9 threads operating in memory.  If these two thread to cores ratio is indicative, a Dual-Core CPU should easily handle 3-threads, and a single core CPU will be very busy with 2-threads.


Thread counts up to the thread-count design limit can give the best time, but if you try to do something else while a stepped simulation is executing the CPU will need to do some memory or thread swapping operations that will slow everything down.  Memory swapping doesn't have a significant time reduction impact on performance, but disk swapping will reduce performance significantly.


Having enough memory to keep the all the available threads in memory throughout the test step testing time is critical to being able to keep the CPU from swapping threads to disk like the results of the R9 computer demonstrates because of Windows XP Pro max working memory size limitation.  That memory limitations causes Windows XP Pro results to run slower than its less capable LT2 computer because XP is doing a lot of thread to memory swapping.

Computer Memory Requirements: ^Top

Stepped optimizations execute at their fastest rate when the entire optimization test is able to execute in memory without Windows needing to use disk space to support any part of the optimization test.


This means each computer's hardware configuration must have enough memory to handle the loading of all the instrument files into memory, and still have room to allocate dedicated memory to each of the threads active during the stepped simulation.  It also means the large simulations using a large portfolio of stocks over a long period of time might not find enough memory to support the planned simulation without having to do a lot of memory to disk space swapping.


When Windows begins to swap memory space to disk so that it can execute other threads before finishing the current thread's processing, the amount of time for the simulation becomes longer than would happen had the computer provided more computer memory for the simulation.


If you find your simulation running slower than expected, check the Windows Task Manager Performance tab to see how much memory is being used.  If the Task Manager shows a high percentage of memory is being used chances are Windows is doing a lot of memory to disk swapping that might be prevented if the computer can support a larger capacity of memory.


While there are no simple rules for determining how much memory is needed, most optimization test with Futures data can be easily be executed with 6 GB of memory.  Optimization test on large Stock portfolios need more memory, but a capacity of 12 GB should be enough most of the time.


Stepped Optimization test Guidelines: ^Top

Short period test require less memory than long period test.

Small portfolios require less memory than large portfolios.

Multiple system optimization tests require more memory than a single system optimization test.

In process custom data output disk activities require less time and memory than optimization that don't require custom data output operations.

Small optimization steps create a larger processing need and a longer period of time to complete.

Active threads that represent between 65 to 75% of CPU supported threads will allow for a high CPU utilization without making other processing needs like email and web browsing stall.

Thread Testing Caution: ^Top

Keep in mind how a threads operate in parallel time.  When a Stepped optimization is testing the First-Thread (Thread-1), the computer is also testing the Second-Thread (Thread-2) at nearly the same start time.    This means if Thread-2, or any other thread test is dependent upon the completion of an earlier thread to get or report information, there will need to be a reference in your source code using the test.threadIndex property to ensure that the timing of the data that is need is passed when it is needed.


Here is what the test warning that appears in the main screen's Log Area reports.  This warning will always appear when a Simulation Scoped variable is used.  A Report for each of the Simulation scoped variables is given:

Using Simulation Scoped variable <your-variable name1> in multi-threaded test could have unintended consequences. Use with caution.


Using Simulation Scoped variable <your-variable name2> in multi-threaded test could have unintended consequences. Use with caution.


Speed Test Rules: (Web Link) ^Top


Trading Blox original optimization “Original Speed Test” was created in January, 2008 to provide a fixed set of data that we could use with our Trading Blox installation.  From that test result comparison we would be able to see how our local performance fit into the overall range of results being reported.


When version 3.4.2 was released the “Second Speed Test” date range was increased to include more data, and test result times showed Trading Blox improved.


Now version 4.x has the ability to add multiple threads so it can do more in the same amount of time.  With Trading Blox doing things even faster we needed to increase the number of testing steps and lengthen data test range again.  Both changes in the new Speed Test requirements improve the resolution of time differences between threads.  It also helps us get a better understanding of where the CPU utilization begins to make smaller time improvements as the thread count increases.


This table shows the actual test details used to generate the top posting: ^Top

Optimization Speed Test Details 20130717


Trader's RoundTable Blox Files: Speed Test


Modules in the zip files were exported by Trading Blox 4 using the Suite’s Export option.  Each Suite will import in Trading Blox 4 using the Suite’s Import feature.  Installing files into Trading Blox 3.x will require the files to be copied from their respective folders and placed into the same folders in your Trading Blox directory.  Trading Blox 3.x should be closed while importing so the start-up logic finds them for display.


To import the SpeedTest's ZIP files contained in the above file, you should remove them from the above file so Trading Blox will find the folders it needs to move the modules.


When you unpack the above Zip file you should see this list of file names:

·All Liquid - Orig.set               - Futures Portfolio File List

·Export Speed Test Def       - Single Step Test

·Export Speed Test Step 020       - 20-Step Test

·Export Speed Test Step 060       - 60-Step Test

·Export Speed Test Step 120       - 120-Step Test


·_Show Thread Portfolio Test-TB3.tbx       - Log Window Reporting Blox TB 3.x Only


This last Blox Module listed, "_Show Thread Portfolio Test-TB3.tbx", is a modified version of a similar module used with Trading Blox 4 included in the Suite packages shown above.  All the modules in the Speed Test Packages will work with Trading Blox 3 except for the Auxiliary module that is used for Log Window Reporting.


For testing in Trading Blox 3, remove the module: "_Show Thread Portfolio Test-TB4.tbx" from all the System Lists and replace it with "_Show Thread Portfolio Test-TB3.tbx" Auxiliary Blox.  Thread properties in the TB4 blox version are not available in TB3 and will cause TB3 to report an error if the Auxiliary module isn’t replaced.


Before testing be sure to check the portfolio symbols in the All Liquid - Orig.set so that they show the correct Futures data file will be used.


Testing is best performed with Log Window on Trading Blox’s main screen open.  Data from each will be generated it will be displayed log window where it can be copied.  Log Window output will look like this:

[img] Sample_Test-Output_Log_Window_Example.png [/img] ^Top


Test result generated on your computer should help you get a relative understanding of the time differences possible on your computer.


Last Edit: 5/8/2017

Edit Time: 5/8/2017 1:58:50 PM

Topic ID#: 191

Created with Help & Manual 7 and styled with Premium Pack Version 2.80 © by EC Software