Quicker Optimizations in Trading Blox 4 (Web Link)
Previous Trading Blox versions executed stepped optimizations one test step at a time. Versions 3.x and earlier executed a stepped optimization in a serial 1-step at a time sequence. Serial step testing requires each test step to complete before the next step begins testing. Processing optimization steps serially adds each steps test time to the amount of time the optimization consumes to complete all test steps.
Version 4 changes how optimizations are executed by putting each test step into a thread. Two threads are available when Trading Blox 4 is first installed, and more threads are available as an option. Each thread can test 1-step at a time, and multiple threads can test other steps in that same test period. With 2-threads always available in version 4 the time it will take to perform an optimization in version 3.x is about twice the amount of time it will take in version 4.x, assuming the same computer specifications and test suite.
Multiple threads can save time because each thread running a step test provides an independent area where calculations can be tested in parallel. By running 2-threads at the same time the amount of time 2-threads requires compared to how a single thread test is nearly half the time of a the single thread time.
To put this in more concrete terms, the following table shows the amount of time that four different optimized test needed to run using 1-thread. All four tests were executed on three different computers. One computer used both the 32-Bit versions of Trading Blox 3.8.4, and Trading Blox 184.108.40.206. Two computers supporting 64-Bit software tested both 32-Bit & 64-Bit version 3 and version 4.
Test duration details in this table represent how many steps were in the four tests and the number of seconds each test required. All tests in this table were limited to 1-thread. Versions prior to version 4.x were only able to use one thread tests, so that test limitation will be the baseline to show how additional threads help to reduce stepped test times.
Active Threads: ^Top
Being able to do an intensive processing required for good system development quickly is a significant enhancement in our trading platform. It is significant because it allows us to see the results of more ideas in less time, and having a second thread provides us with a significant time reduction from what was possible in version 3.
In this next table 2-threads are used to perform the same test as shown above, however only data for version 4 is available because version 3 cannot support a second thread to improve performance, so we need to accept the single thread times are the best older versions will be able to provide.
Time saved versus the time needed for a 2-thread test in Trading Blox 4 versus the time it takes with 1-threads creates a significant time reduction across all three optimizations: ^Top
A second thread run in parallel for optimization test is a noticeable improvement, and it also asks the question of whether two threads are enough. My answer is no, two threads are not enough. It is no because our test ranged from 20 to 120 steps is a fairly small optimization test, but it is a practical size for careful data collection, and it shows how ineffectively we are using the computers capacity to get work done.
Computers capability should be a factor in making the decision about how many threads are enough; 2-threads on multi-core computers will only generate a low utilization. It is low because a single core limits 2-threads to process the entire test. This means all the other cores wont be helping to reduce test times.
This next image shows the six-core computer running our 120-step test. Windows Task Managers performance tab shows this thread limitation only being able to demonstrate an 18% utilization rate, leaving the other 82% of the computers capability at idle. ^Top
For reference Windows Task Manager Performance Tab gives us a good estimation of the CPU Usage based on how much of the computers design capacity is available.
Each in these CPU Usage images the narrow vertical windows represents the number of threads the CPU was designed to handle at the same time. In this case there are 6-physical CPU cores in the chip, with each core capable of managing two threads, for a total of 12-threads available by design.
In this next image of a quad-core CPU the utilization running the same test creates an estimate of 27% utilization leaving 73% of the CPUs ability to sit at idle. ^Top
Being able to support 12-threads, or 8-threads for a quad-core CPU, doesn't mean all the cores should be loaded at 100% because that would interfere with the users need to do other things while the computer was running optimization test. More threads up to the design limit will improve utilization as long as the operating system and the computers memory installation are capable of supporting all the threads. ^Top
In this next thread count test results table image a full range the same our 120-Step test using 1-thread up to 12-threads on each of the computers was executed on all three computers. Test results details for the other steps lengths are available on request. ^Top
This table shows that 4-core computers performance improvement slows significantly when Trading Blox is sets to allow 6 of the possible 8 threads a quad-core CPU will support. On a 6-core computer the number of thread time reduction differences slows when there are 8 to 9 threads operating in memory. If these two thread to cores ratio is indicative, a Dual-Core CPU should easily handle 3-threads, and a single core CPU will be very busy with 2-threads.
Thread counts up to the thread-count design limit can give the best time, but if you try to do something else while a stepped simulation is executing the CPU will need to do some memory or thread swapping operations that will slow everything down. Memory swapping doesn't have a significant time reduction impact on performance, but disk swapping will reduce performance significantly.
Having enough memory to keep the all the available threads in memory throughout the test step testing time is critical to being able to keep the CPU from swapping threads to disk like the results of the R9 computer demonstrates because of Windows XP Pro max working memory size limitation. That memory limitations causes Windows XP Pro results to run slower than its less capable LT2 computer because XP is doing a lot of thread to memory swapping.
Computer Memory Requirements: ^Top
Stepped optimizations execute at their fastest rate when the entire optimization test is able to execute in memory without Windows needing to use disk space to support any part of the optimization test.
This means each computer's hardware configuration must have enough memory to handle the loading of all the instrument files into memory, and still have room to allocate dedicated memory to each of the threads active during the stepped simulation. It also means the large simulations using a large portfolio of stocks over a long period of time might not find enough memory to support the planned simulation without having to do a lot of memory to disk space swapping.
When Windows begins to swap memory space to disk so that it can execute other threads before finishing the current thread's processing, the amount of time for the simulation becomes longer than would happen had the computer provided more computer memory for the simulation.
If you find your simulation running slower than expected, check the Windows Task Manager Performance tab to see how much memory is being used. If the Task Manager shows a high percentage of memory is being used chances are Windows is doing a lot of memory to disk swapping that might be prevented if the computer can support a larger capacity of memory.
While there are no simple rules for determining how much memory is needed, most optimization test with Futures data can be easily be executed with 6 GB of memory. Optimization test on large Stock portfolios need more memory, but a capacity of 12 GB should be enough most of the time.
Stepped Optimization test Guidelines: ^Top
•Short period test require less memory than long period test.
•Small portfolios require less memory than large portfolios.
•Multiple system optimization tests require more memory than a single system optimization test.
•In process custom data output disk activities require less time and memory than optimization that don't require custom data output operations.
•Small optimization steps create a larger processing need and a longer period of time to complete.
•Active threads that represent between 65 to 75% of CPU supported threads will allow for a high CPU utilization without making other processing needs like email and web browsing stall.
Thread Testing Caution: ^Top
Keep in mind how a threads operate in parallel time. When a Stepped optimization is testing the First-Thread (Thread-1), the computer is also testing the Second-Thread (Thread-2) at nearly the same start time. This means if Thread-2, or any other thread test is dependent upon the completion of an earlier thread to get or report information, there will need to be a reference in your source code using the test.threadIndex property to ensure that the timing of the data that is need is passed when it is needed.
Here is what the test warning that appears in the main screen's Log Area reports. This warning will always appear when a Simulation Scoped variable is used. A Report for each of the Simulation scoped variables is given:
Using Simulation Scoped variable <your-variable name1> in multi-threaded test could have unintended consequences. Use with caution.
Using Simulation Scoped variable <your-variable name2> in multi-threaded test could have unintended consequences. Use with caution.
Trading Blox original optimization Original Speed Test was created in January, 2008 to provide a fixed set of data that we could use with our Trading Blox installation. From that test result comparison we would be able to see how our local performance fit into the overall range of results being reported.
When version 3.4.2 was released the Second Speed Test date range was increased to include more data, and test result times showed Trading Blox improved.
Now version 4.x has the ability to add multiple threads so it can do more in the same amount of time. With Trading Blox doing things even faster we needed to increase the number of testing steps and lengthen data test range again. Both changes in the new Speed Test requirements improve the resolution of time differences between threads. It also helps us get a better understanding of where the CPU utilization begins to make smaller time improvements as the thread count increases.
This table shows the actual test details used to generate the top posting: ^Top
Trader's RoundTable Blox Files: Speed Test Modules.zip
Modules in the zip files were exported by Trading Blox 4 using the Suites Export option. Each Suite will import in Trading Blox 4 using the Suites Import feature. Installing files into Trading Blox 3.x will require the files to be copied from their respective folders and placed into the same folders in your Trading Blox directory. Trading Blox 3.x should be closed while importing so the start-up logic finds them for display.
To import the SpeedTest's ZIP files contained in the above file, you should remove them from the above file so Trading Blox will find the folders it needs to move the modules.
When you unpack the above Zip file you should see this list of file names:
·All Liquid - Orig.set - Futures Portfolio File List
·Export Speed Test Def Sim.zip - Single Step Test
·Export Speed Test Step 020 Sim.zip - 20-Step Test
·Export Speed Test Step 060 Sim.zip - 60-Step Test
·Export Speed Test Step 120 Sim.zip - 120-Step Test
·_Show Thread Portfolio Test-TB3.tbx - Log Window Reporting Blox TB 3.x Only
This last Blox Module listed, "_Show Thread Portfolio Test-TB3.tbx", is a modified version of a similar module used with Trading Blox 4 included in the Suite packages shown above. All the modules in the Speed Test Packages will work with Trading Blox 3 except for the Auxiliary module that is used for Log Window Reporting.
For testing in Trading Blox 3, remove the module: "_Show Thread Portfolio Test-TB4.tbx" from all the System Lists and replace it with "_Show Thread Portfolio Test-TB3.tbx" Auxiliary Blox. Thread properties in the TB4 blox version are not available in TB3 and will cause TB3 to report an error if the Auxiliary module isnt replaced.
Before testing be sure to check the portfolio symbols in the All Liquid - Orig.set so that they show the correct Futures data file will be used.
Testing is best performed with Log Window on Trading Bloxs main screen open. Data from each will be generated it will be displayed log window where it can be copied. Log Window output will look like this:
[img] Sample_Test-Output_Log_Window_Example.png [/img] ^Top
Test result generated on your computer should help you get a relative understanding of the time differences possible on your computer.
Last Edit: 9/20/2017
Edit Time: 9/20/2017 07:56:26 AM
Topic ID#: 191