Page 1 of 1

On building a Platform for Very large test

Posted: Wed Jun 08, 2011 9:21 pm
by ratio
I have decide to build a platform that could run a very large number of test in the minimum length of time. As most of our development is on the Stock Market, every time we want to design a new system we have to load thousands of instruments: 3952 for the TSX Market, 8433 on the NYSE.

We usually test for the period from 1997 to this day.

So in most instance one test can take up to 4 minutes. So even if you have a fairly simple test parameter that loop over 6 times, this test will take 24 minutes.

So far our testing effort have been focus on finding range or parameters by subset. Exemple will test best Support / resistance level. Then we will go on testing best Stop Strategy, and so on.

The problem with this is that as you move on with the test subset, the result are biased by the previous choice of parameters that you had optimised.

So my idea is to build a platform that could test all the parameters combinations. As you all know every time you add a parameter to test, you multiply your number of test to run by the number of loop in that parameters.

As an exemple a test with 6 parameters that loop over 6 values, where each test take 4 minutes. Would take your machine for 46656 test or roughly 65 days.

Now as we had already builded a parallell testing platform for your daily CSI download and TB running. We have decided to take that experience and to create a parallell testing platform.

Our goal is to achieve the same result in much less time.

So in the upcoming days as we build this platform, I will document the steps here, talking about the choice of equipment , the software involved, and the results.

The first wich was to acquired a Dell R810 Server, This server have 4 X7560 CPU each with 8 cores. 256 Gigs of RAM and 3.6 Terabytes of Hard Drive Space.

We will be using windows 2008 datacenter and Hyper-V for virtualization.

Tomorrow I will be talking about the building of the algorythm to split the test file (the vts file in simulation suite directory)


Posted: Tue Jun 14, 2011 9:36 am
by cordura21
Hey Ratio. How's your project going? I'm looking forward to hear from it.
Cheers, Cord

Posted: Tue Jun 14, 2011 10:11 pm
by ratio
The machine came in a couple of days ago. I had to decide how to partition the Disk. The machine have 6 - 600 Gig disk. I read a bit on disk configuration and decided to go with a RAID 10 Configuration. This configuration according to documentation should give us the fastest disk read speed. I would have prefer to have 1 SSD drive to store the CSI Data, but there is no space left in the machine.

I included a picture of the machine.

So far I have installed Windows 2008 R2 Datacenter Edition 64 Bit. We've installed 2008 Datacenter because it come with Hyper-V the Microsoft Virtualization platform. I prefer Hyper-V over VMWare ESX server, because we are use to Microsoft Programming and they have a very good API to interface with the Hyper V Controler. This enable us to start, stop and query the status of the Virtual Machine as the process run.

After installing Windows 2008 and Hyper V, I was left with 247 GIG of RAM. This will allow me to have 30 - 8 Gig of RAM virtual machine running in parallel, and I will be able to allocate 2 Virtual Cores per virtual machine.

Also the program to read the Simulation Suite ".vts" file is finish, I will post the steps it take to read, parse and create the "Child vts" file.

Posted: Wed Jun 15, 2011 9:09 am
by drm7
Eric Crittenden (ecritt) has done a lot of large-scale testing on stocks (maybe even larger than your test), so he may have some useful advice on the logistics of it all.

Posted: Wed Jun 15, 2011 10:09 am
by ratio
Installing Hyper V is as sample as adding a Server Roles in Windows 2008.

Posted: Wed Jun 15, 2011 10:12 am
by ratio

well we already do them on the NASDAQ, 20,000+ listed and delisted stocks. We just want to speed up things. And I think that sharing the experience can give some helpfull information.

Posted: Wed Jun 15, 2011 1:39 pm
by ratio
In order to submit the processing jobs to our multiple virtual machine, we had to design a way to split the Simulation Suite into a number of subset files.

If we have 30 virtual machine, and we have 6000 test to do, we want to create 30 VTS files each one with a subset of the test to do. In order to accomplish that we need to use TB and configure our complete test.

This is a section of the VTS that I use for demonstration. Each variable that we need to test the range are presented with a > on the left side of the = sign. Except for the Boolean that are presented as VARIES wich mean we must switch between TRUE and FALSE

If we look in the included sample for the parameter: Ranking Overweight Factor Q-1, this would mean we need to test it from value 1 to 5 and we must step +1 each time.

Code: Select all

[Rebalanced Momentum:Momentum Ranking]
Ranking Overweight Factor Q-1=1 > 5 +1.0
Minimum Price * Volume=3000000 > 10000000 +1000000
Buy Only IF Over MA=VARIES
Buy Over MA Length=30 > 90 +30

[Rebalanced Momentum:Incentive_Fees EOM]
Perf. Fee (%)=0%
Management Fee (%)=0%
Start Date for Simulation=19971225

[Rebalanced Momentum:Momentum Rebalanced Entry Exit Money]
Buy Threshold=10 > 30 +5
Sell Threshold=15
Rebalance Threshold (%)=1% > 10% +2%
Buy Over MA Exit Length=120

Once we've read all those variable we construct a matrix of all the combination. We then sort it and we split it by range of row so that we get one range per virtual machine. Ex: if we have 300 test to do, and we have 3 virtual machine, each machine would receive a VTS that split the matrix to do 100 test.

Once the matrix is in memory, it is fairly easy to split it we take the first set of line, and we look for the minimum and maximum values for that variables in those line. This become the Test Range for that variable in the new VTS

We then rewrite a VTS file per Machine. So if the original VTS is called: ORIGINAL.VTS and we have 3 virtual machine the VTS are going to be called ORIGINAL_01.VTS, ORIGINAL_02.VTS and ORIGINAL_03.VTS. And each will have a subset to test.

This process will be included later in a much larger Step Handler, that will execute the complete process automatically upon submiting of a VTS file.

Next step tomorrow, create the Virtual Machine template.

Posted: Wed Jun 15, 2011 2:36 pm
by ratio
Creating the Virtual Machine Template is an easy step, what you need is an iso file of windows 2008 or the CD in your server.

You simply use Hyper-V manager, Select New Virtual Machine, give it a Name, here we call it TBMASTER. Assign the RAM 8 GIG, and the Virtual Network.

Once this is done the VM will be created, upon booting it, windows 2008 Standard will install from the CD. Just follow the normal installation steps.

Once this is done, we boot the VIrtual Machine, login and use the Run Menu to execute: control userpasswords2

This application allow you to set the default username and password that will be use when this Virtual machine start, this will allow us to use the startup script to automatically launch TB when we boot the virtual machine.

All the file and scripts to automate the process are store on the main server. In this case we have a VBS file called RatioProcessTBQueue.vbs
VBS is Visual Basic Script that we store on the server and is run by each Virtual Machine upon Startup. This VBS file will know (will explain later how) if there is processing to do and will do it.

Posted: Fri Jun 17, 2011 1:31 pm
by ratio
The RatioRunTBQueue.vbs file is actually executed upon the virtual machine lauch and is connecting to the Microsoft Message Queue.

It read the next entry, wich is the name of a subset VTS file.
It then lauch a batch file that run TB from the command line with the VTS file as a parameters

Code: Select all

Option Explicit  

Const MQ_SEND_ACCESS    = 2  
Const MQ_PEEK_ACCESS    = &h20  
Const MQ_ADMIN_ACCESS   = &h80  
Const MQ_DENY_NONE          = 0  
Const MQ_NO_TRANSACTION     = 0  

Dim qi  
Dim qm
Dim count
Dim queuename 
Dim machine
Dim Ret
on error resume next

queuename = "DIRECT=OS:HSITRADING\private$\RatioRunTBQueue"

Set qm = WScript.CreateObject("MSMQ.MSMQManagement") 

qm.Init machine,,queuename
count = qm.MessageCOunt

	set qi = WScript.CreateObject("MSMQ.MSMQQueueInfo")  
	qi.FormatName = queuename

	Dim q  
  	set q = qi.Open(MQ_RECEIVE_ACCESS, MQ_DENY_NONE)  

	dim WshShell
	Set WshShell = WScript.CreateObject("Wscript.Shell")

	dim commandline
	while qm.MessageCount > 0
		Dim msg  
		set msg = q.Receive (MQ_NO_TRANSACTION, FALSE, TRUE, 100)  
		commandline = "\\HSITRADING\dataroot\data\RAtioScript\RUNTB " & msg.Label

		Ret = WshShell.Run(commandline,1,true)



Sub ShutDown 

Const nLogOff=0 
Const nReboot=2 
Const nForceLogOff=4 
Const nForceReboot=6 
Const nPowerDown=8 
Const nForcePowerDown=12 
Dim oOS
Dim oOperatingSystem 

Set oOS = GetObject("winmgmts:{(Shutdown)}").ExecQuery("Select * from Win32_OperatingSystem") 

For Each oOperatingSystem in oOS 

End sub 
The beauty of this is that even if you have more VTS files than you have Virtual Machine, they get queue. And the process will continue until the queue is empty.

The VIrtual Machine will then shutdown automatically

Posted: Fri Jun 17, 2011 1:37 pm
by ratio
The Directory Structure required

Now each virtual machine does need to have it's own installation of TB. Rather than doing that I do one install of TB on the Host server. I then replicate the directory 30 times (one for each VM)

Each VM will then update the specifics Directory as required. This is done by the batch file that the VBS script lauch. It's name is RUNTB.CMD

Code: Select all

if X%1==X goto parameter
if X%2==X goto parameter
if X%3==X goto parameter
if X%4==X goto parameter

net use t:/delete
net use t: \\hsitrading\dataroot\data

cd T:\tb\tbv%4\%RATIONAME%

del "Simulation Suites\*.vts" /Q
del "blox\" /Q
del "Stock Sets\" /Q
del "Futures Sets\"  /Q
del "Forex Sets\"  /Q
del "Data\Dictionaries\"  /Q
del "Systems\" /Q

xcopy "T:\tb\TBRunningData\%1\Simulation Suites\%2.*" "Simulation Suites\"  /Y
xcopy "T:\tb\TBRunningData\%1\Stock Sets\*.set" "Stock Sets\"  /Y
xcopy "T:\tb\TBRunningData\%1\Dictionaries\*.csv" "Data\Dictionaries\"  /Y
xcopy "T:\tb\TBRunningData\%1\Dictionaries\*.txt" "Data\Dictionaries\"  /Y

xcopy "T:\tb\tbv%4\TradingBloxMAIN%1\Blox\*.*" "Blox\"  /Y
xcopy "T:\tb\tbv%4\TradingBloxMAIN%1\*.ini" /Y 
xcopy "T:\tb\tbv%4\TradingBloxMAIN%1\Futures Sets\*.*" "Futures Sets\"  /Y
xcopy "T:\tb\tbv%4\TradingBloxMAIN%1\Forex Sets\*.*" "Forex Sets\"  /Y
xcopy "T:\tb\tbv%4\TradingBloxMAIN%1\Systems\*.*" "Systems\" /Y

tradingblox.exe -suite "%2" -orders -exit

taskkill /IM tradingblox.exe

For all of this to run and for the Virtual Machine to know where to take their data, the directory structure need to be well defined. Here is what I have created.

Posted: Mon Jun 20, 2011 10:33 pm
by laurens3

Thanks for all the information.

If one is trading a portfolio of 7.000 stocks and trading varios strategies combined and using 1 PC exlcusively for Trading Blox running 1 version at a time;

What are the most important elements you recommend for choosing a PC with a budget of maximum 1.000 USD?
- Processor; dual, cuadcore?
- Or is the type of procesor not that important but the amount of RAM is?

My testing is done for 10 or more years each run.

Thank you for your input.


Posted: Tue Jun 21, 2011 9:17 pm
by ratio
7000 Stock will not need more than 8 GIG, wich is pretty standard these day

So Go for the maximum CPU Speed, Intel Core-I7-965 to 990 are probably still in the top these day

I use a Core-I7 975 on my desktop

I would also suggest to get a good SSD Drive to store your data. This will speed up a lot the data loading.


Posted: Wed Jun 22, 2011 7:01 am
by laurens3
Thank you very much for your reply.

Now let's say in the future I will start trading on a bigger stockportfolio of 15.000+ stocks and 4 different systems, is it then still the number of cores and procesor speed that matter the most or does RAM get more important then?

many thanks!


Posted: Wed Jun 22, 2011 9:43 am
by longmemory
One has to wonder what such models do...

Any possibility of rethinking the trading heuristic so compute intensive parts could be implemented as pre-processes?

Some traders have implemented speed critical Easy Language components with ADE (All Data Everywhere) and could show 1,000 for speed up for their effort.

I use many C++ DLLs to replace Easy Language functions. Even with calling overhead (not using ADE, so less efficient) the speed up is in order of magnitude range.