Page 1 of 1
Memory, Floating Point Representation, and Tradeoffs
Posted: Mon Apr 19, 2004 8:09 am
MODERATOR'S NOTE: The following posts were split from the topic "backtesting dynamic portfolio system". This quote was extracted from Bernd's post in that topic.
The amount of data is really huge. My current backtesting engine keeps all data in RAM, which is a problem when I want to backtest a portfolio selection algorithm for several years and for several thousands of stocks.
I don't think that the data size should be a problem for daily data.
Assuming 250 trading days per year, 24 bytes per market for date, open, high, low, close, and volume (4 byte floats each), 10,000 stocks, and 25 years of data you get 1.5 gigabytes. That still fits reasonably comfortably into a 2 gigabyte machine.
It's unlikely that you'd want to test 10,000 stocks for liquidity reasons so your actual numbers should be much lower.
Survivorship bias is indeed a serious problem, but I don't think it affects "traders" as much as "investors" since we are likely to be either out of the market or short markets for companies that go bankrupt.
Posted: Tue Apr 20, 2004 9:14 am
Posted: Tue Apr 20, 2004 3:19 pm
I understand your tradeoffs, with memory being so cheap, I generally prefer to keep things in memory as well.
I chose to implement the indicators in VeriTrader using computation. Current processors are fast enought that it doesn't slow down the drawing of graphs to recompute them and you don't need hardly any additional memory per indicator. The problem comes down to having many systems and many indicators per system. Since VeriTrader supports an arbitrary number of systems we wanted to be able to keep the testing speed up even for 20 to 30 simultaneous systems which could have increased our memory requirements 10-fold over keeping just the price data in memory.
Testing with intraday pricing is where the data gets really big. For testing, using anything less than 5 minute bars makes it pretty hard to keep more than a few hundred markets worth of data in memory with today's 4G limit.
Using 32 bit floats for pricing and indicators is an easy way to gain a lot of memory since there's enough precision to store price and indicator values. We use 64 bit floats for dollar values where the precision to the penny is required and the numbers can get into the billions.
The good news is that 64 bit computers and the required Operating Systems are coming into the reach of mere mortals. These can address as much memory as you can afford. I expect these to be pretty reasonable in two or three years.
- Forum Mgmnt
Posted: Wed Apr 21, 2004 4:36 am
I agree with c.f. using Float (32Bit Decimal) for price data is a good idea and saves you a lot of memory.
But using Float for calculations/quantitative algorithms is problematic, because of Rounding errors and far less precision. Using Double (64Bit Decimal) you can also check for floating point exceptions (like INF, NAN...) that can occur.
The current problem with 32Bit CPUs is that they can only adress 2^32 Bit = 4GByte of memory. It will take some time for the OS-Vendors to port their operating systems to 64Bit.
Posted: Wed Apr 21, 2004 5:32 am
Posted: Wed Apr 21, 2004 8:37 am
I store dates as 32 bit integers.
So in your example, Hiramhon, I would store "20040421" as 20040421. There's plenty of precision in 32 bits to store this without an offset.
The upside is that it's easy to read dates during debugging sessions. I can also easily do the mapping back and forth in my head.
I also often set breakpoints on lines I insert in my code like:
Code: Select all
if ( currentDate == 19970413 )
int a = 0; // <<<< Breakpoint on this line
when I'm having a problem that occurs only on a specific date, or I'm trying to figure out why an order was or was not executed on a particular date, etc.
You could also store dates as a union with a 16-bit integer for the year, and two 8-bit integers for the month and day. I once used this format but later switched to the simpler to read and write format above since I use dates in debugging so much.
- Forum Mgmnt
Posted: Wed Apr 21, 2004 12:17 pm
I am using an Int64 as Type for Date/Time. This way I can express any date/time (in msec resolution for tickdata) in a single value and can also use these number for calculations. (add, subtract,...)
The advantage over a Double (64Bit Floating Point) is eliminating rounding errors. You can also easily compare two Int64 without using an epsilon for avoiding rounding errors.
Posted: Thu Nov 16, 2006 12:13 pm
That is my problem. I use tick data. I am switching all my code to my Apple, which is FreeBSD anyway. When I have my code finished I will buy one of the new 64 bit jobbies. I started my analysis programming on Unix and I am really looking forward to getting back onto it. To be honest I loathe windows.