Time and sales data interpretation

Discussions about the testing and simulation of mechanical trading systems using historical data and other methods. Trading Blox Customers should post Trading Blox specific questions in the Customer Support forum.
Post Reply
pkoufalas
Contributor
Contributor
Posts: 3
Joined: Mon Jul 11, 2005 10:59 pm
Location: Adelaide Australia

Time and sales data interpretation

Post by pkoufalas »

G'day all.

I am using EOD OHLC (+V+OI) data for backtesting. I have modelled slippage and spread using a percentage move away from the nominal fill price, as suggested by some members.

I have access to some time and sales data and am wondering how that differs from the EOD OHLC data that I have. That is, how do I interpret the prices shown? For the EOD OHLC data, I understand the prices shown are midpoint prices, (bid+ask)/2.

Is that true for the time and sales data? I'm told by my broker that time and sales represents actual trades, each trade shown has a buyer and seller represented. How does this relate to the OHLC data?

I was interested in the time and sales data in order to verify my EOD OHLC backtesting "calibration" for slippage and spread.

Any helpful comments much appreciated.

Cheers,
Paul.
Roger Rines
Roundtable Knight
Roundtable Knight
Posts: 2038
Joined: Wed Oct 06, 2004 10:52 am
Location: San Marcos, CA

Post by Roger Rines »

My experience indicates not all Time & Sales information is the same. For example, the information broadcast by a data service may include everything the data feed received. This may not be what you want or need because if there is missing prices, or erroneous prices, it would be best if you could filter the errors and include the corrections. At a minimum the T&S listing should indicate missing and corrected information as well as Sales, Bid and Ask values.

On my CQG data feed they provide a T&S Toolbar that will either suppress or display the items mentioned above, plus Trade Volume and Contributor ID, etc. This ability to decide makes understanding the values easier because you aren’t forced read each price display to understand how the bar got built.

In a CQG data system price bars only contain all actual trade values. In simple terms, the first validated trade is used as the Open, the highest validated value is used as the High, the lowest valid value is the Low and the final settlement price published by the exchange is labeled the Last or the Close of the bar. When missing prices are inserted or errors corrected, these values are considered or removed in the construction of the price bar no matter when they happen.

If you are using T&S data files, then the last update to the file might not include insertions or missing values, so it might not be a good source if accuracy is important.

You will probably find the best answer to your question will come from the data service that provides your data and T&S listing, as they are the only one who will really know how they handle their process. You should also consider getting a feel for how much of the data broadcast by the exchange is broadcast by the data service. Some data vendors don’t broadcast all the ticks because of load limitations. They also don’t talk about this filtering because of the problems it can cause. However, even those who don’t do a good job of getting all the ticks out, will usually do a decent job in their daily bar construction for the day, but charts that build intraday bar data with this restricted feed will find they don’t match what is happening in reality. This failure is most noticeable during fast markets, when it is most important to be aware.

You might also find that subscribing to a reliable EOD data service like CSIDATA where data gets scrubbed before being broadcast, and gets corrected even when the correction from the exchange happens after the day’s activity is a frugal choice. Buying reliable data is an important principle and is certainly a better use of time for serious traders.
Paul King
Roundtable Knight
Roundtable Knight
Posts: 207
Joined: Mon Feb 23, 2004 9:13 am
Location: Vermont, USA
Contact:

Post by Paul King »

If you are using time and sales or tick data to perform historical backtesting, you have to seriously consider whether using 'corrected' data is really representative of the actual market you are testing on.

Using data that has been corrected, if the correction was actually really only available at a later date in real time, is a form of 'look forward' error where your testing is using information that would not have been available at the time.

This is just one of the numerous caveats of automated backtesting to watch out for. Any data feed is only one representation of what actually happened, and it is wise to test any strategy in real time to make sure the results are similar enough to historical testing, to mean they are likely to be valid.

Paul
Paul King
Roundtable Knight
Roundtable Knight
Posts: 207
Joined: Mon Feb 23, 2004 9:13 am
Location: Vermont, USA
Contact:

Post by Paul King »

If you are using time and sales or tick data to perform historical backtesting, you have to seriously consider whether using 'corrected' data is really representative of the actual market you are testing on.

Using data that has been corrected, if the correction was actually really only available at a later date in real time, is a form of 'look forward' error where your testing is using information that would not have been available at the time.

This is just one of the numerous caveats of automated backtesting to watch out for. Any data feed is only one representation of what actually happened, and it is wise to test any strategy in real time to make sure the results are similar enough to historical testing, to mean they are likely to be valid.

Paul
Post Reply