Stock data

Discussions specific to trading the stock market.
Post Reply
Chuck B
Roundtable Knight
Roundtable Knight
Posts: 481
Joined: Thu Apr 17, 2003 6:34 am

Stock data

Post by Chuck B »

From what testing I have tried to do on stocks in the past, I always ran into significant data quality problems that are far worse than that seen in the futures markets. This is particularly true in real time data delivery on stocks. I found that TickData has the cleanest stock data I could find. I have also used their tick futures data since 1991, and the new owner dramatically improved their whole service in recent years (I have no connection to them other than being a satisfied customer).

TickData Inc. has an interesting white paper of data that can be seen here:
http://www.tickdata.com/EquityScrubbing.html
Forum Mgmnt
Roundtable Knight
Roundtable Knight
Posts: 1842
Joined: Tue Apr 15, 2003 11:02 am
Contact:

Post by Forum Mgmnt »

Chuck, thanks for the link to the document. I ran across it before but didn't download because I was expecting it to be a piece of mindless marketing drivel. It is very well done.

The other issues associated with stocks that won't be obvious to futures traders relate to what happens to companies' as compared to things.

Some examples:
  • Stock Splits - Stocks are split so that shareholders receive new shares in proportion to the split. A two-for-one split will cause 100 shares at $100 to be 200 shares at $50.
  • Mergers - Usually one of the companies stock turns into the acquirers stock at some proportion.
  • Spin-outs - Companies launch other companies (e.g. AT&T spun off Lucent) and the shareholders of the parent company get shares in the new entity.
  • Dividends - Some companies issue dividends, if you happen to be the shareholder on the date that matters you get the dividend.
  • Symbol Reuse - After mergers or delistings some symbols are reused, sometimes this results in the wrong data getting hooked together erroneously.
  • Symbol Changes - Sometimes (especially recently) symbols are changed as companies merge, or are subject to certain exchange actions, NASDAQ adds letters when companies are pending delisting, for example. This can cause the data to show up as two contiguous but separate data series under each symbol with no indication that they are really the same stock.
  • Etc.
These issues generally don't affect traders of short-term (minutes to days) or medium-term (weeks to months) systems too much (except for the dividend effect) but they can have a large effect on anything that holds positions for a long time (many months or years).

All of the suppliers of data have significant mistakes in their information. Sometimes it is off by a factor of 10 or 100.

You get the basic idea. There is more here than one would think at first glance if you have a futures background.
blueberrycake
Roundtable Knight
Roundtable Knight
Posts: 125
Joined: Mon Apr 21, 2003 11:04 pm
Location: California

Post by blueberrycake »

Forum Mgmnt wrote: All of the suppliers of data have significant mistakes in their information. Sometimes it is off by a factor of 10 or 100.
So what data do you use for your testing? Do you just scrub it manually?

-bbc
Howard Brazzil
Roundtable Fellow
Roundtable Fellow
Posts: 54
Joined: Wed Apr 16, 2003 12:45 pm
Location: Houston, TX USA

re: Stock data

Post by Howard Brazzil »

Hi Forum Mgmnt,

Do you mind taking a minute to specifically address the issue of survivorship bias?

When testing a momentum strategy, for instance, I can't help thinking that the absence of (now delisted)
companies such as Netscape, and Ascend, just to name two, could have a significant impact on results.
And thinking a few years farther forward in time, there are a large number of internet companies that would
have been stellar momentum candidates, that have since just...gone away.

But various data imperfections have different effects on different strategies, as you mentioned.
Is survivorship one of the factors that doesn't really affect short-term and medium term traders?
How do you go about making the distinction?

Thanks,
- Howard
Bernd
Roundtable Knight
Roundtable Knight
Posts: 126
Joined: Wed Apr 30, 2003 6:39 am

Re: Stock data

Post by Bernd »

Chuck,

if you have errors in the real time data you base your trading diccsions on, then wouldn't it be good to have the same errors in the historical data you use to test your systems?

In other words: I see a danger to perform the tests on data which is cleaner then the data which you use for real trading.

Bernd
Forum Mgmnt
Roundtable Knight
Roundtable Knight
Posts: 1842
Joined: Tue Apr 15, 2003 11:02 am
Contact:

Post by Forum Mgmnt »

We currently use a hybrid of data from various sources that has been scrubbed a bit.

We are working on a new joint project with another research company that is based in Eastern Europe and has a great database of company information, we will be combining this with our daily and tick data to get a unified database that does a decent job of representing reality as one would have seen it over time.

Most of our testing has been done with decent data but still data with problems. We usually look at charts for trades that have unusual results (biggest winners and losers) and sometimes find bad data points that way. There is no automatic way to do this that doesn't have some downside. I prefer using human beings augmented with tools.

We still have a lot of work to do before I am happy with this.

Howard brings up a good point. There are many issues associated with survivorship bias and related issues. Some will result in performance looking worse, others with performance looking better.

One of my favorites is testing something like the S&P 500. If you test today's S&P 500 you'll get a lot of companies that are there because they have been very successful in the past 5 or 10 years and have grown from small companies into large ones. One can't know they would have been on the list before they got there. So you really need to test using whatever component stocks might have existed at the time. Getting the data to do this is no small task and not cheap.

The easiest way around a lot of these issues for the independent trader is to use only dollar trading volume to filter the big (or at least liquid) stocks from the little ones instead of using an index.
Last edited by Forum Mgmnt on Thu May 08, 2003 5:46 am, edited 1 time in total.
Chuck B
Roundtable Knight
Roundtable Knight
Posts: 481
Joined: Thu Apr 17, 2003 6:34 am

Post by Chuck B »

if you have errors in the real time data you base your trading diccsions on, then wouldn't it be good to have the same errors in the historical data you use to test your systems?
In other words: I see a danger to perform the tests on data which is cleaner then the data which you use for real trading.
Bernd,

To some extent yes, and to a great extent, no, at least with stock data. Stock data is full of junk that often is far away from the market. You have to scrub this data to some extent to even have an intraday database that is usable and mostly representative of the actual traded prices of the stock. Your stops and limits will not be filled in real trading with the erroneous ticks, so there is no point in trying to use data filled with inserted trades, wrong symbol insertions, out of sequence prices (sometimes from the previous day even!), etc.

Take a look at TickData's paper I referenced above.
Bernd
Roundtable Knight
Roundtable Knight
Posts: 126
Joined: Wed Apr 30, 2003 6:39 am

Post by Bernd »

:wink:
Post Reply