CSI Stock Data: Reliability of Start Dates?

Use this forum to discuss data providers like CSI, charting, or other non testing software.
AFJ Garner
Roundtable Knight
Roundtable Knight
Posts: 2071
Joined: Fri Apr 25, 2003 3:33 pm
Location: London
Contact:

CSI Stock Data: Reliability of Start Dates?

Post by AFJ Garner »

I have great doubt that CSI Stock data is the right tool for those concerned to make an accurate test of a long term trend following strategy, such as that produced by ecritt.

I have a list of over 8,000 US IPOs stretching back to 1975 together with their first trading dates. In many, many instances, it appears that CSI has only picked up the daily data some days, months or even years after the date the stock first traded.

In fairness, there may be an explanation: and I will report on it if I find it.

But prima facie, this does not look at all satisfactory.

There are numerous other considerations which ncessitate going stock by stock through the entire CSI database. Not that this part is a criticism but you need to exclude multiple classes of preference shares, Pink Sheet stocks, OTC BB stocks etc etc. I suspect many of these penny stocks would be weeded out by a volume filter.........but of course that is all extra processing time.
AFJ Garner
Roundtable Knight
Roundtable Knight
Posts: 2071
Joined: Fri Apr 25, 2003 3:33 pm
Location: London
Contact:

Post by AFJ Garner »

Here is an example. Hutchinson Technology Inc, ticker HTCH. The company went public on 15th August 1985 according to my information (and the information on the company website confirms this) and yet CSI only have data from 26th March 1990 - an incredible 5 years after the company supposedly went public.

The may be some explanation but I am finding this a very frequent occurrence in the CSI database.
Last edited by AFJ Garner on Wed Jan 31, 2007 10:47 am, edited 1 time in total.
sluggo
Roundtable Knight
Roundtable Knight
Posts: 2987
Joined: Fri Jun 11, 2004 2:50 pm

Post by sluggo »

I don't know about CSI + Stocks, but for CSI + Futures you have to pay extra to get more than 10 years of historical data. Perhaps they charge a similar fee to let you access stock data from 22 years ago.

Their website contains lots of stuff, including this image:
Attachments
extra_csi.png
extra_csi.png (41.02 KiB) Viewed 13696 times
AFJ Garner
Roundtable Knight
Roundtable Knight
Posts: 2071
Joined: Fri Apr 25, 2003 3:33 pm
Location: London
Contact:

Post by AFJ Garner »

Thanks for input Sluggo but no, that is not the problem. I have stock data going right back to 1962 - I have a chart of Alcoa for instance, going back to that date.

I fear the problem is simply incomplete data series. And it does present a serious problem. I am sorry to say I have not listed the instances I have found where data starts late but it is a very considerable number.

ecritt's enormously helpful study probably shows, to most peoples' satisfaction, that ULTTF does work on stocks. But it would be better to be able to test the hypothesis for oneself.

Rather like the conclusion that a very long term system can trade almost any future profitably, one can probably assume the same for almost any basket of stocks with a system like that ecritt proposes.

Having paid $999 + $300 for US listed and de-listed stocks, I'm irritable to find that at least so far as IPOs go, my test results will be of very limited use.

I must ask ecritt what data provider he used.

See http://www.crsp.chicagogsb.edu/index.html

I fear one probably has to resort to such a body as this for complete historic stock data. And god knows who one goes to for Europe and the Far East.
nickmar
Roundtable Knight
Roundtable Knight
Posts: 192
Joined: Tue Oct 26, 2004 12:38 pm
Contact:

Post by nickmar »

AFJ, Sluggo: Do either of you know how to extract the CUSIP or PermNo identifiers for each security in the CSI database? I wanted to perform an issue date lookup comparison with the following excellent sources of data in order to determine the extent of the problem:

Field-Ritter dataset of company founding dates:
http://bear.cba.ufl.edu/ritter/foundingdates.htm
Excel Spreadsheet link: http://bear.cba.ufl.edu/ritter/work_papers/age7505.xls
Additional relevant info:
http://bear.cba.ufl.edu/ritter/ipodata.htm

Paper: IPO Information Aggregation: Do Underwriters Differ?
http://leeds-faculty.colorado.edu/yungc ... gation.pdf
Excel Spreadsheet link:
http://leeds-faculty.colorado.edu/yungc ... 20Data.xls
AFJ Garner
Roundtable Knight
Roundtable Knight
Posts: 2071
Joined: Fri Apr 25, 2003 3:33 pm
Location: London
Contact:

Post by AFJ Garner »

I admit that I'm a boring pedantic old fart but here is another reason to check CSI stocks line by line:

40194 IBAS OTC Ibasis Inc 19991110 20070124
68901 IBAS OTC IBasis Inc 20060622 20060622

OK, with only one day's data, no problem but there are quite a number of such duplications where the overlap is considerably longer.

Plus there are similarly misleading instances where the same stock is listed for one period on one exchange for one period then separately for a following period on a bulletin board where the company has been forced to or decided to de-list from Nasdaq. Or at least I would imagine that is the reason.

It is likely using an ULTTF system that one would be well out of the stock anyway before it was de-listed. And I would not propose to trade penny stocks on bulletin boards anyway. But my point is that these problems need to be considered and the database gone through line by line.

Nickmar, no, I do not have any fast or mechanised way of making such a comparison but I am working on it. It clearly needs to be done.

I have also queried CSI and asked for an explanation of their policy/why these late listings occur.

Ain’t straightforward is it?

And yes, Ritter's papers and sources are excellent. I particularly like the paper of long term returns for IPOs fo buy and hold. Horrid underperformance!

The "money left on the table" paper is interesting and fits exactly with my long experience of trading IPOs: you tend only to get a first day pop if the price range has been raised evidencing that the stock is hot.
Tim Arnold
Site Admin
Site Admin
Posts: 9015
Joined: Tue Apr 06, 2004 1:41 pm
Location: Boston, MA
Contact:

Post by Tim Arnold »

Within CSI UA, if you check the "Output Stock Market Specs in CSV Format" and then Get Data, it will create a sdbfacts.csv file in the Archives folder. This has all the stock information listed.

Unfortunately, the CUSIP field is blank, so I'm not sure they really have this info, or maybe it is a premium service.
Attachments
csiOutput.gif
csiOutput.gif (14 KiB) Viewed 13666 times
nickmar
Roundtable Knight
Roundtable Knight
Posts: 192
Joined: Tue Oct 26, 2004 12:38 pm
Contact:

Post by nickmar »

Darn - thanks for the info Tim. The lack of a unique security identifier is unfortunate as Ticker symbols are often re-used.
Tim Arnold
Site Admin
Site Admin
Posts: 9015
Joined: Tue Apr 06, 2004 1:41 pm
Location: Boston, MA
Contact:

Post by Tim Arnold »

Be sure to use this option so that you have a unique ID. Unique to CSI, but still can't connect with the outside world...
Attachments
csinumber.gif
csinumber.gif (12.64 KiB) Viewed 13656 times
ecritt
Roundtable Knight
Roundtable Knight
Posts: 134
Joined: Sat Aug 28, 2004 3:44 am
Location: Phoenix, AZ
Contact:

Post by ecritt »

CSI's stock data prior to March of 1990 is "hit or miss". Others and myself have been unable to find any material problems post 1990; there were about 20 missed splits and various corporate actions that they have since fixed at my request. CSI's real-time (day to day) performance, which I've been evaluating for 3 years now, has been quite good.

If you want to be completely legit pre-1990 be prepared to spend about $5,000 per month, with at least a 1-year commitment. Firms to look at are Quantitative Analytics, Factset, FT-Interactive.
AFJ Garner
Roundtable Knight
Roundtable Knight
Posts: 2071
Joined: Fri Apr 25, 2003 3:33 pm
Location: London
Contact:

Post by AFJ Garner »

Thanks Eric. Yes, I can confirm my problems seem mostly to be with the pre 1990 data. And of course cutting out data one does not want (ADRs , prefs, closed end funds, bulltin boards, pink sheets) is relatively easy, especially given the current 3 and 4 and 5 symbol tickers for NYSE, NASDAQ and bulletin boards respectively.

Your answer is much appreciated.
AG
AFJ Garner
Roundtable Knight
Roundtable Knight
Posts: 2071
Joined: Fri Apr 25, 2003 3:33 pm
Location: London
Contact:

Post by AFJ Garner »

Oh gawd, I continue to plough through the data and now of course keep noticing POST 1990 problems! Many of them.

Just come across a couple of post 1990 IPOs whose prices are not picked up until 2 years after the listing date. No, the data is riddled with these errors, its not just pre 1990.

What this is saying to me is that the data is by no means wholly reliable for my specific purpose: testing various forms of mechanical trading on the universe of US IPOs as from the listing date.

It does NOT of course invalidate LTTF on stocks generally.

It is, however, intensely irritating.
DPH
Roundtable Knight
Roundtable Knight
Posts: 115
Joined: Tue Nov 17, 2009 1:07 am
Location: Central Pennsylvania
Contact:

Post by DPH »

CSI’s has other significant problems. First of all they limit the maximum number of stocks you can export to 10,000. In order to build an entire database you have to go thru a cumbersome process of creating exporting, then deleting a portfolio of fewer than 10,000 stocks and then doing it again and again until you have the whole universe exported in usable format. If you want this database updated daily you would have to go through the whole process each day!

No problem you say “I’ll just use the scanner in CSI to narrow down my universe to under 10,000 stocksâ€
cully
Full Member
Full Member
Posts: 11
Joined: Sat Mar 03, 2007 11:28 pm

Post by cully »

So what is the consensus here? Using CSI data post 1990 is as "good at it gets" and should be "good enough" to develop stock trend following strategies?
bf2
Full Member
Full Member
Posts: 16
Joined: Sun Jun 26, 2011 2:54 pm

Post by bf2 »

As we can safely assume that there will never be a perfect solution for clean data what's the best alternative?

What if we add an average volume filter? That should weed out most of the rubbish.

I have read Ecritt's paper and I am not convinced that delisted stocks (i.e. survivorship bias) is a major problem for a method like his that buys new all-time highs. Not having delisted stocks in your universe while testing such a strategy may inflate your system performance slightly - so just deduct a couple of percentage points from the CAGR the test shows.
AFJ Garner
Roundtable Knight
Roundtable Knight
Posts: 2071
Joined: Fri Apr 25, 2003 3:33 pm
Location: London
Contact:

Post by AFJ Garner »

I have not tested de-listed stocks and therefore do not know whether you are right or wrong. What I do know is that ecritt wrote a later and equally fascinating paper (following his paper about buying stocks at all time highs) based on his own research showing just how low the survivor-ship rate is on equities over the years. Doubtless a search will reveal this further paper for you - I highly recommend it. I had no idea that quite such a large proportion of listed stocks crash and burn. This paper may change your views. Or not.
bf2
Full Member
Full Member
Posts: 16
Joined: Sun Jun 26, 2011 2:54 pm

Post by bf2 »

AFJ Garner wrote:What I do know is that ecritt wrote a later and equally fascinating paper (following his paper about buying stocks at all time highs) based on his own research showing just how low the survivor-ship rate is on equities over the years. Doubtless a search will reveal this further paper for you - I highly recommend it.
I have searched for the last two hours (obviously not very well) - but cannot find the paper you mention. I have found other papers by Ecritt, including 'Does Trendfollowing Work on Stocks Part II', 'If You Miss Certain Stocks', 'Real or Random' and 'The Capitalism Distribution' - but not the one that deals with low survivorship.

I initially thought it's the The Capitalism Distribution you are talking about but that doesn't go into that area.
AFJ Garner
Roundtable Knight
Roundtable Knight
Posts: 2071
Joined: Fri Apr 25, 2003 3:33 pm
Location: London
Contact:

Post by AFJ Garner »

Hmm, sorry about that. Perhaps this is what I was thinking of:

viewtopic.php?t=3300&highlight=blackstar

Also see attachment. Alternatively perhaps the whole thing was in my failing imagination.
Attachments
TrendingStocksDriveTheMarket.pdf
(514.94 KiB) Downloaded 429 times
ecritt
Roundtable Knight
Roundtable Knight
Posts: 134
Joined: Sat Aug 28, 2004 3:44 am
Location: Phoenix, AZ
Contact:

Post by ecritt »

bf2 wrote:As we can safely assume that there will never be a perfect solution for clean data what's the best alternative?

What if we add an average volume filter? That should weed out most of the rubbish.

I have read Ecritt's paper and I am not convinced that delisted stocks (i.e. survivorship bias) is a major problem for a method like his that buys new all-time highs. Not having delisted stocks in your universe while testing such a strategy may inflate your system performance slightly - so just deduct a couple of percentage points from the CAGR the test shows.
A couple of important points about delisted stocks. 1) They outnumber active stocks. 2) About a third of them were acquired at or near all time highs.

By omitting delisted stocks you are limiting yourself to a sample (active stocks) that lacks most of the downside dispersion, and a much of the upside dispersion in long term individual stock returns. What kind of biases do you think this creates?
Attachments
DelistedStocks.gif
DelistedStocks.gif (11.82 KiB) Viewed 13273 times
bf2
Full Member
Full Member
Posts: 16
Joined: Sun Jun 26, 2011 2:54 pm

Post by bf2 »

Thanks ecritt. That does change things.
Post Reply