How to select a portfolio of stocks for testing and trading?

Discussions about the testing and simulation of mechanical trading systems using historical data and other methods. Trading Blox Customers should post Trading Blox specific questions in the Customer Support forum.
Post Reply
choppystride
Full Member
Full Member
Posts: 22
Joined: Thu Jul 17, 2003 4:54 am

How to select a portfolio of stocks for testing and trading?

Post by choppystride »

I've been lurking around this forum for the last little while and I can see that some of you are stock traders. I have the following newbie questions:

How do you select your portfolio of stocks for your system? Do you:
  • 1) Simply use the constituents of some well known indices such as S&P or Russell?
  • 2) Define the list based on some fundamental/technical characteristics such as market cap, avg volume, etc?
  • 3) Apply your system to all the stocks and include those that meet certain performance criteria?
  • 4) Have some other procedures that you wouldn't mind discussing just a bit? :)
Once you have defined your portfolio, how often do you do change its components? (of course here I'm assuming that you're not using method #1)

On a related topic, how do you deal with the pesky problem of survival bias? That is, in your testing, how do you account for stocks that no longer exist due to bankrupcy, M&A, LBO, or whatever other reasons? I've done some search on the Internet and found that historical data for defunct stocks are generally hard to find and very expensive. Even if you find them, they could be a maintenance nightmare due to symbol changes and all sorts of corporate actions.

So do you:
  • a) Simply go out and buy all the necessary data for all the defunct stocks and include them in your testing?
  • b) Design your system in such a way to avoid or minimize the inclusion of defunct stocks over your test period? This way, you only need to obtain and maintain a manageable amount of defunct stock data.
  • c) Exclude them all together from your test data and simply discount them in your final test results (i.e. just assume that they will negatively impact your performance results such as Sharpe, W/L Ratio, etc by 30% or some such number).
  • d) Again, have some other procedures that you wouldn't mind discussing just a bit? :)
Jason Czech
Roundtable Fellow
Roundtable Fellow
Posts: 68
Joined: Fri May 16, 2003 8:28 am
Location: Atlanta, GA

Post by Jason Czech »

On the issue of inactive (defunct) stocks, I've been in contact with CSI Data & they've told me that they have 'nearly' all of these issues included in their database. I'm thinking about forking out the $240 for 20 years of historical data & giving them a try. I think it's very important (for my own confidence) to have these issues included.

Anyone here have any experience w/ CSI equities data?

Jason
choppystride
Full Member
Full Member
Posts: 22
Joined: Thu Jul 17, 2003 4:54 am

Post by choppystride »

Hi Jason,

Thanks for the heads-up! Actually, I'm more interested in intraday data. However, if CSI sells 20 years of EOD for $240, it does seem to be an interesting offer. I sent them an enquiry via email. One of the responses I got was that their data history is complete and without any missing data. This obviously contradicts with what you said about their data being only *nearly* complete. Also, they didn't quote me a price even though I specifically asked for one. All they told me was that they're opening a Yahoo store soon and asked me to check back when it's up and running.

So I'm just wondering, how did you get your info regarding their price and data integrity? It was actually a staff from their marketing department who answered my email. And judging from her writing, it does seem that she is not totally familiar with the features of their products.
Jason Czech
Roundtable Fellow
Roundtable Fellow
Posts: 68
Joined: Fri May 16, 2003 8:28 am
Location: Atlanta, GA

Post by Jason Czech »

Well, they didn't exactly give me a quote on the 20 years of history, but according to their website if you order Unfair Advantage you receive 10 years of data and additional years can be purchased at $20 per year. The setup cost for UA is $40, so I concluded that 20 years of history would cost me $240. Of course, this is assuming that I subscribe to their service...which you may not want to do. It's also assuming that they actually have 20 years in their database, which I haven't verified.

The response that they have 'nearly all' of the inactive symbols included came from someone there named Denise.

On the issue of selecting a portfolio of stocks...that's the key question isn't it? :) I'm currently going with the approach that I should start with the universe of all stocks and then remove those that are illiquid (the definition of which is probably a function of your available capital) and those that are super large cap. I'm still trying to define what 'super large cap' means, but now I'm basically assuming that I don't want to trade anything with a market cap of over say...$40 billion. This is just based on the logic that there are limits to growth & it's much more difficult for a $40b company to become a $120b company than it is for a $4b company to become a $12b company. I also want to exclude some industry groups, certainly utilities, and probably a few more. I don't think that 'I' have the ability to pick which industries are going to be trending next so I want to include lots of different groups. I like the idea of setting correlation limits between positions in different industries (borrowing from the Turtle Rules), but I've yet to figure out how I want to go about doing this.

I'm also interesed in researching the idea of trading some of the 'emerging' market exchanges, but that's something that will probably have to wait until I have a larger capital base :x . I don't have the available data to test foreign markets, nor do I have a broker that can execute trades on these markets.

These are just my initial ideas on the subject, I'm sure that as I continue to research I'll come up with better ideas...at least I hope so ;)

Jason
choppystride
Full Member
Full Member
Posts: 22
Joined: Thu Jul 17, 2003 4:54 am

Post by choppystride »

Jason,

Unfortunately, it seems that the 20 years of history may cost a lot more than $240. The $240 rate you talked about is for the daily download service, which doesn't seem to include all the inactive stocks. I think you may need to separately purchase a data CD from CSI to get the complete history. I've sent another email to CSI. Hopefully, they'll clarify this with a reply tomorrow. I'll let you know.

Using market cap as a selection criterion is something that I've considered but I can't figure out a good way (or find a good, affordable data source) to keep track of the historical market caps of all the companies. I know that you can simply calculate it as price x shares o/s but keeping track of shares o/s seems to be a messy endeavour since it fluctuates due to splits, new issuances, buybacks, etc. On top of that, for delisted companies, I bet it's pretty hard to find much historical info on their market cap and/or shares o/s.

As for diversification thru different industries/sectors, I find that the classification of companies is too much of a gray area as opposed to commodities where gold is gold, wheat is wheat. These entities are instrinically different and have their fundamental identities. However, for stocks, the perception of what a company does often changes due to various factors. For instance, the management of an old boring utility decides to "re-invent" itself as a dot-com to captialize on the internet craze. Perhaps a software company that used to sell products gradually shifts itself into a service oriented consulting firm. I just think that if I really wanted to sort it all out by going thru all the companies' fundamental data/reports/filings, I might as well become a discretionary trader.

My interest in individual stock data is to find some ways to use them in an aggregate manner. That is, I am thinking of using baskets of stocks to compute my own breadth statistics/indicators to faciliate the trading of the equity indices. However, there's still the problem of defining the composition of the baskets. Currently, I'm leaning towards using the baskets defined by the Russell indices. Admittedly, letting someone else decide the composition is probably just an easy way out. There are probably better alternative methodologies. This is what I would like to figure out.
Jason Czech
Roundtable Fellow
Roundtable Fellow
Posts: 68
Joined: Fri May 16, 2003 8:28 am
Location: Atlanta, GA

Post by Jason Czech »

I believe the daily download service from CSI includes 10 years of historical data, and you have the option to purchase additional years of historical data at a rate of $20 per year. I was led to believe that the historical data includes inactives as well. But I think 'they' are the best source for the truth on this...so I'll wait to hear what they say to you.

If you are interested in quality historical fundamental data, I've heard good things about vectorvest, thought I haven't tried their service myself. Supposedly their data is clean, and they have fundamental data going back about 10 years. But their service is a bit more expensive, about $55 per month I think, and they charge $200 for the historical data on top of that. But it's still much less than Bloomberg would cost.

I agree with everything you say about the problems with defining market cap/industry groups ect. I guess the attitude I have is that I have to make the best of the information that is available to me, knowing that it's imperfect. I've been using the Media General Industry Groups classifications from TC2000, and I've found that some groups perform pretty well in my system, and some perform terribly. I know I may miss some good trades in goups that I exclude, but hey...that's life in an imperfect world ;)

I'm also researching the use of some 3rd party stock rating system to weed out stocks with poor fundamentals from the system. The Investors Business Daily rating system makes logical sense and I'm considering using that, but I'm unable to get their historical ratings so I'll have to save their ePaper for a year or two & evaluate the ratings later.
choppystride
Full Member
Full Member
Posts: 22
Joined: Thu Jul 17, 2003 4:54 am

Post by choppystride »

Jason,

Just a quick note to confirm that your original post regarding CSI is correct: their UA service does contain the inactive stocks. I have not tried it personally but after a series of emails, they've pretty much confirmed it. When I asked them about obtaining the last 20 years of stock data, they didn't indicate that it will be a problem. I take that as meaning that the data is available.

Now I'm wondering about the quality of their stock data. It would be great if someone who had experience with CSI can comment on them.
Jason Czech
Roundtable Fellow
Roundtable Fellow
Posts: 68
Joined: Fri May 16, 2003 8:28 am
Location: Atlanta, GA

Post by Jason Czech »

Thanks for checking with them on the historical data. It's always nice to get a confirmation from someone else.

Regarding the quality of the data, I've heard others say that CSI is much better than TC2000 or Quotes Plus. Supposedly they often go back and correct bad data when it is discovered, where many other services aren't very dilligent about this.

I've just started a 5 week trial with VectorVest & they have the most complete data I've seen so far. I'm pretty impressed with the fundamental and industry data they provide. They also seem to be dilligent about correcting bad data. Downside is that they only offer historical daily data back to 1996, and it costs about $200 for the history. The other downside is that they don't provide a mechanisim for exporting the historical data to a format that could be used by other backtesting engines.
ecritt
Roundtable Knight
Roundtable Knight
Posts: 134
Joined: Sat Aug 28, 2004 3:44 am
Location: Phoenix, AZ
Contact:

Post by ecritt »

I use CSI's Unfair Advantage and pay professional fees. The data I have goes back to 1962. CSI claims to have complete data, including delisted stocks, 99% of the time going back to day 1 of trading. From what I can see this is true.

As for correlations, I have found that they are high (>70%) and unstable during bear markets and sideways markets and medium (35% to 70%) and relatively stable during bull markets. Correlations between sectors (Nat Gas, Internet, utilities, banks, tobacco, etc.) are unstable and quite high when you need them to be low. Moral of the story: don't rely on a lack of correlation if you want to survive; at least not for equities.

I would not rely on someone else's sector or industry group classification. There are several reasons for this. 1) lack of historical data for such classifications 2) overly restrictive liquidity restrictions applied for membership to sector or industry group 3) no assurance that rules/criteria will not change or have been consistent through time 4) dependence upon some vendor that may go out of business or be acquired by a company that will not continue the project.
ecritt
Roundtable Knight
Roundtable Knight
Posts: 134
Joined: Sat Aug 28, 2004 3:44 am
Location: Phoenix, AZ
Contact:

Post by ecritt »

I will also point out that CSI's Unfair Advantage allows one to adjust stock data for dividends. If you are using a long-term approach, backtesting on data that is not adjusted for dividends will lead to unrealistic results. The greater the dividend yield the more the error.

If you are using liquidity filters, something you must do to reflect the realities of trading, it is important to dividend adjust the volume data as well. In the past CSI has NOT offered this. However, their most recent beta version does and it is working well for me.
Post Reply