Page 1 of 1

Dreaming up the ultimate contract stitching software

Posted: Thu Dec 24, 2009 6:35 pm
by jklatt
I'm currently writing some custom software in Matlab to stitch together futures contracts for me.

I'm curious what some of the users on this forum WISH they could do but cannot.

Right now I've covered most of the basic things like:

1) How to stitch together the contract? Close to Close? Close to Open? etc.

2) When to stitch together the contract? OI crossover? Vol crossover? Vol/OI crossover? X day moving averages of Vol/OI crossovers?

3) Fixed date rollovers such as days prior to expiration, days prior to first notice, a specific day of the month, a specific date of the month, etc.

So lay it on me. What have you always wished you could have?

Posted: Thu Dec 24, 2009 8:22 pm
by sluggo
Lots of stock index futures expire on the Nth Xday of the month, e.g., N=3, X=Fri for the Italian stock index futures shown below. Rollover in these is therefore anchored (but prior!) to the Nth Xday, such as: 3 days before the Nth Xday, etc.

Some are even stranger. Have a look at the continuous contracts sold by pinnacle data Note the various roll days in their grey table.

Posted: Sat Dec 26, 2009 2:05 pm
by sluggo

Posted: Sun Dec 27, 2009 9:40 pm
by Roscoe
I would absolutely LOVE to have software that does that!!

These would be some options that I would want:
  1. Standalone stitching application (maybe in C++?)
  2. Specify input files location and naming convention
  3. Specify output files location and naming with an option to change the output filename (this is for uniform naming)
  4. Roll on specified days/dates as per Pinnacle Data example (thanks Sluggo) or OI, Volume, etc.
  5. Roll type: Ratio adjust, Reverse adjust
  6. Contracts to roll from and to - may not necessarily be the next contract
  7. Decimal point adjust - this for consistency of output
  8. All user preferences stored in a text file - the set and forget approach
That is all I can think of off the top of my head. My underlying motivation has always been to process data from any source to a uniform output so that if a data vendor has issues and the data can be obtained from elsewhere then business can continue.

klatt_attack, would you make this software, or just the source code, available or would it be only for your own use? PM me if you wish.

Posted: Tue Dec 29, 2009 1:12 pm
by equa
my favorite would be software that could read intraday data and from that would be able to build:

1) continuous backadjusted daily bars using user defined trading hours.
2) continuous backadjusted intraday data using user defined trading hours.

The Tickdata engine does this, unfortunately they only provide monthly updates for their futures data.................sniff. Also their custom compression methodology means you cannot feed the software with your own data.

Posted: Tue Jan 12, 2010 5:59 pm
by jklatt
Thanks for the comments all.

I've implemented some of the suggestions and I'm working on implementing others. I've been real busy with being sick, buying a new house and renting out the old one on top of being an expected dad!

I wouldn't have any problems sharing the code when it's finished, but I don't know how user friendly it would be. The code is written in Matlab (I should probably finally move to a different programming platform one of these days but I'm just so used to it) and there really isn't a graphical user interface to change the various settings. Right now I simply edit the source code and flip a bunch of "switches" to tell the program what to do. I never really went into this with the idea of making it easy for others to use, but we'll see what happens.

If there are any other suggestions that one of you have always wanted in a futures stitching program, speak up! ;)

Posted: Wed Jan 13, 2010 6:01 am
by babelproofreader
The code is written in Matlab .... and there really isn't a graphical user interface to change the various settings. Right now I simply edit the source code and flip a bunch of "switches" to tell the program what to do

I use Octave, which is the open source equivalent of Matlab, and so the code for the two is mostly compatible. I would refer you to the following Octave manual pages ... inal-Input ... -Statement

In your code you should be able to use terminal input as the input to the switch case statement to avoid the necessity to keep editing source code. This will also have the advantage of making your code much more user friendly - simply follow the terminal prompts to choose settings.
...(I should probably finally move to a different programming platform one of these days but I'm just so used to it)...
no need - in Octave you can write additional callable functions in C++ and I assume you can do the same for Matlab. If you take this approach you'll be able to rapidly prototype ideas with Octave/Matlab scripts, which you're already familiar with, but have the speed benefits of compiled code for production/working functions.

Posted: Wed Jan 13, 2010 7:15 pm
by Roscoe
N00b question here if I may: I have no idea about Octave or MatLab but I can do things in C++ - is there any chance that the code would be relatively transportable to C++ or is that unlikely? TIA. Edit: I do use R a bit if that helps. On the surface it seems to do some of what Octave does.

Posted: Fri Jan 29, 2010 10:44 pm
by jklatt
I've made some pretty good headway in this project. I'm not sure this "software" (it's really just a script) will ever be available to the public, but I thought I'd share some of the thought process in all of this so far and maybe some more new ideas will emerge.

For a while now I've been frustrated with not making any progress in trading system design. My problem has been that I tend to bounce from topic to topic (entries, risk control, back to entries, pattern recognition, back to entries, etc.) and I never really make any real progress. So in November I decided to create a blog for myself and I tried to brainstorm every little category I could think of in trading system design. I then prioritized them (added some when I thought of them) and made a commitment to work on them in order in hopes of finally having a finished product at some point. I use the blog to brainstorm ideas by writing and to document my thought process along the way and to keep track of unfinished brainstormed ideas. Kind of a private diary if you will. So far, it has been immensely helpful. Below is a screen shot and an excerpt. As you can see I'm currently on Test Data Construction and it's the whole reason why I started this thread.

I've been thinking about this topic off and on the past few weeks without posting much. There are a lot of questions I have on this subject matter. How does the different ways of stitching the contracts together affect performance? When long a market in backwardation, is it beneficial to roll closer to expiration as prices tend to move toward the cash? If it is, does doing so increase the odds of taking delivery on physically delivered products? All of these questions are good questions and I think are ones that need to eventually be answered, but I also feel like I could get really bogged down researching these subjects.

So... instead of getting bogged down in all the tiny "what ifs", what is really important in my mind?

1) Volume -- I want to have a rolling strategy that does a good job at keeping me in the most liquid contract without rolling back and forth amongst contracts. I only want to roll one time and I can only roll to a future contract.

2) Not taking delivery -- I think I wrote about this before, but taking delivery sounds like something I do not want to experience.

As I've previously wrote, there are lots of different ways to signal a rollover:

1) Fixed date
2) Days before Expiration
3) First Notice Date
4) Days before First Notice Date
5) Volume Crossover
6) Open Interest Crossover
7) Volume and OI Crossover

The First Notice Date isn't something that is easy to figure out and isn't something that is published in CSI's instrument database (from what I can see). So instead of going through all of the 300+ instruments I've downloaded data for and attempting to figure out the First Notice Dates for each one, I think it'd be more efficient to, for the time being, ignore FIrst Notice Dates and simply focus on a rolling methodology that provides me with the most liquidity on any given day. In my mind, liquidity is a function of front month contract volume and the spread between the bid and the ask expressed in minimum ticks and since I don't have any intraday data to analyze the bid and the ask spreads, I'm going to focus on front month contract volume.

To complicate matters even further, I need to come up with a way to define the rollover dates. From my limited experience, a lot of times contracts expire on the Nth Weekday of the expiration month (think 3rd Friday of the contract expiration month). I assume this is done to help avoid expiration landing on a weekend. So... if expiration is typically expressed like that, are typical rollover dates expressed in a similar fashion (the first Thursday before expiration for example)? I've also heard that bonds tend to rollover on the last couple days of the month prior to the expiration month. Geez!

So to get things up and running, lets try to keep things as simple as possible and maybe add as I go.

1) Lets write a program to figure out what the expiration is for each instrument in my database. This program will classify the date of expiration as the Nth Weekday of the expiration month and will look to see if there is a common expiration date throughout or if there isn't. If there isn't, I'll need to look at defining the expiration date a different way (maybe week/trading days into the expiration month?).

2) Next lets calculate the very best roll date (with respect to volume) between expiration dates of successive contracts going back a maximum of 45 calendar days. This will act as a baseline "best" rolldate upon which all of the other roll dates for that timeframe will be judged against.

3) Then the program will go through each of the various rolling triggers (Fixed date, days before expiration, V, OI, V+OI -- with the dates being defined by as the Nth Weekday with a contract month offset) and look to see if there is a specific rolling method that tends to get the best grade at rollover.

This shouldn't be too difficult to do. #3 seems like the most difficult part with the awkward date definitions and pattern recognition, but once I can bolt down a good rollover method for each contract (again, with respect to volume), I can then move on to the next subject (Liquidity) and attempt to create a liquidity filter to weed out some of the lesser liquid instruments in my database in hopes of eventually coming back to figuring out which contracts have physical delivery and when their First Notice Dates are.

EDIT - replaced incredibly wide image by narrower version, for the benefit of readers without ultra high resolution screens --- Moderator

Posted: Fri Jan 29, 2010 10:45 pm
by jklatt
Another excerpt from tonight.

I finally got most of the program up and running. Right now, I'm able to do the following:

1) Read through every file in the historical data directory and organize the files into chronological order based upon symbol, contract year and contract month.

2) I look at every symbol and make sure that there is a continuous data set from a CONTRACT MONTH point of view. This query is variable and right now I have it set to 6 months. This means, that if the first contract in the symbol's data set is June 1990, the next contract available to trade needs to be no older than December 1990. I had to do this because some of the data from CSI is really really messed up. AD1 is a good example. It looks like a lot of these electronic only instruments had weird stuff going on back in the 90s and then started to really take hold and have "real" data in the mid 2000s. Below is an example of the output for this section. Each field in order is: Symbol, Historical Data Directory Start Index, Starting Contract Year, Starting Contract Month, Historical Data Directory end Index, Ending Contract Year, Ending Contract Month. So as you can see below, AD1 doesn't have a continuous data set where the contract months are spread out 6 months or less. There is missing data from the December 1999 contract all the way to the December 2003 contract. When I started programming this program, I started to get quite a few errors because the data sets weren't continuous. So, I decided to write something to first look through all of the contracts and make sure I have some continuous data sets to work on. Remember, I'm only looking at the contract years and months -- I'm not digging into the actual files to verify that the data files are indeed continuous from a daily trading day perspective.

'AD1' 3 1995 'Z' 20 1999 'Z'
'AD1' 21 2003 'Z' 45 2009 'Z'
'AD_' 52 1987 'H' 143 2009 'Z'

Once I got some hopeful continuous data sets defined, I wrote a program to calculate and record the very best roll dates from a volume perspective (the most volume) and each contract's expiration date. The program uses a variable to define how far into the future you want to look forward to roll. For example, if I'm in January 1990 Crude Oil and I use 12 months for my variable, I can look to roll into any contracts that I have data for all the way out 12 months to January 1991. From there, I allowed the program to look into the future and cherry pick what would have been the very best dates to roll. The only limitation here is that I can't roll backward. This means that if I'm in January Crude Oil and it would have been best to roll into April Crude Oil at some point in February, I wouldn't be able to roll backward into March Crude Oil at any point. It would have just been extra work to build this functionality into the program and it didn't seem like the juice was worth the squeeze and I haven't really heard of people rolling backward very often if at all. Below is what the output looks like with the output header being Contract Year, Contract Month, Historical Data Directory Start Index, Best Roll Date, Data File Roll Date Index:

1983 6 3 19830330 1
1983 7 4 19830502 23
1983 8 5 19830523 38
1983 9 6 19830701 66
1983 10 7 19830722 80
1983 12 9 19830817 98
1984 1 10 19831109 114
1984 2 11 19831121 106
1984 3 12 19831219 85
1984 4 13 19840130 97
1984 5 14 19840223 98
1984 6 15 19840320 90
1984 7 16 19840430 104
1984 8 17 19840531 115
1984 9 18 19840622 149
1984 10 19 19840801 184
1984 11 20 19840904 207
1984 12 21 19840925 222
1985 1 22 19841112 217
1985 2 23 19841203 182

So... the last problem to solve on the agenda is that the original program will only allow rolls into contracts that have data for the very first possible roll date. For example, if I roll into January 1990 Crude Oil on December 1st 1989 and the data file for February 1990 Crude Oil (and all of the further out months within my looking window of 12 months) does not have data for December 1st 1989, then the program stops and says there was an error. It's disappointing that this seems to happen to quite a few of the different contracts that I've pulled data for. Most of the errors are occuring early in the simulation (3-5 rolls into things) so I'm hoping that this problem is occuring because the data set is fairly new and once I get deeper into the data set, there will be a lot more historical data for further out contracts (for example having data on January 3rd 1990 for February 1991 Crude Oil).

So I think I'm close to being finished with finding the best roll dates for each contract and recording the dates and the corresponding expiration dates. Once I get all of this data, I can then write a program to try to classify the best roll dates and expiration dates. This doesn't look like it's going to be easy as every contract seems to expire in a different fashion and a lot of the roll dates seem to be tied to expiration one way or another.

So far I think this exercise has been pretty enlightening. It's giving me a good idea as to some of the limitations in the data provided by CSI and it has opened my eyes to just how much flexibility there really is when it comes to rolling your own contracts. I'm excited to get this done and once it's done I'll have pretty much the highest volume continuous contract (with a few limitations) available to grade various rolling methods against.

So hopefully, witha little more effort, I'll be able to roll contracts with the type of flexibility described above and also be able to define a benchmark against which I can grade the rolling method. Right now my only benchmark is highest volume, but down the road I could probably build different benchmarks such as spreads between cash and contracts (or whatever) and then test what kind of rolling method performs best. I haven't read anywhere about giving your rolling method a "grade", so I thought I'd share this with everyone tonight.

Posted: Sat Jan 30, 2010 8:53 am
by sluggo
You could consider having a file of "manual overrides" for each instrument.

For each rollover trade (GC_93G to GC_93J for example), first check to see if there is a manual override. If there is, then roll over exactly as the manual override dictates .... the human trader (who specified the override) is always right. On the other hand, if this rollover trade does not have a manual override, then perform the rollover according to the chosen software algorithm for that instrument: Roll on Open Interest Crossover, Roll on Date, etc.

Overrides let you handle anomalous & weird situations that spring up from time to time. Such as the cut-over from old-Harbor-Unleaded-Gasoline futures contracts to new-Reformulated(RBOB)-Gasoline futures contracts. Volume and open interest went wonky during that weird period, so human traders stepped in and manually selected their own rollovers. Why not have your software do the same thing? A more recent example is the transition from 5 tonne Robusta Coffee futures "LKD", to 10 tonne futures "LRC", which happened just 13 months ago. (LINK). Human traders might not have immediately jumped in to the very first contract of the new product ("Nobody wants to be the first, but Everybody wants to be the first to be second."). So a manual override was needed.

A simple implementation is to have your software maintain a list of rollovers for each instrument, perhaps in .CSV format. It might be something as simple as four fields or columns: A=date; B=Contract_rolling_out_of; C=Contract_rolling_in_to; D=Override_Flag

When the software runs, it (re)builds this list except for the entries marked as Overrides, which are slavishly obeyed no matter what.

To perform an override, the human trader merely edits this file and puts in the manually-chosen rollover data. Easy!

Posted: Sat Jan 30, 2010 11:45 am
by Jez Liberty
Hey klatt_attack, it looks like you are making great progress on this.
Good idea on the blog too - I find that mine helps me to keep on track too - is yours public or private?

Also I was wondering what is your goal in building this compared to using CSI roll algorithms which have a lot of options already?

Posted: Sun Jan 31, 2010 12:51 pm
by jklatt
The blog is pretty much private. There's not a whole lot of interesting stuff in there right now anyway.

The goal of all of this is to come up with a way to roll contracts for my eventual trading system and to familiarize myself with CSI and its data. If you really dig deep into the individual contract data, you'll see a lot of weird things going on (especially the further back in time you go).

For example, for STW (not sure the CSI number, I'm on the Mac right now), my "best roll" calculator was humming along just fine saying that it was best to roll right around the same time every month into the very next contract month (Jan to Feb to March to April etc.). Then... rolling from the December 01 contract into another contract, it picked the March 02 contract. That was a bit out of the norm from what had been going on so I dug deeper to find out that the December 01 contract lists data on Christmas (December 25th). The January 02 and February 02 contract does not list data for that date but the March 02 contract DOES. So the program favored the data set that matched up with the current December 01 contract best. So which one is correct? Do they celebrate Christmas in Taiwan? Was the market closed on that date? I have no idea and it's something I need to investigate.

So far, I haven't really had an opportunity to do a whole lot of analysis and to draw any conclusions. Up until now I've just been getting hung up on data inconsistencies. Hopefully sometime this week I'll be able to get past most of the inconsistencies and find "the best" continuous rollover contract with respect to volume and be able to stitch together various rolling methods (volume crossover, OI crossover, fixed date) and compare them to ideal and then select a method for me and move onto the next category. I think I'm close, but I've felt close before and haven't been. ;)

Posted: Tue Feb 02, 2010 8:20 am
by sluggo
A few days ago, I wrote:... the transition from 5 tonne Robusta Coffee futures "LKD", to 10 tonne futures "LRC", which happened just 13 months ago. (LINK). Human traders might not have immediately jumped in to the very first contract of the new product ("Nobody wants to be the first, but Everybody wants to be the first to be second."). So a manual override was needed.
Another example which vexed Roundtable Forum members in 2008 (link#1) and again today (link#2 -- in the Blox Customers area), is the Russell-2000 futures. Human traders decided to stick with the old contract at the old exchange, right up to the very end of its life. This meant that volume (plotted in blue) and open interest (plotted in red) were severely depressed in the new contract at the new exchange. Continuous contracts meant for system historical testing, would have benefited from a manual override. An override could avoid feeding misleading sewage like the chart below, into your backtests.