Page 1 of 1

script to format correlations data from CSI Correlation Lab

Posted: Fri Sep 02, 2011 12:58 pm
by leslie
Hi All,

Suppose you have a big futures and commodities portfolio and wish to update correlation data in the FuturesInfo.txt file. For the Euro Currency, the strong correlated symbol list looks something like this
AD
..
BAX
BO
CD
...
SF
...

How would you generate such a list with minimum effort?
My solution is to use CSI on-line "Correlation Lab" and start off with 15 year correlation study.

The Correlation Lab generates a lot of data so it needs to be stripped and formatted for FuturesInfo.txt consumption. Doing this in Microsoft Excel is a very time consuming task. Better solution is to automate the task with single line shell script. Here is how.

$ cut -d "," -f 5 cu_80.txt | sed s/[0-9]// | sort | uniq | tr -d \"

OK, this is the whole automation.
This will run in any shell. (While Microsoft includes Posix compliance for government customers--mandated--retail version of Windows lacks Unix tools.
For users with no prior experience, try

http://www.cygwin.com/

Cygwin is a free GNU distro. Cygwing allows Windows users access to tens of thousands of Unix programs.)

Back to the one line shell script: This program does is take the CSI Correlation lab listing as input. (The Euro Currency results were saved to a file cu_80.txt)

The -d defines Delimiter as , (comma) and -f selects Field 5 to cut.

At that point, we'd like to get rid of duplicate lines, but first need to remove numbers from symbol names. CSI will distinguish between SF, SF2 as electronic vs pit sessions. Pit and electronic correlation ought to be about equal, so root symbol is all we need. Therefore, next in the pipeline,

sed s/[0-9]//

The Stream Editor sed Subsitutes (s directive) any number [0-9] for nothing.
The [] square brackets allow pattern specification.
The // indicates the the new pattern is nothing since nothing was specified.

Now, we have just a list of symbol names except we may have duplicates AD, SF, etc., occurrences. To filter duplicates,

sort | uniq

sort them into alphabetic sequence.
The | pipes sort output to uniq.
uniq strip duplicates, more precisely prints unique lines only.

Final clean up:
tr -d \"

The TRanslate command -d Delete something, in this case the quote mark.
Since quotes is special character, it has to be escaped using \ backslash.

Final output looks like this:
AD
ATX
BAX
BO
CD
.......
SF
SI
W
.....

Alternate formatting in case you'd like to instrument name & correlation factor.

cut -d "," -f 5-6,8 cu_80.txt | sort

Output looks like this:

"AD","Australian Dollar-(Floor+Electronic Combined)-CME",0.907785
"AD1","Australian Dollar-GLOBEX(Electronic Only; Last In Settle Field)",0.830185
"ATX","Austrian Trade Index-OTOB",0.870716
"BAX","Canadian Bankers' Acceptance-3Mth-(24 hr)-ME",0.912522
... etc.,

The cut -f picks out fields 5, 6, 8 so you can see instrument name and correlation factor. Again, sort gives them in alpha sequence. CSI lists them from high to low correl sequence.


Here is what the input looked like for 15 year correlation study of Euro Currency.
Selection at: 80%
File format: ASCII file

csinum1,IsStock1,csinum2,IsStock2,symbol2,name2,Industry2,RankCorrelation,FinalZScore
524,0,263,0,"DX","U.S. Dollar Index-FINEX","",-0.992245,1.408867
524,0,1205,0,"DX2","U.S. Dollar Index(Floor+Electronic Combined)-FINEX","",-0.992245,1.408867
524,0,25,0,"SF","Swiss Franc-(Floor+Electronic Combined)-CME","",0.959681,-3.825230
524,0,996,0,"E2M","Euro German Bobl(Last Trade in Close Field)-EUREX","",0.959451,-2.004922
524,0,553,0,"EBM","Euro German Bobl-EUREX","",0.959445,-2.012217
524,0,185,0,"FES","Euro Swiss Franc-EURONEXT(LIFFE)","",0.946236,-3.583397
524,0,552,0,"EBL","Euro German Bund-EUREX","",0.945188,-2.390609
524,0,995,0,"E2L","Euro German Bund(Last Trade in Close Field)-EUREX","",0.945160,-2.372542
524,0,878,0,"SF1","Swiss Franc (Electronic Only)","",0.931905,-3.433959
524,0,298,0,"MFF","Spanish Govt Bond(10Yr)-MEFF","",0.930323,1.276485
524,0,225,0,"YT2","Australian Govt Bond 6%-(10Yr)(rounded #381)-SFE","",0.923142,-2.533295
524,0,231,0,"YTT","Australian Govt Bond 6%(3Yr)-SFE","",0.922097,-2.582321

... etc.,