Page 1 of 1

Cool Free Data Mining Software

Posted: Wed Dec 21, 2011 12:23 pm
by DPH
Here is an interesting piece of data-mining freeware from a pair of Harvard and MIT students that helps find meaningful relationships in data.

Data Mining Software

Here is an introductory video about it:

MINE Software Introduction

I used it to find a lot of intriguing relationships in a 50,000 data point spreadsheet of CTA statistics.

I would be curious to know what other kind of trading related uses others might find with it...thanks.

Posted: Wed Dec 21, 2011 1:23 pm
by drm7
Looks interesting, Dean. I may take a look at this over Christmas break.

Posted: Wed Feb 15, 2012 4:31 pm
by squaredQ

I was wondering if you would be able to share the spreadsheet you ran, as I would be interested in looking at the results (via MINE). We could compare results and save some time perhaps, as there are a few parameters to explore in the tool.

I would have PM'd you, but do not have it enabled (and have sent a request to the webmaster). Otherwise, I believe I have incoming PM privileges.

If it's proprietary .cvs data, no problem. Thank You.

Posted: Wed Feb 15, 2012 10:31 pm
by DPH
Hi SQ,

Yeah, I am not at liberty to share all that data, sorry....Lots of providers you can buy it from though.....

For CTA analysis I use BarclayHedge data, and unfortunately it's not cheap, I pay $6,000 per year for the data and their analysis tool "MAP". Sadly, that's only the beginning! I am currently using another tool from AlternativeSoft that runs between $8,000 to $20,000 per year depending on the modules you use (Great product, talk to Vic if interested). I am also trialing MatLab and am likely going to be shelling out 10-15k with them too. This is on top of various other software products public and private (XL Stat, Statistica, various neural net packages and data mining / visual analysis tools & applications).

It may sound like total overkill, and there is like 95%+ overlap on some of them, BUT each one of them have a few things, or often ONE THING, that nobody else has and it's important to me! Spending a few grand on a product that I already have 95% to 99% covered kills me, but if you really want that one thing they offer that nobody else does, its your only option, unless you want to spend significantly more in time developing it yourself! There is no "one tool" in this business as far as I can tell :(

Then there is the research time hate this one, (spending money is relatively easy)........I knew one day when my young daughter referred to me as Dean instead of Dad that it had gotten VERY out of hand , and not worth it at the level I was pursuing it................................................................... (I am happy to report that she and I are now having the time of our lives and I am "Dad" again) Any other OCD type researchers know how this stuff can consume your life if you let it!

I don't say this to try and impress (or discourage) you, I have no idea who you are.....You might be David Harding for all I know!!

These are just simply my reflections of where you will likely be headed if you really want to evaluate this stuff at the level many "Trading Blox'ers" types seem to demand. Like I say, for all I know you do all this and more already!

As far as the MINE software it was (is) neat, it has a unique spin on standard correlation analysis. However, I honestly don't and never really did use it much, I've got much more practical (useful, powerful) tools for this sort of thing. But by all means, if you want to play with their tool I don't think it will be time wasted, its (maximal information coefficient (MIC)) caused me to question and ultimately alter something significant I was doing elsewhere.

Posted: Wed Feb 15, 2012 10:58 pm
by squaredQ

Thanks so much for the detailed reply. Yes, it certainly sounds like you shell out quite a bit for various tools. I spend a lot of time developing my own, as 1) I like to look under the hood and understand what it is doing 2) As you inadvertently pointed out, one size does not fit all.

Anyway, I suppose I was interested in running some larger data sets simply to look at what MINE might produce as possibly it might narrow down the initial search space of interesting relationships. I like the idea of using the MIC measure to quantitatively look at relationships common linear methods may dismiss.

They include a few sets (DNA, baseball), but none financial and the documentation is very sparse at the moment. I've run a few noisy sets that did not do much better than linear correlation, telling me that the data needs to have very low noise to outperform standard lin. correlation.

Thanks Again,