Jump Off
For the better part of the last year, I have been writing about the usage of Information Theory as a means of quantifying equity movement. Usually it’s been in the application of spreadsheet based mechanical systems for the purpose of illustrating the efficacy of this analytical process but, also to show readers how fairly straight forward it is to apply this little known tool to something as nonsensical as Mother Market herself.
Finding myself between gigs and wanting to keep my skillset sharp, I decided to take what was out there and find a more straightforward way to use the ID3 algorithm. Additionally, I wanted to learn how to build a GUI app with my second favourite programming language, Ruby.
What follows is the presentation part of a technical demo that I gave last week of this very simple application to the president of our provincal professional association.
On with the show.
ID3A
ID3A, is a windowed app which facilitates the use of the ID3 algorithm as implemented by Sergio Fierens in his Rubygem ai4r.
To wit, it provides an interface which is more consistent with the average computer user as opposed to being an application that requires cryptic commands typed in a terminal session.
The additional benefits of this machine learning paradigm are it’s speed and the transparency of the rulesets that are developed with ID3.
The ID3 algorithm was introduced in 1975 by Australian Ross Quinlan and published in the first issue of Machine Learning and has been taught in AI courses ever since. Tried and true, it is often a “beginner’s introductory tool” to the field and subsequently passed on during studies. It is for this probable reason that the wikipedia entry shows a robust variety of implementations however, none of them are aimed at an end user who is not already extremely comfortable with computers.
The Demonstration – Batch Querying
The most recent addition to the application is the means of querying a csv file of data in a batch manner so that one is not forced to deal with a dialog for each query.
Step 1: Download the csv file output from the Dow Jones Industrial Average (^dji on Yahoo!) from October 1, 1928 to June 23, 2009
Step 2: In the columns following the “Adjusted Price” begin adding these headers: Above 5 Days Ago?, Above 10 Days Ago?, Above 20 Days Ago?, Above 10 DMA?, Above 20 DMA?, Above 50 DMA?, Above 75 DMA?, Above 100 DMA?, Higher 100 Days from today?
Step 3: Next create formulae that answer the above questions in columns H to U with each cell giving a “Y” or “N” to the column’s header question.
You will now find that your data set has been reduced slightly to a range of Feb 26, 1929 to Jan 29, 2009.
Step 4: Copy and past only the answers into a separate spreadsheet.
Step 5: Split the data into an 80/20 split which will give 16, 157 examples to train on and 4,017 examples to test against.
Step 6: Now that you have split the data, take the spreadsheet with the 80% training set and save it as “dji_train.csv”.
Step 7: Save the testing set as “dji_test_master.csv”.
Next you will need to create the sample set for the batch query, the only difference is that the last column “Higher 100 Days from Now?” column will be blank beneath the header. Save this file as “dji_test.csv”
Step 8: Download the v.02 version of ID3A from it’s location on github (http://github.com/cuervoslaugh/ID3A/tree/v.02) and unzip it on your desktop
Step 9: In the /bin folder you will find ID3A.exe and it launches with a double click
Step 10: Under the File menu, select “Load CSV” and select the “dji_train.csv” that was created in Step 6
Step 11: Under the Analyse menu, select “Generate Rules” and after about 10 seconds you will find that a ruleset has been created from the 16,157 examples (speed will vary with CPU – on my duo core, it’s about 10 seconds)
Step 12: Under File, select “Save Rules” and give the ruleset a name
Step 13: Close ID3A and restart it
Step 14: Under Analyse menu, select “Batch Query” and it will ask you to pick first which ruleset you want to use (select the one you saved in step 12) and the batch samples to load (select “dji_test.csv”)
Step 15: Open the report that was generated in the /reports folder and copy the last column into the last column of the dji_test_master.csv file.
The Results
At this point, you will have noticed that 90% of your time has been spent building the spreadsheets to feed into ID3A, the application itself is able to chew through an immense amount of data fairly quickly. (While I have not pushed it to it’s limits, I was able to generate rulesets for 10,000 examples of 1,000 columns in under 30 seconds and process a similar batch query in the same amout of time.)
System Results
As you may have surmised from above, the system is a very slow motion 100 day trading system that buys a share of the Dow Jones and holds for 100 days.
For the period from Feb 23, 1993 to March 3, 2009 the system gave these results:
- Number of trades: 40
- Average Trade: $135.43
- Average Win: $563.24
- Average Loss: $1,115.56
- Win Percent: 70%
- Expectancy $59.60
- Annualised Returns: $158.94
For those that care, the Two-tailed P Score is 0.0114 which is considered “statistically significant”.
Closing comments:
ID3A is an open source piece of software. I could not in good faith, create an application based on open source software and attempt to monetise the actual application itself. As for any consulting services relating to developing and using machine learning techniques – I’m open to discussions. My contact information is on github.
I’d like to thank the beta-testers who took the time to thrash it around and see if it broke. The fast and furious emails in the hours after it went in closed beta release helped keep last weekend interesting.
Special thanks go out to DPeezy who figured out how to get it to run on a Mac and to Woodsheddar who kept a steady stream of feedback and to Jeremy who didn’t use the internet laser beam on my FEMA trailer.
I’m about to pack up our belongings and relocate the fam back to Toronto this summer so I don’t know how often I’ll be posting in the near future but, I’d like to thank The Fly for his series on how he dealt with his first brokerage job and his ultimate win in the year that followed.
Theme Song




(6 votes, average: 3.67 out of 5)

Cuervos, Nice work 5 Stars. Good luck with the move and finding a new gig.
Cuervo, man, this step by step is just what I need. I’m sorry I’m fairly retarded when it comes to this type of software. Anyway, this will enable to play around with ID3A a lot more.
Thanks, sincerely for sharing this with all of us. I know it has been a journey for you, and I hope that it will lead to more opportunity in the future.
Best wishes for your move. Please stay in touch.
cuervo, have you ever played around with http://www.rmetrics.org/ ?
(I’m noticing an ad running in the bottom panel of ibc right now for RLink which also uses R. so obviously even google notices that you should try R )
@artha – why should I do that?
Max Dama’s work with R is second to none MaxDama.com so I’m not going to tread on his work.
Right now, aside from getting ready to pack, I’m porting ID3A to Scheme with a focus on PocketScheme for all the Windows Mobile OS users out there.
Testing to see if my gravatar shows…
Best of luck Cuervos, it’s an awesome thing to contribute your hard work and ability for an open source goal, very cool………..!!
@cuervoslaugh I don’t know much about rmetrics, I was actually asking your opinion on it.
MaxDama is very interesting, stuffing my brain now.
I am a good programmer, but I’m only now reaching the point where I want to investigate serious system trading.