Thursday, March 11th, 2010

Programming a custom Backtest Profile in R

7

Posted by milktrader at 9:22 pm
1 Star2 Stars3 Stars4 Stars5 Stars (13 votes, average: 3.69 out of 5)
Loading ... Loading ...

One of the many issues with systems trading is trying to make sense of the vast amounts of data you accumulate with the backtest of a system. Historical backtesting is the first step in testing your trading idea. If it is a trading idea that ought to work across many different markets, then you need to test it on many different markets to see how it performs. Yes, you are looking to see how …. robust (I said it) … your trading idea is in the crucible of historical data. It’s easy to get lost in the data and that’s why I’m embarking on creating a custom Backtest Profile Report, dubbed version 1.0. I’ve chosen to create this profile using the R statistical package, which is offered for free to those who elect to use it.

I’m just getting started and I can see how this code will reach thousands of line already. The good news is that once the basic logic is set up, it’s basically a trivial population of code to reach the final product. Let’s start with the beginning. This is the code I currently have saved for the beginnings of the Backtest Profile Report.

################### Call packages

require (”zoo”)
require (”xts”)

################### Read Data

BT <– read.csv (”C:/R/TESTDATA/BUMBL.WHITE.BTEST.ALL.test1.csv”, sep =”,”, header=TRUE)

At this point, the file I created and massaged a little is being read into the R software. The next command will print the first 6 records of the file, and will expose what the header looks like.

head (BT)
Buy.Sell Entry.Name Market File
1 SELL Short US US_REV.CSV
2 BUY Long CT CT_REV.CSV
3 SELL Short LX LX_REV.CSV
4 SELL Short TY TY_REV.CSV
5 BUY Long CN CN_REV.CSV
6 SELL Short ZB ZB_REV.CSV

EntryDate EntryPrice Exit.Date Exit.Year Exit.Name
3/19/1990 21.71875 3/28/1990 1990 Cover
3/19/1990 130.30000 3/29/1990 1990 Sell
3/19/1990 4327.00000 3/29/1990 1990 Cover
3/20/1990 29.60938 3/29/1990 1990 Cover
3/20/1990 78.22000 5/15/1990 1990 Sell
3/21/1990 21.03000 4/3/1990 1990 Cover

Exit.Price Trade.P.L Running.P.L
1 22.40625 -687.50 -687.50
2 129.85000 -225.00 -912.50
3 4354.00000 -270.00 -1182.50
4 30.20312 -593.75 -1776.25
5 78.79000 570.00 -1206.25
6 24.63000 -1512.00 -2718.25

Now you can view what R sees when it views the .csv file that was read-in with the read.csv method.

R looks at the dates in a specific way, so we need convert our date format to one that R likes. This is done with simple code calling the as.Date method.

############### Convert date character to R Date

ENTER <– as.Date(BT$EntryDate,”%m/%d/%Y”)
EXIT <– as.Date(BT$Exit.Date,”%m/%d/%Y”)

A quick note. To get the EntryDate column (or vector as R likes to call it), we first identify the file (or data.frame as R likes to call it) and then use the ‘$’ symbol to identify the vector.

So, BT$ExitDate is the BT file, ExitDate column. We’ll use these dates later, but best practices requires us to get it fixed early on.

Next, we’re going to clean up some of these bizarre headers by redefining them.

################## Define Variables from existing vectors

Market <– BT$Market
PnL <– BT$Trade.P.L
Year <– BT$Exit.Year

So instead of always referring to a column (or vector) with some unintuitive nomenclature, we’re just going to assign a simple name to some important ones.

Now we’re ready to start drilling down on some data we’ll use to get important data to the fore.

##################### Define Variables as new vectors

################ Format = Statistic.Market.Year

############ the PUZZLE – subset a subset
##################### PUZZLE solved with & symbol to add conditions

PnL.AN <– subset(PnL, Market==”AN”)
PnL.AN.1990 <– subset(PnL, Market==”AN” & Year==”1990″)
PnL.AN.1991 <– subset(PnL, Market==”AN” & Year==”1991″)
PnL.AN.1992 <– subset(PnL, Market==”AN” & Year==”1992″)
PnL.AN.1993 <– subset(PnL, Market==”AN” & Year==”1993″)
PnL.AN.1994 <– subset(PnL, Market==”AN” & Year==”1994″)
PnL.AN.1995 <– subset(PnL, Market==”AN” & Year==”1995″)
PnL.AN.1996 <– subset(PnL, Market==”AN” & Year==”1996″)
PnL.AN.1997 <– subset(PnL, Market==”AN” & Year==”1997″)
PnL.AN.1998 <– subset(PnL, Market==”AN” & Year==”1998″)
PnL.AN.1999 <– subset(PnL, Market==”AN” & Year==”1999″)

I’ve included the same comments I put into my code because sometimes I forget how I got to where I am, and it helps to include comments. (R ignores stuff after the # sign). I had some trouble figuring out how to subset a vector to include only values I’m interested in (such as a specific market), but got it figured out. There is more than one way to do this, but this works for now. There are also some issues with trying to subset an already subsetted object, so it’s best to use the ‘&’ symbol to specifically define a subset right from the get go.

Alright, let’s take a quick break and see what sort of object we have created. Take PnL.AN.1999 for example. It looks at the PnL vector (the original big Kahuna), takes out only those who include the AN market and the Year value of 1999. Essentially, it’s a little nugget that shows how trades fared in 1999 in the Australian Dollar.

So far so good. Now let’s take only those trades that were profitable and then we’ll take a break. Thanks for hanging in this long.

############ the following breakdown does not account for zero trades
############ the order is critical
############ create a subset of a subset and THEN take positive values
############ the other way doesn’t work for some reason

WinPnL <- PnL [PnL>0]

WinPnL.AN <- PnL.AN [PnL.AN>0]
WinPnL.AN.1990 <- PnL.AN.1990 [PnL.AN.1990>0]
WinPnL.AN.1991 <- PnL.AN.1991 [PnL.AN.1991>0]
WinPnL.AN.1992 <- PnL.AN.1992 [PnL.AN.1992>0]
WinPnL.AN.1993 <- PnL.AN.1993 [PnL.AN.1993>0]
WinPnL.AN.1994 <- PnL.AN.1994 [PnL.AN.1994>0]
WinPnL.AN.1995 <- PnL.AN.1995 [PnL.AN.1995>0]
WinPnL.AN.1996 <- PnL.AN.1996 [PnL.AN.1996>0]
WinPnL.AN.1997 <- PnL.AN.1997 [PnL.AN.1997>0]
WinPnL.AN.1998 <- PnL.AN.1998 [PnL.AN.1998>0]
WinPnL.AN.1999 <- PnL.AN.1999 [PnL.AN.1999>0]

Here I have defined another object, specifically WinPnL which will hold only those trades that showed a profit. It’s one level deeper than the PnL code just above it. It takes the PnL vector and extracts only positive values by using the WinPnL0] method. Fairly simple code. Now to test it we type out an object and R should return its value. Thusly,

WinPnL.AN.1990
[1] 2740 1500

There it works. I’ve checked it against the original .csv file and there were indeed two profitable trades in 1990 with the values listed. R lets you test stuff very quickly and efficiently by simply running the script. It’s best to write the script in an editor and run it from there instead of typing it directly into the terminal.

There is still quite a bit more statistics to code for our final product. As you can see, it’s easy to define variables and manipulate them in R, so it won’t be too hard to get to the statistics we want to see. For version 1.0, I’m focusing on overall profitability by market, overall positive expectancy by market, average winning percentage, percentage of yearly profitability by market, and percentage of yearly positive expectancy by market. I’m using 47 markets so I’ve got some copy and pasting in my future, along with some editing. Because of the potentially explosive number of lines to deal with, I’ll have to figure out a good way to do this. I’m thinking I need to learn VIM so I can do it efficiently, because I’m sure not going to move around with the stupid arrow keys, and delete and type in all those markets.

The end-game is to create intuitive histograms that give us a good feel for the system we’re testing. Don’t complain about how hard it is, etc. I’m no programmer and I’m getting it to work, so you can too. Now get coding.

Post to Twitter Post to Digg Post to Facebook

Comments

7 Responses to “Programming a custom Backtest Profile in R”
  1. admin says:

    Thanks for the great post.

    It seems that, as of late, a lot of talented people have been gravitating toward R for systems trading (e.g., Soren over at http://dopeness.org/). Interesting…

    I don’t have a ton of experience in quantitative trading, but I can’t help but wonder why people would choose R over OpenQuant, AmiBroker, etc. Is the statistical support in the former that much greater than in the latter? Or is it a matter of execution speed / control? What I mean to say is, writing in R seems a bit cumbersome relative to some of the software that’s already out there — but, as I said, I’m not familiar with R.

  2. Newt says:

    With all the different programming languages and software platforms, how can I choose which is best to start with? or which is the most universal?

    (ie. TradeStation, TradersStudio, R ect.)

    I’d like to start simple by back testing RSI entry and exits, sector rotation and such to get the hang of things

    Thanks

  3. milktrader says:

    The backtest report that generated the csv file used in the example comes from TradersStudio. So you can use different tools for different tasks without excluding yourself from a powerful software. I use TradersStudio to generate backtesting, optimization and walk-forward data, but I’d like to have a very specific view of the data and that’s why I turned to R. You can write a script and apply different files to it to get a standard Profile Report.

    Besides TradersStudio (which I know is slower than AmiBroker) and R, I’m also doing some work with Python, another open source free programming language. My interest in Python is centered around it’s neural network capabilities. You can do NNs in TradersStudio and R, but Python is the leading candidate for me at this time.

    The good thing about R and Python is that they’re both free, and they have a substantial number of packages and modules that will help you achieve whatever you’re trying to do with them. R has the nnet package for neural networks (which I haven’t explored fully) while Python has the PyBrain module (which I have toyed around with).

    To get started with backtesting, it might be best to use software-specific reporting functions that AmiBroker or TradersStudio provide as you can always grow into R and Python.

  4. Woodshedder says:

    Newt, AmiBroker, without a doubt.

    Good post Milk, as always.

    I do not have the patience for something like R, unfortunately. Maybe when life slows down a tad.

  5. milktrader says:

    Is there some backtest metric that you’d like to see that is not visually apparent from the AmiBroker backtest data? Let me know and I’ll include it in version 2.0.

    Once the basic architecture is setup correctly (and I’m working on that now), the addition of unique data views is fairly simple

  6. Hiya Milk,

    Nice to see someone else applying some statistical analysis around here.
    I would suggest that before you waste your time with artificial neural nets you take a look at Random Forests or SVMs.

    The problem with ANNs is that while they have a certain ’sexiness’, the problem is that you end up with a solution that’s a freakin’ black box.

    And I hate black boxes.

    If you’re going to move into Python you might want to look at Jython instead. The reason I say this that I’m a Rubyist for years now and the problem with the C implementation is that it doesn’t take advantage of multiple cores so you’re pushing one but the rest are idle. It becomes an issue with some of my projects immediately since I spend a fair bit of my time whipping up Genetic Algos and the like.

    jRuby solved that issue for me since the Java JVM is multi-core aware.

    Best of luck with R. It’s a brain breaker, that language.

  7. @admin,

    I wouldn’t use OpenQuant if my life depended on it.
    Who in their right mind would try to build a RealTime trading system in VB.Net or C#?
    Ugh.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!