At the foundation of ScoreMetrics is the principle of back-testing.  This is a fancy way of saying you need to make sure your system performed properly over a specific historical period of time.  If you develop a system because it worked last year, but you don’t go back further to make sure that last year wasn’t an outlier and the previous years proved the system to be faulty, then you could be setting yourself up for some serious disappointment.  You will also need a tool or multiple tools for back-testing performance, and you will need data going back at least ten years.  

Why ten years? System development, if done correctly, goes beyond a fad or a trend. Anyone can identify a trend and ride it for a while, and inevitably miss the part where the trend ends or reverses and ultimately gives back some, all, or even more than they made for however long they caught the trend.  I believe ten years goes beyond a trend and discovers systems that will stand the test of time.  

The ten year mark also does something else that is very critical to this whole process, and that is it helps you to gauge downside risk, unit allocation and a stop loss point.  This is because you now have perspective over a considerable period of time, giving you a data pool that lets you see the anticipated drawdown risk of the system, and to a degree it will set expectations as to what a worse case scenario might look like.

There are some exceptions to this rule.  Let’s say, for example, that the system involves a team that was formed 4 years ago, an umpire or referee that started 6 years ago, or a coach that got his job 5 years ago, or perhaps the industry wasn’t tracking the data you are using for the system for a full ten years.  In those instances, there still might be a great system from a smaller data pool. We don’t want to avoid a great system only because we can’t have a full data history, but we need to proceed with caution because a smaller data pool means less confidence that the system will hold up over time and less data to determine a stop loss point and drawdown risk.  Typically, with systems using smaller data pools we will reduce the unit allocation in the portfolio until the system proves itself over time. This marginalizes the risk while allowing you to implement the system with caution.

When back-testing, there are some critical rules to be aware of in order to avoid systems that just ‘don’t fit quite right.’ This applies to trading in traditional investment vehicles as well as sports trading.  Think of back-testing analysis as that childhood game when you were a toddler learning shapes or a youngster doing your first jigsaw puzzle. You keep trying to make this puzzle piece fit but you have to really jam it in there to get to go where you want.  It really is the wrong puzzle piece but you keep trying to get it to wedge into your spot and force it to fit. It’s that old round peg in a square hole saying. Back-testing works much in the same way. You will come across many systems that are almost perfect but have flaws you don’t want to see and a bad trader will ignore the data to focus on a system that may have big ROI potential.  This is how traders who put the work in still lose their butts! It’s also known as gambling and, as I stated several times along the way, we don’t gamble.  

I subscribe to the perfect puzzle piece approach.  It needs to fit just right, or I move on to the next.  Sure it can be frustrating but that is that barrier of entry I was talking about before…remove the emotion of wanting so badly to win and stay non-emotional in not only your application of a system in a trading environment but also in the creation and testing of the system.  

So, what are the right elements of a system to verify in your back-testing? Here are the ScoreMetrics guidelines I discussed earlier that pertain to the back-testing analysis:

Limit the ‘trades’ within a season to between 5 and 80 trades

Have a positive annual ROI that exceeds 20%

A reasonable worst-case drawdown (in relation to the positive ROI years)

A positive return in at least 80% of the years in the back-test group

Let’s use some examples to explain the back-testing analysis you will need to do.

System A

When looking at back-testing results we put aside the system logic for a moment as we are simply trying to analyze the data itself.  System A looks like this over a 10 year back-tested performance using a single unit allocation per trade (in this analysis we assign a dollar value of $1 per unit):

Last year: +$.73, 44 trades, worst in-season drawdown of 6 trades

Year 2: +$4.61, 19 trades, worst in-season drawdown of 3 trades 

Year 3: +$8.29, 33 trades, worst in-season drawdown of 2 trades

Year 4: -$8.44, 46 trades, worst in-season drawdown of 11 trades

Year 5: +$2.22, 94 trades, worst in-season drawdown of 9 trades

Year 6: +$4.49, 40 trades, worst in-season drawdown of 4 trades

Year 7: -$7.90, 29 trades, worst in-season drawdown of 10 trades

Year 8: +$3.20, 12 trades, worst in-season drawdown of 3 trades

Year 9: +$5.01, 35 trades, worst in-season drawdown of 4 trades

Year 10: +$3.88, 42 trades, worst in-season drawdown of 4 trades

Ok deep breath.  There is a lot to analyze and dissect here.  The first thing we have to do is decide the stop loss based on the risk of the system.  This is a proprietary formula I created to determine the ‘walk away point’ of a given system.  While I can’t disclose the specifics of the formula, I can tell you that it takes into account the history of drawdowns based on quantity and variance of trades per season, worst in-season drawdown, and the overall pattern of those drawdowns in terms of gauging whether a worst drawdown was an outlier or a consistent indicator of the system’s performance variance.  In this case, the stop loss for the system would be 16 trades.   

We do this step first because once we know the max drawdown allowance, or stop loss point, we know the amount of units we have to ‘fund’ this particular system with for the season.  Once we know the funding amount we can determine the ROI of the system. Based on funding 16 units for each unit we allocate per trade we can add up all the returns for the 10 years and divide by 16 to get the total return and then divide by 10 to get the average annual return.

System A over 10 years generated $32.43 (8 winning years) – $16.34 (2 losing years) = $16.09 in net profit.  Based on needing 16 units allocated in the portfolio, this produced a total return of 101% ($16.09/16) or 10% ROI per season (101%/10).  

Let’s take a look at the rules we talked about earlier and how they stacked up against system A:

Limit the ‘trades’ within a season to between 5 and 80 trades (failed)

Have a positive annual ROI that exceeds 20% (failed)

A reasonable worst-case drawdown (in relation to the positive ROI years) (failed)

A positive return in at least 80% of the years in the back-test group (passed)

In this example, System A failed 3 out of the 4 back-tested result requirements to be considered for the portfolio.  One year had 94 trades, which should also stand out as an anomaly compared to the other 9 years in addition to exceeding the 80 trade threshold.  We like to see consistency and when there is an anomaly it adds overall risk to the system because it adds an unpredictable element to the results.

This system also doesn’t hit the ROI numbers to make it into the portfolio as we target a minimum of 20% annual ROI.  The other glaring issue is that the drawdown risk is very high. Not only are we allocating 16 units per unit traded (which is what affects the ROI numbers so dramatically) but years 4 & 7 show that the losing years offset the best winning years.  Overall risk is too high in this system for the returns given in the winning years. The MARROC (MAR Ratio on Crack) is our proprietary risk analysis formula that takes the MAR Ratio discussed earlier and applies probability analysis to it. In this case, the MARROC indicates a very poor system because of the elevated risk without the returns to justify the risk and the fact that 30% of the data period is affected by this risk (years 4, 5 & 7).  The fact that it passes the 8/10 winning years rule is positive for the system but cannot justify the failure of the other rules we apply in analyzing the historical performance.

Whew!  That was a lot of data points and rules to process.  The takeaway on all this should be that you go step-by-step and review the data non-emotionally to find a system that can fit our stringent requirements.  Let’s take a look at another example.

System B

Last year: +$5.23, 28 trades, worst in-season drawdown of 3 trades

Year 2: +$1.80, 22 trades, worst in-season drawdown of 3 trades 

Year 3: +$4.29, 30 trades, worst in-season drawdown of 2 trades

Year 4: -$1.44, 31 trades, worst in-season drawdown of 5 trades

Year 5: +$0.12, 28 trades, worst in-season drawdown of 5 trades

Year 6: +$3.20, 25 trades, worst in-season drawdown of 3 trades

Year 7: -$1.20, 35 trades, worst in-season drawdown of 4 trades

Year 8: +$5.90, 24 trades, worst in-season drawdown of 2 trades

Year 9: +$4.29, 24 trades, worst in-season drawdown of 2 trades

Year 10: +$3.12, 31 trades, worst in-season drawdown of 3 trades

After looking at System A, I am sure you can take one look at System B and realize it’s a far superior system.  Because the drawdowns are so consistently low the stop loss point for this system is 8 trades. Based on a single unit allocation per trade we have a funding requirement of 8 units.  Therefore, System B over 10 years generated $27.95 (8 winning years) – $2.64 (2 losing years) = $25.31 in net profit. Based on needing 8 units allocated in the portfolio this produced a total return of 316% ($25.31/8) or 32% ROI per season (316%/10).  

Let’s take a look at the rules we talked about earlier and how they stacked up against system A:

Limit the ‘trades’ within a season to between 5 and 80 trades (passed)

Have a positive annual ROI that exceeds 20% (passed)

A reasonable worst-case drawdown (in relation to the positive ROI years) (passed)

A positive return in at least 80% of the years in the back-test group (passed)

System B didn’t have some of the incredible high return years like System A but it has so many other things going for it.  It hit the mark with 8/10 winning years. The drawdown risk and portfolio unit allotment is relatively small given the performance of the system.  The annualized return is well above the 20% minimum requirement. The number of trades per season is relatively consistent and right in the sweet spot in terms of trade frequency and per trade risk.  Overall, this system checks all the marks and looks like a great addition to the portfolio.

Back-testing analysis requires a rule-based approach to avoid the round peg in a square hole mistake most traders make.  ScoreMetrics applies stringent rules, setting a really high bar for a system to meet in order to qualify. The reality is that by doing this we make it a lot harder to find systems that fit the portfolio, which means a lot more research time and hours of discouraging deep data dives that result in having to start all over again with another system concept.  The result, however, is a portfolio of systems that all offer phenomenal back-tested returns with relative risk/reward scenarios that greatly increases the odds of success.