Put That Hypothesis to the Test

NRG Top8 Competitor Dominic Vitello covers a unique method of informing your plays and answering your biggest questions about your deck.

Answering your Deck’s Questions

Is turn 1 Ragavan, Nimble Pilferer better than Dragon's Rage Channeler for my chance to win? How many cyclers should I keep in my opening hand when playing Living End? How key is it to make the third land drop on schedule in Rakdos Scam? Can anyone ever even attempt to answer these questions? In this article, I’m going to introduce you to a powerful tool that you can use to ask questions and analyze the decisions in deck building that you’ve always wondered about for your builds. While the examples from here on will pertain to the modern format in specific, this methodology can be applied to decks in any format from Standard to Pioneer and even Commander. 

Making Associations

A deeply-seated part of being human is the desire to find associations. If a fruit is colorful, does that mean it’s safe to eat? If I drive a red car, am I more likely to be pulled over? Does taking an antibiotic before surgery decrease the chance of getting an infection after surgery? Most generally: does X affect Y? Each of these questions drives at a central point. Are these two things, whatever they might be, associated? Good question.

This type of thinking doesn’t just apply to being a hunter-gatherer, buying a car, or being in healthcare, respectively. You can apply this paradigm to almost anything, and today we’re applying it to Magic. Our next question is how. To find out, we’re going to look at the simplest example of an association. Just like in our earlier questions, we’ll be looking at two “things.” We’ll call each of these “things” a “variable.” In this article, our variables are all going to be binary, they can only either be “yes” or “no.” This is key, there is no wishy-washy here. Many decks will have various things that they want to find out about themselves, either to guide you toward being a better pilot of that deck or to reasonably prove beyond a gut-feeling that the changes and iterations that you’d like to make to a list could be considered correct. Many times, we can boil these sorts of questions down into yes / no questions. From this, we can make, what they call in the business, a “contingency table.” Let’s look at what the contingency table looks like so far. We’ll revisit it throughout the article and see how it evolves.

You Must Ask the Right Question

Now that we have this table, we have to come up with a question. I play a lot of Yawgmoth. Let’s use an example of a question I have asked in the past. My question is: does playing and resolving a mana dork on turn one affect my percent chance to win the game? Kind of a simple question, when you think about it. Next, we have to figure out how that applies to our contingency table. Well, we have to break our question down into two parts. Each of these parts is going to be a variable. So, lets say “Variable 1” will determine whether or not we landed the turn-one dork. “Variable 2” is going to be if we got the W. Lets look at our contingency table now.

This is starting to look like something we can put some numbers into, isn’t it? Well, spoiler alert, we certainly can. I’m going to pump the brakes here very briefly, because I’m sure by now I’m starting to generate some skepticism. “Is the turn-one dork really that integral to winning? I mean come on; how could you possibly relate those two things? Magic is just too complicated. The game is too variable and there’s too many other factors in a game for that to have any solid statistical meaning!” Hold onto those thoughts, as we’ll come back and address them. 

There are a near limitless number of questions that can be asked of any given deck. “Does my deck mulligan well?” could be parsed into “did I win the game?” and “did I mulligan to five or less?”. “Should I counter the control deck’s planeswalker before turn four?”, “Is my one-of a worthwhile inclusion in my deck”, and “Should I spend a card interacting with my opponent’s plays before turn three?” are all reasonable questions that could be asked. This is a system for finding the answers to smaller questions that could lead to big and impactful changes in play when the answers are all assembled correctly and coherently. 

Doing the Legwork

Next is the hard part, but also the fun part. Play Magic, and a lot of it. How much? It depends. The exact answer is based on something called, no meme, a “power calculation”. Performing a power calculation is outside the scope of this article. In the biomedical world, it is proper and often necessary form to do a power calculation prior to testing a hypothesis. For our purposes, I’m going to ask that you believe me when I say, it isn’t imperative for what we’re up to. I did do a power calculation for this little experiment, however, and found that I would have to play about 250 games of magic to get to the bottom of the question. For each one of those games, I put the results into our contingency table. Below, I have tallied up everything from 241 games with my Golgari Yawgmoth deck. (the power calculation was pretty close).

Basically, a power calculation tells you how much data you need to reach a reasonable conclusion from a specific set of data. If we ran these calculations with a mere 20 games, we’d certainly have data but it would be by no means reliable. The power calculation will tell us how much data to collect before we can get to the analysis, and this will change depending on the number of variables we have. A power calculation tool is available [here] for anyone interested in trying their own, but for questions with two variables on two outcomes, we’ll assume that 250 is the correct place to be. 

It might seem intuitive to some, but let’s quickly run through what this info is telling us. Specifically, what was the outcome of the games where we did versus did not have the turn-one dork hit the board? This is just some quick maths. I played 104+60=164 games where we had the turn-one dork (the first row added up). I played 38+39=77 games where I did not have the turn-one dork (second row added up).

My win rate with the turn -onedork is row one column one divided by the sum of row one. So that is 104/164*100%=63.41%. When I didn’t have the dork, the analogous calculation is 38/77*100%=49.35%. This is our first glimpse into the result of our experiment. It seems like, on average, my win rate with a turn-one dork is about 63%. On the other hand, I only win about 49% of the time when I do not have a turn-one mana dork. Cool. So, it looks like my win rate goes up a lot when I have turn-one dork. I guess I should really favor keeping opening hands that can play the turn-one dork then, right? Well, so it seems, but how can I be so sure? 

Now it’s time to finish up the analysis. To explain this, I’m going to turn our attention back to the question I posed earlier: does this result matter? Is the result that we calculated above something we can hang our hats on? Magic is complex and games vary. Random things happen. Everything from timely top decks to flooding out and more seem to influence the game. The matchup matters, play-draw matters, how your hand lines up with your opponent’s matters, and so on.

To address these concerns with our little experiment, I’m going to invoke the infamous “RNG.” In other words, I am going to say that all these things that I have mentioned fall into the category of “randomness.” In doing this, I am asserting that, for example, over a “large” number of games, who gets the play or the draw doesn’t matter. Those occasional games where our hand “just doesn’t line up” don’t matter. You can think of these things as elements that wouldn’t matter if, hypothetically, we tested our hypothesis after playing infinite games of Magic. The problem is, we haven’t played an infinite number of games of Magic. We’ve only played 241. Is this good enough? Are 63% and 49% really all that different after all? From a statistical standpoint? Is this a true association?

Tests of Statistical Significance

To answer these last few questions, we need to know what is meant when something is “statistically significant.” Essentially, it means just what we think it means. On a deeper level though, it means testing one’s data with a statistical test of significance.

In the most general sense, this is a deeply complicated question in both science and statistics. For the sake of ease, I’m going to blast past a lot of the theory and distill it down to what you need to know. In biomedical science, when we are comparing two binary variables, we use something called Fisher’s Exact Test. This is the same test I might use if I was, say, comparing whether getting antibiotics before surgery is associated with a decreased risk of infection after surgery.

There’s nothing that says we can’t also apply Fisher’s Exact Test to our Magic questions too: is resolving a turn 1 mana dork associated with winning the game? So how do we do that? Actually, performing Fisher’s Exact Test by hand requires some tedious (slow) maths. When biomedical scientists test for statistical significance, they frequently use some kind of calculator or program that does the “hard” parts. Sometimes this requires software that needs to be purchased. Fortunately, the statistical software GraphPad offers a free version as a webapp. A link to that can be found [here].  It looks like this:

You can fill in the little headings for each row and column with whatever you want, it doesn’t affect the calculation, just helps you interpret the results. Leave the settings below as they are. Exactly what they mean is a bit beyond what we need to spend time learning about today. Let’s plug our data in and click “calculate:”

If the calculator says it, it must be true. Having the turn-one dork is associated with winning the game more often, and it is considered to be statistically significant. This means that the likelihood that this association is due purely to randomness/variance/RNG is low. Its not impossible that the association we are making is happening because of something random. But by using Fisher’s exact test, we’ve shown, through statistics, that it is unlikely that the association is random.

Correlation is Not Causation

Fisher’s Exact Test is telling us that when I play a mana dork on turn 1 with Golgari Yawgmoth, I am more likely to win than when that does not happen. There are two more elements I want to address before I call this case closed, though. The first is the difference between correlation and causation. The second is the meaning behind the small (but important) p-value.

You’ve probably heard the saying correlation is not causation. I think most people know, roughly, what that means. After reading this article, though, it’s important to know what it means in a statistical context. It means that Fisher and his test don’t tell us why the association exists. It simply tells us, the association probably exists. Hold on to the “probably” part, we’ll come back to it. But again, the test doesn’t give us any insight into the mechanics of what’s really going on. This is what is meant by correlation is not causation. In embarking upon this whole investigation, we haven’t found anything out about why the turn-one dork helps me convert for the win. All we know now is that something about it just does.

That is still useful info, of course. Because, at this point, the meat and potatoes of the analysis is complete. We know what we need to know. When I look at my opening hands, I can now use this info to help inform my decision to mulligan. The fact is, I should try to keep hands that have a turn-one mana dork in them. If I’m considering keeping a hand without turn-one dork, I need to be thinking a bit more carefully. Either I should consider taking a mulligan, or I have to be pretty convinced that my hand is still good enough to keep.

Well, ok, so if we can trust the calculator’s assessment, why does it also give us this p-value thing? And what is it? Why did I say earlier the association probably exists? There is a lot to unpack with the p-value but I do want to mention it because it is very important, both for our purposes and more broadly. The p-value is a percentage (so in our example it is 0.0491 or 4.91%) that roughly reflects the chance that the association that we’re making is, actually, a false association. This is a bit of an oversimplification, but it will work for now. As with anything, reality favors complexity and nuance. Our subject is no different. Tests of significance, at their core, are designed to calculate this p-value. When we look at the p-value we get a sense of how likely it is that we’re making a false association. In the biomedical literature, it’s generally held that the threshold for calling something statistically significant is a p-value of 0.05 or less. So what the Fisher’s exact test calculator on GraphPad is doing when it says “the association is statistically significant” it is actually looking to see if p<0.05 or not. Setting the level of statistical significance at a p-value of 0.05 is a bit arbitrary. If we wanted to be more confident that our association is true, we could say – again arbitrarily – that we want a p-value of 0.01 or less. But we would need to play a lot more games of Magic (probably thousands) to reach that level of significance. I say all this not to undermine the value or quality of the work we have done. Instead, I hope that you take away a deeper understanding of the tool you will be using. By doing so, you’ll be able to really understand the results you are getting. 

Wrap Up and Closing Thoughts

First, give yourself some props. We’ve covered a lot of ground. We learned how associations can be organized into a contingency table. How the results that are inputted into a contingency table can be analyzed using Fisher’s Exact Test and how to interpret the results. And we learned what all that means along the way. And, of course, we learned that having turn-one mana dork in Golgari Yawgmoth is associated with a higher chance of winning the game. Most importantly, we learned the reasoning behind why all of that is true. We have also learned the limitations and some of the faults of this process. Before I say so long for now, I want to address a couple of loose ends.

I do want to mention something that we glossed over a little bit. When you are asking your initial question, as we did about the turn-one dork, and while you’re collecting data, it’s important that you do so in a scientific way. You have to really put on your (mad) scientist thinking cap when you do this. For example, I didn’t mention that when I was going through and recording games, I made the decision to exclude mulligans to five. I made this decision because I felt that I wouldn’t really be using my data when mulling to five or fewer. When I go to five, I’m just looking for a coherent hand at that point. The win rate just goes down pretty significantly (in my experience) overall, and I didn’t think that I would really be addressing my initial question by including those games in my data. So I didn’t. This is just one example of how our simple, initial question can become complicated.

I also excluded games where my opponent disconnected on MTGO. My point is this: you’re in the driver’s seat when you collect your data. Collecting the data and performing the analyses are skills. The more sound the science, the more valid the result. If you ask “is keeping three or fewer lands in the opener associated with losing,” you get to decide if having two legendary lands counts. You get to decide if you still want to record the game if your opponent mulls to three. Just know that, with each of these decisions, that influences how you interpret your result at the end of the day. If you exclude every game against your worst matchup, that’s fine. But in turn, you have to remember that your result is meaningless when thinking about in-game decisions in that matchup. If you exclude a game just because “they had the nuts,” you might end up making a false association when you run Fisher’s Exact Test.

One additional point is that multiple of these tests may be run simultaneously, using the same data set (games of Magic played). As we mentioned above, a lot of things can happen in a game. This means that it’s wholly likely that many of the situations that we’re investigating will occur. For example, a couple of my questions for Yawgmoth are the turn-one Dork question that we’ve been discussing, “Does having Essence Warden matter?”, or “Did having the second Endurance matter?”. As I look to tune and refine my list with some fringe choices, I’m using this scientific approach to figure out the most reasonable conclusion to each question, and can find the answers to them objectively and passively as I play the deck. There will be plenty of games where I both don’t have the turn-one dork, I win, and Essence Warden mattered, and all of that data is usable. As you can see in my personal Yawgmoth doc below, the process is ongoing as I input data after each relevant game. Some questions are at different stages than others, but all of the data can be extrapolated either separately or as a whole to find the fair answer to any specific question I may have. 

Let’s round it out by coming back to our main finding, that the turn-one dork wins more often than any other first turn play. Based on what we’ve covered in the article, we know that there is probably an association there. This is based on things in aggregate. If I cast a dork on turn one, I’m not guaranteed the win. I’m not even guaranteed that my chance to win is 63%. I just know that when that dork lands, my chance to win just went up. And, again, that is kind of a big deal. Because I know now, when I look at my opening hand, I am incentivized to look for a turn-one dork. If my opener doesn’t have turn-one dork, again, I should think about shipping it. To this end though, there’s no statistical association police. No one is going to arrest you if you don’t universally and rigidly make decisions based on these associations. In fact, you shouldn’t do that. Just because we have found this association doesn’t mean that I haven’t kept (and won with ) turn-one Young Wolf, or turn-one Essence Warden. You’re perfectly within your right to ignore a known association and make a judgment call. But each association you know that is statistically significant enhances your ability to make the best possible play. Or maybe can even show you that a given play was better (or worse) than you might have thought. All said, I wrote this article because I love competitive Magic and knowledge is power. I hope I gave you a little more knowledge today and a tool to help you min-max every play and put your own hypotheses to the test. 

  • Dominic Vitello

    Dominic Vitello is an NRGSeries top 8 competitor and has enjoyed competitive Magic since 2011. He is also a licensed surgeon at Northwestern Memorial Hospital in Chicago Illinois. He has an interest in liver, bile duct, and pancreatic cancer. He performs biomedical research in these diseases. Dominic holds a doctor of medicine (MD) from Case Western Reserve University School of Medicine and a bachelor’s of science in aeronautical and astronautical engineering from Purdue University.

Liked it? Take a second to support PlayingMTG on Patreon!

Leave a Reply

Your email address will not be published. Required fields are marked *