The Unbearable Triteness of Preening: Ridill

It's hard to tell what the consensus is about how frequently Ridill and Kraken Club process x number of hits. Two Japanese sources indicate either directly or indirectly that for Ridill the proportions for one, two, and three hits per attack round are .3, .5, and .2, respectively.

For Kraken Club, however, Studio Gobli gives the distribution of swings per attack round as "5:15:25:25:15:10:3:2". This corresponds to 3.82 expected hits per attack round. Another source (I don't know how "authoritative" it is) specifies 3.2 expected swings per attack around without giving any proportions.

Out of curiosity, I'd like to know how exactly these claims are justified. Did an SE representative give out this information, so it must be true? If these claims were justified empirically, where are the data?

But why should anyone really care about the distribution of number of hits for multihit weapons? Believe it or not, some freaks have been concerned that double attack traits (from the warrior job trait, equipment, Fighter's Roll, whatever) attenuate the number of triple attacks for Ridill (and number of attacks greater than 2 for K. Club), so it may be helpful to know whether this attenuation results in worse performance of Ridill (and other multi-hit weapons) in the presence of double attack than without DA. I myself am more interested in how to analyze any data collected in support or contradiction of a belief. This is for the sake of making conclusions that are marginally better than hand-waving about "margin of error" without even quantifying it.

Collecting data for Kraken Club from English-language sources appears to be a non-starter, but some data for Ridill is easily found. The talk page for Ridill on FFXIclopedia has some good data sets for the number of x hits (x = 1, 2, 3). This is assuming that FFXI's random number generator is sufficiently random (no reason to believe otherwise).

Apparently, the purpose of this data collection was to find evidence that DA affects Ridill's output. But how would DA affect Ridill's output? There were two claims implied by the inane discussion:

(1) Double attack trait processes on all attack rounds equally. This means that single attacks are "converted" to double attacks and triple attacks are "converted" to double attacks. (DA trait "overrides" the Ridill proc.) DA trait may also process when a double attack occurs, but there is no difference in result. As a result, the proportions of single and triple attacks are reduced by the same percentage.

(If the average number of hits/round is less than 2, the net result is a slight increase in Ridill output. If exactly 2, no change regardless of DA level. If greater than 2, a slight decrease in Ridill output.)

(2) DA trait "disproportionately" reduces the number of triple attacks compared to single attacks. Ridill nerfed!

The second claim is really a poorly formed and vague hypothesis; there is no suggestion as to how to express this hypothesis in numerical terms. In contrast, the first claim at least provides some basis for statistical inference because there is a specific claim of how DA interacts with Ridill.

Supposing that the multihit distribution of Ridill as stated previously is really true (a working assumption), then we can calculate Ridill's hit distribution in the presence of warrior's double attack job trait (10% DA) under the first claim:

single: .3(1-.1) = .27
double: .5 + .1(.3 + .2) = .55
triple: .2(1-.1) = .18

The very first data set on the talk page was collected using a WAR/NIN with no other DA from equipment or other sources. The sample proportions are

single: 276/1020 = 0.2705882
double: 541/1020 = 0.5303922
triple: 203/1020 = 0.1990196

At first blush, there seems to be no need to go through the motions of performing a statistical analysis. (Never mind that I saw the data before proposing a hypothesis...) Even though the usual logic of using some statistical hypothesis test doesn't really hold (not trying to assemble evidence against a "null" hypothesis, but rather trying to find corroborating evidence to support one), I use this example to illustrate a few approaches one might use to analyze the data.

One approach is to generate simultaneous confidence intervals (with some pre-specified confidence level) for the proportions of single, double, and triple attacks.

Formally speaking, these multihit distributions can be modeled using a multinomial distribution with educated guessing about the parameters (the proportions of x-hits). Given the data above, a set of approximate simultaneous CIs, using the approach of Goodman (1965), will give a range of probable values of the true proportions of Ridill's x-hits.

If I wanted to be (at least) 95% confident that all the confidence intervals contained the true proportions, then I obtain this set of CIs for the given data:

single: (0.23864, 0.30510)
double: (0.49292, 0.56753)
triple: (0.17081, 0.23059)

I think a family of (simultaneous) CIs is more useful than a CI for an individual proportion if only to get some sense of the "big picture" and limit your attention to "plausible" sets of multiple proportions. With the right techniques, your CIs won't be much wider than the individual CIs you would calculate the usual way. The downside is that there aren't any statistical packages that have built-in options to generate simultaneous intervals.

Conclusion: The above CIs happen to cover the null parameters, so the proposed model seems like a good fit to the data, using the logic of a goodness-of-fit test ("accepting" a null hypothesis in the absence of contradictory data). ("Double attack trait processes on all attack rounds equally.")

Instead of dealing with confidence intervals for multiple proportions, you could focus your attention instead on confidence intervals for the sample mean (expected value) of the number of hits per attack round, which is a random variable just as the numbers of single/double/triple attacks are random variables (all of which depend on the sample size, hence the use of the sample mean).

Indeed, the mean number of hits per attack round is a linear function of the numbers of single/double/triple attacks, and we can use this observation to compute the variance of the sample mean, using the fact that the sum of the individual proportions must equal 1 (for any multinomial distribution).

Thus, for the "null" hypothesis we are currently considering, the expected value of the sample mean of hits/round is 1.91, and the variance of the sample mean is 0.0004332353. By the central limit theorem, the sampling distribution of the sample mean is approximately normal for sufficiently large n. We can use this fact to obtain confidence intervals for the (sample) expected value of number of hits/round.

Personally, I don't think I would bother employing this method. It might be easier to understand if only for the sake of debunking bullshit assertions that arise from point estimates of the expected value for a given sample size. (I'll point out a few of these assertions after I use this method for the previously considered data.) But you lose a sense of the "big picture" when you sacrifice detail for concision.

From the data above, it can be shown that the sample mean of hits/round is 1.928431. Since we already have an assumption about the expected value of the sample mean, we might as well use the population variance of the sample mean (0.0004332353) instead of fussing with a sample variance. (You could also argue that with a sample size of 1,020, who cares?) Then, a 95% confidence interval for the sample mean of hits/round is

1.928431 ± (1.959964)(0.02081431) or (1.888, 1.969)

Recall that the expected value of the sample mean is 1.91. There is no reason to believe that 1.928 is an "extreme" result, assuming that the true distribution of Ridill multi-hits with DA job trait is .27/.55/.18. This can be illustrated with a histogram of a simulated sampling distribution of hits/round (dotted vertical line denoting 1.928431 from the sample and red vertical lines denoting the bounds of the CI), overlaid with a graph of a normal distribution with mean 1.91 and variance 0.0004332353:

Note that the normal distribution and the simulated sampling distribution agree, as expected.

Conclusion: Using the criterion of "average swings per attack round", the proposed model seems like a good fit to the data. ("Double attack trait processes on all attack rounds equally.")

So how does this apply to the discussion of Ridill on FFXIclopedia? To reiterate, from the first data set (Ridill multihits with WAR DA trait only), the estimated sample value was 1.928. Later on, there is a data set for Ridill multihits in the presence of WAR DA trait, Brutal Earring (assumed DA 5%), Warrior's Cuisses (1%), and Fighter's Calligae (1%), for a total of 17% DA. Does DA "nerf" Ridill or not going from 10% DA to 17% DA? (Whether or not it's really 17% DA, it's higher than 10%.)

Similar to what was shown earlier, it is easy to calculate an alternative distribution under 17% DA (null being 10% DA), assuming DA affects all x-hits equally:

single: .3(1-.17) = .249
double: .5 + .17(.3 + .2) = .585
triple: .2(1-.17) = .166

The sample proportions from the data are

single: 257/1022 = 0.2514677
double: 611/1022 = 0.5978474
triple: 154/1022 = 0.1506849

The sample mean of hits/round for Ridill is 1.899 given DA 17%, which is less than 1.928 given DA %10.

I recall on BG someone drew the erroneous conclusion that additional DA (from equipment) has the effect of "nerfing" Ridill without accounting for random variability! But before evaluating this assertion, I want to finish up discussing whether the alternative hypothesis is a good fit to the data.

Is 1.899 an "extreme" result given the "alternative" hypothesis just specified? Under the alternative, the expected value of the sample mean of hits/round is 1.917, and the variance is 0.0003993258. We can then repeat the exercise of generating a graph, this time of a normal distribution with mean 1.917 and variance 0.0003993258, along with a simulated sampling distribution of the mean:

As you can see, 1.899 is not an extreme result under the above distribution. Furthermore, because the expected value of hits/round is 1.917 and the underlying (sampling) distribution is normal, if you repeat this experiment many, many times, about half of the observed hits/round must be below 1.917, and about half must be above 1.917.

But this wasn't the null distribution, or the point of the comparison. Even under the null distribution (first graph), 1.899 is not an extreme result. This shows that for sample sizes around 1,000 (1,000 is really large for any typical hypothesis testing that "really matters"), the effect of DA, if it really exists, is obscured by random error, at least under the assumptions I'm subscribing to.

If the Japanese sources are really correct, then there is no point in doing statistics. But if they are not correct, statistics probably won't help to reveal what seems to be a very slight effect from a change in DA (without using excessive sample sizes). Assuming that calculating average number of hits/round is valid, going from 10% DA to 17% DA is, in the long run, a .37% increase in hits/round.

Conclusion: using the "number of hits/round" criterion, the evidence doesn't show that a DA increase has a "statistically significant" effect, neither worse nor better. (Here, I wanted to find evidence against the null of "no change from 10% DA to ~17% DA.)

(If you used the method of obtaining simultaneous 95% confidence intervals instead, you would get (0.22043, 0.28528) for singles, (0.56068, 0.63392) for doubles, and (0.12585, 0.17942) for triples, each of which covers the parameters they correspond to for the 17% DA case. Incidentally, they don't cover the parameters under the 10% DA case. In fact, a chi-square goodness-of-fit test would "reject" at the 5% level the null hypothesis that the data are a random sample from the case where DA is 10%. Such are the perils of choosing appropriate statistics for inference.

Since the null hypothesis model is a not-so-good fit to the data, maybe you would favor the idea that DA improves the output of Ridill, however negligible.)

At this point, you might be wondering what's the point of this post then, and I'm wondering that myself, too. The point is that when taking a random sample of data, remember the "random" part. An effect that you happen to observe in a one-shot sample could easily be ascribed to sampling error, and a goal of statistical inference is to rule out random variability as a possible explanation.

Finally, would random error explain what appears to be an increase in triple attacks in the presence of DA from equipment? Sure. (I already said it's possible earlier, but here is yet another illustrative example.) Consider the following data set (source: QCDN):

War/Drg + Askar Korazin & Brutal Earring:

Triples: 18.37%
Doubles: 59.77%
Sinlges: 21.86%
Total: 430 Rounds. 845 Swings. (1.97 Swings/Round)

I would have to assume out of 430 rounds, 79 triples, 257 doubles, and 94 singles occurred. If DA procs on all hits equally (17% here), then the hypothesized proportions of single/double/triple attacks are .249/.585/.166 respectively. Note that the sample size is 430. A 95% confidence interval for the number of swings/round is

1.965116 ± (1.959964)(0.03054195) or (1.905, 2.025)

This CI happens to cover what we assume is the true expected value (1.917). 1.965 swings/round is not so "extreme" a result if our assumptions are indeed true.

The Unbearable Triteness of Preening

Thursday, October 2, 2008

Occasionally posts once