Wednesday, October 29, 2008

Double attack and weapon skills, part 2

So many "known" things about random mechanics in FFXI seem poorly substantiated due to a lack of data, bad methodology when data are collected, and poor or non-existent analysis and interpretation after the data collection. Then again, it's not as though you really need to know, say, how many hits per attack round you can expect from a Kraken Club. Even if you have one, such considerations are beside the point.

That said, it's almost delightful to see some real data (not some useless parse), and even better when there are some easily tested hypotheses that follow from the purpose of the data collection. This thread on double attack during WS generated some interesting speculation about how many times double attack can process based on the data gathered but ventured no further, and no one really provided and tested a model of how DA interacts with weapon skills, the closest being a proposal that Penta Thrust may receive up to 3 DA "checks" per WS.

This proposal followed from data collection on TP return (a measure of number of hits in a WS) for Penta Thrust, which is summarized as follows:

10% DA rate (warrior subjob)
95% hit rate (lv 73 dragoon vs. lv 47-54 diatryma)

196 total WS

3 hits: 3 (.015)
4 hits: 42 (.214)
5 hits: 120 (.612)
6 hits: 30 (.153)
7 hits: 1 (.005)

However, I am not interested in seeing whether a "3 DA check" model is a good fit to the data since it is "known" that double attack cannot proc more than twice on a WS. (I hope this is a correct assumption. Besides, it doesn't seem likely that people who love to jack off to WS damage, and make their obnoxious asses known on popular FFXI forums, wouldn't run their mouths about a 8-hit Penta Thrust. Sometimes the persistent absence of evidence is strong evidence--NOT PROOF--of absence.) Rather, I'm looking to clarify how exactly double attack can proc twice at most based on my previous post.

As a reminder, I proposed the following models for how DA might work with WS: (1) double attack can proc twice on specific hits of the WS (thought to be the first two hits per FFXIclopedia), and (2) double attack may proc a maximum of two times on a WS (not restricted to specific hits). Is it even possible to tell the difference between these two models for Penta Thrust, given only 10% DA rate?

Fortunately, the probability distributions under each model are fairly easy to calculate (the calculations for Penta Thrust are similar to those for 3-hit WS last time) and are summarized in the following graph:



The difference between the two is fairly stark, so it wouldn't take that much data to support one over the other, assuming either one is true. In particular, the difference between the two models is most pronounced for the 6-hit and 7-hit cases. A sample proportion of .153 for 6-hits is very unlikely for the "2 DA maximum" model, where the theoretical proportion is .262. The "DA 2 hits only" seems a decent fit, so run with that.

The FFXIclopedia article on DA was changed February 10 of this year to state that DA can activate on the first two hits only (instead of being able to proc twice at most and on any of the hits). Aside from the fact that one cannot distinguish between the 4 ways DA can proc consecutively on two hits in Penta Thrust (saying it procs on the first two hits is nothing more than a guess if you don't know how it's programmed), I wonder if that change was motivated by the evidence of sample data or if it was just a shot in the dark. At least I found some evidence for that.

If you are interested in playing around with the probabilities of the number of hits for your favorite multi-hit weapon skill, the following is some R code I wrote to generate them. You can change p1 (hit rate), p2 (double attack rate), and y (number of normal hits in the WS) to suit your particular situation. Some slight modification would have to be made to isolate the probability of the first hit occurring for the purposes of calculating average WS damage (where fTP isn't 1.0).


# p1 - hit rate
# p2 - double attack rate
# y - number of normal hits in the weapon skill

p1 = .95
p2 = .15
y = 2

# double attack can process on only two hits

p_2x = rep(0,(y+2))
for (i in 0:(y+2)) {
p_2x[i+1] = sum(dbinom(max(i-2,0):min(i,y),y,p1)*dbinom(i-max(i-2,0):min(i,y),2,p1*p2))
}

# double attack may process a maximum of two times

p_max = rep(0,(y+2))
for (i in 0:(y+2)) {
if (i < 2) {
p_max[i+1] = sum(dbinom(max(i-2,0):min(i,y),y,p1)*dbinom(i-max(i-2,0):min(i,y),y,p1*p2))
next
}

p_max[i+1] = dbinom(i-2,y,p1)*sum(dnbinom(0:(y-2),2,p1*p2))

if (i != (y+2)) {
p_max[i+1] = p_max[i+1] + sum(dbinom((i-1):i,y,p1)*dbinom(i-(i-1):i,y,p1*p2))
}
}

# probability mass functions

round(p_2x,10)
round(p_max,10)

# expected number of hits

hit = seq(0,(y+2))
exp_hit_2x = sum(hit*p_2x)
exp_hit_max = sum(hit*p_max)

exp_hit_2x
exp_hit_max


Some checks: for a 2-hit WS, the two models are indistinguishable. As DA tends to 0%, the two models are indistinguishable in the limit. (The negative binomial distribution is degenerate when p2 = 0.) When hit rate is 100%, there are no number of hits less than the number of normal hits.

Friday, October 24, 2008

Double attack and weapon skills

Previously, I estimated the average damage of both Raging Rush and King's Justice for my character on lv 82 greater colibri (link), but there was one major unmentioned assumption I made concerning how the double attack trait processes on weapon skills.

Suppose that "conventional wisdom" assumes that double attack can proc twice, at most, on a WS (I haven't seen any evidence to prove that DA can proc more than twice), but under this assumption there are two possibilities: (1) double attack must proc on only two hits of the WS (2 or more normal hits in the WS; this is usually thought of as occurring on the first two hits of the WS), and; (2) double attack may proc a maximum of two times on a WS. Which one is it?

There is a subtle difference between the two "hypotheses." If DA can proc on any hit in a multi-hit weapon skill, there are more opportunities for DA to proc twice (when the number of normal hits in the WS is greater than 2) than there would be if DA is limited to proc on specific hits in the WS. Intuitively, if the number of normal hits in the WS is greater than 2, there will be, on average, more WS hits under the second hypothesis even in the presence of a cap to exclude 3+ DA procs.

If you aren't convinced, the following probability exercise will help. Suppose I'm looking at a 3-hit WS (examples: Raging Rush, King's Justice, Blade: Jin, Tachi: Rana) and I want to know the probability of seeing n hits (n = 1, 2, ..., 5) in one WS, given my DA level. Assume 95% hit rate.

Since DA procs are independent of normal hits (in the sense that normal hits must occur in a WS even if they miss), it's simple to calculate these probabilities when DA must proc on only two hits in the WS. Here, the second DA proc is assumed to be independent of the first DA proc, and vice versa. For the other case, the DA procs are dependent, so the calculations are less simple, but they can be done.

When the DA rate is 10%, the probability distributions for both cases are illustrated as follows:



People are more likely to notice 5-hit results than other results, but in either case the probability of observing a 5-hit is pretty low. However, under "2 DA maximum" there are more opportunities for DA to proc (even if there is a 2-DA cap). The expected number of hits is 3.04 for "DA two hits only," and 3.13 for "2 DA maximum."

If you increase your DA rate, the expected number of hits for a WS should always increase (you will see relatively more 4- and 5-hit WSes), and this is the case going from 10% DA to 19% DA:



The expected number of hits is 3.211 for "DA two hits only," and 3.39 for "2 DA maximum." Given 19% DA, it is now fairly easy to distinguish between the two hypotheses, and collecting enough sample data on n-hits of a 3-hit WS should provide evidence in favor of one or the other.

If you can manage to push your DA rate even higher (through merits or elsewhere; I myself have 2 DA merits), the difference between the two hypotheses becomes more stark. Consider when DA is 22%:



The expected number of hits is 3.268 for "DA two hits only," and 3.47 for "2 DA maximum."

Which one do I believe to be the case? I don't have any stake in believing one over the other, but it was easier for me to assume that DA procs on two hits only (there are three ways this can happen for a 3-hit WS, but it doesn't matter in calculating the probabilities).

Monday, October 20, 2008

The relationship between DEX and critical hit rate

My previous post somehow got over 40 "click-throughs" on TTTO, perhaps because its authoritative title, "King's Justice versus Raging Rush," promised a decisive comparison yet its conclusions were slightly less touchy-feely than eyeballing. (I was actually looking for some feedback, but I guess it wasn't meant to be.) In that vein, I also offer this bait-and-switch regarding the relationship between DEX and critical hit rate.

I would not care about such things if not for the prospect of obtaining Byakko's Haidate one day; with its 15 DEX, surely there must be some obvious increase in critical hit rate, right?

In fact, for some reason or another 15 DEX was once "thought" always to increase critical hit rate by a paltry 1-2% despite the reality of sampling error. (I've always wondered how people arrived at such conclusions by sampling. Even if you collected data through a parse, if you had a sample of 2500 hits, the margin of error associated with your crit rate estimate would be as much as 2%.) This conventional "wisdom" was then debunked around March 2007 with a discussion of the DEX/crit relation motivated by the observation that lots of DEX sent crit rates soaring up to some maximum. Coincidentally or not, around that time there was a parallel discussion on Allakhazam about the same topic.

Sure, these people didn't bother to control for mob AGI. Now, it appears evident that your DEX relative to your target's AGI is a factor in the critical hit rate determination. But for the experiments discussed in those threads, AGI wasn't controlled. The AGI of Robber Crabs, a test subject in the Alla thread, apparently is either 39 or 42, and the AGI of Tavnazian Sheep and Miner Bees, targets in the BG thread, probably varies too. But despite the lack of control it was obvious that piling on enough DEX will increase your critical hit rate markedly at some point.

Unfortunately, this conclusion is couched in the lazy terminology of "tiers." Some examples are

(1) "Stack enough DEX to break some critical rate tier, where each point of DEX you add within that tier has a larger effect."

(2) "Any large amounts of DEX before a critical rate tier will not have a major effect on critical hit rate."

Implicit in such statements is that if you don't break a "tier," it isn't worth trying to pile on DEX. In turn, considering that "tiers" in crafting refer to discontinuous jumps in HQ rate, it isn't surprising that a "tier" in terms of crit rate is also thought of as a sudden, discontinuous jump at some critical level of DEX. But the evidence provided in the above threads doesn't really point to such a discontinuous phenomenon.

First, consider the results from BG thread. Amazingly, the point estimates were given as approximations based on sample sizes of about 300 (really, that lazy not to record the exact sample sizes?), but that isn't that big a deal. But these point estimates are themselves random variables with corresponding distributions so it is helpful to visualize confidence intervals for the true values of these crit rates for given levels of DEX, and I created a graph to help with that:



The 95% confidence intervals are represented by black bars with the point estimates centered within the CIs. I also marked what are thought to be the minimum and maximum crit rates for DEX only with gray lines, 9% minimum and 24% maximum with 4/4 critical hit rate merits (who doesn't have those?). Critical hit rate bonuses from equipment are not subject to the caps.

The data corresponding to "low" and "high" DEX on this graph conform to the minimum and maximum crit rates. (At least there is no reason to believe otherwise.) At some point, though, crit rate increases with DEX in seemingly a linear fashion, which could awkwardly be described as a "tier," I suppose. This evokes a parallel with overall hit rate versus accuracy, with a minimum of 20% and a maximum of 95% and hit rate thought to vary linearly with accuracy in between. So if crit rate does increase (linearly) within a certain range of DEX, it is worth adding DEX within this interval all other things being equal. Sure, I guess you are within a "tier" when this happens, but where's the evidence for a discontinuous jump to reach this "tier"?

Furthermore, there is hardly any evidence for the plural tiers.

I've also graphed the first set of data from Allakhazam (first post), which is similar to the BG one:



Interestingly, here the crit rate estimates increase over a 15-DEX range, even more evidence against the idea of a discontinuous jump.

Finally, in the Alla discussion data from the Robber Crabs was pooled. Pooled data generally poses statistical hazards (for one, we're assuming the exact experimental conditions for each person involved but you figure there's gotta some idiot to fuck it up or some other factor... like the fact that the AGI of Robber Crabs varies!), but let's just run with this. I created a graph of 95% CIs for the pooled data as follows:



Even in violating statistical assumptions (independence) it is obvious there is no discontinuous jump in crit rate to be seen that cannot be attributed to sampling error. And even with the fundamental shadiness of this experiment (not controlling AGI), I even had the cheerful temerity to do least-squares linear regression (which itself is inappropriate for a variety of reasons) on the data points for which over 1000 samples were collected, in the DEX region where crit rate seems to increase linearly. For me it's enough to know that there is an obvious increase in crit rate; it doesn't matter what the exact increase will be for 1 additional DEX.

Also, the region is fairly narrow (10-15 DEX) for Robber Crabs, which would explain why people observe a sudden jump when adding DEX, as there is the view that adding DEX for the purposes of increasing crit rate should be an all-or-nothing thing (never mind the reality that the tradeoffs you make to stack DEX make such an attempt impractical).

It isn't necessarily true that the results from robber crabs can be generalized to other mobs. But if this phenomenon is real and can be generalized, then you may not have to go for an all-or-nothing attempt to increase crit rates with DEX, either in an auto-attack or WS phase, as long as your DEX is within the region where DEX is considered helpful.

For robber crabs, this region appears to be between 77 and 92 DEX. The higher level robber crabs in Kuftal Tunnel have 42 AGI, which jibes with the idea that your crit rate is capped when your DEX is 50 higher than your target's AGI.

The "transition region" clearly doesn't start when your DEX is equal to your target's AGI, but where should it start? The statement in the previous paragraph implies that it could start at about 35 DEX above your target's AGI, but this is a troublesome statement to make given that the crit rates consistently appear to be above 9% (the minimum) before 77 DEX. One possible explanation is that crit rate could be a minimum when (DEX - AGI) is less than or equal to 0, and rises very slowly from 0 to around 35. This could be why it's difficult to see any improvement in crit rates from adding DEX on your usual merit mobs, which all have AGI above 67.

I admit I didn't break any new ground, but I thought it might be fun to show my take on this.

Thursday, October 16, 2008

King's Justice versus Raging Rush

How does King's Justice stack up to Raging Rush? I decided to waste my time providing an answer to this question by creating some frivolous graphs to compare the average WS damage of Raging Rush with that of King's Justice on everyone's favorite canonical merit party fodder, the greater colibri (lv 82).

Given that the current incarnation of the physical damage equation is still a reasonable approximation (a generous assumption), I calculated these averages based on the attributes of my character's WS setup. (And to make more approximations upon approximations, I assumed the pDIF distribution for my cRatio, 1.433, was uniform over [1, 1.719].) Interestingly, FFXIclopedia gives a fTP "bonus" of 0.5 for the first hit of Raging Rush, which contradicts other sources (Gobli among them) and seems incorrect. I used 1.0 because if it were 0.5, Raging Rush would obviously be inferior. (I suppose I should get into some merit party for the first time in months to see if my calculations are way off.)

I plotted average WS damage of R.R. and K.J. versus critical hit rate since I don't know how exactly the TP modifiers affect crit rate for Raging Rush and neither does anyone else:



Suppose that at 100 TP there is no crit rate bonus for Raging Rush. Looking around for the relationship between DEX and crit rate, I place my overall crit rate at 12% on colibri, and behold, Raging Rush and King's Justice are pretty close in average damage. If this is indeed the case in practice, I probably won't bother unlocking King's Justice just for better Mighty Strikes/300 TP weapon skills. Skillchains? No one cares.

Recall that Raging Rush's first-hit damage used to vary with TP (1.00/1.50/2.50 at 100/200/300 TP, but .35 STR modifier as it is now), so you can get a sense of the magnitude of the increase in average R.R. damage since the exalted "2-hander update" just by looking at the graph (starting at 0% critical hit rate and ending at whatever crit rate you think is associated with R.R.).

Of course, mere averages don't give any idea of the distribution of possible WS damage values. I've seen a few comments that King's Justice is more consistent than Raging Rush, and that Raging Rush yields higher "spikes." You certainly don't need to do any frivolous simulation to lend credence to this perception. I'm not even going to say the shapes of these simulated distributions of WS damage for R.R. and K.J. are even accurate (after piling on approximation after approximation, I wouldn't think so), but they do give some idea of their variance. Even though the average WS damage is close, there is slightly less variance in WS damage associated with King's Justice. (The "sample" means for both R.R. and K.J. damage were within single digits of one another.)

Sunday, October 12, 2008

La Vaule seized

It seems at the eleventh hour some of the Japanese population on Fenrir server took the initiative to gain control of La Vaule for this week. Does this mean the "Splitting Heirs" Campaign Op is available on Fenrir? Will Fenrir be swimming in Cuchulain's Mantles and Witch Sashes? Not so fast...


As you can see, San d'Oria lost control of Jugner Forest, of all areas! Way to hold it down! At the moment though Sandy is still up on the beastmen in La Vaule, but their advantage will probably be erased by the end of the day.

How's this for idiotic: some user on FFXIclopedia observed that the existence of Cuchulain's Mantles on Asura, without any nation having control of all its contiguous areas, "disproves" the idea that control of all areas is required for access to "Splitting Heirs" and the like. Have you ever heard of server transfers? I see that the concept of arbitrage is way beyond your ken.

Moving on, I've heard that Windurst has control of all its adjacent areas on Phoenix server, so its "beastman assassination" analogue to "Splitting Heirs" should be available. The Campaign op is called "Plucking Wings," and I'm sure we can find some information about it in a little bit. Check wiki.ffo.jp periodically. Phoenix server already has an auction house listing for Karasutengu Kogake (INT +3, Campaign: refresh effect) and Roundel Earring.

Edit (Monday): Cuchulain's Belt is also a potential reward.

To summarize, here are the treasure pools for "Splitting Heirs" and "Plucking Wings," and an inferred one for Bastok's "Cracking Shells":

Splitting Heirs (La Vaule):
0-1 of Cuchulain's Mantle, Orcish Gauntlets, Witch Sash
1 of Brave Grip, Wise Strap
2 of "miscellaneous" items (gems, Spectacles, Vile Elixir +1, etc)

Plucking Wings (Castle Oztroja):
0-1 of Cuchulain's Belt, Karasutengu Kogake, Roundel Earring
1 of Brave Grip, Wise Strap
2 of "miscellaneous" items (gems, etc)

Cracking Shells (Beadeaux):
0-1 of Airy Buckler, Balestarius, Crapaud Earring
1 of Brave Grip, Wise Strap
2 of "miscellaneous" items

Looking at the dwindling number of items unaccounted for from the June version update, it's a reasonable guess that Crapaud Earring and Airy Buckler come from Bastok's "Cracking Shells." Both those items don't have an "ex" flag. But both of them have limited appeal. The shield can be used only by THF/PUP/DNC. Well, the Crapaud Earring is of interest to the vast majority of black mages who don't have Novio Earring. It'll help push my Thunder IV to 1470 without food on Ebony Puddings if I maximize INT in all my slots. If I had Novio, that number would be 1512 before food. (I currently give up 8 INT out of cheapness and desire for a "maximum MP" setup for NW Apollyon.)

The fact that Castle Oztroja can be overtaken gives me hope that Beadeaux can be taken over as well even though the logistics are daunting. I just don't expect any group on Fenrir to be willing and capable of doing so.

Saturday, October 11, 2008

Number of quests

It's not uncommon when visiting FFXIclopedia to see in the "Latest Activity" box some asshole updating a personal checklist of maps or quests. (I myself maintain a list of incomplete quests, but not for public display.) FFXIclopedia does not distinguish between real quests that appear in the quest logs and those that could be considered quests but do not appear in the logs. In case you have some interest in knowing the number of quests available for each region, I've tallied the number of quests that appear in the quest log. (This excludes garbage like "beastman treasure" and chocobo riding.) Please correct me if my totals are wrong.

As of the Sept 2008 version update:
Jeuno:       119
Other: 62
Outlands: 48
Aht Urhgan: 64
Crystal War: 41
San d'Oria: 79
Bastok: 87
Windurst: 89

Total: 589

I will be updating these totals when new version updates come out. Not really useful, but if you ever feel like rebutting some player's claims about doing 500+ quests, you can see if he's (is it ever a she?) BS'ing by asking how many quests he's completed for each region. It doesn't take too long to count accurately, eight at a time, by scrolling with shift + right key.

Thursday, October 2, 2008

Occasionally posts once

It's hard to tell what the consensus is about how frequently Ridill and Kraken Club process x number of hits. Two Japanese sources indicate either directly or indirectly that for Ridill the proportions for one, two, and three hits per attack round are .3, .5, and .2, respectively.

For Kraken Club, however, Studio Gobli gives the distribution of swings per attack round as "5:15:25:25:15:10:3:2". This corresponds to 3.82 expected hits per attack round. Another source (I don't know how "authoritative" it is) specifies 3.2 expected swings per attack around without giving any proportions.

Out of curiosity, I'd like to know how exactly these claims are justified. Did an SE representative give out this information, so it must be true? If these claims were justified empirically, where are the data?

But why should anyone really care about the distribution of number of hits for multihit weapons? Believe it or not, some freaks have been concerned that double attack traits (from the warrior job trait, equipment, Fighter's Roll, whatever) attenuate the number of triple attacks for Ridill (and number of attacks greater than 2 for K. Club), so it may be helpful to know whether this attenuation results in worse performance of Ridill (and other multi-hit weapons) in the presence of double attack than without DA. I myself am more interested in how to analyze any data collected in support or contradiction of a belief. This is for the sake of making conclusions that are marginally better than hand-waving about "margin of error" without even quantifying it.

Collecting data for Kraken Club from English-language sources appears to be a non-starter, but some data for Ridill is easily found. The talk page for Ridill on FFXIclopedia has some good data sets for the number of x hits (x = 1, 2, 3). This is assuming that FFXI's random number generator is sufficiently random (no reason to believe otherwise).

Apparently, the purpose of this data collection was to find evidence that DA affects Ridill's output. But how would DA affect Ridill's output? There were two claims implied by the inane discussion:

(1) Double attack trait processes on all attack rounds equally. This means that single attacks are "converted" to double attacks and triple attacks are "converted" to double attacks. (DA trait "overrides" the Ridill proc.) DA trait may also process when a double attack occurs, but there is no difference in result. As a result, the proportions of single and triple attacks are reduced by the same percentage.

(If the average number of hits/round is less than 2, the net result is a slight increase in Ridill output. If exactly 2, no change regardless of DA level. If greater than 2, a slight decrease in Ridill output.)

(2) DA trait "disproportionately" reduces the number of triple attacks compared to single attacks. Ridill nerfed!

The second claim is really a poorly formed and vague hypothesis; there is no suggestion as to how to express this hypothesis in numerical terms. In contrast, the first claim at least provides some basis for statistical inference because there is a specific claim of how DA interacts with Ridill.

Supposing that the multihit distribution of Ridill as stated previously is really true (a working assumption), then we can calculate Ridill's hit distribution in the presence of warrior's double attack job trait (10% DA) under the first claim:

single: .3(1-.1) = .27
double: .5 + .1(.3 + .2) = .55
triple: .2(1-.1) = .18

The very first data set on the talk page was collected using a WAR/NIN with no other DA from equipment or other sources. The sample proportions are

single: 276/1020 = 0.2705882
double: 541/1020 = 0.5303922
triple: 203/1020 = 0.1990196

At first blush, there seems to be no need to go through the motions of performing a statistical analysis. (Never mind that I saw the data before proposing a hypothesis...) Even though the usual logic of using some statistical hypothesis test doesn't really hold (not trying to assemble evidence against a "null" hypothesis, but rather trying to find corroborating evidence to support one), I use this example to illustrate a few approaches one might use to analyze the data.

One approach is to generate simultaneous confidence intervals (with some pre-specified confidence level) for the proportions of single, double, and triple attacks.

Formally speaking, these multihit distributions can be modeled using a multinomial distribution with educated guessing about the parameters (the proportions of x-hits). Given the data above, a set of approximate simultaneous CIs, using the approach of Goodman (1965), will give a range of probable values of the true proportions of Ridill's x-hits.

If I wanted to be (at least) 95% confident that all the confidence intervals contained the true proportions, then I obtain this set of CIs for the given data:

single: (0.23864, 0.30510)
double: (0.49292, 0.56753)
triple: (0.17081, 0.23059)


I think a family of (simultaneous) CIs is more useful than a CI for an individual proportion if only to get some sense of the "big picture" and limit your attention to "plausible" sets of multiple proportions. With the right techniques, your CIs won't be much wider than the individual CIs you would calculate the usual way. The downside is that there aren't any statistical packages that have built-in options to generate simultaneous intervals.

Conclusion: The above CIs happen to cover the null parameters, so the proposed model seems like a good fit to the data, using the logic of a goodness-of-fit test ("accepting" a null hypothesis in the absence of contradictory data). ("Double attack trait processes on all attack rounds equally.")

Instead of dealing with confidence intervals for multiple proportions, you could focus your attention instead on confidence intervals for the sample mean (expected value) of the number of hits per attack round, which is a random variable just as the numbers of single/double/triple attacks are random variables (all of which depend on the sample size, hence the use of the sample mean).

Indeed, the mean number of hits per attack round is a linear function of the numbers of single/double/triple attacks, and we can use this observation to compute the variance of the sample mean, using the fact that the sum of the individual proportions must equal 1 (for any multinomial distribution).


Thus, for the "null" hypothesis we are currently considering, the expected value of the sample mean of hits/round is 1.91, and the variance of the sample mean is 0.0004332353. By the central limit theorem, the sampling distribution of the sample mean is approximately normal for sufficiently large n. We can use this fact to obtain confidence intervals for the (sample) expected value of number of hits/round.

Personally, I don't think I would bother employing this method. It might be easier to understand if only for the sake of debunking bullshit assertions that arise from point estimates of the expected value for a given sample size. (I'll point out a few of these assertions after I use this method for the previously considered data.) But you lose a sense of the "big picture" when you sacrifice detail for concision.

From the data above, it can be shown that the sample mean of hits/round is 1.928431. Since we already have an assumption about the expected value of the sample mean, we might as well use the population variance of the sample mean (0.0004332353) instead of fussing with a sample variance. (You could also argue that with a sample size of 1,020, who cares?) Then, a 95% confidence interval for the sample mean of hits/round is

1.928431 ± (1.959964)(0.02081431) or (1.888, 1.969)

Recall that the expected value of the sample mean is 1.91. There is no reason to believe that 1.928 is an "extreme" result, assuming that the true distribution of Ridill multi-hits with DA job trait is .27/.55/.18. This can be illustrated with a histogram of a simulated sampling distribution of hits/round (dotted vertical line denoting 1.928431 from the sample and red vertical lines denoting the bounds of the CI), overlaid with a graph of a normal distribution with mean 1.91 and variance 0.0004332353:


Note that the normal distribution and the simulated sampling distribution agree, as expected.

Conclusion: Using the criterion of "average swings per attack round", the proposed model seems like a good fit to the data. ("Double attack trait processes on all attack rounds equally.")

So how does this apply to the discussion of Ridill on FFXIclopedia? To reiterate, from the first data set (Ridill multihits with WAR DA trait only), the estimated sample value was 1.928. Later on, there is a data set for Ridill multihits in the presence of WAR DA trait, Brutal Earring (assumed DA 5%), Warrior's Cuisses (1%), and Fighter's Calligae (1%), for a total of 17% DA. Does DA "nerf" Ridill or not going from 10% DA to 17% DA? (Whether or not it's really 17% DA, it's higher than 10%.)

Similar to what was shown earlier, it is easy to calculate an alternative distribution under 17% DA (null being 10% DA), assuming DA affects all x-hits equally:

single: .3(1-.17) = .249
double: .5 + .17(.3 + .2) = .585
triple: .2(1-.17) = .166

The sample proportions from the data are

single: 257/1022 = 0.2514677
double: 611/1022 = 0.5978474
triple: 154/1022 = 0.1506849

The sample mean of hits/round for Ridill is 1.899 given DA 17%, which is less than 1.928 given DA %10.

I recall on BG someone drew the erroneous conclusion that additional DA (from equipment) has the effect of "nerfing" Ridill without accounting for random variability! But before evaluating this assertion, I want to finish up discussing whether the alternative hypothesis is a good fit to the data.

Is 1.899 an "extreme" result given the "alternative" hypothesis just specified? Under the alternative, the expected value of the sample mean of hits/round is 1.917, and the variance is 0.0003993258. We can then repeat the exercise of generating a graph, this time of a normal distribution with mean 1.917 and variance 0.0003993258, along with a simulated sampling distribution of the mean:


As you can see, 1.899 is not an extreme result under the above distribution. Furthermore, because the expected value of hits/round is 1.917 and the underlying (sampling) distribution is normal, if you repeat this experiment many, many times, about half of the observed hits/round must be below 1.917, and about half must be above 1.917.

But this wasn't the null distribution, or the point of the comparison. Even under the null distribution (first graph), 1.899 is not an extreme result. This shows that for sample sizes around 1,000 (1,000 is really large for any typical hypothesis testing that "really matters"), the effect of DA, if it really exists, is obscured by random error, at least under the assumptions I'm subscribing to.

If the Japanese sources are really correct, then there is no point in doing statistics. But if they are not correct, statistics probably won't help to reveal what seems to be a very slight effect from a change in DA (without using excessive sample sizes). Assuming that calculating average number of hits/round is valid, going from 10% DA to 17% DA is, in the long run, a .37% increase in hits/round.

Conclusion: using the "number of hits/round" criterion, the evidence doesn't show that a DA increase has a "statistically significant" effect, neither worse nor better. (Here, I wanted to find evidence against the null of "no change from 10% DA to ~17% DA.)

(If you used the method of obtaining simultaneous 95% confidence intervals instead, you would get (0.22043, 0.28528) for singles, (0.56068, 0.63392) for doubles, and (0.12585, 0.17942) for triples, each of which covers the parameters they correspond to for the 17% DA case. Incidentally, they don't cover the parameters under the 10% DA case. In fact, a chi-square goodness-of-fit test would "reject" at the 5% level the null hypothesis that the data are a random sample from the case where DA is 10%. Such are the perils of choosing appropriate statistics for inference.

Since the null hypothesis model is a not-so-good fit to the data, maybe you would favor the idea that DA improves the output of Ridill, however negligible.)

At this point, you might be wondering what's the point of this post then, and I'm wondering that myself, too. The point is that when taking a random sample of data, remember the "random" part. An effect that you happen to observe in a one-shot sample could easily be ascribed to sampling error, and a goal of statistical inference is to rule out random variability as a possible explanation.

Finally, would random error explain what appears to be an increase in triple attacks in the presence of DA from equipment? Sure. (I already said it's possible earlier, but here is yet another illustrative example.) Consider the following data set (source: QCDN):

War/Drg + Askar Korazin & Brutal Earring:

Triples: 18.37%
Doubles: 59.77%
Sinlges: 21.86%
Total: 430 Rounds. 845 Swings. (1.97 Swings/Round)

I would have to assume out of 430 rounds, 79 triples, 257 doubles, and 94 singles occurred. If DA procs on all hits equally (17% here), then the hypothesized proportions of single/double/triple attacks are .249/.585/.166 respectively. Note that the sample size is 430. A 95% confidence interval for the number of swings/round is

1.965116 ± (1.959964)(0.03054195) or (1.905, 2.025)

This CI happens to cover what we assume is the true expected value (1.917). 1.965 swings/round is not so "extreme" a result if our assumptions are indeed true.