The Unbearable Triteness of Preening: weapon skills

Showing posts with label weapon skills. Show all posts

Saturday, June 19, 2010

Weapon skill critical hit rate bonus: summary of evidence

(Edit #2: added information for Backhand Blow and Blade: Jin, and another source for Rampage.)

(Edit #1: added another source for Drakesbane.)

This is an attempt to summarize any evidence following attempts to determine the critical hit rate bonus at or around 100 TP (if any) for weapon skills whose "chance of critical varies with TP."

I am not aware of any (non-anecdotal) evidence for the following weapon skills: Ascetic's Fury, Vorpal Blade, Power Slash, Sturmwind, Keen Edge, Vorpal Scythe, Vorpal Thrust, Skewer, Blade: Rin, True Strike, Hexa Strike, Sniper Shot, Heavy Shot, Dulling Arrow, and Arching Arrow (17 weapon skills). That leaves only six: Backhand Blow, Evisceration, Rampage, Raging Rush, Drakesbane, and Blade: Jin.

For now, "convenient" determination of critical hit rate is possible only for the first hit. Most of the testing done concerns the first hit, and conclusions are based on the assumption that the bonus (where it exists) is additive.

Backhand Blow (hand-to-hand, 2 hits)

Source: dex/crit relation, WS crits, WS gorgets discussion (Blue Gartr forums)

Comparing the sample proportions 22/50 (.44) at 9% baseline critical rate and 37/50 (.74) at 30% baseline (with 6% from Destroyers), it is obvious that there is some kind of innate critical rate bonus for at least the first hit of Backhand Blow.

But with Backhand Blow TP varying between 100 and 120 TP, it seems likely that the critical rate was not fixed for each sample. The consequences of this on the allocation of Type I error and coverage probability of the corresponding interval estimate are explored for Blade: Jin bonus estimation (later in the post), as data for that was obtained by the same person, but for now I will just describe briefly how to go about estimating the bonus for Backhand Blow.

Assume that the innate bonus is additive and constant (meaning it's independent of whatever the baseline critical rate is). Also assume that the critical rate bonus from Destroyers (6%) increases the critical hit rate of Backhand Blow by an additional 6% (starting from 24%).

Let X₁ be the number of critical hits observed at 9% baseline, n₁ the total number of hits observed at 9%, X₂ the number of critical hits observed at 30% baseline, and n₂ the total number of hits observed at 30%. A natural "pooled" estimator for Backhand Blow's critical hit rate bonus is

and its standard error is

The sample proportion is .395 and a corresponding 95% confidence interval for the WS bonus is (30.32%, 48.68%).

Conclusion: there is a critical hit rate bonus for Backhand Blow at 100 TP. A bonus of 40% would be consistent with the given data.

Evisceration (dagger, 5 hits)

Source: Evis crit rate testing (Allakhazam forums)

At ~100 TP and given 24% base critical hit rate, the pooled sample gives a sample proportion 248/696 = .3563. A 95% confidence interval for the critical hit rate bonus is (8.61%, 15.32%).

Conclusion: there is a critical hit rate bonus for Evisceration at 100 TP, with +10% being a possibility.

Rampage (axe, 5 hits)

Source (1): ランページとDEXの関係

There are two sets of estimates: one for DEX 68, and one for DEX 124, with Gigantobugard as the target mob in both cases. I'm not much interested in calculating base AGI and confirming that the Megalobugard's level range is 40-43, so I ignored the estimates for DEX 68. DEX 124 ensures a 24% base critical hit rate.

At 100 TP, the sample proportion of critical hits is 35/130 = .2692. A 95% confidence interval for the critical hit rate bonus is ( -4.48%, 11.40%). But suppose there actually is a 10% critical hit bonus. For a sample size of 130, the probability that the sample is sufficient to show a statistically significant bonus is about .7388 (power calculation).

At 200 TP, the sample proportion of critical hits is 68/150 = .4533. A 95% confidence interval for the critical hit rate bonus is (3.20%, 29.66%).

Source (2): dex/crit relation, WS crits, WS gorgets discussion (Blue Gartr forums)

I did say I wasn't interested in calculating a mob's AGI, but a Clipper's AGI is either 18 or 21 regardless of the levels reported on FFXIclopedia, and either AGI value doesn't affect the actual crit rate for the DEX 57 case, which is indeed 13%. (See this for details about critical hit rate as a function of your DEX - mob AGI.)

Using the same "pooled" estimator rationale I used for Backhand Blow (earlier in the post), the sample proportion for Rampage's crit bonus at 300 TP is .465 and a corresponding 95% confidence interval for the rate bonus is (31.80%, 61.20%). For the sake of completeness, estimates for the bonus at 100 TP and 200 TP are (-9.26%, 25.40%) and (3.20%, 48.80%), respectively.

Conclusion: if there is a critical hit rate bonus for Rampage at 100 TP, the known evidence is insufficient to show that, but if the bonus were 10%, for n = 130 the power to reject the null hypothesis of no bonus is fairly high (.7388). Given all the data, it is relatively unlikely that the bonus is 10%, but a smaller bonus cannot be ruled out with such small samples.

Unsurprisingly, there is a bonus at 200 TP and 300 TP.

Raging Rush (great axe, 3 hits)

Source (1): レイグラのクリティカル率について　その１

The sample proportion is 20/40 given the usual 24% base. The "control" data for base critical rate (which is a good idea to have by the way), however, gives the sample proportion 44/130 = .3384, which is somewhat unusual, but I write that off merely as that, not a sign of dubious experimental error. This data alone gives the tentative impression that there is a bonus.

Source (2): RagingRush Critical rate test (Killing Ifrit forums)

The raw data (showing damage values) are in a spreadsheet, but you don't need to download it.

At 100 TP and given 24% base critical hit rate, the proportion of critical hits is 155/373 = .4155. A 95% confidence interval for the critical hit rate bonus is (12.50%, 22.74%). This is strong evidence that the critical hit rate bonus is not 10%. Possible candidates are 15% and 20%.

More interesting to me is that the damage for 1 TP return (2o occurrences) was also noted, providing an opportunity to determine whether a critical hit rate bonus also applies to off-hand hits (despite there being no way to tell the difference between a double attack hit and a regular off-hand hit). Assuming a 24% base critical hit rate, with 9 observed critical hits out of 20, the corresponding p-value is .03614, which suggests a critical hit rate bonus.

Conclusion: there is a critical hit rate bonus for Raging Rush at 100 TP, with +15% and +20% being possible candidates. The small sample for critical hits from off-hand hits suggests a critical hit rate bonus for off-hand hits of Raging Rush as well.

Drakesbane (polearm, 4 hits)

Source (1): drakesbane native crit% (FFXIclopedia forums)

The first sample is 38/100 and the second, 24/100 (given 106 TP).

38/100 is a fairly extreme observation given 24% base critical hit rate (if there were no bonus). On the other hand, 24/100 is not that extreme an observation given a 34% rate. Since there is no good reason to think the conditions changed between the two samples, pool the data and crank out an interval estimate for the rate bonus, which is (0.66%, 13.91%).

Source (2): 雲蒸竜変の検証

There are four samples: three for 100 TP and one for 300 TP.

For 100 TP, the sample proportions are 12/49, 15/45, and 15/41 (given 24% base critical hit rate). The pooled estimate is 42/135 = .3111 and a 95% confidence interval for the bonus is (-0.57%, 15.64%). While this interval covers 0, 0 is again close to the left endpoint (in the other case the 0 being on the "right" side based on expectations).

As for 300 TP, the sample proportion is 16/30 and a 95% confidence interval for the rate bonus is (10.32%, 47.66%), which rules out 50% (tentatively).

Conclusion: there is suggestive evidence for a critical hit rate bonus at 100 TP, with +5% and +10% being possible candidates. At 300 TP, a +50% bonus appears to be an "unlikely" possibility.

Blade: Jin (katana, 3 hits)

Source: dex/crit relation, WS crits, WS gorgets discussion (Blue Gartr forums)

The sampling was done in the same fashion as for Backhand Blow, with observed critical hit proportions 3/30 at 9% baseline crit rate and 8/30 at 30% baseline (with Senjuinrikio's 6% bonus) at 100 TP. Using the same estimator that I used for Backhand Blow, the "pooled" sample proportion for Blade: Jin's critical bonus is -0.01167, and a corresponding 95% confidence interval is (-10.73%, 8.39%).

Taking the confidence interval at face value, if there is a critical bonus for Blade: Jin at 100 TP, it is unlikely that it's 10% or higher, especially considering the "sloppy" manner in which the data was likely collected (with TP not being held fixed, the critical hit rate could have varied), which further supports that contention. If the bonus were 10%, obviously, the probability that a 95% confidence interval wouldn't cover 10% at the right endpoint of the interval would be near .025 (half the Type I error). The consequences of experimental "error" are explored in a simulation study described at the end of this post.

Conclusion: if there is a critical hit rate bonus for Blade: Jin at 100 TP, it is unlikely that the bonus is as high as 10%.

Simulation study: is a 10% critical hit rate bonus that unlikely for Blade: Jin?

Consider the following simulation study based on hypotheticals: if there actually were a 10% bonus at 100 TP, with a 1% increase for every 5 TP, then with TP varying between 100 and 119 TP, the critical rate varies between 10% and 13%.

Given that "TP overflow" is inevitable with dual wield, and that extra hits occurring beyond TP were quite possible because data collection was reported to be boring, suppose that each of the critical rates between 10% and 13% (inclusive) are equally likely to be "chosen" for Blade: Jin.

The purpose of the study is to show how likely it is that the "pooled" large-sample confidence interval covers 10% given the above conditions.

A histogram of the simulated sampling distribution of the critical hit rate bonus shows that it's obviously not normal, with the mean (about 11.5%) higher than 10%, which is supposed to be the "actual" bonus at 100 TP for this simulation. (The shape of the large-sample approximation of the sampling distribution is traced with the solid curve.)

On the other hand, the margin of error for all simulated sample proportions is higher than 9.56%, the margin of error for the actual sample, about 97.7% of the time. (The mean margin of error is 11.19%.) Also, the "actual" (in the context of the simulation) Type I error is about .059, with about .040 allocated to the right tail (meaning there is a probability of .0402 that the null hypothesis of .10 is rejected because the estimate is higher than .10 based on the criterion of statistical significance) and about .019 allocated to the left tail (meaning the null is rejected with probability .019 because the observed estimate is significantly lower than .10). By comparison, the nominal left-tail error is .025.

Repeating this exercise under the condition that there is no bonus, the margin of error for all simulated sample proportions is higher than 9.56% only 58.0% of the time, and the probability that a confidence interval's right endpoint is higher than 8.39% is less than 0.1%.

If Blade: Jin's critical hit rate bonus at 100 TP were actually 10%, considering TP overflow and additional hits occurring beyond TP overflow, it would be very unlikely that a given 95% confidence interval would not cover 10%. The margin of error would also be very likely to be higher than 9.56%. Therefore, it is more plausible that its critical rate bonus is significantly less than 10%, if it even exists.

The following is some code for the simulation, but the inner loop should probably be expanded so that it finishes faster.


n = 100000
ci.lower = numeric(n)
ci.upper = numeric(n)
p.pool = numeric(n)
for (i in 1:n) {
X1 = 0
X2 = 0

for (j in 1:30) {
X1 = X1 + rbinom(1,1,sample(seq(.19,.22,by=.01),1))
X2 = X2 + rbinom(1,1,sample(seq(.40,.43,by=.01),1))
}

p.pool[i] = (X1 + X2 - .39*30)/60

ci.upper[i] = p.pool[i] + qnorm(.975)*sqrt((X1/30*(1-X1/30) + X2/30*(1-X2/30))/120)
ci.lower[i] = p.pool[i] - qnorm(.975)*sqrt((X1/30*(1-X1/30) + X2/30*(1-X2/30))/120)
}

mean(p.pool)
me = (ci.upper - ci.lower)*.5
mean(me>sqrt((3/30*(1-3/30)+8/30*(1-8/30))/120)*qnorm(.975))
mean(ci.upper<.10) mean(ci.lower>.10)
mean(ci.upper<.10) + mean(ci.lower>.10)

Friday, July 24, 2009

A comparison of 5-hit Rindomaru with Hagun

The great katana Rindomaru is one of those new "fey" weapons that can be augmented through the quest "Succor to the Sidhe," and with the possibility that Rindomaru can be augmented with a heap of Store TP, a "5-hit Rindomaru" setup could theoretically rival the boilerplate "6-hit Hagun."

The main comparison here is whether the increased WS frequency from a 5-hit Rindomaru overcomes the Hagun's TP bonus, or whether a hypothetical 25% increase in weapon skill frequency overcomes the 20% increase in weapon skill damage with the TP bonus for the Yukikaze/Gekko/Kasha triumvirate.

A crude calculation of efficency with "most things being equal" could be something like [(88*4+700)/30/(86*5+800)*37.5 - 1]*100 = 6.91%, that is, Rindomaru is more efficient with really crude simplifications. But surely we can be more sophisticated than that.

To start, here is a description of a pretty good augmented Rindomaru (15 Store TP, +4% weapon skill damage, etc.) as well as some specific, full-Usukane equipment setups for both 5-hit Rindomaru and 6-hit Hagun with minor differences, such as Sword Strap for Hagun and White Tathlum for Rindomaru. I will be basing my calculations based on these setups... and the canonical Greater Colibri.

Calculating average time to 100 TP

Weapon	Average no. of rounds	Average no. of hits	Average time (s)
Rindomaru	3.774	4.123	28.310
Hagun	4.690	5.123	34.159

Here, I am assuming 15% double attack rate and 95% hit rate. For Hagun, the reduction of delay is from 450 to 437. Under these conditions, the increase in WS frequency is about 21%.

Calculating average damage to 100 TP (including weapon skill damage)

Weapon	AA "base" damage	WS "base" damage	Average AA damage	Average WS damage	Total damage
Rindomaru	88	167	580.640	709.992	1290.632
Hagun	86	169	705.051	818.463	1523.515

The assumption of using a weapon skill immediately upon getting 100 TP is not all that realistic, but then again all comparisons like this are based on "ideality" and the excuse that this is all supposed to be the case "in the long run." Take the comparison with a brick of salt as you consider the conditions under which it's made.

As another "ideal" assumption, I suppose that pDIF is maxed out for weapon skill damage, and is 1.6 on average in the auto-attack phase. The average number of hits per weapon skill is 1.0925.

How do I account for the "+4% weapon skill damage" on that hypothetical Rindomaru? It sounds like it could be incorporated into the fTP factor. Thus, the fTP factor for the first weapon-skill hit is 1.975 for Hagun and 1.7025 for Rindomaru. For this case, Hagun's average WS damage is about 15% higher.

I also accounted for Meditate but no Overwhelm, mainly because I don't know if Overwhelm affects all hits of a weapon skill.

Damage per second

Weapon	AA proportion of total damage	DPS	Relative efficiency
Rindomaru	.450	45.588	+2.21%
Hagun	.462	44.600	---

Here, you would want to compare the theoretical AA proportion of total damage to what you actually experience.

On paper, this idealized Rindomaru is actually better than Hagun on paper. At this point, you may wonder if using a weapon skill immediately at 100 TP is realistic, so consider also the fairly extreme case of waiting an additional round past 100 TP before WSing (equivalent to starting from 0 TP).

Another comparison of damage per second by "wasting" an attack round beyond 100 TP

Weapon	AA proportion of total damage	DPS	Relative efficiency
Rindomaru	.496	41.286	---
Hagun	.495	41.631	+0.83%

You can think of the last two tables as representing the ideal lower and upper bounds of how fast you WS after attaining 100 TP... in the long run. So if you hold TP, the additional TP will "benefit" Hagun more than Rindomaru and the relative benefit of having a 5-hit setup with Rindomaru is eroded.

Finally, note that this hypothetical Rindomaru has +4% weapon skill damage, which I assumed is an fTP bonus. I consider this the key attribute that allows Rindomaru to eke out a slight edge.

Of course, as far as Greater Colibri are concerned, polearm with attack buffs would be more "fun" as far as cranking out Penta Thrusts with an average easily exceeding 1k. I did manage to find a forum thread with a parse of a 5-hit Tomoe where the AA proportion was about .38.

Sunday, July 19, 2009

Comparison of Love Halberd and Tomoe for samurai

I allocated way too much time this weekend to putzing around with spreadsheets, but let's just finish this off, shall we? Here's an example of doing a fairly simple comparison of a 7-hit Love Halberd with a 5-hit Tomoe, which is based on ideas presented in a prior comparison of weapons for warrior. In particular, I utilize the concepts of "expected number of rounds to clear 100 TP" and "expected number of hits to clear 100 TP" to make the arithmetic more tractable.

I didn't see any (good) hypothetical comparison of Tomoe 5-hit versus Love Halberd 7-hit for samurai (using Penta Thrust), so I thought I could do this really fast because I already set up the "black box" (this mess of a spreadsheet) to spit out an answer.

Calculating average time to 100 TP

Weapon	Average no. of rounds	Average no. of hits	Average time (s)
Love Halberd	3.959	6.488	26.131
Tomoe	3.774	4.123	30.197

This is the easiest step as the assumptions are reasonable if idealized, such as 95% hit rate, 15% double attack rate, and starting with some initial TP from the previous weapon skill.

Calculating average damage to 100 TP (including weapon skill damage)

Weapon	AA "base" damage	WS "base" damage	Average AA damage	Average WS damage	Total damage
Love Halberd	70	110	454.179	650.335	1104.515
Tomoe	96	136	395.890	822.614	1218.505

Again, there are more simple assumptions, like using Penta Thrust immediately after attaining 100 TP, using the same fSTR throughout, and assuming an average pDIF of 1. Using the expected values from the previous table, the average auto-attack and WS damage can be calculated.

Also, average WS damage is based on an average return of 5.035 hits.

Did I account for the effect of Meditate? Assuming Meditate recast is 150 seconds, we can assume all the TP goes to one WS and incorporate that damage into one cycle of AA and WS damage. For example, a "Meditate WS" is about 0.174 of a full WS in one cycle for Love Halberd, 0.201 for Tomoe, which makes sense as Meditate will benefit "slower-to-WS" weapons relatively more (Tomoe being slower).

Damage per second

Weapon	AA proportion of total damage	DPS	Relative efficiency
Love Halberd	.411	42.267	+4.75%
Tomoe	.324	40.351	---

Time for a reality check. Is it really possible for Tomoe auto-attack damage to account for only about 33% of total damage? I would have to see some parser output to validate these calculations. If you ignore Meditate, the proportions increase to .451 and .366. I will update this post when I can track down some parser output.

Even accounting for Meditate, Love Halberd comes out ahead on paper by almost 5%. Whether that 5% is worth expending virtue stones in a merit party is another issue altogether. You can't really argue differences in hit rate (if you want hit rate to drop below 95%) since the only real difference would be whatever is used in the ammo slot. As for attack differences, who knows how DEX +7 would compare to attack +5 and whatever's in the ammo slot.

Of course, the major issue, at least to me, is whether DA really stacks with virtue weapons. I've been assuming it does. Even if it doesn't though, Love Halberd is still slightly more efficient.

Cutting corners with Store TP and weapon skills

Edit: Another table appended.

Last week I referred to "minimum store TP" to achieve so-called n-hit builds from the standpoint of going from 0 to 100 TP in n hits or reaching 100 TP in n - 1 hits starting with sufficient TP return from the previous weapon skill, but practically speaking I should have called it "worst-case scenario store TP if you're using a multi-hit WS." With all the TP you'll get after the first hit (and when was the last time you saw only the first hit land when your WS didn't kill your target?), there aren't too many compelling reasons to maintain "true" store TP totals if it means using equipment you wouldn't touch otherwise. The question is how much store TP to drop while still maintaining a "virtual" n-hit.

As you might have guessed, you can turn to probability to answer this. Consider first the case of a 5-hit polearm with a 5-hit weapon skill. After calculating the probabilities for obtaining sufficient TP returns from a single WS (no need to present such clutter, but I hope I didn't screw up), we can see the relationship between dropping store TP and the lowered probability that you will be able to get 100 TP in n - 1 hits of the next TP-generating "cycle." Of course, these probability calculations are based on the assumption that DA can proc only twice on a multi-hit weapon skill (fewer than seven hits).

I am assuming 95% hit rate for the first WS hit. Since (lack of) accuracy does affect TP return, I thought it would be useful to show the effect of a lower hit rate.

Table 1. Probability of getting 100 TP in 4 hits (after a WS) for a 5-hit polearm (480 delay, 17% double attack rate)

Minimum hits after 1st WS hit	Store TP	95% hit rate	80% hit rate
0	54	.95	.95
1	53	.949996	.948865
1	52	.949996	.948865
2	51	.949677	.930353
2	50	.949677	.930353
3	49	.940513	.815681
3	48	.940513	.815681
4	47	.822485	.490462
4	46	.822485	.490462
5	45	.233998	.105841

You can see there is not much of a drop by shedding up to 6 store TP and still being pretty close to a true 5-hit. Remember that the first hit of a WS can still miss.

The probabilities shown are cumulative probabilities in the sense that, given some amount of store TP, what is the probability that I will be able to get 100 TP in 4 hits after a weapon skill? More specifically, given some amount of store TP, what is the least amount of hits I need to land to be able to get 100 TP in 4 hits with an acceptable probability? Remember that .95 is pretty much as good as it gets.

If you have 95% hit rate, 48 store TP gives you a 94% chance of generating 100 TP in 4 hits, requiring at least a 4-hit return from the previous WS (1st hit TP and TP from at least 3 other hits). If you have a "true" 5-hit build, shedding 6 store TP may be a good trade-off. For example, I've seen 5-hit polearm builds with 49 store TP (including merits), suggesting awareness that 54 store TP is rather superfluous.

If you have 80% hit rate, 48 store TP gives you a 82% chance of generating 100 TP in 4 hits, so you might want at least 50 store TP if being around 80% hit rate is more realistic for whatever you are doing.

This exercise can be repeated for both 6-hit polearm and 6-hit great axe.

Table 2. Probability of getting 100 TP in 5 hits (after a WS) for a 6-hit polearm (480 delay, 21% double attack rate)

Minimum hits after 1st WS hit	Store TP	95% hit rate	80% hit rate
0	29	.95	.95
1	28	.949996	.948948
1	27	.949996	.948948
2	26	.949705	.931688
3	25	.941320	.823837
3	24	.941320	.823837
4	23	.832807	.513000
5	22	.284425	.130744

29 store TP not all that easy to obtain as a warrior (maybe you want to use Aurum Cuirass), but 24 is possible with a bunch of ticky-tack pieces. 15 from /SAM, 5 from Rajas, 1 from Brutal, 1 from Chivalrous Chain, 1 from Ecphoria Ring, and 1 from Engetsuto gives 24 total. Then again, if you're spamming crab sushi, some of these may not be very optimal for Penta Thrust.

Table 3. Probability of getting 100 TP in 5 hits (after a WS) for a 6-hit great axe (504 delay, 21% double attack rate)

Minimum hits after 1st WS hit	Store TP	95% hit rate	80% hit rate
0	22	.95	.95
1	21	.948478	.923695
2	20	.889887	.702636
2	19	.889887	.702636
3	18	.034124	.017160

I have only 6 store TP on equipment for warrior anyway. I can live with 21 store TP if I actually am using /SAM for some reason. What about the likes of 6-hit scythe and 6-hit polearm, both with four-hit weapon skills (like Guillotine and Drakesbane)? The following table compares the minimum TP for a "true" 6-hit build to the minimum TP for a "virtual" 6-hit build.

Table 4. Minimum Store TP requirements for 6-hit builds with 4-hit weapon skills

		Minimum Store TP
Delay	Base TP	True	Virtual
528	14.4	16	14
513	13.9	21	18
501	13.6	23	20
492	13.3	26	23
480	13.0	29	26

With "virtual" store TP builds, the corresponding probability is .9449 given 95% hit rate. (Of course, lower hit rate will lower this probability.) If that .0051-difference in probability really troubles you and is unacceptable, by all means be hyper-conservative.

Dumb thread(s) of the day

Here's a new feature where I talk briefly about crappy replies to decent questions. It would be a lot easier just to take pot-shots all day at shitty FFXI forum threads, which I might just do rather than play with numbers all the time.

Apparently, there is a thread on BG discussing why Allakhazam is so maligned, which usually is done by repeatedly knocking down the straw man that anyone actively endorses TPing in STR or DEX rings. When talking about a signal-to-noise ratio, the noise component is rather substantial on Allakhazam but the signal is pretty small in absolute terms for any FFXI forum, really. Even BG has threads like this, where bald-faced assertions are made without referencing sources and people can say they get 8-hit Drakesbanes with a straight face.

As another example, if you were talking about the relative efficiency of a 6-hit polearm build, you would pretty much get the same content-free, inane answers whether you posited this question on Allakhazam or Blue Gartr. Apparently, it is so difficult to use an average auto-attack damage, use an average WS damage, estimate the time between weapon skills, and use all this information to estimate roughly the relative efficiency of a 6-hit build. (Hint: a 6-hit is not even close to being 20% more efficient than a 7-hit). Instead, you have a reasonable OP followed mostly by dumb-fuck snark and drivel.

Wednesday, October 29, 2008

Double attack and weapon skills, part 2

So many "known" things about random mechanics in FFXI seem poorly substantiated due to a lack of data, bad methodology when data are collected, and poor or non-existent analysis and interpretation after the data collection. Then again, it's not as though you really need to know, say, how many hits per attack round you can expect from a Kraken Club. Even if you have one, such considerations are beside the point.

That said, it's almost delightful to see some real data (not some useless parse), and even better when there are some easily tested hypotheses that follow from the purpose of the data collection. This thread on double attack during WS generated some interesting speculation about how many times double attack can process based on the data gathered but ventured no further, and no one really provided and tested a model of how DA interacts with weapon skills, the closest being a proposal that Penta Thrust may receive up to 3 DA "checks" per WS.

This proposal followed from data collection on TP return (a measure of number of hits in a WS) for Penta Thrust, which is summarized as follows:

10% DA rate (warrior subjob)
95% hit rate (lv 73 dragoon vs. lv 47-54 diatryma)

196 total WS

3 hits: 3 (.015)
4 hits: 42 (.214)
5 hits: 120 (.612)
6 hits: 30 (.153)
7 hits: 1 (.005)

However, I am not interested in seeing whether a "3 DA check" model is a good fit to the data since it is "known" that double attack cannot proc more than twice on a WS. (I hope this is a correct assumption. Besides, it doesn't seem likely that people who love to jack off to WS damage, and make their obnoxious asses known on popular FFXI forums, wouldn't run their mouths about a 8-hit Penta Thrust. Sometimes the persistent absence of evidence is strong evidence--NOT PROOF--of absence.) Rather, I'm looking to clarify how exactly double attack can proc twice at most based on my previous post.

As a reminder, I proposed the following models for how DA might work with WS: (1) double attack can proc twice on specific hits of the WS (thought to be the first two hits per FFXIclopedia), and (2) double attack may proc a maximum of two times on a WS (not restricted to specific hits). Is it even possible to tell the difference between these two models for Penta Thrust, given only 10% DA rate?

Fortunately, the probability distributions under each model are fairly easy to calculate (the calculations for Penta Thrust are similar to those for 3-hit WS last time) and are summarized in the following graph:

The difference between the two is fairly stark, so it wouldn't take that much data to support one over the other, assuming either one is true. In particular, the difference between the two models is most pronounced for the 6-hit and 7-hit cases. A sample proportion of .153 for 6-hits is very unlikely for the "2 DA maximum" model, where the theoretical proportion is .262. The "DA 2 hits only" seems a decent fit, so run with that.

The FFXIclopedia article on DA was changed February 10 of this year to state that DA can activate on the first two hits only (instead of being able to proc twice at most and on any of the hits). Aside from the fact that one cannot distinguish between the 4 ways DA can proc consecutively on two hits in Penta Thrust (saying it procs on the first two hits is nothing more than a guess if you don't know how it's programmed), I wonder if that change was motivated by the evidence of sample data or if it was just a shot in the dark. At least I found some evidence for that.

If you are interested in playing around with the probabilities of the number of hits for your favorite multi-hit weapon skill, the following is some R code I wrote to generate them. You can change p1 (hit rate), p2 (double attack rate), and y (number of normal hits in the WS) to suit your particular situation. Some slight modification would have to be made to isolate the probability of the first hit occurring for the purposes of calculating average WS damage (where fTP isn't 1.0).


# p1 - hit rate
# p2 - double attack rate
#  y - number of normal hits in the weapon skill

p1 = .95
p2 = .15
y = 2

# double attack can process on only two hits

p_2x = rep(0,(y+2))
for (i in 0:(y+2)) {
  p_2x[i+1] = sum(dbinom(max(i-2,0):min(i,y),y,p1)*dbinom(i-max(i-2,0):min(i,y),2,p1*p2))
}

# double attack may process a maximum of two times

p_max = rep(0,(y+2))
for (i in 0:(y+2)) {
  if (i < 2) {
    p_max[i+1] = sum(dbinom(max(i-2,0):min(i,y),y,p1)*dbinom(i-max(i-2,0):min(i,y),y,p1*p2))
    next
  }

  p_max[i+1] = dbinom(i-2,y,p1)*sum(dnbinom(0:(y-2),2,p1*p2))

  if (i != (y+2)) {
    p_max[i+1] = p_max[i+1] + sum(dbinom((i-1):i,y,p1)*dbinom(i-(i-1):i,y,p1*p2))
  }
}

# probability mass functions

round(p_2x,10)
round(p_max,10)

# expected number of hits

hit = seq(0,(y+2))
exp_hit_2x = sum(hit*p_2x)
exp_hit_max = sum(hit*p_max)

exp_hit_2x
exp_hit_max

Some checks: for a 2-hit WS, the two models are indistinguishable. As DA tends to 0%, the two models are indistinguishable in the limit. (The negative binomial distribution is degenerate when p2 = 0.) When hit rate is 100%, there are no number of hits less than the number of normal hits.

Friday, October 24, 2008

Double attack and weapon skills

Previously, I estimated the average damage of both Raging Rush and King's Justice for my character on lv 82 greater colibri (link), but there was one major unmentioned assumption I made concerning how the double attack trait processes on weapon skills.

Suppose that "conventional wisdom" assumes that double attack can proc twice, at most, on a WS (I haven't seen any evidence to prove that DA can proc more than twice), but under this assumption there are two possibilities: (1) double attack must proc on only two hits of the WS (2 or more normal hits in the WS; this is usually thought of as occurring on the first two hits of the WS), and; (2) double attack may proc a maximum of two times on a WS. Which one is it?

There is a subtle difference between the two "hypotheses." If DA can proc on any hit in a multi-hit weapon skill, there are more opportunities for DA to proc twice (when the number of normal hits in the WS is greater than 2) than there would be if DA is limited to proc on specific hits in the WS. Intuitively, if the number of normal hits in the WS is greater than 2, there will be, on average, more WS hits under the second hypothesis even in the presence of a cap to exclude 3+ DA procs.

If you aren't convinced, the following probability exercise will help. Suppose I'm looking at a 3-hit WS (examples: Raging Rush, King's Justice, Blade: Jin, Tachi: Rana) and I want to know the probability of seeing n hits (n = 1, 2, ..., 5) in one WS, given my DA level. Assume 95% hit rate.

Since DA procs are independent of normal hits (in the sense that normal hits must occur in a WS even if they miss), it's simple to calculate these probabilities when DA must proc on only two hits in the WS. Here, the second DA proc is assumed to be independent of the first DA proc, and vice versa. For the other case, the DA procs are dependent, so the calculations are less simple, but they can be done.

When the DA rate is 10%, the probability distributions for both cases are illustrated as follows:

People are more likely to notice 5-hit results than other results, but in either case the probability of observing a 5-hit is pretty low. However, under "2 DA maximum" there are more opportunities for DA to proc (even if there is a 2-DA cap). The expected number of hits is 3.04 for "DA two hits only," and 3.13 for "2 DA maximum."

If you increase your DA rate, the expected number of hits for a WS should always increase (you will see relatively more 4- and 5-hit WSes), and this is the case going from 10% DA to 19% DA:

The expected number of hits is 3.211 for "DA two hits only," and 3.39 for "2 DA maximum." Given 19% DA, it is now fairly easy to distinguish between the two hypotheses, and collecting enough sample data on n-hits of a 3-hit WS should provide evidence in favor of one or the other.

If you can manage to push your DA rate even higher (through merits or elsewhere; I myself have 2 DA merits), the difference between the two hypotheses becomes more stark. Consider when DA is 22%:

The expected number of hits is 3.268 for "DA two hits only," and 3.47 for "2 DA maximum."

Which one do I believe to be the case? I don't have any stake in believing one over the other, but it was easier for me to assume that DA procs on two hits only (there are three ways this can happen for a 3-hit WS, but it doesn't matter in calculating the probabilities).

Thursday, October 16, 2008

King's Justice versus Raging Rush

How does King's Justice stack up to Raging Rush? I decided to waste my time providing an answer to this question by creating some frivolous graphs to compare the average WS damage of Raging Rush with that of King's Justice on everyone's favorite canonical merit party fodder, the greater colibri (lv 82).

Given that the current incarnation of the physical damage equation is still a reasonable approximation (a generous assumption), I calculated these averages based on the attributes of my character's WS setup. (And to make more approximations upon approximations, I assumed the pDIF distribution for my cRatio, 1.433, was uniform over [1, 1.719].) Interestingly, FFXIclopedia gives a fTP "bonus" of 0.5 for the first hit of Raging Rush, which contradicts other sources (Gobli among them) and seems incorrect. I used 1.0 because if it were 0.5, Raging Rush would obviously be inferior. (I suppose I should get into some merit party for the first time in months to see if my calculations are way off.)

I plotted average WS damage of R.R. and K.J. versus critical hit rate since I don't know how exactly the TP modifiers affect crit rate for Raging Rush and neither does anyone else:

Suppose that at 100 TP there is no crit rate bonus for Raging Rush. Looking around for the relationship between DEX and crit rate, I place my overall crit rate at 12% on colibri, and behold, Raging Rush and King's Justice are pretty close in average damage. If this is indeed the case in practice, I probably won't bother unlocking King's Justice just for better Mighty Strikes/300 TP weapon skills. Skillchains? No one cares.

Recall that Raging Rush's first-hit damage used to vary with TP (1.00/1.50/2.50 at 100/200/300 TP, but .35 STR modifier as it is now), so you can get a sense of the magnitude of the increase in average R.R. damage since the exalted "2-hander update" just by looking at the graph (starting at 0% critical hit rate and ending at whatever crit rate you think is associated with R.R.).

Of course, mere averages don't give any idea of the distribution of possible WS damage values. I've seen a few comments that King's Justice is more consistent than Raging Rush, and that Raging Rush yields higher "spikes." You certainly don't need to do any frivolous simulation to lend credence to this perception. I'm not even going to say the shapes of these simulated distributions of WS damage for R.R. and K.J. are even accurate (after piling on approximation after approximation, I wouldn't think so), but they do give some idea of their variance. Even though the average WS damage is close, there is slightly less variance in WS damage associated with King's Justice. (The "sample" means for both R.R. and K.J. damage were within single digits of one another.)

Tuesday, September 16, 2008

Magic damage calculations

As someone who's wasted time caring about magic damage on Ebony Puddings in a vain attempt to approach 9,000 exp/hr any day of the Vana'diel week, I've found the so-called magic damage formula to be quite useful in getting my hopes up. I've also wondered why no one ever bothered listing the "base damage" values ("V values" in the FFXIclopedia article) of offensive spells other than black magic for the sake of reference.

After further investigation, I managed to dig up a reference of "magic D values" not only for black magic but also for white magic and ninjutsu, along with some other sundry details. Earlier, I endeavored to calculate the D-value for the ni line of elemental ninjutsu, and I was happy to find out that I managed to get the right value (78). Interestingly, the san line has a D-value of 105 and a multiplier of 1.5, so you can pretty much dismiss claims of 400-500 damage from san spells as the result of targeting tiny mandies. (You certainly can't do that well on Ebony Puddings.)

As an example, suppose you had +40 MAB (24 from /BLM, 5 from Moldavite, 8 from Uggalepih Pendant, 3 from Denali Kecks), 40 INT from equipment and an elemental staff to boost damage further. If you're a taru with no INT merits (74 INT), your dINT will be 25 going up against an Ebony Pudding (INT 89). Based on this, I computed the damage to a pudding from a san spell to be 285. 400+ damage, where?

Going back to my previous rigmarole about great axe break weapon skills, I confirmed for myself that the unresisted duration of their debuffs should be 180/270/360 seconds corresponding to 100/200/300 TP; in a partially resisted case, the duration is lowered to 90/180/270 seconds, or by exactly 90 seconds. (Finally, I got a Full Break where the accuracy down was partially resisted and the other effects weren't. The time difference between accuracy down wearing off and the rest wearing off was exactly 90 seconds.)

It's kind of funny that these durations are easy to confirm compared to the degree of the debuff effects, yet the reported debuff effects are treated as gospel while the reported durations are obviously wrong now (allowing for the possibility that they used to be 3/4/5 minutes unresisted).

Based on what we know about the effects of the break weapon skills, I have to wonder whether the accuracy down effect of Blade: Kamu can be resisted (lowered duration). That would depend I suppose on whether Blade: Kamu has an associated element (probably earth, but who knows yet...). Break WSes are never used in practice, anyway, and I wouldn't count on Blade: Kamu either, especially if its effect is really accuracy -10 as listed on Studio Gobli currently. There are probably better things to do with your TP, especially subbing dancer in a soloing context.

Ninjutsu damage and break WS duration, two things that I've always wondered about and that I finally know more about. Hooray for utterly useless knowledge, yet arguably better than talking about income from chocobo racing or heaven forbid, gear I want to primp and preen in.

Monday, September 15, 2008

Break weapon skills

(I should let it go, but there are a few things I've always wanted a decent explanation for...)

Studio Gobli has summarized the properties of the new job-specific weapon skills, and the traits of some of these WS are intriguing if only in a tangential "oh, that could be useful in some situation that will never come to pass."

1) King's Justice (WAR) has a primary skillchain attribute of Fragmentation (thunder/wind), giving the warrior the ability to participate in a Light skillchain using a two-handed weapon without having to obtain Ground Strike (great sword). (Its secondary SC attribute is Scission or earth.)

2) Vidohunir (BLM) lowers magic defense (as though you would ever melee) and the duration of this effect is presumably 60/120/180 seconds with 100/200/300 TP.

3) Blade: Kamu (NIN) lowers accuracy and the duration of this effect is 60/90/120 seconds with 100/200/300 TP. Ninja can now participate in a Light skillchain (Fragmentation attribute), as though you'd want to. I assume Blade: Kamu is earth element given the accuracy down effect.

There are a few other WS with "duration of effect varies with TP" (RDM, SMN), but I'd really like to know whether these effects are explicitly stated in the chat logs when they actually process (even better, whether they're resisted or not). If they actually are, then why not do the same for the great axe "break" weapon skills?

I always disliked the fact that you have no idea whether the "[attribute] down" effect actually kicked in immediately after using one of the break WS. In practice, you have to look at the chat logs to see if the effect eventually wears off ("The [monster's] [Attribute] Down effect wears off"), implying that the effect was actually applied by the WS, and if you don't see any message, you usually end up concluding one of the following: (1) the effect was still active when the mob died (if the effect is really there, shouldn't it be obvious, you say?); (2) the mob used an ability/spell to override the effect, or; (3) the effect was never there to begin with.

Why would you care? Well, for those of you that proselytize low-level WARs into using Shield Break exclusively, you might actually want to know:

(1) how long the evasion down effect should actually last
(2) whether it actually works reliably on exp mobs of interest

Regarding (1), Studio Gobli lists durations of 3/4/5 minutes with 100/200/300 TP for all the break weapon skills, and this seems to have been generally accepted as true, if hardly widely known. Yet I just had to convince myself that this was true, and after messing around in Lufaise Meadows for a bit, I was surprised to find that, on the neighborhood mobs at least, the effect duration is more along the line of 90/180/270 seconds with 100/200/300 TP, as summarized by the following graph:

I had no evidence of partial resists (just an all or nothing effect); this was all the data I got from break weapon skills where the effect was applied.

Most of the Shield Break results were from bees, which are supposedly weak to ice. The one Weapon Break was used on an orc, which is supposedly weak to water, and the Armor Break was used on a bugard, which isn't known to be weak to wind.

As for (2), one way to convince yourself that Shield Break should work, without waiting for a "wearing off" message that you may or may not see, is to target mobs that are known to be weak to ice. This is merely a rule of thumb, as Shield Break might work on mobs that aren't weak to ice (such as Goblins) because resists should come into play, and it's possible Shield Break won't work on mobs whose crystal drop actually contradicts a "known" ice weakness. Bugards, which drop fire crystals yet are putatively weak to ice, are one example (though admittedly not an exp target).

I myself was more interested in whether I could ever see a full complement of debuffs from Full Break on mobs that aren't weak to earth, to no avail:

Makara
113% TP
101 seconds
accuracy and evasion ONLY

Makara
114% TP
103 seconds
evasion ONLY

Gigantobugard
138% TP
nothing in 8 minutes!

Death Jacket
167% TP
150 seconds
accuracy, attack, and evasion (no defense)

Death Jacket
123% TP
111 seconds
accuracy and evasion ONLY

Gigas Warwolf
185% TP
nothing in 3 minutes!

Gigas Martialist
112% TP
100 seconds
evasion ONLY

Orcish Bowshooter
171% TP
154 seconds
attack and evasion ONLY

Note that earlier I said the above results applied only to the mobs in Lufaise Meadows... to my chagrin, I later obtained the following preliminary results from Bull Dhalmels in Buburimu Peninsula:

Bull Dhalmel
128% Full Break
230 seconds
evasion and defense ONLY

Bull Dhalmel
300% Full Break
360 seconds
evasion and defense ONLY

Bull Dhalmel
150% Full Break
135 seconds (Windsday)
attack and defense ONLY

Bull Dhalmel
107% Full Break
96 seconds (Windsday)
full debuff!

From this data alone, I could conclude that the duration of the debuffs from Full Break is up to 6 minutes at 300% TP. The other data, hmm...

So, what gives?

Did Studio Gobli used to be right? The evidence doesn't support 3/4/5 minutes at the moment.

Are there actually partial resists? Perhaps a partial resist reduces effect duration by 90 seconds flat regardless of TP level. The dhalmel data suggest an unresisted duration of 180/270/360 seconds with 100/200/300 TP. I propose a "partially resisted" duration of 90/180/270 seconds with 100/200/300 TP, which would match up with the results from Lufaise Meadows. But if those are really "resisted" durations, why didn't I ever see unresisted durations ever in Lufaise Meadows?

What other factors am I not considering (day of the week)? No, I didn't use a Martial Bhuj.

Wednesday, August 6, 2008

Heat without light

Hypothesis testing pertaining to game mechanics is mostly a waste of time because the kinds of questions asked are mostly a waste of time.

For example, on the BG forums I found a proposal to determine whether the critical hit rate bonus on Senjuinrikio increases the critical hit rate of the first hit of Blade: Jin, involving testing on "too weak" mobs so that the mob dies on the first hit of the weapon skill. Two sets of data, one using Mamushito +1 (DMG 38), the other using Senjuinrikio (also DMG 38), are to be produced.

Never mind how tedious it would be to carry out such an experiment. Never mind the widely accepted conventional wisdom that Senjuinrikio's crit bonus does affect Blade: Jin. Let's pretend a formal statistical test is actually worth using.

Then the concern here is what sample size (the same for each set of data) is "sufficient" enough to be able to detect the effect of Senjuinrikio on Blade: Jin? To answer that question, it might help to do some prospective power computations for a test of two independent proportions (under the Neyman-Pearson paradigm of hypothesis testing). In other words, given a sample size that is the same for each group, what is the probability of rejecting the null hypothesis (Senjuinrikio has no effect) when the null hypothesis is false (so that you are inclined instead to accept that Senji has an effect)?

When determining a "suitable" sample size, one practical concern is that the variance of a (known) binomial proportion is a function of both the actual value of the proportion and n, the number of trials, and that when the number of trials is fixed, the variance is maximized for p = .5.

So not only can you control the sample size for your experiment, you can also attempt to keep your "base" critical hit rate as far away from 50% as possible in your attempt to achieve adequate statistical power. If you want to try to keep your sample size low (relatively speaking), and you know people willing to be your bitch on demand, you could try to conduct these tests in Salvage and, hell, throw in some Stumbling Sandals, too. (I honestly don't know what kind of critical hit rates you can expect in Salvage, though.)

Assuming that the Senji does actually have an effect on Blade: Jin, power curves of this two-sample test (Fisher's exact test in this case) for various null proportions (base critical hit rates) show that the power is lower for higher null proportions:

Let's say you'll accept a power of .80 given a Type I error of .05 ("false positive"). If your base critical hit rate is 20%, you'll need a sample size of 639 for each sample proportion. But if your base critical hit rate is 5%, you'll need a sample size of 278.

Of course, you can always accept a higher Type I error in exchange for lower sample sizes for a given level of power:

Given a Type I error of .10, if your base critical hit rate is 20%, you'll need a sample size of 475 for each sample proportion to achieve a power of .80. But, if your base critical hit rate is 5%, you'll need a sample size of "just" 211.

Again, this is all under the assumption that Senji does have an effect on Blade: Jin in the way that we expect.

Let us suppose it is possible to achieve a base critical hit rate of 5% and, therefore, assume that the critical hit rate with Senji is 11%. Assuming this, here is some R code that estimates the power of the test of two proportions from 1,000 simulated experiments (as expected, Fisher's exact test will correctly reject the null hypothesis about 80% of the time), using n = 211 for each group:

p_value = rep(0,1000)
n = 211

for (i in 1:1000) {
  a = rbinom(1,n,.05)
  b = rbinom(1,n,.11)
  c_table = matrix(c(a,n-a,b,n-b),nrow=2)
  p_value[i] = fisher.test(c_table,alternative="less")$p.value

}

power = sum(p_value < .1)/1000
power