The Unbearable Triteness of Preening

Saturday, November 29, 2008

On sophistry

(Edit - Dec. 23: changed link to document.)

Two years too late, but I prepared some comments on this so-called "advanced analysis" of paralyze proc data, mainly concerning the statistical sophistry involved. (I really hope insights have been further developed since then.) Such are the perils of idleness. (I don't recommend that you continue reading further; you've been warned.) I address specific sections of the write-up (sections in boldface).

Introduction

The author claims that it is not desirable to maximize the duration of a paralyze effect. Instead, he (is it ever a she when bloviating about some B.S.?) seems to think that maximizing the number of processes (procs) per cast is the relevant goal. He cites two hypothetical situations where the durations are different yet the rate of procs per unit time is the same. He argues that the scenario with the shorter duration gives an opportunity to reapply a possibly stronger paralyze (higher rate of procs per unit time).

However, he proposed a model that assumes that MND, enfeebling magic skill, and a HQ staff have an effect (statistically significant or not) on both spell duration and the number of paralyze procs. So why not just model the rate of procs per unit time to begin with? The author argues we must "account for" (control for) the effect of duration (something we cannot directly control) so we can see how the controlled factors affect the number of procs directly within some varying time interval that is supposed to be under statistical control. But this is also modeling the rate of procs per unit time (when duration is controlled).

Finally, his "analysis" shows that the duration of the paralyze effect has the greatest effect on the number of procs (MND also does), which he considers unfortunate. However, it goes without saying (but I'll say it anyway) that you cannot change duration purposefully without changing some combination of MND and enfeebling skill (not to mention any omitted variables that may affect duration). (In most practical situations MP-users don't cast without elemental staves.) So what, exactly, did you expect?

Preliminary Analysis

Note that the presence of the 10 missing observations affects the calculation of the correlation matrix. The missing observations are excluded from the subsequent path analysis.

Path Analysis

First off, I must acknowledge that I have never used path analysis for anything, so as I become more familiar with it I may revise my comments later.

The pair-wise "sample" correlations between the so-called exogenous variables here, MND, enfeebling skill, and HQ staff, are meaningless as the variables are not random. (What multicollinearity?) I don't even know why they are indicated on the diagram other than to follow some rote procedure rigidly.

"Clustered ordinary least-squares (OLS) regression" is an oxymoron. Generally speaking, using a robust least-squares method of estimation is a departure from what is ordinarily done. Furthermore, the justification for "clustered robust" LS estimation--that observations within each group (naked, enfeebling, MND, etc.) are not independent--is not valid. The author attributes lack of independence of observations within groups to the "experimental setup of this test," but there is absolutely nothing in the description of the "experimental setup" that suggests this should be so. Autocorrelation is not an issue. (Why would catoblepas build up resistance to paralyze anyway?) But even if it were, a "clustered robust" method cannot account for that. What he basically did was control for group effects twice, which is absolute nonsense and has no effect on his parameter point estimates anyway. (The coefficient of determination, R², is the same whether improperly accounting for nonexistent "clustering" or not.)

There is also the issue of not controlling for test subject (monster), but regardless of the magnitude of the effect of test subject, this concern is not discussed while comparatively more frivolous concerns are. To wit, the author's irrelevant aside about Bayesian inference has nothing to do with the use of BIC here, even though he is not really doing model selection but providing cover for arguing that MND may be a more "important" predictor of duration than the use of a HQ staff.

That cover is rather weak though since individual (non-simultaneous) interval estimates for the "standardized coefficients" are rather wide in the model that the author actually "chose":

MND: (.086, .404)
skill: (.017, .337)
staff: (.057, .377)

Now consider the second regression (modeling number of procs). Again, the author uses completely inappropriate clustered robust linear regression, which leads him to trump up enfeebling skill as highly significant. In reality, the enfeebling effect is barely significant at the 5% level, hardly convincing evidence of a real effect (if it exists, which I doubt). Moreover, something fishy could be going on with the last set of observations. If you omit those from the analysis, the enfeebling effect does not even approach significance. But the data are what they are.

Discussion

Again, the author fails to recognize the imprecision of his parameter estimates (standardized beta coefficients) despite curiously devoting time earlier to a frivolous comparison of two population correlations in Appendix A.

Today, it may be "commonly known" that MND does affect the accuracy of a MND-based magic spell in some way, but arguing that MND has a relatively stronger effect on paralyze duration (a measure of accuracy) than enfeebling skill on the basis of standardized effects is spurious because of the poor parameter estimates and because of the interpretation. Obviously, the main effects are not random variables, so their associated standard deviations don't have any particular meaning as they are just an artifact of experimental control.

Consider the interpretations in real-unit terms. From the first linear regression, the duration is estimated to increase by 6.38 seconds for every 22.8-point increase in MND (controlling for the other main effects). Similarly, the duration is estimated to increase by 4.93 seconds for every 14.4-point increase in enfeebling magic skill (controlling for the other main effects). Point for point, enfeebling magic skill is more effective than MND, and I don't know anyone who would argue for a comparison other than by a per-point basis.

Certainly, there are distinct levels of resists, but there is no reason to believe that HQ staves have a privileged role in determining the distribution of partial resists any more than other factors that affect magic "hit rate," especially since magic accuracy bonuses for both NQ and HQ staves have been estimated.

As for unexplained variability in the number of procs, the author provides a laundry list of possible explanatory factors, none of which are as important as the ones under one's direct control. (Do you do anything only during specific moon phases?)

Reaction and criticism (not in the write-up)

These people had the temerity to broadcast this "analysis" on both Allakhazam and Killing Ifrit.

On Allakhazam, you typically had the usual sucking off. Not unexpectedly, a reasonable objection was raised about the relationship between duration and number of procs. It seems practical enough to consider an increase in duration (holding other factors constant) as increasing the number of procs that are observed. The exogenous factors (MND), on the other hand, actually affect the potency of paralyze (proc rate), also measured as the number of procs, but holding duration constant. But instead of recognizing this line of reasoning, these numbnuts hid beind numbers (and statistics) without even thinking about how to interpret effects and the implications of their "analysis." (This is actually all too common for all the so-called "mathematicians" on Allakhazam.)

On Killing Ifrit, there were a few somewhat naïve criticisms of the experimental design (all from the same poster). Yes, it would be nice to use more than two levels of each independent variable, but there is no compelling case for a nonlinear trend. Again, generating standardized effects for each predictor is a pointless exercise for this data (as discussed previously). A multi-factor ANOVA is superfluous as you can construct simultaneous confidence intervals for the parameter estimates from regression (in general). Sample size and power are brought up, but concern for "too much power" (with excessive sample sizes) is simply a trivial objection.

Alternative (not in the write-up)

I don't have any particular objection to path analysis per se. The low-hanging fruit are that the statistical procedures are questionable, the write-up mired in irrelevant details and the interpretations awkward.

Let us return to the original motivation for the path "analysis." Modeling proc rate was criticized (false distinction between that and number of procs when controlling for duration) but the interpretations involved in path analysis concern proc rate anyway. (Potency must be a proc rate. This is beyond dispute.) So why not model the proc rate directly? (And if you care so much about modeling duration too, you can regress that on your favorite predictors. No one's stopping you.)

It seems natural enough to use Poisson regression to model proc rate, and I carried out this procedure in R (output below):


Call:
glm(formula = proc ~ MND + enfeebling + staff + iceday + offset(log(duration)), 
    family = poisson, data = paralyze)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2293  -0.8721  -0.0776   0.6353   3.0779  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -6.260517   1.120601  -5.587 2.31e-08 ***
MND          0.007941   0.002259   3.516 0.000439 ***
enfeebling   0.008038   0.003644   2.206 0.027404 *  
staff        0.047913   0.107654   0.445 0.656271    
iceday      -0.027394   0.114415  -0.239 0.810772    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

(Dispersion parameter for poisson family taken to be 1)

    Null deviance: 155.82  on 139  degrees of freedom
Residual deviance: 138.19  on 135  degrees of freedom
AIC: 510.53

Number of Fisher Scoring iterations: 5

The model deviance indicates that this model is an acceptable fit to the data. (Note: I facetiously specified an Iceday effect in the model.) Controlling for other factors, proc rate is estimated to increase by .797% for every one-point increase in MND. Note that the z-values are similar to the t-values using OLS estimation.

Monday, November 24, 2008

Crystal Stakes collapse

Note: this is a profanity-laced rant.

On November 4, the day of the last server maintenance, my winning rate in C1 races was 32/68. Since then, the results have been atrocious: 5 1st place finishes, 8 2nd or 3rd place finishes, and 3 below 3rd place. For perspective, in the first 68 races, I placed worse than 3rd only once. Or, to put it in terms that really make me incensed as I write this, a loss of chocobucks between 542 and 764 in sixteen fucking races, or 33.9 to 47.8 per fucking race, or between 72 and 103 fucking minutes wasted per race farming chocobucks for basically jack shit.

It's not merely that I have been losing but that losing more often entails dealing more frequently with a timesink that is in place basically to deter RMT. But so be it... not that I don't have methods of dealing with it.

During this period, aside from one uncontested race (and I didn't even finish first), all the C1 races I've entered (and recorded the toteboards for) have involved at least one other PC chocobo. Rationally, I must acknowledge that C1 races are more hotly contested than ever. Irrationally, I am pissed off that the same chocobos keep placing first over me (and to rub more salt in the wound, I end up placing behind garbage SS/SS/B/F chocobos and "off the podium"), even ones with nominally the same attribute profile as mine. Literally, the same SS/B/B/B chocobo has placed first 3 times against me while I have gotten 5 first-place finishes in three weeks. (Interestingly, I've observed that chocobo was raised with an enlarged beak, for what it's worth. I might even laugh if that owner read this blog to get some ideas.) It would be even more crackpot and solipsistic to associate this string of poor results with the latest server maintenance, but you can't count the FFXI "dev team" out for fucking with its players without even being upfront about it, especially that fat-fuck CoP director.

Sure, in the long run things may even out all things being equal (all things being equal is a huge assumption, not knowing what saddles they are using), if I even get a chance to even things out. But this is FFXI the zero-sum MMORPG, where illiterate, proudly ignorant, gloating motherfuckers get to rake it in and bolt once they get theirs (fuck the rest!) while you get jack shit for the same amount of "effort." Even in chocobo racing.

Estimating changes in magic hit rate with skill

For the purposes of estimating melee hit rate, the functional relationship among accuracy, dexterity, combat skill, mob level, and mob evasion has long been established, thanks to the clever use of the check function. Sadly, no such relationship has really been justified for magic "hit rate" (or resist rate), but that doesn't mean we are condemned to flail in the dark.

Having wondered myself about the utility of meriting elemental magic skill for the purposes of reducing the frequency of resists on "hard stuff," I looked for some information on the relationship between magic skill and resist rate, but solid evidence was hard to come by. Fortunately, after wading through senseless conjecture on BG, I managed to come across an interesting data set for which the "success" rates of casting magic on Ebony Puddings were recorded, given specific levels of elemental skill, magic accuracy, and INT. Even better, this data all but invites me to take a swing at it using some kind of linear regression analysis.

But first, if the factors that go into "magic hit rate" (rate of success or rate of no resists) are similar to those that go into melee hit rate, there are several issues that immediately come to mind when trying to suss out some kind of relationship, such as the relationship between magic accuracy and magic resistance (or evasion?). (Dec. 15: I wrote "Is a ratio involved, as is the case with melee accuracy and melee evasion?" which is incorrect. I probably was thinking of MAB/MDB, but that would be analogous to melee attack and defense.) Furthermore, even if magic resistance/evasion were constant among the flans on Mount Zhayolm, there is a range of levels for Ebony Puddings (supposedly 75-80 on Mount Zhayolm), and if a "magic hit rate" calculation involves a level correction, there is no practical way to account for that.

Still, looking specifically at the nuke data (tests II, III, IV), there appears to be some evidence of a linear association between magic skill alone (holding other relevant factors constant) and success rate. You can do your own plot if you're not convinced.

But as far as magic accuracy is concerned, there are only three combinations of magic accuracy and elemental skill where the success rate was measured. One may argue that magic accuracy seems to be less effective at 242 elemental skill than at higher levels of skill, which may seem persuasive (random variability and unaccounted sources of variability notwithstanding). Really, though, it's a reach to conclude that elemental skill and magic accuracy are correlated with the limited data here.

Finally, INT seems to have no effect at 242 elemental skill, yet has some effect in large quantities at 274 skill. Maybe it's not all that far-fetched to say that the effect of INT on magic hit rate is dependent on magic skill level, which can compromise the estimates associated with a regression analysis. Even worse, perhaps the relationship between INT and magic hit rate (holding other factors constant) is not strictly linear but follows some weird piecewise function depending on your target mob's INT. This calls attention to the need for more data at other levels of INT, macc, and elemental skill (or perhaps a better choice of target whose level and magic resistance value is known to be fixed, but in practice this will be extremely difficult to achieve).

At any rate, using linear regression (with unresisted magic hit rate as the binary response) on the above observations (ignoring the middle rows of test II because they contribute to a poor model fit) gives the following parameter estimates (I truncated output to save space):

                     Standard   Wald 95% Confidence
Parameter  Estimate     Error         Limits         Pr > ChiSq

Intercept   -1.9393    0.1872   -2.3062     -1.5724      <.0001
skill        0.0095    0.0007    0.0082      0.0109      <.0001
macc         0.0147    0.0022    0.0103      0.0190      <.0001
int          0.0028    0.0010    0.0009      0.0047      0.0038

I included both INT and magic accuracy in the model just for the heck of it even though the parameter estimates associated with them aren't all that reliable. Certainly, including more observations with varying levels of INT and magic accuracy may improve those estimates (assuming magic hit rate is linear over some range of either factor), and they should be included in a model for the sake of a comprehensive view of magic hit rate. But for now, we can see that the data suggest that magic hit rate increases by about 1% for every one-point increase in elemental magic skill (holding INT and magic accuracy fixed). The range of elemental magic skill considered is between 242 and 295.

One can also perform a similar analysis with the Sleep trials (tests V and VI), but note that the "success" rate encompasses partial resists also:

                     Standard   Wald 95% Confidence
Parameter  Estimate     Error         Limits         Pr > ChiSq

Intercept   -1.3636    0.5853   -2.5108     -0.2164      0.0198
skill        0.0056    0.0018    0.0021      0.0091      0.0016
macc         0.0085    0.0025    0.0035      0.0134      0.0008

It seems that the effects of magic skill (enfeebling in this case) and magic accuracy are weaker for sleeping than for nuking. (Granted, the interval estimates are rather wide.) The range of enfeebling magic skill is between 307 and 333. It's possible that the acts of sleeping and nuking are just not comparable (unlikely) with respect to resist rates. It's also possible that the effects of general magic skill and accuracy on magic hit rate are diminished past the 300 level of general magic skill. Either way, this complicates understanding of magic hit rate somewhat and steps can be taken to rule out either explanation.

It hasn't escaped my attention that magic accuracy seems to increase magic hit rate more than magic skill, ignoring the wide interval estimates. If this is really the case, the difference is so slight and direct competition between the two attributes so rare that it's not worth caring about. Even comparing Oracle's Robe (magic accuracy +6) to Igqira Weskit (elemental magic skill +5), I would first argue the benefits of using Oracle's Robe to replace both Errant Houppelande (like anyone cares about the elemental enfeebling line) and Igqira Weskit. The HP+20 for Sorcerer's Ring activation can be useful, too.

It also occurred to me that one may try to argue, in analogy to melee accuracy and melee hit rate, that this data support the contention that magic skill increases magic hit rate by 0.9% above the 200 skill level (1% at or below 200), although it is ludicrous to distinguish between 0.9% and 1% based on random data without excessive sample sizes.

But, if all you cared about was estimating the change in magic hit rate for every one-point increase in elemental skill, you might as well focus on the change in magic hit rate between two levels of elemental skill that are relatively far apart, assuming the rate of change is constant (in other words, a linear relationship between hit rate and skill), an assumption that is borne out by the previously considered data.

The regression analysis for the nuke data used 1,400 total trials; these trials could be allocated equally between, say, 242 skill and 292 skill. Then you'll have an easier time showing that the increase in magic hit rate is less than 50% (less than 1% per point of elemental skill). (Use a test for two proportions.)

Saturday, November 8, 2008

Aggressor and double attack merits

After meriting on greater colibri for a bit, I was wondering whether I would be "better off" had I merited double attack to level 5 instead of Aggressor recast. (Unsynchronized Berserk and Aggressor timers would be really annoying though.) This May 2007 discussion comparing Aggressor and double attack merits shows, despite the muddled presentation, a situation where fully merited double attack is more effective than fully merited Aggressor recast, since Aggressor supposedly provides an accuracy bonus of 25, which corresponds to only a 12.5% hit rate increase (on average). However, we might be interested in the magnitude of difference between the two Group 1 schemes, which is more difficult to quantify.

One approach is to calculate the average number of attack rounds to reach 100 TP for both 5 DA/0 Aggressor and 0 DA/5 Aggressor. (The number of attack rounds is independent of specific damage values.) Of course, the relative effectiveness of Aggressor is higher when your hit rate is lower, as is usually the case when targeting anything more difficult than greater colibri. Then it might be useful to compare max DA and max Aggressor for lower levels of a baseline hit rate.

Ultimately we want to know what the differences in long-run "damage over time" are, but first we can look at the average number of attack rounds, as that is an indirect measure of time. (Assume number of seconds per attack round is constant.) Unfortunately, an analytic expression of the average number of attack rounds to reach 100 TP is too annoying to derive primarily because the number of attack rounds needed to reach 100 TP depends on the TP return of the previous weapon skill, which is almost never zero for a multi-hit weapon skill with a decent hit rate. The number of hits to 100, given initial TP, seems basically to follow a Poisson process, but I'd rather not worry about cumbersome calculations. Therefore, I resorted to simulation to generate the following approximate values based on my warrior setup (varying the Group 1 merit configurations, obviously), given baseline hit rate and the use of a 3-hit weapon skill (Raging Rush or King's Justice):


Average number of attack rounds given baseline hit rate

       5/0   2/4   0/5
0.2   20.19 19.81 19.87
0.3   14.60 14.52 14.61
0.4   11.40 11.39 11.47
0.5    9.31  9.33  9.41
0.6    7.83  7.86  7.95
0.7    6.73  6.78  6.84
0.75   6.29  6.33  6.39
0.8    5.88  5.93  5.99
0.825  5.71  5.74  5.81

Here, the first column corresponds to baseline hit rate (before the Aggressor bonus), and the next three columns correspond to different Group 1 merit configurations:

"5/0": 5 double attack, 0 Aggressor
"2/4": 2 double attack, 4 Aggressor (mine)
"0/5": 0 double attack, 5 Aggressor

Then, we can obtain values representing "damage over time" in terms of hits per round, given the baseline (or nominal) hit rate:


Average number of hits per round given baseline hit rate

       5/0   2/4   0/5
0.2   0.336 0.341 0.340
0.3   0.458 0.460 0.456
0.4   0.580 0.579 0.573
0.5   0.702 0.698 0.691
0.6   0.819 0.818 0.811
0.7   0.946 0.935 0.925
0.75  1.006 0.996 0.983
0.8   1.069 1.054 1.041
0.825 1.097 1.085 1.071

The max DA configuration is already about even with max Aggressor at 30% baseline hit rate, and it really starts to pull away as the baseline hit rate increases (especially after the point where Aggressor does not provide the full accuracy bonus, past 82.5% hit rate), so to me there is scant justification for 5/5 Aggressor. This makes sense as fully merited Aggressor provides an average 1.5% hit rate increase over non-merited Aggressor, which pales in comparison to the increase in "damage over time" that can be conferred by 5 double attack in the presence of high levels of accuracy. This analysis doesn't account for multi-hit weapons such as Ridill and Joyeuse, but the relative differences between 5/0 and 0/5 should still favor 5 DA merits even though the gap may close. And of course, this post doesn't account for actual damage per hit, but DA and hit rate are "independent" of damage per hit anyway (hits/time × damage/hit = damage/time!) and it's not that much of a reach to estimate real "damage over time" by factoring in an average damage per hit.

I found it helpful to plot attack rounds vs. hit rate to illustrate that the average number of attack rounds to 100 TP levels off as hit rate increases:

Obviously the rate of change in the number of attack rounds to 100 TP is decreasing in magnitude (but is still negative) with hit rate. But the number of attack rounds is not a direct measure of damage over time. Damage over time is a ratio of, yes, damage over time. The number of attack rounds is a proxy for time, and is not a ratio.

The number of hits, given the number of attack rounds, on the other hand, is a measure of damage, so dividing the number of hits by the number of attack rounds gives a quantity that can stand in for "damage over time," as plotted below vs. nominal hit rate:

Of course, there is no reason to plot such a thing because intuitively the rate of change of hits/round must be constant (we're plotting hit rate vs. hit rate!), especially if you believe that 2 points of accuracy always corresponds to 1% hit rate between 20% hit rate and 95% hit rate. If you do, it's complete nonsense to speak of damage over time showing "diminishing returns" to hit rate. Hit rate leveling off with accuracy in some logistic fashion is another story though.

Wednesday, October 29, 2008

Double attack and weapon skills, part 2

So many "known" things about random mechanics in FFXI seem poorly substantiated due to a lack of data, bad methodology when data are collected, and poor or non-existent analysis and interpretation after the data collection. Then again, it's not as though you really need to know, say, how many hits per attack round you can expect from a Kraken Club. Even if you have one, such considerations are beside the point.

That said, it's almost delightful to see some real data (not some useless parse), and even better when there are some easily tested hypotheses that follow from the purpose of the data collection. This thread on double attack during WS generated some interesting speculation about how many times double attack can process based on the data gathered but ventured no further, and no one really provided and tested a model of how DA interacts with weapon skills, the closest being a proposal that Penta Thrust may receive up to 3 DA "checks" per WS.

This proposal followed from data collection on TP return (a measure of number of hits in a WS) for Penta Thrust, which is summarized as follows:

10% DA rate (warrior subjob)
95% hit rate (lv 73 dragoon vs. lv 47-54 diatryma)

196 total WS

3 hits: 3 (.015)
4 hits: 42 (.214)
5 hits: 120 (.612)
6 hits: 30 (.153)
7 hits: 1 (.005)

However, I am not interested in seeing whether a "3 DA check" model is a good fit to the data since it is "known" that double attack cannot proc more than twice on a WS. (I hope this is a correct assumption. Besides, it doesn't seem likely that people who love to jack off to WS damage, and make their obnoxious asses known on popular FFXI forums, wouldn't run their mouths about a 8-hit Penta Thrust. Sometimes the persistent absence of evidence is strong evidence--NOT PROOF--of absence.) Rather, I'm looking to clarify how exactly double attack can proc twice at most based on my previous post.

As a reminder, I proposed the following models for how DA might work with WS: (1) double attack can proc twice on specific hits of the WS (thought to be the first two hits per FFXIclopedia), and (2) double attack may proc a maximum of two times on a WS (not restricted to specific hits). Is it even possible to tell the difference between these two models for Penta Thrust, given only 10% DA rate?

Fortunately, the probability distributions under each model are fairly easy to calculate (the calculations for Penta Thrust are similar to those for 3-hit WS last time) and are summarized in the following graph:

The difference between the two is fairly stark, so it wouldn't take that much data to support one over the other, assuming either one is true. In particular, the difference between the two models is most pronounced for the 6-hit and 7-hit cases. A sample proportion of .153 for 6-hits is very unlikely for the "2 DA maximum" model, where the theoretical proportion is .262. The "DA 2 hits only" seems a decent fit, so run with that.

The FFXIclopedia article on DA was changed February 10 of this year to state that DA can activate on the first two hits only (instead of being able to proc twice at most and on any of the hits). Aside from the fact that one cannot distinguish between the 4 ways DA can proc consecutively on two hits in Penta Thrust (saying it procs on the first two hits is nothing more than a guess if you don't know how it's programmed), I wonder if that change was motivated by the evidence of sample data or if it was just a shot in the dark. At least I found some evidence for that.

If you are interested in playing around with the probabilities of the number of hits for your favorite multi-hit weapon skill, the following is some R code I wrote to generate them. You can change p1 (hit rate), p2 (double attack rate), and y (number of normal hits in the WS) to suit your particular situation. Some slight modification would have to be made to isolate the probability of the first hit occurring for the purposes of calculating average WS damage (where fTP isn't 1.0).


# p1 - hit rate
# p2 - double attack rate
#  y - number of normal hits in the weapon skill

p1 = .95
p2 = .15
y = 2

# double attack can process on only two hits

p_2x = rep(0,(y+2))
for (i in 0:(y+2)) {
  p_2x[i+1] = sum(dbinom(max(i-2,0):min(i,y),y,p1)*dbinom(i-max(i-2,0):min(i,y),2,p1*p2))
}

# double attack may process a maximum of two times

p_max = rep(0,(y+2))
for (i in 0:(y+2)) {
  if (i < 2) {
    p_max[i+1] = sum(dbinom(max(i-2,0):min(i,y),y,p1)*dbinom(i-max(i-2,0):min(i,y),y,p1*p2))
    next
  }

  p_max[i+1] = dbinom(i-2,y,p1)*sum(dnbinom(0:(y-2),2,p1*p2))

  if (i != (y+2)) {
    p_max[i+1] = p_max[i+1] + sum(dbinom((i-1):i,y,p1)*dbinom(i-(i-1):i,y,p1*p2))
  }
}

# probability mass functions

round(p_2x,10)
round(p_max,10)

# expected number of hits

hit = seq(0,(y+2))
exp_hit_2x = sum(hit*p_2x)
exp_hit_max = sum(hit*p_max)

exp_hit_2x
exp_hit_max

Some checks: for a 2-hit WS, the two models are indistinguishable. As DA tends to 0%, the two models are indistinguishable in the limit. (The negative binomial distribution is degenerate when p2 = 0.) When hit rate is 100%, there are no number of hits less than the number of normal hits.

Friday, October 24, 2008

Double attack and weapon skills

Previously, I estimated the average damage of both Raging Rush and King's Justice for my character on lv 82 greater colibri (link), but there was one major unmentioned assumption I made concerning how the double attack trait processes on weapon skills.

Suppose that "conventional wisdom" assumes that double attack can proc twice, at most, on a WS (I haven't seen any evidence to prove that DA can proc more than twice), but under this assumption there are two possibilities: (1) double attack must proc on only two hits of the WS (2 or more normal hits in the WS; this is usually thought of as occurring on the first two hits of the WS), and; (2) double attack may proc a maximum of two times on a WS. Which one is it?

There is a subtle difference between the two "hypotheses." If DA can proc on any hit in a multi-hit weapon skill, there are more opportunities for DA to proc twice (when the number of normal hits in the WS is greater than 2) than there would be if DA is limited to proc on specific hits in the WS. Intuitively, if the number of normal hits in the WS is greater than 2, there will be, on average, more WS hits under the second hypothesis even in the presence of a cap to exclude 3+ DA procs.

If you aren't convinced, the following probability exercise will help. Suppose I'm looking at a 3-hit WS (examples: Raging Rush, King's Justice, Blade: Jin, Tachi: Rana) and I want to know the probability of seeing n hits (n = 1, 2, ..., 5) in one WS, given my DA level. Assume 95% hit rate.

Since DA procs are independent of normal hits (in the sense that normal hits must occur in a WS even if they miss), it's simple to calculate these probabilities when DA must proc on only two hits in the WS. Here, the second DA proc is assumed to be independent of the first DA proc, and vice versa. For the other case, the DA procs are dependent, so the calculations are less simple, but they can be done.

When the DA rate is 10%, the probability distributions for both cases are illustrated as follows:

People are more likely to notice 5-hit results than other results, but in either case the probability of observing a 5-hit is pretty low. However, under "2 DA maximum" there are more opportunities for DA to proc (even if there is a 2-DA cap). The expected number of hits is 3.04 for "DA two hits only," and 3.13 for "2 DA maximum."

If you increase your DA rate, the expected number of hits for a WS should always increase (you will see relatively more 4- and 5-hit WSes), and this is the case going from 10% DA to 19% DA:

The expected number of hits is 3.211 for "DA two hits only," and 3.39 for "2 DA maximum." Given 19% DA, it is now fairly easy to distinguish between the two hypotheses, and collecting enough sample data on n-hits of a 3-hit WS should provide evidence in favor of one or the other.

If you can manage to push your DA rate even higher (through merits or elsewhere; I myself have 2 DA merits), the difference between the two hypotheses becomes more stark. Consider when DA is 22%:

The expected number of hits is 3.268 for "DA two hits only," and 3.47 for "2 DA maximum."

Which one do I believe to be the case? I don't have any stake in believing one over the other, but it was easier for me to assume that DA procs on two hits only (there are three ways this can happen for a 3-hit WS, but it doesn't matter in calculating the probabilities).

Monday, October 20, 2008

The relationship between DEX and critical hit rate

My previous post somehow got over 40 "click-throughs" on TTTO, perhaps because its authoritative title, "King's Justice versus Raging Rush," promised a decisive comparison yet its conclusions were slightly less touchy-feely than eyeballing. (I was actually looking for some feedback, but I guess it wasn't meant to be.) In that vein, I also offer this bait-and-switch regarding the relationship between DEX and critical hit rate.

I would not care about such things if not for the prospect of obtaining Byakko's Haidate one day; with its 15 DEX, surely there must be some obvious increase in critical hit rate, right?

In fact, for some reason or another 15 DEX was once "thought" always to increase critical hit rate by a paltry 1-2% despite the reality of sampling error. (I've always wondered how people arrived at such conclusions by sampling. Even if you collected data through a parse, if you had a sample of 2500 hits, the margin of error associated with your crit rate estimate would be as much as 2%.) This conventional "wisdom" was then debunked around March 2007 with a discussion of the DEX/crit relation motivated by the observation that lots of DEX sent crit rates soaring up to some maximum. Coincidentally or not, around that time there was a parallel discussion on Allakhazam about the same topic.

Sure, these people didn't bother to control for mob AGI. Now, it appears evident that your DEX relative to your target's AGI is a factor in the critical hit rate determination. But for the experiments discussed in those threads, AGI wasn't controlled. The AGI of Robber Crabs, a test subject in the Alla thread, apparently is either 39 or 42, and the AGI of Tavnazian Sheep and Miner Bees, targets in the BG thread, probably varies too. But despite the lack of control it was obvious that piling on enough DEX will increase your critical hit rate markedly at some point.

Unfortunately, this conclusion is couched in the lazy terminology of "tiers." Some examples are

(1) "Stack enough DEX to break some critical rate tier, where each point of DEX you add within that tier has a larger effect."

(2) "Any large amounts of DEX before a critical rate tier will not have a major effect on critical hit rate."

Implicit in such statements is that if you don't break a "tier," it isn't worth trying to pile on DEX. In turn, considering that "tiers" in crafting refer to discontinuous jumps in HQ rate, it isn't surprising that a "tier" in terms of crit rate is also thought of as a sudden, discontinuous jump at some critical level of DEX. But the evidence provided in the above threads doesn't really point to such a discontinuous phenomenon.

First, consider the results from BG thread. Amazingly, the point estimates were given as approximations based on sample sizes of about 300 (really, that lazy not to record the exact sample sizes?), but that isn't that big a deal. But these point estimates are themselves random variables with corresponding distributions so it is helpful to visualize confidence intervals for the true values of these crit rates for given levels of DEX, and I created a graph to help with that:

The 95% confidence intervals are represented by black bars with the point estimates centered within the CIs. I also marked what are thought to be the minimum and maximum crit rates for DEX only with gray lines, 9% minimum and 24% maximum with 4/4 critical hit rate merits (who doesn't have those?). Critical hit rate bonuses from equipment are not subject to the caps.

The data corresponding to "low" and "high" DEX on this graph conform to the minimum and maximum crit rates. (At least there is no reason to believe otherwise.) At some point, though, crit rate increases with DEX in seemingly a linear fashion, which could awkwardly be described as a "tier," I suppose. This evokes a parallel with overall hit rate versus accuracy, with a minimum of 20% and a maximum of 95% and hit rate thought to vary linearly with accuracy in between. So if crit rate does increase (linearly) within a certain range of DEX, it is worth adding DEX within this interval all other things being equal. Sure, I guess you are within a "tier" when this happens, but where's the evidence for a discontinuous jump to reach this "tier"?

Furthermore, there is hardly any evidence for the plural tiers.

I've also graphed the first set of data from Allakhazam (first post), which is similar to the BG one:

Interestingly, here the crit rate estimates increase over a 15-DEX range, even more evidence against the idea of a discontinuous jump.

Finally, in the Alla discussion data from the Robber Crabs was pooled. Pooled data generally poses statistical hazards (for one, we're assuming the exact experimental conditions for each person involved but you figure there's gotta some idiot to fuck it up or some other factor... like the fact that the AGI of Robber Crabs varies!), but let's just run with this. I created a graph of 95% CIs for the pooled data as follows:

Even in violating statistical assumptions (independence) it is obvious there is no discontinuous jump in crit rate to be seen that cannot be attributed to sampling error. And even with the fundamental shadiness of this experiment (not controlling AGI), I even had the cheerful temerity to do least-squares linear regression (which itself is inappropriate for a variety of reasons) on the data points for which over 1000 samples were collected, in the DEX region where crit rate seems to increase linearly. For me it's enough to know that there is an obvious increase in crit rate; it doesn't matter what the exact increase will be for 1 additional DEX.

Also, the region is fairly narrow (10-15 DEX) for Robber Crabs, which would explain why people observe a sudden jump when adding DEX, as there is the view that adding DEX for the purposes of increasing crit rate should be an all-or-nothing thing (never mind the reality that the tradeoffs you make to stack DEX make such an attempt impractical).

It isn't necessarily true that the results from robber crabs can be generalized to other mobs. But if this phenomenon is real and can be generalized, then you may not have to go for an all-or-nothing attempt to increase crit rates with DEX, either in an auto-attack or WS phase, as long as your DEX is within the region where DEX is considered helpful.

For robber crabs, this region appears to be between 77 and 92 DEX. The higher level robber crabs in Kuftal Tunnel have 42 AGI, which jibes with the idea that your crit rate is capped when your DEX is 50 higher than your target's AGI.

The "transition region" clearly doesn't start when your DEX is equal to your target's AGI, but where should it start? The statement in the previous paragraph implies that it could start at about 35 DEX above your target's AGI, but this is a troublesome statement to make given that the crit rates consistently appear to be above 9% (the minimum) before 77 DEX. One possible explanation is that crit rate could be a minimum when (DEX - AGI) is less than or equal to 0, and rises very slowly from 0 to around 35. This could be why it's difficult to see any improvement in crit rates from adding DEX on your usual merit mobs, which all have AGI above 67.

I admit I didn't break any new ground, but I thought it might be fun to show my take on this.