Saturday, November 29, 2008

On sophistry

(Edit - Dec. 23: changed link to document.)

Two years too late, but I prepared some comments on this so-called "advanced analysis" of paralyze proc data, mainly concerning the statistical sophistry involved. (I really hope insights have been further developed since then.) Such are the perils of idleness. (I don't recommend that you continue reading further; you've been warned.) I address specific sections of the write-up (sections in boldface).

Introduction

The author claims that it is not desirable to maximize the duration of a paralyze effect. Instead, he (is it ever a she when bloviating about some B.S.?) seems to think that maximizing the number of processes (procs) per cast is the relevant goal. He cites two hypothetical situations where the durations are different yet the rate of procs per unit time is the same. He argues that the scenario with the shorter duration gives an opportunity to reapply a possibly stronger paralyze (higher rate of procs per unit time).

However, he proposed a model that assumes that MND, enfeebling magic skill, and a HQ staff have an effect (statistically significant or not) on both spell duration and the number of paralyze procs. So why not just model the rate of procs per unit time to begin with? The author argues we must "account for" (control for) the effect of duration (something we cannot directly control) so we can see how the controlled factors affect the number of procs directly within some varying time interval that is supposed to be under statistical control. But this is also modeling the rate of procs per unit time (when duration is controlled).

Finally, his "analysis" shows that the duration of the paralyze effect has the greatest effect on the number of procs (MND also does), which he considers unfortunate. However, it goes without saying (but I'll say it anyway) that you cannot change duration purposefully without changing some combination of MND and enfeebling skill (not to mention any omitted variables that may affect duration). (In most practical situations MP-users don't cast without elemental staves.) So what, exactly, did you expect?

Preliminary Analysis

Note that the presence of the 10 missing observations affects the calculation of the correlation matrix. The missing observations are excluded from the subsequent path analysis.


Path Analysis

First off, I must acknowledge that I have never used path analysis for anything, so as I become more familiar with it I may revise my comments later.

The pair-wise "sample" correlations between the so-called exogenous variables here, MND, enfeebling skill, and HQ staff, are meaningless as the variables are not random. (What multicollinearity?) I don't even know why they are indicated on the diagram other than to follow some rote procedure rigidly.

"Clustered ordinary least-squares (OLS) regression" is an oxymoron. Generally speaking, using a robust least-squares method of estimation is a departure from what is ordinarily done. Furthermore, the justification for "clustered robust" LS estimation--that observations within each group (naked, enfeebling, MND, etc.) are not independent--is not valid. The author attributes lack of independence of observations within groups to the "experimental setup of this test," but there is absolutely nothing in the description of the "experimental setup" that suggests this should be so. Autocorrelation is not an issue. (Why would catoblepas build up resistance to paralyze anyway?) But even if it were, a "clustered robust" method cannot account for that. What he basically did was control for group effects twice, which is absolute nonsense and has no effect on his parameter point estimates anyway. (The coefficient of determination, R2, is the same whether improperly accounting for nonexistent "clustering" or not.)

There is also the issue of not controlling for test subject (monster), but regardless of the magnitude of the effect of test subject, this concern is not discussed while comparatively more frivolous concerns are. To wit, the author's irrelevant aside about Bayesian inference has nothing to do with the use of BIC here, even though he is not really doing model selection but providing cover for arguing that MND may be a more "important" predictor of duration than the use of a HQ staff.

That cover is rather weak though since individual (non-simultaneous) interval estimates for the "standardized coefficients" are rather wide in the model that the author actually "chose":

MND: (.086, .404)
skill: (.017, .337)
staff: (.057, .377)

Now consider the second regression (modeling number of procs). Again, the author uses completely inappropriate clustered robust linear regression, which leads him to trump up enfeebling skill as highly significant. In reality, the enfeebling effect is barely significant at the 5% level, hardly convincing evidence of a real effect (if it exists, which I doubt). Moreover, something fishy could be going on with the last set of observations. If you omit those from the analysis, the enfeebling effect does not even approach significance. But the data are what they are.

Discussion

Again, the author fails to recognize the imprecision of his parameter estimates (standardized beta coefficients) despite curiously devoting time earlier to a frivolous comparison of two population correlations in Appendix A.

Today, it may be "commonly known" that MND does affect the accuracy of a MND-based magic spell in some way, but arguing that MND has a relatively stronger effect on paralyze duration (a measure of accuracy) than enfeebling skill on the basis of standardized effects is spurious because of the poor parameter estimates and because of the interpretation. Obviously, the main effects are not random variables, so their associated standard deviations don't have any particular meaning as they are just an artifact of experimental control.

Consider the interpretations in real-unit terms. From the first linear regression, the duration is estimated to increase by 6.38 seconds for every 22.8-point increase in MND (controlling for the other main effects). Similarly, the duration is estimated to increase by 4.93 seconds for every 14.4-point increase in enfeebling magic skill (controlling for the other main effects). Point for point, enfeebling magic skill is more effective than MND, and I don't know anyone who would argue for a comparison other than by a per-point basis.

Certainly, there are distinct levels of resists, but there is no reason to believe that HQ staves have a privileged role in determining the distribution of partial resists any more than other factors that affect magic "hit rate," especially since magic accuracy bonuses for both NQ and HQ staves have been estimated.

As for unexplained variability in the number of procs, the author provides a laundry list of possible explanatory factors, none of which are as important as the ones under one's direct control. (Do you do anything only during specific moon phases?)

Reaction and criticism (not in the write-up)

These people had the temerity to broadcast this "analysis" on both Allakhazam and Killing Ifrit.

On Allakhazam, you typically had the usual sucking off. Not unexpectedly, a reasonable objection was raised about the relationship between duration and number of procs. It seems practical enough to consider an increase in duration (holding other factors constant) as increasing the number of procs that are observed. The exogenous factors (MND), on the other hand, actually affect the potency of paralyze (proc rate), also measured as the number of procs, but holding duration constant. But instead of recognizing this line of reasoning, these numbnuts hid beind numbers (and statistics) without even thinking about how to interpret effects and the implications of their "analysis." (This is actually all too common for all the so-called "mathematicians" on Allakhazam.)

On Killing Ifrit, there were a few somewhat naïve criticisms of the experimental design (all from the same poster). Yes, it would be nice to use more than two levels of each independent variable, but there is no compelling case for a nonlinear trend. Again, generating standardized effects for each predictor is a pointless exercise for this data (as discussed previously). A multi-factor ANOVA is superfluous as you can construct simultaneous confidence intervals for the parameter estimates from regression (in general). Sample size and power are brought up, but concern for "too much power" (with excessive sample sizes) is simply a trivial objection.

Alternative (not in the write-up)

I don't have any particular objection to path analysis per se. The low-hanging fruit are that the statistical procedures are questionable, the write-up mired in irrelevant details and the interpretations awkward.

Let us return to the original motivation for the path "analysis." Modeling proc rate was criticized (false distinction between that and number of procs when controlling for duration) but the interpretations involved in path analysis concern proc rate anyway. (Potency must be a proc rate. This is beyond dispute.) So why not model the proc rate directly? (And if you care so much about modeling duration too, you can regress that on your favorite predictors. No one's stopping you.)

It seems natural enough to use Poisson regression to model proc rate, and I carried out this procedure in R (output below):


Call:
glm(formula = proc ~ MND + enfeebling + staff + iceday + offset(log(duration)),
family = poisson, data = paralyze)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.2293 -0.8721 -0.0776 0.6353 3.0779

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.260517 1.120601 -5.587 2.31e-08 ***
MND 0.007941 0.002259 3.516 0.000439 ***
enfeebling 0.008038 0.003644 2.206 0.027404 *
staff 0.047913 0.107654 0.445 0.656271
iceday -0.027394 0.114415 -0.239 0.810772
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 155.82 on 139 degrees of freedom
Residual deviance: 138.19 on 135 degrees of freedom
AIC: 510.53

Number of Fisher Scoring iterations: 5


The model deviance indicates that this model is an acceptable fit to the data. (Note: I facetiously specified an Iceday effect in the model.) Controlling for other factors, proc rate is estimated to increase by .797% for every one-point increase in MND. Note that the z-values are similar to the t-values using OLS estimation.

Monday, November 24, 2008

Crystal Stakes collapse

Note: this is a profanity-laced rant.

On November 4, the day of the last server maintenance, my winning rate in C1 races was 32/68. Since then, the results have been atrocious: 5 1st place finishes, 8 2nd or 3rd place finishes, and 3 below 3rd place. For perspective, in the first 68 races, I placed worse than 3rd only once. Or, to put it in terms that really make me incensed as I write this, a loss of chocobucks between 542 and 764 in sixteen fucking races, or 33.9 to 47.8 per fucking race, or between 72 and 103 fucking minutes wasted per race farming chocobucks for basically jack shit.

It's not merely that I have been losing but that losing more often entails dealing more frequently with a timesink that is in place basically to deter RMT. But so be it... not that I don't have methods of dealing with it.

During this period, aside from one uncontested race (and I didn't even finish first), all the C1 races I've entered (and recorded the toteboards for) have involved at least one other PC chocobo. Rationally, I must acknowledge that C1 races are more hotly contested than ever. Irrationally, I am pissed off that the same chocobos keep placing first over me (and to rub more salt in the wound, I end up placing behind garbage SS/SS/B/F chocobos and "off the podium"), even ones with nominally the same attribute profile as mine. Literally, the same SS/B/B/B chocobo has placed first 3 times against me while I have gotten 5 first-place finishes in three weeks. (Interestingly, I've observed that chocobo was raised with an enlarged beak, for what it's worth. I might even laugh if that owner read this blog to get some ideas.) It would be even more crackpot and solipsistic to associate this string of poor results with the latest server maintenance, but you can't count the FFXI "dev team" out for fucking with its players without even being upfront about it, especially that fat-fuck CoP director.

Sure, in the long run things may even out all things being equal (all things being equal is a huge assumption, not knowing what saddles they are using), if I even get a chance to even things out. But this is FFXI the zero-sum MMORPG, where illiterate, proudly ignorant, gloating motherfuckers get to rake it in and bolt once they get theirs (fuck the rest!) while you get jack shit for the same amount of "effort." Even in chocobo racing.

Estimating changes in magic hit rate with skill

For the purposes of estimating melee hit rate, the functional relationship among accuracy, dexterity, combat skill, mob level, and mob evasion has long been established, thanks to the clever use of the check function. Sadly, no such relationship has really been justified for magic "hit rate" (or resist rate), but that doesn't mean we are condemned to flail in the dark.

Having wondered myself about the utility of meriting elemental magic skill for the purposes of reducing the frequency of resists on "hard stuff," I looked for some information on the relationship between magic skill and resist rate, but solid evidence was hard to come by. Fortunately, after wading through senseless conjecture on BG, I managed to come across an interesting data set for which the "success" rates of casting magic on Ebony Puddings were recorded, given specific levels of elemental skill, magic accuracy, and INT. Even better, this data all but invites me to take a swing at it using some kind of linear regression analysis.

But first, if the factors that go into "magic hit rate" (rate of success or rate of no resists) are similar to those that go into melee hit rate, there are several issues that immediately come to mind when trying to suss out some kind of relationship, such as the relationship between magic accuracy and magic resistance (or evasion?). (Dec. 15: I wrote "Is a ratio involved, as is the case with melee accuracy and melee evasion?" which is incorrect. I probably was thinking of MAB/MDB, but that would be analogous to melee attack and defense.) Furthermore, even if magic resistance/evasion were constant among the flans on Mount Zhayolm, there is a range of levels for Ebony Puddings (supposedly 75-80 on Mount Zhayolm), and if a "magic hit rate" calculation involves a level correction, there is no practical way to account for that.

Still, looking specifically at the nuke data (tests II, III, IV), there appears to be some evidence of a linear association between magic skill alone (holding other relevant factors constant) and success rate. You can do your own plot if you're not convinced.

But as far as magic accuracy is concerned, there are only three combinations of magic accuracy and elemental skill where the success rate was measured. One may argue that magic accuracy seems to be less effective at 242 elemental skill than at higher levels of skill, which may seem persuasive (random variability and unaccounted sources of variability notwithstanding). Really, though, it's a reach to conclude that elemental skill and magic accuracy are correlated with the limited data here.

Finally, INT seems to have no effect at 242 elemental skill, yet has some effect in large quantities at 274 skill. Maybe it's not all that far-fetched to say that the effect of INT on magic hit rate is dependent on magic skill level, which can compromise the estimates associated with a regression analysis. Even worse, perhaps the relationship between INT and magic hit rate (holding other factors constant) is not strictly linear but follows some weird piecewise function depending on your target mob's INT. This calls attention to the need for more data at other levels of INT, macc, and elemental skill (or perhaps a better choice of target whose level and magic resistance value is known to be fixed, but in practice this will be extremely difficult to achieve).

At any rate, using linear regression (with unresisted magic hit rate as the binary response) on the above observations (ignoring the middle rows of test II because they contribute to a poor model fit) gives the following parameter estimates (I truncated output to save space):

                     Standard   Wald 95% Confidence
Parameter Estimate Error Limits Pr > ChiSq

Intercept -1.9393 0.1872 -2.3062 -1.5724 <.0001
skill 0.0095 0.0007 0.0082 0.0109 <.0001
macc 0.0147 0.0022 0.0103 0.0190 <.0001
int 0.0028 0.0010 0.0009 0.0047 0.0038


I included both INT and magic accuracy in the model just for the heck of it even though the parameter estimates associated with them aren't all that reliable. Certainly, including more observations with varying levels of INT and magic accuracy may improve those estimates (assuming magic hit rate is linear over some range of either factor), and they should be included in a model for the sake of a comprehensive view of magic hit rate. But for now, we can see that the data suggest that magic hit rate increases by about 1% for every one-point increase in elemental magic skill (holding INT and magic accuracy fixed). The range of elemental magic skill considered is between 242 and 295.

One can also perform a similar analysis with the Sleep trials (tests V and VI), but note that the "success" rate encompasses partial resists also:

                     Standard   Wald 95% Confidence
Parameter Estimate Error Limits Pr > ChiSq

Intercept -1.3636 0.5853 -2.5108 -0.2164 0.0198
skill 0.0056 0.0018 0.0021 0.0091 0.0016
macc 0.0085 0.0025 0.0035 0.0134 0.0008


It seems that the effects of magic skill (enfeebling in this case) and magic accuracy are weaker for sleeping than for nuking. (Granted, the interval estimates are rather wide.) The range of enfeebling magic skill is between 307 and 333. It's possible that the acts of sleeping and nuking are just not comparable (unlikely) with respect to resist rates. It's also possible that the effects of general magic skill and accuracy on magic hit rate are diminished past the 300 level of general magic skill. Either way, this complicates understanding of magic hit rate somewhat and steps can be taken to rule out either explanation.

It hasn't escaped my attention that magic accuracy seems to increase magic hit rate more than magic skill, ignoring the wide interval estimates. If this is really the case, the difference is so slight and direct competition between the two attributes so rare that it's not worth caring about. Even comparing Oracle's Robe (magic accuracy +6) to Igqira Weskit (elemental magic skill +5), I would first argue the benefits of using Oracle's Robe to replace both Errant Houppelande (like anyone cares about the elemental enfeebling line) and Igqira Weskit. The HP+20 for Sorcerer's Ring activation can be useful, too.

It also occurred to me that one may try to argue, in analogy to melee accuracy and melee hit rate, that this data support the contention that magic skill increases magic hit rate by 0.9% above the 200 skill level (1% at or below 200), although it is ludicrous to distinguish between 0.9% and 1% based on random data without excessive sample sizes.

But, if all you cared about was estimating the change in magic hit rate for every one-point increase in elemental skill, you might as well focus on the change in magic hit rate between two levels of elemental skill that are relatively far apart, assuming the rate of change is constant (in other words, a linear relationship between hit rate and skill), an assumption that is borne out by the previously considered data.

The regression analysis for the nuke data used 1,400 total trials; these trials could be allocated equally between, say, 242 skill and 292 skill. Then you'll have an easier time showing that the increase in magic hit rate is less than 50% (less than 1% per point of elemental skill). (Use a test for two proportions.)

Saturday, November 8, 2008

Aggressor and double attack merits

After meriting on greater colibri for a bit, I was wondering whether I would be "better off" had I merited double attack to level 5 instead of Aggressor recast. (Unsynchronized Berserk and Aggressor timers would be really annoying though.) This May 2007 discussion comparing Aggressor and double attack merits shows, despite the muddled presentation, a situation where fully merited double attack is more effective than fully merited Aggressor recast, since Aggressor supposedly provides an accuracy bonus of 25, which corresponds to only a 12.5% hit rate increase (on average). However, we might be interested in the magnitude of difference between the two Group 1 schemes, which is more difficult to quantify.

One approach is to calculate the average number of attack rounds to reach 100 TP for both 5 DA/0 Aggressor and 0 DA/5 Aggressor. (The number of attack rounds is independent of specific damage values.) Of course, the relative effectiveness of Aggressor is higher when your hit rate is lower, as is usually the case when targeting anything more difficult than greater colibri. Then it might be useful to compare max DA and max Aggressor for lower levels of a baseline hit rate.

Ultimately we want to know what the differences in long-run "damage over time" are, but first we can look at the average number of attack rounds, as that is an indirect measure of time. (Assume number of seconds per attack round is constant.) Unfortunately, an analytic expression of the average number of attack rounds to reach 100 TP is too annoying to derive primarily because the number of attack rounds needed to reach 100 TP depends on the TP return of the previous weapon skill, which is almost never zero for a multi-hit weapon skill with a decent hit rate. The number of hits to 100, given initial TP, seems basically to follow a Poisson process, but I'd rather not worry about cumbersome calculations. Therefore, I resorted to simulation to generate the following approximate values based on my warrior setup (varying the Group 1 merit configurations, obviously), given baseline hit rate and the use of a 3-hit weapon skill (Raging Rush or King's Justice):

Average number of attack rounds given baseline hit rate

5/0 2/4 0/5
0.2 20.19 19.81 19.87
0.3 14.60 14.52 14.61
0.4 11.40 11.39 11.47
0.5 9.31 9.33 9.41
0.6 7.83 7.86 7.95
0.7 6.73 6.78 6.84
0.75 6.29 6.33 6.39
0.8 5.88 5.93 5.99
0.825 5.71 5.74 5.81


Here, the first column corresponds to baseline hit rate (before the Aggressor bonus), and the next three columns correspond to different Group 1 merit configurations:

"5/0": 5 double attack, 0 Aggressor
"2/4": 2 double attack, 4 Aggressor (mine)
"0/5": 0 double attack, 5 Aggressor

Then, we can obtain values representing "damage over time" in terms of hits per round, given the baseline (or nominal) hit rate:

Average number of hits per round given baseline hit rate

5/0 2/4 0/5
0.2 0.336 0.341 0.340
0.3 0.458 0.460 0.456
0.4 0.580 0.579 0.573
0.5 0.702 0.698 0.691
0.6 0.819 0.818 0.811
0.7 0.946 0.935 0.925
0.75 1.006 0.996 0.983
0.8 1.069 1.054 1.041
0.825 1.097 1.085 1.071


The max DA configuration is already about even with max Aggressor at 30% baseline hit rate, and it really starts to pull away as the baseline hit rate increases (especially after the point where Aggressor does not provide the full accuracy bonus, past 82.5% hit rate), so to me there is scant justification for 5/5 Aggressor. This makes sense as fully merited Aggressor provides an average 1.5% hit rate increase over non-merited Aggressor, which pales in comparison to the increase in "damage over time" that can be conferred by 5 double attack in the presence of high levels of accuracy. This analysis doesn't account for multi-hit weapons such as Ridill and Joyeuse, but the relative differences between 5/0 and 0/5 should still favor 5 DA merits even though the gap may close. And of course, this post doesn't account for actual damage per hit, but DA and hit rate are "independent" of damage per hit anyway (hits/time × damage/hit = damage/time!) and it's not that much of a reach to estimate real "damage over time" by factoring in an average damage per hit.

I found it helpful to plot attack rounds vs. hit rate to illustrate that the average number of attack rounds to 100 TP levels off as hit rate increases:



Obviously the rate of change in the number of attack rounds to 100 TP is decreasing in magnitude (but is still negative) with hit rate. But the number of attack rounds is not a direct measure of damage over time. Damage over time is a ratio of, yes, damage over time. The number of attack rounds is a proxy for time, and is not a ratio.

The number of hits, given the number of attack rounds, on the other hand, is a measure of damage, so dividing the number of hits by the number of attack rounds gives a quantity that can stand in for "damage over time," as plotted below vs. nominal hit rate:



Of course, there is no reason to plot such a thing because intuitively the rate of change of hits/round must be constant (we're plotting hit rate vs. hit rate!), especially if you believe that 2 points of accuracy always corresponds to 1% hit rate between 20% hit rate and 95% hit rate. If you do, it's complete nonsense to speak of damage over time showing "diminishing returns" to hit rate. Hit rate leveling off with accuracy in some logistic fashion is another story though.