Saturday, July 31, 2010

Aspir and Drain modeling: an incomplete picture

Earlier, I made some bold statements (at least by my standards) about the distribution of unresisted values of Drain and Aspir. I proposed a model from which I can make explicit, testable predictions.

Specifically, I wanted to see if the model holds at 114 dark magic skill, which can be attained by subbing /DRK on any job without any native dark magic skill (in my case NIN/DRK). Based on the model I described previously, the maximum Aspir without any potency-increasing equipment used is 114/3 + 20 = 58 MP, and the minimum is half that, or 29 MP.

I went out to cast Aspir on Stone Eaters (North Gustaberg (S)) and after the seventh cast I obtained an Aspir of 63 MP, which exceeds the stipulated maximum, so the model doesn't hold for 114 dark magic skill. The following are the observed data values (in order observed):

51 44 56 49 25 51 63 53 35 62 62 46 41 32 44


I then cast Drain on the same Stone Eaters with 246 dark magic skill (NIN75/SCH35 with Dark Arts) and obtained the following results (stem-and-leaf plot):

   6 | 49
7 |
8 | 4
9 |
10 |
11 |
12 |
13 | 4445
14 | 68
15 | 044
16 | 46
17 | 37
18 | 9
19 | 4
20 | 23789
21 | 79
22 | 026
23 | 55
24 | 277
25 | 36
26 | 01346


(The reason I used Drain and not Aspir was that my scholar is only level 35, and Aspir is accessible at level 36. I would prefer to gather data for Aspir because Aspir values are obviously less variable than Drain values.)

Given 246 dark magic skill, the predicted maximum for Drain is 266 and the predicted unresisted minimum is 133 (both without any potency gear or day/weather effects), so the distribution of Drain (as represented by the sample) seems consistent with the model.

In conclusion, if one were to be technical in describing the scope of the model I proposed earlier, I would say the model is (likely) valid for Drain between 246 and 300 dark magic skill. It is valid for Aspir between 269 and 300 dark magic skill. And, finally, it is valid for Drain II between 285 and 300 dark magic skill. What happens in the 100s is just not that relevant. (In case you are wondering what data I am referring to, you'd have to check my old posts on Drain and Aspir).

Saturday, July 24, 2010

The "new" melee pDIF

This post is definitely not for those who neither understand nor care about what melee "pDIF" is all about and why it can be of interest, so I find no point in making some sort of "for dummies" kind of introduction and will just jump into the results.

First, a reference to the "new" melee pDIF should be seen as a sarcastic gesture, as there likely have no been wholesale changes to pDIF after the August 2007 version update that brought the gameplay-altering "two-handed weapon adjustment." Therefore, the following results are assumed to reflect the actual changes to pDIF made in August 2007.

Data and results

The guy who plays Masamunai (currently of Cerberus) provided this spreadsheet of data, having tabulated the observed damage values for various ratios of attack to defense (without level correction), using both one-handed and two-handed weapons, on level 63-65 Lesser Colibri and then "standardizing" them to approximate observed pDIF values (acknowledging estimation error associated with in-game truncation of values). There are more details concerning the raw data and he provided his own analysis, but I prefer to do my own analysis so you don't necessarily have to review the spreadsheet yourself.

The following is an image attempting to plot 67,123 of the observed pDIF data values (almost of all the data) to show primarily how the minimum, maximum, and (most important to me) mean pDIF for both critical and non-critical ("normal") hits varies with the ratio of attack to defense:



It is somewhat difficult to plot 67,123 data values cleanly and elegantly with limited resolution, so I exploited transparency of data points, resulting in narrow "bands" that vary in opacity from top to bottom, an attempt to illustrate roughly the relative "density" of observed values. Each band represents the entirety of the data collected for a given attack/defense ratio. Another interpretation is that each band represents the observed conditional distribution of pDIF for a given attack/defense ratio.

The bands for critical pDIF are generally less "dense" or less opaque than those for normal pDIF, reflecting that fact that there are many more data points for normal pDIF (55,956 versus 11,127). Also, the bands are generally most translucent at the endpoints, reflecting the fact that the observed data at the extremes of each conditional pDIF distribution (for a given attack/defense ratio) occur relatively less frequently, which is consistent with the idea that pDIF is now a function of two uniform random variables (either the sum or the product), which follows a trapezoidal(-like) distribution. (But I will not be discussing probability distributions today.)

Aside from the plotting of the data values, regression lines for the mean pDIF (controlling for attack/defense ratio) were also plotted (lines based on ordinary least squares, which is justifiable as there are a lot of data points involved for each level of attack/defense considered). Regression was done in an informal piecewise fashion, as there are specific ranges of attack/defense ratio where the variance of pDIF is obviously not constant, specifically for three cases:
  • where there is a critical pDIF upper limit imposed (3.15 when attack/defense is approximately greater than 1.65)
  • where there is a normal pDIF lower limit imposed (1.00 when attack/defense is between 1.25 and 1.5), and
  • where the mode of normal pDIF is 1.00 and the mode does not occur at the left endpoint of the pDIF distribution (when attack/defense is less than 1.25). It should be noticed that it is impossible to discern the mode of pDIF (conditional on a given attack/defense ratio) based on the above graph. One would have to consult the original source as cited above.
I hope that will suffice as an explanation for the elements of the graph.

Interpretations and conclusions

These are a few of the things one could take away from the graph above.

Aside from the maximum attack/defense ratio attainable, there appear to be no differences in pDIF between one-handed weapons and two-handed weapons. I have incorrectly thought otherwise in the past, but I assumed people who cared about this knew what they were talking about. Obviously not.

While there is no data for two-handed weapons below 1.398 attack/defense ratio, I would invoke model parsimony and assert there is no good reason to expect differences at lower values of attack/defense. Although it is not shown above (and cannot be shown above cleanly), 2.00 is the maximum attack/defense ratio for one-handed weapons, and 2.25 is the maximum attack/defense ratio for two-handed weapons. Support for the these maxima can be found in the spreadsheet.

The ceiling on critical hit pDIF first occurs near 1.65 attack/defense. Moreover, the value of the ceiling, 3.15, is the modal (most frequently occurring) pDIF for attack/defense ratios above 1.65.

Mean pDIF, as a function of attack/defense, does NOT increase at the same rate for critical hits as for normal hits for a given value of attack/defense. A consequence of this is there is no pat way to relate normal pDIF to critical pDIF, like critical pDIF = normal pDIF + 1. To see what I mean, refer to this blog entry (JP), particularly the first image, to get a sense of how pDIF was incorrectly perceived more than a year after the August 2007 version update (a mish-mash of the critical hit pDIF ceiling of 3.15, increased attack/defense ratio maximum, and the old pDIF model).

Irrelevant considerations

This is a matter of personal preference, but I consider the so-called "secondary randomizer" an irrelevant red herring. pDIF as a product of two uniform random variables or sum of two uniform random variables, so what? (But I will note that the slope of mean pDIF without the second random factor does not change if the factor is added and does change if the factor is multiplied.) I just know it's there and I can explain what can cause it, but it is not very important for estimating mean pDIF, which is why I even made this post in the first place.

I also do not care about exactness of any pDIF model. Approximately true is fine with me as far as modeling rates of damage is concerned. (There are other factors when completely ignored or incorrectly computed that cause much more error than mere sampling error based on 60,000+ samples).

Formulas for mean pDIF as a function of attack/defense ratio and whether the weapon is one-handed or two-handed

These formulas are based on the regression estimates. (You may have noticed discontinuities in the piecewise mean pDIF functions suggested in the graph, but I do not care that much about fudging the estimates to eliminate that.) For normal-hit pDIF, which I have denoted as MNormal, the estimated mean of MNormal, as a function of H, the number of hands required to wield a weapon (H = 1, 2), and R, the attack/defense ratio (level-corrected or otherwise), is



For critical-hit pDIF, the functional relationship between that and H and R is



The following is the output of the regression procedure. I only include this to show that there is no reason to expect that the coefficient of determination be high, mainly because there is inherent variability of pDIF.

***Regression for normal pDIF, ATK/DEF < 1.25****

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.225403 0.010107 22.3 <2e-16 ***
ratio 0.782699 0.009748 80.3 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1621 on 12257 degrees of freedom
Multiple R-squared: 0.3447, Adjusted R-squared: 0.3446
F-statistic: 6447 on 1 and 12257 DF, p-value: < 2.2e-16



***Regression for normal pDIF, 1.25 < ATK/DEF < 1.5****

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.07274 0.04129 1.762 0.0781 .
ratio 0.90232 0.02969 30.390 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2369 on 14254 degrees of freedom
Multiple R-squared: 0.06085, Adjusted R-squared: 0.06078
F-statistic: 923.6 on 1 and 14254 DF, p-value: < 2.2e-16



***Regression for normal pDIF, ATK/DEF > 1.5****

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.294339 0.016866 -17.45 <2e-16 ***
ratio 1.162306 0.009566 121.50 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.2518 on 29479 degrees of freedom
Multiple R-squared: 0.3337, Adjusted R-squared: 0.3336
F-statistic: 1.476e+04 on 1 and 29479 DF, p-value: < 2.2e-16



***Regression for critical pDIF, ATK/DEF < 1.65****

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.94139 0.01739 54.14 <2e-16 ***
ratio2 1.07335 0.01273 84.29 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.232 on 6745 degrees of freedom
Multiple R-squared: 0.513, Adjusted R-squared: 0.5129
F-statistic: 7106 on 1 and 6745 DF, p-value: < 2.2e-16



***Regression for critical pDIF, ATK/DEF > 1.65****

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.59673 0.04287 37.25 <2e-16 ***
highratio 0.68607 0.02344 29.27 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1905 on 4377 degrees of freedom
Multiple R-squared: 0.1637, Adjusted R-squared: 0.1635
F-statistic: 856.8 on 1 and 4377 DF, p-value: < 2.2e-16

Thursday, July 22, 2010

Drain and Aspir minimum and maximum

Figuring out things that were already figured out years ago

After reviewing all my posts on Drain and Aspir, I finally realized that I had enough information to specify the minimum and maximum unresisted values, as a function of dark magic skill, for both Drain and Aspir, and since the distribution of unresisted values can be considered approximately uniformly distributed (acknowledging flooring effects), the mean Drain and Aspir can also be specified with reasonable certainty.

Of course, this was something that was figured out years ago. This Drain and Aspir summary page (JP) has some interesting things to say about Vampiric Mitts/Boots, Diabolos's Pole, Y's Scythe, and factors other than equipment, day/weather, and magic burst affecting Drain that do not affect Aspir, but I will leave it to those interested to check out that source while I summarize the the minima and maxima for Drain, Aspir, and Drain II, and speculate on those for Aspir II.

All specified extrema are defined for dark magic skill between 0 and 300, and for clarity's sake, factors that increase Drain or Aspir potency are not included in the statements aside from dark magic skill. It should be understood that any of those other factors increase the maximum and minimum by some multiplicative constant, and these constants can be multiplied in succession (generally, the order of multiplication is first the constant for equipment, then that for day/weather, then that for magic burst, with some twists on Drain potency you can look up yourself) with flooring after each multiplication step to obtain the final values of the maximum and minimum.

Maximum and minimum of Drain

The maximum unresisted Drain is

skill + 20,

and the minimum unresisted Drain is

floor(0.5(skill + 20)).

It follows that the mean unresisted Drain is approximately

0.75(skill + 20).

Currently 320 HP is the highest possible Drain return before any other potency-increasing factors are considered.

Each 1-point increase in dark magic skill (up to 300 total skill) increases the maximum Drain by 1 HP. Other potency-enhancing factors can increase the magnitude of this marginal return.

Maximum and minimum of Aspir

The maximum unresisted Aspir is

floor(skill/3 + 20),

and the minimum unresisted Aspir is

floor(0.5floor(skill/3 + 20)).

It follows that the mean unresisted Aspir is approximately

0.75(skill/3 + 20).

Currently 120 MP is the highest possible Aspir return before any other potency-increasing factors are considered.

Each 3-point increase in dark magic skill (up to 300 total skill) increases the maximum Aspir by 1 MP. Other potency-enhancing factors can increase the magnitude of this marginal return.

Maximum and minimum of Drain II

This discussion of the Hirudinea Earring effect (Hirudinea Earring seems to increase the potency of Drain or Aspir by either 2.5% or 3%) also has Drain II data that is consistent with the following expressions for the minimum and maximum of Drain II.

One thing to note is that Drain II has less variability than Drain I. (A dot plot would help show this.) Another thing to note is that the observed minima seem as though they are based on the expression for the maximum of Drain I. Then the minimum unresisted Drain II is

skill + 20.

The maximum unresisted Drain II is

skill + 85.

The mean unresisted Drain II is

skill + 52.5.

Each 1-point increase in dark magic skill (up to 300 total skill) increases both the minimum and maximum Drain II by 1 HP. Other potency-enhancing factors can increase the magnitude of this marginal return.

Currently 385 HP is the highest possible Drain II return before any other potency-increasing factors are considered.

Note that unlike Drain and Aspir, the variability of Drain II values appears not to vary with dark magic skill. Instead, varying skill merely shifts the distribution of unresisted values to the left or the right.

Minimum and maximum of Aspir II?

While I have yet to see anything regarding Aspir II, I wouldn't be surprised to see the minimum be

floor(skill/3 + 20)

and the maximum be

floor(skill/3 + 85),

but this is subject to verification based on others' experience.

Summary

The following table (image) summarizes the maximum, minimum and (approximate) mean unresisted value of Drain, Aspir, and Aspir II as a function of dark magic skill ("capping" at 300 skill) when no other potency-increasing factors are present.

Table of extrema and means for Drain, Aspir, and Drain II

Other potency-increasing factors increase the minimum and maximum by their corresponding multiplicative constant (e.g. that for equipment), with flooring after each multiplication step.

Tuesday, July 20, 2010

Increasing Drain potency

This is the last of a series of posts describing my investigation of Drain mechanics, particularly what things increase Drain potency and by how much. The following findings are based yet again on casting Drain on Zvahl Fortalices, details of which (justification, limitations, and whatnot) I described in previous posts (but weren't that detailed).

The motivation: Excelsis Ring


Having received the Excelsis Ring from the Bastok sequence of Wings of the Goddess nation-oriented quests, I wondered whether this ring actually increased the potency of Drain, which should be understood as increasing the maximum Drain or the average (mean) Drain value.

Having previously shown that Drain potency could possibly "cap" at 300 dark magic skill holding all other relevant factors fixed, I felt that it would be useful to verify, in the process of determining whether Excelsis Ring actually increases Drain potency, that Drain potency doesn't increase at higher levels of skill (again), specifically 331 dark magic skill.

After establishing two baseline samples at 331 dark magic skill given no other potency-enhancing equipment (or day/weather effects), I then obtained a sample adding Excelsis Ring to the "baseline." After that, I figured it would be helpful also to quantify the effects of Pluto's Staff and Dark weather on Drain potency, both relative to the baseline.

Wait a second... aren't you ever going to describe an experimental procedure in more detail?

I could, but seriously, is it that hard to think of equipment swaps that guarantee your current HP is 350-400 HP lower than your maximum HP? Hint: HP-increasing equipment upon Drain casting counts. Also, it isn't that hard to think of ways 331 dark magic skill can be attained. I could take a screenshot of the equipment used, but that would mean logging into the game.

Results




The above image is yet another set of dot plots providing a visual summary of the five samples obtained. In the past, I made no statements about the actual distribution of Drain (controlling for resist level), but by now it should be pretty obvious that a uniform distribution is a good model for the data, so further insights will be based on this model.

Also, there were a few Drain resists observed, but as they are obviously resists they can be ignored for the purposes of potency estimation, and the vertical lines denote the suspected minimum (160 HP) and maximum HP (320 HP) drained for the control samples, to be discussed later. A table (ASCII, yes I am that lazy) summarizing the extrema and median of the unresisted Drain values for each sample follows:

                    ·--------·-------------·-------------·------------·
| n | Minimum | Maximum | Median |
·-------------------·--------·-------------·-------------·------------·
| Baseline 1 | 48 | 162 | 305 | 223.0 |
| Baseline 2 | 50 | 160 | 319 | 238.0 |
| Excelsis Ring | 51 | 171 | 336 | 255.0 |
| Pluto's Staff | 61 | 184 | 365 | 263.0 |
| Dark weather | 30 | 190 | 349 | 288.5 |
·-------------------·--------·-------------·-------------·------------·
Speculating on what the extrema could tell us, it looks as though the variability of the data (as indicated by the range of observed values) is highest for Pluto's Staff, which is consistent with the idea that Pluto's Staff increases potency by some multiplicative factor. Also of interest is the possibility that the minimum is merely half of the maximum, so the observed minima could provide some insight on what the maxima should be.

Another obvious thing to note is that even at 331 dark magic skill, there is still no evidence that Drain potency is higher compared to that at 300 skill, so it is reasonable to conclude, considering all the data to date, that the Drain "cap" (holding other potency-enhancing factors fixed) seems to be met somewhere near 300 dark magic skill (if not exactly at 300). Technically, I could invoke an equivalence test yet again (as I did in a previous post), but you would have to be a tool to demand one here.

Since we know that a Pluto's Staff and single Dark weather each increase direct-damage magic (as shown by changes to the initial damage of any of the Bio spells, allowing for truncation) by 15% and 10% respectively, it wouldn't be surprising if these factors increased Drain potency by the same amount. It stands to reason that Excelsis Ring could behave in the same way (but increasing potency by a lesser amount).

One way to estimate the mutiplicative factors is to divide the observed maximum Drain for each of the non-baseline samples by the observed maximum for the baseline samples combined.

Excelsis Ring appears to increase Drain potency by a percentage near (336/319 - 1)100% = 5.3%. But I suspect the maximum Drain for the baseline is 320 (I just wasn't lucky enough to observe it), which would mean the increase in Drain potency from Excelsis Ring could be 5%.

Keeping in mind that the maximum Drain for the baseline could be 320, then Pluto's Staff appears to increase Drain potency by 15%, and Dark weather appears to increase Drain potency by 10%, and these figures are consistent with their effects on direct-damage magic.

An alternative, more statistical method of potency estimation is to divide the mean Drain value for each non-baseline sample by the mean Drain value of the pooled baseline sample. The ratio is then an estimate of the multiplicative factor for the potency-increasing equipment of interest. From the basic bootstrap, a set of simultaneous 95% confidence intervals for the ratios of the means can be obtained.

                    ·----------------·-------------------------------·
| Mean ratio | Confidence interval (95%) |
·-------------------·----------------·-------------------------------·
| Excelsis Ring | 1.088056 | (0.996482, 1.183355) |
| Pluto's Staff | 1.144933 | (1.057611, 1.237342) |
| Dark weather | 1.193882 | (1.085225, 1.303625) |
·-------------------·----------------·-------------------------------·

(Edit: 07/24/2010. I realized the bootstrapped data also included Drain resists. Not correct to include them in the data. Estimates have been corrected.) Not surprisingly, statistical significance for the potency effect of Excelsis Ring isn't achieved, but this is merely a consequence of the sample size. The effects of Pluto's Staff and Dark weather are large enough to yield statistical significance.

Conclusion and open issues

Excelsis Ring appears to increase the potency of Drain by 5%. Pluto's Staff was verified to increase the potency of Drain by 15%, and single Dark weather was verified to increase the potency of Drain by 10%.

Now, this still leaves the accuracy effect of Excelsis Ring to be determined, but quantifying it is difficult as it likely does not impart a large accuracy bonus (if there is one at all). If I were to do so, I'd be interested in determining whether Dark weather or Darksday has an effect on Drain accuracy. If so, it wouldn't be unreasonable to generalize to other types of magic and conclude that the weather or day can affect magic accuracy in general.

Regarding the mechanics of Drain in particular, it seems that the distribution of Drain could be uniform (discrete or continuous with truncation, it doesn't really matter) and that it is parameterized only by the maximum, with the minimum possibly being exactly one-half the maximum. Provided that this holds, one obvious implication is that the variability of Drain increases with dark magic skill (up to a point), and that it also increases with other various potency factors present. Taken altogether, these factors can make it very difficult to draw any conclusions about Drain (and Aspir by analogy) based on eyeballing alone.

As far as the distribution of Drain values given some sort of resist, it still isn't clear how resists relate to non-resists, but resists aren't that interesting to me so I would never willingly investigate Drain resists.

Finally, why have I never made any explicit claims about what the actual value of the Drain cap should be? Considering that Zvahl Fortalices take increased magic damage, it's possible that they also take increased Drain damage, and 320 might not be the "true" cap given that no other potency-enhancing factors are in play. But I was interested in differences, not actual amounts.

Thursday, July 8, 2010

When to use Snake Eye?

What is the appropriate use of Snake Eye?

Recently, I was challenged on the notion that "it is typical to use Snake Eye to get a lucky Phantom Roll total or avoid an unlucky one" (my words).

Instead, it is recommended to save Snake Eye to get an 11 (XI) or to avoid an unlucky roll, while not using Snake Eye to get a lucky total. (Note that there is no reason that Snake Eye couldn't be used on 10 if Snake Eye wasn't used to get there regardless of whether there is intent to use Snake Eye to get a lucky roll.) Can it can be shown probabilistically which Snake Eye tactic is better?

Example: Corsair's roll

Corsair's roll being the only relevant roll for experience parties, let's just use this as an example. For reference, here are the experience point bonuses for "desirable" outcomes based on a tactic of rolling based on the expected value of the bonus if you continue to Double Up, given your current total, exceeding the bonus for your current total, which I have called an "expected value on Double Up" criterion (EVDU). Of course, with Snake Eye available, the unlucky total can be avoided entirely. (See this spreadsheet if that doesn't make sense):

Roll total EXP Bonus
------------------------
5 20%
7 15%
8 16%
9 8%
10 17%
11 24%

For all probability calculations, assume job abilities occur instantaneously without recast restrictions for the sake of clarity. Also assume for simplicity that Phantom Roll always has one of several desirable outcomes in effect (Phantom Roll never wearing off) and that Snake Eye can be used only once toward the final outcome. Also assume Phantom Roll lasts either 5 or 10 minutes because who merits Winning Streak? Finally, let us assume no bust-mitigation measures (for now).

First, let's consider the "recommended" course of action, which is to save Snake Eye to get an eleven or to avoid an unlucky roll. In other words, use Snake Eye on 9 or 10. This kind of makes sense to do, since eleven lasts longer and I don't really see a downside to Corsair's roll lasting 10 minutes.

The probability distribution (numbers rounded to ten digits) of getting one of the possible outcomes (shown previously) follows:

Roll total Probability
--------------------------
5 .3087705771
7 .1935656731
8 .1657878952
10 .1333804876
11 .1470336084
Bust .0514617630

Obviously, with 11 lasting twice as long as each of the others, the relevant consideration is the "proportion of time spent under each effect," so the probabilities need to be adjusted:

Roll total Probability
--------------------------
5 .2691905221
7 .1687532701
8 .1445362135
10 .1162829808
11 .2563719263
Bust .0448650871

Based on that, the expected (long run) EXP bonus is 18.357% if you use Snake Eye to get 11 and to avoid a 9.

Now, what if you take a more conservative tack and Snake Eye given three conditions: when you are on a 4 (one less than lucky, 5), when you are on a 9 (unlucky), or when you are on a 10? The probabilities (adjusting for time duration differences) shake out as follows:

Roll total Probability
--------------------------
5 .4864098317
7 .1305837864
8 .1050579060
10 .0752777122
11 .1621366109
Bust .0405341527

Based on that, the expected (long run) EXP bonus is 18.538% if you use Snake Eye to get a 5 (lucky), 10 (avoid an unlucky), or 11. This approach is better probabilistically and you have a lower probability of busting!

For the sake of comparison, the expected EXP bonus if you take a timid approach and avoid busting completely (never getting an 11), but use Snake Eye where it makes sense, is 17.89%.

Um... why don't you account for bust mitigation?

Suppose hypothetically that you can re-roll (start over) indefinitely (and instantaneously) to avoid a bust. Then the asymptotic probabilities for both Snake Eye tactics are as follows:

Roll total Snake Eye on 9 or 10 Snake Eye on 4, 9, or 10
----------------------------------------------------------------
5 .2818350774 .5069589846
7 .1766800353 .1361005051
8 .1513254427 .1094962435
10 .1217450847 .0784579382
11 .2684143600 .1689863286

The (limiting) expected EXP bonus when using Snake Eye on 9 or 10 is 19.22%, while the (limiting) expected EXP bonus when using Snake Eye on 4, 9, or 10 is 19.322%.

Conclusion

Two Snake Eye tactics were considered for Corsair's roll. One tactic is to use Snake Eye on 9 or 10 (emphasizing getting an 11 at the expense of getting a 5), and the other tactic is to use Snake Eye on 4, 9, or 10 (no emphasis on getting an 11).

Using Snake Eye on 4, 9, or 10 is a superior tactic regardless of attempts at bust mitigation.

By "going for broke" (getting an 11), you give up a sure thing, and the trade-off is not worth it (even if the difference is slight), and this is before considering time spent under suboptimal EXP bonuses in the process of achieving a desirable total.

Does Drain have a cap?

This is a continuation of a line of inquiry regarding the mechanics of Drain, specifically whether an "unconditional" maximum value of Drain exists (still thinking statistically...) and, if so, whether it can be achieved with a minimum level of dark magic skill such that additional skill does nothing to increase the unconditional maximum (holding all other relevant factors fixed, obviously).

To put it another way, does additional skill do nothing to increase Drain potency?

First, I supposed that a maximum Drain value could be achieved with a minimum dark magic skill of 300 (or somewhere around 300), and that 25 additional skill would therefore not change the average potency of Drain. Data were collected in the usual manner (taking care not to cast Drain with Dark weather in effect; see previous posts for more details on "experimental procedure"). All other factors affecting potency and or accuracy were held fixed for both samples. I obtained the following results:

The maximum Drain observed given 300 dark skill was 319 and the maximum observed given 325 dark skill was 318. Also note that there were more "obvious" resists under 300 skill than under 325.

Considering the data as a whole, does 25 additional dark magic skill really have no effect on Drain potency? Noting the lack of difference in maximum Drain values should be enough for most (and it is for me), but let us also consider a way to apply statistics.

First, we can treat this data as though we were doing equivalence testing, so we need to decide what average difference would be considered a sign that there is a difference in Drain potency.

Recall that last time, I observed that, given 269 dark magic skill, I observed a maximum Drain of 288. Here, I observed a maximum Drain of 319 given 300 dark magic skill. Does one point of additional skill really increase the maximum (and minimum) Drain by 1 HP? Who knows, but suppose it were the case. Then 25 additional skill would have increased the maximum (and average) Drain by 25 HP. Let us then consider any observed difference in averages between 0 and 25 to be the result of chance, provided that there really is no difference in potency. Of course, there are some observations that are obviously resists, so let us also use the crude cutoff that any values below 150 be excluded from the analysis.

By the logic of an equivalence test ("two one-sided tests"), a 90% confidence interval for the difference in average potency between 325 skill and 300 skill is (-7.003157, 19.665328). This confidence interval is completely contained in the interval [-25, 25], so from the standpoint of statistical significance one can say that 325 skill is equivalent to 300 skill as far as average potency is concerned.

Conclusions

There appears to be no difference in Drain potency between 300 dark magic skill and 325 dark magic skill, holding all other relevant factors fixed. Additional dark magic skill still appears to affect Drain accuracy beyond 300 dark magic skill, however. Drain potency could cap at 300 dark magic skill (holding other factors fixed).

Personally, these findings devalue somewhat my dark skill equipment that is devoted solely to Drain or Aspir, of which there are three pieces. Certain pieces such as Sorcerer's gloves and Wizard's tonban still provide large boosts to accuracy (at least it is expected).

Sunday, July 4, 2010

Magian weapons: mutli-attack rate estimation

First, I will recap what is "known" about the "occasionally attacks twice" (OAT) rate of Magian weapons, and then present an estimation of the the probability distribution of attacks for "occasionally attacks 2-3 times" (OA2-3T) Magian weapons.

Is there a universal "occasionally attacks twice" rate for Magian weapons?

Appealing to Occam's razor, one could assert that the lack of duplicate entries for "occasionally attacks twice" (OAT) in the .DATs means that all the Magian weapons with that trait have the same OAT rate. While I do not know whether that assertion is actually true, at least I can look at various pieces of published evidence to see if this notion of a universal rate has any traction.

The track record of English-language FFXI sources is dismal: there is exactly one forum post (that I am aware of) that presents any data that can be used to estimate the OAT rate. The "general consensus" is that it's 40%. (By comparison, the Joyeuse rate is 45%.) As far as I know, this is the only evidence that English-language users cite or allude to when making claims about the Magian OAT rate, which is pathetic, but in line with the natural incuriosity of the FFXI sheep.

Another data set that someone shared with me, concerning the OAT rate of a Magian great axe (Luchtaine) with a 19% base double attack rate, showed 344 double attacks out of 689 total attack rounds. Based on these counts, an interval estimate of the Magian OAT rate (given 95% confidence) is (.3357284, .4279119), which is also consistent with the idea of a 40% OAT rate. (Reasoning leading to the OAT rate estimation is similar to that for virtue weapons I discussed previously.)

Among Japanese sources, there is more data but an annoying lack of statistical consistency, if this one blog post is to be taken as a summary of all pieces of evidence regarding the Magian OAT rate. They can be grouped into two categories: evidence consistent with a 40% rate and evidence consistent with a rate higher than 40% but lower than 50% ("statistically significantly," what a gauche phrase). The foolish conclusion that the OAT rate is 43.75%, based on an idiotic pooling of the data, has no traction.

The attack distribution of "occasionally attacks 2-3 times" Magian weapons

Given the above discussion of the lack of reliable information on the Magian OAT rate, the prospect of getting reliable data concerning the attack distribution of Magian weapons that "occasionally attack 2-3 times" (OA2-3T) appears poor. In fact, there is one set of count data (source) for Magian OA2-3T hand-to-hand with MNK (whatever its actual name is for the weapon, I don't give a fuck) that can shed light on the matter, but it can only do so provided that the attack distribution associated with OA2-3T is the same for all Magian weapons and the data are actually credible. The counts are as follows (302 total):

2 attacks: 57
3 attacks: 98
4 attacks: 81
5 attacks: 48
6 attacks: 16
7 attacks: 2

In order to obtain estimates of the attack distribution probabilities for Magian OA2-3T, a probability model needs to be specified and estimation based on this model.

Let H denote the number of attacks in a given attack round. Let πn denote the probability of n = 1, 2, 3 attacks of a single hand in an attack round, and that the sum of the probabilities equals 1, and also let k denote the probability of a kick attack in an attack round. Provided that the number of attacks of one hand, the number of attacks of the other hand, and the number of kick attacks (all in a given attack round) are mutually independent, the probability mass function of H is


and 0 otherwise.

Aside: "Why do you care about kicks?" is a valid question. The answer is that the data were collected with a parser. Just as WAR cannot have a 0% double attack rate, MNK cannot have a 0% kick attack rate, and kparser cannot make the distinction between a kick and a punch (nor should we expect that kind of distinction to be made). Surely, a person can tell the difference, but why would you expect anyone to count manually when a parser is available? The occurrence of kicks does not provide any useful information about the attack distribution of an OA2-3T weapon (but can help validate the probability model), so all kicks do is introduce undesirable variability to the proceedings, but you can't do anything about it (other than get the data using PUP).

With the above data and probability model, maximum likelihood estimation can proceed. Of immediate concern is whether to assume that the kick attack rate, given 5/5 Kick Attack merits, is actually 17.5%. (Of course, I could let the kick attack rate be yet another parameter to estimate, but estimating four parameters with a sample size of 302 is not really that helpful.) People who play monk are generally fucking retarded, but I'll just use that rate. Using numerical methods, a set of point estimates and 95% simultaneous confidence intervals (Bonferroni, too lazy to care about other methods) is generated:

         p.hat  ci.lower  ci.upper
[1,] 0.4795746 0.4160971 0.5430521
[2,] 0.3377920 0.2492500 0.4263339
[3,] 0.1826334 0.1283014 0.2369655

Assuming the 17.5% kick attack rate is valid (the weakest assumption by far in my view, to go along with all the other assumptions upon which the analysis is based), the probability distribution of attacks for Magian OA2-3T is obviously not the same as that for the likes of Ridill, Mercurial Kris, and Soboro Sukehiro. The alleged 30:50:20 ratio for 1-3 attacks obviously does not agree with the data (and the corresponding estimates). Given the data, the multi-attack probability (including 2 and 3 attacks) could be 1/2, partitioning to 3/10 for two attacks and 1/5 for three attacks.

To put it another way, the ratio of 1-3 attacks could be 50:30:20 for Magian "occasionally attacks 2-3 times" weapons (generalizing from hand-to-hand to all weapons), and that's what I'll stand by until other data persuasively rejects that working hypothesis.

Addendum: numerical maximum likelihood estimation

Analytical MLE for the above case is a complete waste of time if it is even possible, so I tapped out an R script for the purposes of numerical estimation.

ll <- function(p,X,k) {
X2 = X[1]; X3 = X[2]; X4 = X[3]; X5 = X[4]; X6 = X[5]; X7 = X[6]
p1 = p[1]; p2 = p[2]

ll = -(X2*log(p1*p1*(1-k)) +
X3*log(2*p1*p2*(1-k) + p1*p1*k) +
X4*log((2*p1*(1-p1-p2)+p2*p2)*(1-k)+2*p1*p2*k) +
X5*log(2*p2*(1-p1-p2)*(1-k) + (2*p1*(1-p1-p2)+p2*p2)*k) +
X6*log((1-p1-p2)*(1-p1-p2)*(1-k) + 2*p2*(1-p1-p2)*k) +
X7*log((1-p1-p2)*(1-p1-p2)*k))
return(ll)
}

counts = c(57,98,81,48,16,2)
est = optim(c(.05,.05),ll,X=counts,k=.175,hessian=T,control=list(reltol=1E-40))
fim = solve(est$hessian);
p.hat = c(est$par,1-sum(est$par))

se = c(sqrt(diag(fim)),sqrt(sum(diag(fim))+2*fim[1,2]))
ci.lower = p.hat - qnorm(1-.05/(2*3))*se
ci.upper = p.hat + qnorm(1-.05/(2*3))*se
cbind(p.hat,ci.lower,ci.upper)

k = .175
fitted = c(p.hat[1]*p.hat[1],2*p.hat[1]*p.hat[2],2*p.hat[1]*p.hat[3] + p.hat[2]*p.hat[2],2*p.hat[2]*p.hat[3],p.hat[3]*p.hat[3],0)*(1-k) +
c(0,p.hat[1]*p.hat[1],2*p.hat[1]*p.hat[2],2*p.hat[1]*p.hat[3] + p.hat[2]*p.hat[2],2*p.hat[2]*p.hat[3],p.hat[3]*p.hat[3])*k
chisq.test(counts,p=fitted)

Thursday, July 1, 2010

Restraint: tentative findings

(Edit 07/09/2010: edited some figures to account for a Critical Attack bonus of 5%.)

This time I'm going to start with the current legitimate claims about the effect of Restraint, a level 78 warrior job ability that "enhances your weapon skill power with each normal attack you land, but prevents you from dealing critical hits" per the help description. Then I will go over the support for those claims, one at a time, and then discuss some implications for the effectiveness of Restraint.

Claims

  • Restraint's enhancement seems to manifest as a damage multiplier distinct from other factors such as TP bonus, pDIF, and "TP modifier" (fTP).
  • Restraint's enhancement is not exact but actually has some variability, controlling for the number of attacks landed. (The damage multiplier is effectively a random variable.)
  • The damage multiplier of Restraint appears to have a maximum of 1.5 (+50% bonus).
  • Restraint's enhancement is dependent on weapon delay. Generally speaking, the higher the weapon delay, the higher the damage increase per normal attack landed.
  • The damage multiplier appears to increase linearly with the number of landed attacks up to a maximum of 1.5 (controlling for weapon delay).

Is Restraint's effect really a simple damage multiplier?

Using the weapon skill Spirits Within, whose damage function I described when discussing how I determined Fencer's TP bonus, it is straightforward for anyone to show that Restraint doesn't provide a TP bonus like Fencer does.

But first, why use Spirits Within? Its damage is completely deterministic and can be calculated exactly given your current HP and current TP. If there is then any deviation from the predicted value, that deviation can be attributed to whatever factor you had changed. Of course, this doesn't say anything about weapon skills one would actually want to use (for reasons to be discussed later), but getting a general idea of how Restraint appears to work should help focus further investigation (in theory because no one really gives a shit about doing it).

Anyway, ruling out a TP bonus is easy enough as soon as you observe a damage return from Spirits Within under Restraint that is impossible were it the result of a TP bonus. To go over briefly how I determined this, I whacked a Zvahl Fortalice with a Trainee Sword/Trainee's Needle combination (5.1 TP/hit with Dual Wield II) until I got 107.1 TP, then used Spirits Within. (Actually this basically is the general experimental procedure performed to reach some of the conclusions about Restraint I described earlier.) Given my current HP of 1148, the predicted damage, given 107.1 TP, is 147, but the observed damage was 164. It is impossible to obtain 164 damage from a TP bonus (the damage equation I provided is exact and has yet to fail), so a TP bonus can be ruled out.

As for pDIF, obviously pDIF doesn't enter into Spirits Within damage, so one cannot really speak of any kind of pDIF bonus, whether additive or multiplicative.

Ruling out an fTP bonus like that from an elemental gorget is not as straightforward. A conceptually simple method is to determine, using the same weapon(s) (holding weapon delay and therefore TP/hit constant), whether the damage of one weapon skill scales by approximately (accounting for flooring) the same factor as the damage of another weapon skill with a different fTP "profile" given the same TP and the same number of landed attacks under Restraint.

If the scaling factors are dramatically different, this could be considered evidence of an additive fTP bonus and one can rule out the idea of a damage multiplier. However, I chose not to do this for the following reason.

The damage increase from Restraint has some variability...

Continuing to whack on a Fortalice, I made the unpleasant discovery that, even though Spirits Within damage is supposed to be completely deterministic (knowing only two pieces of information, TP and current HP, means being able to calculate the damage exactly), I observed some variability of damage return holding TP constant under the effect of Restraint. The predicted damage values (based on 1148 current HP) and the observed damage values (which have a relationship to the number of attacks landed with Restraint active) are given in the following text table as I was too lazy to use my inelegantly constructed table markup:
Attacks landed     TP        Predicted damage     Observed damage (Restraint)
------------------------------------------------------------------------------
19 102 143 160, 158, 160, 160, 158, 158
20 107.1 147 164, 163, 160, 166, 163, 164
21 112.2 147 167
22 117.3 152 174, 170, 179
23 122.4 156 177, 179
24 127.5 161 183, 180
25 132.6 165 184, 189
26 137.7 170 192, 197
27 142.8 170 195, 202
28 147.9 174 201, 201
29 153 179 209, 209, 209
30 158.1 183 214, 219
31 163.2 188 218, 223
32 168.3 188 225, 229, 218, 218
33 173.4 192
34 178.5 197 232, 236

To me, the fact that the "designers" apparently decided to make the Restraint enhancement a random variable is extremely obnoxious. (What the fuck is the point? Or was this unintentional?) If this is not merely an "anomaly" specific to Spirits Within, it makes it that much more annoying to pin down the effect of Restraint using weapon skills whose damage is normally variable. But at least now people should be aware of this.

Perhaps this is a glitch specific to dual wielding? I also checked for single wield and also observed variability of Spirits Within damage. I also attempted to check for damage variability with a magical WS whose damage is also "deterministic" (controlling for resist), but after getting a 87-damage quarter-resist and then a 171-damage half-resist with Seraph Blade (given 117.3 TP), I got very annoyed and switched back to Spirits Within for the purposes of exploring Restraint's effect further. Note that 171 is less than twice that of 87 (174), which can be considered evidence of variability fundamental to Restraint.

Restraint's maximum damage increase appears to be 50% (1.5 damage multiplier

For one iteration of Restraint, I wanted to see how much of a damage increase to Spirits Within I could get by accumulating as many landed attacks as possible within the 5-minute duration. I managed to get 110 hits in before using Spirits Within, which gave 807 damage, which happens to be exactly 1.5 times 538, the usual damage at 300 TP given 1148 HP.

Studio Gobli's version update notes also suggest that 50% might be the upper bound for the weapon skill damage bonus. But more important, the update notes indicates that Restraint's effect appears to be dependent on weapon delay.

The effect of Restraint depends on weapon delay

Reiterating Studio Gobli's notes, given 20 landed attacks, the weapon skill bonus (the weapon skill used is not stated) is highest (+21%) for the weapon with the highest delay (444), lower (+17%) for the weapon with the second-highest delay (264), and lowest (+13%) for the weapon with the lowest delay (218). Again, I used a Trainee Sword/Trainee's Needle combination (187 delay per weapon given Dual Wield II), and referring back to my text table, the damage increase given 20 hits was observed to vary from +8.84% (160/147) to +12.9% (166/147), so my results are consistent with Studio Gobli's claims (seemingly unsourced by the way). Moreover, my use of dual wield suggests that Restraint is affected by effective weapon delay.

Therefore, the effect of Restraint on Spirits Within cannot be generalized to other weapon skills because of this dependence on weapon delay. But findings from my Spirits Within investigation can be considered a kind of lower bound on Restraint's WS damage bonus (not that I would actually check for anything below 187 weapon delay).

For a given weapon delay, Restraint's (apparent) damage multiplier may increase linearly with the number of landed attacks

Given the above data in the text table, as well as some other observations toward the "extremes" (based on the number of landed attacks below 19, the minimum number to get to 100 TP after Spirits Within, which never misses, and above 34), I just performed OLS regression mainly to see visually if it is "safe" to assume that the Restraint damage bonus scales linearly with the number of landed attacks:


It appears that linearity is a valid enough assumption, and it could be said that the Restraint damage bonus increases by about 0.00588515 (0.058%) for every additional landed attack, up to a maximum bonus of 1.5 (50%). Also note that my 110-attack observation is well off the trend line. Extrapolation here is not fatal; I predict that either 85 or 86 is the minimum number of landed attacks to reach the maximum bonus.

Implications for Restraint use

Here it may be useful to reflect on snap judgments about Restraint's potential utility (or lack thereof).

First, it has been shown that Mighty Strikes is unaffected by Restraint (but is Restraint unaffected by Mighty Strikes?), so there shouldn't be any disadvantage to using Restraint for the purposes of zerging, whatever the bonus is (or isn't).

Second, suppose that it is even desirable to achieve the maximum Restraint bonus to start a zerg off (say, for a 300 TP Steel Cyclone). There aren't that many situations where this is feasible, due to unavailability of mobs to "power up" Restraint and/or time limitations. (The "stored" WS damage potential disappears when Restraint wears off.)

Other than that, let's look at the use of Restraint from the long-run, "optimal" perspective of dealing damage, which means WS spamming and whatnot. Losing critical hit damage for increased weapon skill damage may not seem like a good trade-off, but whether the trade-off is acceptable is determined primarily by what the actual bonus is, which no one has yet to determine for the "usual" range of landed attacks before using a great axe weapon skill. (I would say the range is between 5 and 9.)

Of course, this doesn't mean one can't estimate how much of a WS damage increase there needs to be to offset the loss of critical hit damage during Restraint. To do this, consider that for two-handed weapons, the loss of critical hits in the auto-attack phase is relatively more "severe" for a low attack/defense ratio than a high one. Also consider that the loss of critical hit damage is relatively more severe if your critical hit rate is high than when it is low. Yet another factor to keep in mind that the more hits that end up being landed in the process of getting to 100 TP (think multi-hit weapons and being lazy), the greater the Restraint bonus must be to offset the loss of critical hits. Finally, since Restraint has often been mentioned with King's Justice (because it is thought that Raging Rush is adversely affected by Restraint), I will base my estimation on the basis of improving KJ damage.

With that in mind, I estimate that, for a high attack/defense ratio (such that the maximum average pDIF is attained without any level correction involved), the Restraint damage bonus (as a percent increase) needed to offset the loss of critical hit damage is between 3.1% and 4.6% given a 9% critical hit rate and between 8.2% and 12.3% given a 24% critical hit rate (on average). It may be that the actual WS damage bonus (for 5, 6, ... landed attacks) exceeds the above estimates given 504 delay, although it remains to be determined.

For a relatively more modest attack/defense ratio (corresponding to an average non-critical pDIF of 1.5 and average critical pDIF of 2.6; these are rough estimates based on someone's empirical observations, which I will not go over at this time), the Restraint damage bonus needed to offset the loss of critical hit damage is between 6.0% and 9.0% given a 9% critical hit rate and between 16.0% and 24.0% given a 24% critical hit rate. (I give ranges based on the some of the current "final upgrade" Magian great axes. But the low estimates are based on the horrible "occasionally attacks 2-3 times" great axe.)

The above suggests a place for Restraint where attack is high or where critical hit rate is low (or a combination of both, which might be experienced in a merit party), but more work needs to be done to justify that contention.