The Unbearable Triteness of Preening: 2010

Monday, September 13, 2010

Drain, Aspir, and Aspir II after the update.

Previous posts have shown that increasing dark magic skill beyond 300 did not increase Drain potency prior to the September 8, 2010 version update, but the version update details state that "maximum values have been increased for certain enhancing magic, dark magic, and blue magic spells whose potency is commensurate with casting skill." The following summarizes the results of checking the potency of Drain, Aspir, and Aspir II without any equipment or other factors that "enhance" any of these spells other than equipment with dark magic skill bonuses.

Raw data is available upon request or I may just post them in the comments below after a week if there is no feedback.

Drain

The distribution of Drain potency (unresisted Drain values) was checked at both 303 dark magic skill (my current dark magic skill level without any equipment) and 340 dark magic skill (with this equipment) on Zvahl Fortalices in Castle Zvahl Baileys (S). The following dot plots show an obvious increase in the maximum value of Drain. Although I was unable to check the distribution of Drain at 300 dark magic skill, it seems that the change in the maximum Drain was done merely by removing the "cap" on increasing it with dark magic skill beyond the 300 level, holding all other potency-increasing factors (maximum-increasing factors) fixed. (In this case, what was fixed was that there were no other factors "present" to affect my results.)

Previously, I gave a formula for the maximum Drain as a function of dark magic skill (without any other potency-affecting factors) as (skill) + 20. The following summary of the minimum and maximum suggests that this formula does not apply beyond 300 skill.

                        ·--------·---------·--------·------------·
                        |    n   |   Min   |   Max  |   Median   |
·-----------------------·--------·---------·--------·------------·
|   Drain (303 skill)   |   60   |   170   |   321  |    247.5   |
|   Drain (340)         |   60   |   173   |   345  |    248.5   |
·-----------------------·--------·---------·--------·------------·

For now, the maximum observed Drain without any other maximum-increasing factors present (or, for shorthand, "naked") is 345.

I do not have access to Drain II so I cannot check it at this time.

Aspir and Aspir II

I actually checked the distributions of Aspir and Aspir II (not omitting resisted values) before that for Drain, and I collected data at both 300 and 302 dark magic skill (on two separate occasions), thinking I had done so only at 300 without realizing later I had skilled up beyond 300. I was lazy and pooled the data, but it is straightforward to see that, similarly to the Drain maximum, the increase in the Aspir (and Aspir II) maximum was done merely by removing the "cap" on increasing it with dark magic skill beyond the 300 level. I checked at 339 dark magic skill with this equipment set. All data collection was done on Stone Eaters in North Gustaberg (S).

I had no idea what the distribution of Aspir II was like compared to Aspir before the update. The previous image shows that, at least after the September 2010 update, the variability of Aspir II (controlling for resist level) is higher than that of Aspir (no reason to assume anything about this beforehand) even though the maximum Aspir II is higher (as expected) as shown in the above image and following summary statistics table:

                            ·--------·---------·--------·------------·
                            |    n   |   Min   |   Max  |   Median   |
·---------------------------·--------·---------·--------·------------·
|   Aspir (300/302 skill)   |   50   |    60   |   120  |     85.0   |
|   Aspir (339)             |   50   |    50   |   135  |     95.5   |
|   Aspir II (300/302)      |   50   |    83   |   180  |    130.5   |
|   Aspir II (339)          |   50   |    56   |   202  |    152.0   |
·---------------------------·--------·---------·--------·------------·

Previously, I gave a formula for the maximum Aspir as a function of dark magic skill (without any other potency-affecting factors) as (skill/3) + 20 (decimals truncated where necessary). The above data shows that this formula does not really apply above 300 dark magic skill. I make no attempt to suggest a formula for the maximum of Aspir II at this time.

The maximum "naked" Aspir observed is 135, and the maximum observed Aspir II is 202.

Summary

Announced in the September 2010 version update, the increase in the maximum Drain, Aspir, and Aspir II values seems to be the result of allowing their potency to increase by increasing dark magic skill beyond 300, holding all other maximum-increasing factors fixed. Before the update, the Drain and Aspir maximum did not increase beyond 300 dark magic skill.

The variability of Aspir II values (controlling for resist level) is higher than that for Aspir values.

I observed a maximum "naked" Drain of 345 (340 dark magic skill), a maximum "naked" Aspir of 135 (339 skill) and maximum "naked" Aspir II of 202 (339 skill). The actual maxima are likely higher and are likely to be attained only by increasing dark magic skill further (well above 340).

Friday, August 27, 2010

Blitzer's Roll

Blitzer's Roll (COR Lv.83)
Reduces melee attack delay for party members within area of effect. Lucky number: 4. Unlucky number: 9.

Seeing as this is the only interesting thing from the update as far as TP-burning for one-handed melee DD vs. Hasso (you self-identify as WAR/DRK/DRG/SAM? Are you a tool? It's either piercing Hasso or non-piercing Hasso) is concerned (although that supposes anyone still cares about FFXI), let's anticipate what the effect might actually be.

Thoughts:

If this really reduces melee attack delay a la Sword Strap, maybe it will be exactly the same form of delay reduction as that from dual wield. (Why not?) If Blitzer's Roll actually stacks with dual wield and the bonuses are good, this would benefit dancer and ninja the most.
One-handed melee jobs get more from Blitzer's Roll even if it reduces weapon delay separately from dual wield, but it doesn't mean it'd be bad for two-handed melee jobs (depends on the actual bonuses), just that Fighter's and Samurai would be better. n-hit builds are overrated.
What if it's "job ability haste"? Wouldn't it be funny if the bonuses were large enough to obviate the use of Hasso and Haste Samba? Among other things that would follow: why are we using two-handed jobs for TP-burn then?

Friday, August 13, 2010

Samurai Roll and Fighter's Roll... for ninja

Having shown in the past that Fighter's Roll can be better than Samurai Roll for increasing warrior's damage output (despite the lack of consideration of WS delay, which actually would favor Fighter's Roll), I thought it might be nice to have a worked example for ninja (this time accounting for WS delay).

This image (Imageshack host) summarizes the computations, based on 95% hit rate, 15% DA rate, 55% haste, 40% dual wield, and a "cRatio" of 1.5, with a 10% critical hit rate, with Blade: Jin as the weapon skill. The specific katana combination is Hochomasamune/Uzura, which is something I would consider getting as a "high-value" option, meaning that it's a highly effective option that I don't have to waste as much time doing boring shit to get this as I would for other options.

First, let's start with Samurai Roll (augmented by the presence of a samurai). The following is a table summarizing the relative increase in the (theoretical maximum) rate of damage from each of several desirable Samurai Roll outcomes.

Roll total      Bonus     Damage/second     Relative efficiency
---------------------------------------------------------------
 -              -               113.607                       -
 2             42 STP           123.979                   9.13%
 7             26               119.672                   5.34%
 8             30               121.410                   6.87%
 9             32               121.414                   6.87%
10             34               121.458                   6.91%
11             50               126.684                  11.51%

The relative increase for each store TP bonus is decent, but not as high as you would expect for two-handed weapons since the proportion of total damage from Blade: Jin for katana is realistically never going to be as high as the proportion of total damage from the (generally) best weapon skills for two-handed weapons (e.g. Drakesbane, Raging Rush, Tachi: Gekko).

In contrast, Fighter's Roll (here augmented from the presence of a warrior) increases auto-attack damage, weapon skill frequency, and weapon skill damage, so it's not a surprise that Fighter's Roll is generally easily superior to Samurai Roll for increasing rates of damage:

Roll total      Bonus     Damage/second     Relative efficiency
---------------------------------------------------------------
 -               -              113.607                       -
 5             15% DA           129.567                  14.05%
 7             11%              125.291                  10.28%
 8             12%              126.358                  11.22%
10             13%              127.427                  12.16%
11             19%              133.857                  17.82%

Note that these are supposed to be the actual DA percent bonuses after the infamous August 2007 version update (source), you know, the one that resulted in ninja being disregarded as DD. If they were what they were before that update, assuming there was a change (higher than what they are now), Fighter's Roll would be even better.

So, if Fighter's Roll can be shown to be better for the job that gets the least relative benefit from Fighter's Roll (warrior), as well as for ninja (though this could be considered self-evident), are there even any legitimate reasons to use Samurai Roll? No, "TP overflow" is an incorrect and stupid answer that betrays a lack of conceptual understanding.

Now, that's not to say Samurai Roll has no application. Maybe it would be desirable to maximize the ratio of damage efficiency to TP "fed" to a mob rather than consider damage efficiency in isolation. I doubt Monk's Roll would do the job, but a case could be made for Samurai Roll.

Also, if it is desirable to maximize weapon skill frequency, Samurai Roll would generally be better for that purpose than Fighter's Roll. (Also related is maximizing TP per hit, which would be desirable for dancer.)

Also, Fighter's Roll can be said to be "contraindicated" for use with multi-hit weapons (that don't use virtue stones) such as Magian multi-hit weapons. (But this assumes Magian multi-hit weapons are actually good, which is not necessarily true.)

Saturday, July 31, 2010

Aspir and Drain modeling: an incomplete picture

Earlier, I made some bold statements (at least by my standards) about the distribution of unresisted values of Drain and Aspir. I proposed a model from which I can make explicit, testable predictions.

Specifically, I wanted to see if the model holds at 114 dark magic skill, which can be attained by subbing /DRK on any job without any native dark magic skill (in my case NIN/DRK). Based on the model I described previously, the maximum Aspir without any potency-increasing equipment used is 114/3 + 20 = 58 MP, and the minimum is half that, or 29 MP.

I went out to cast Aspir on Stone Eaters (North Gustaberg (S)) and after the seventh cast I obtained an Aspir of 63 MP, which exceeds the stipulated maximum, so the model doesn't hold for 114 dark magic skill. The following are the observed data values (in order observed):

51 44 56 49 25 51 63 53 35 62 62 46 41 32 44

I then cast Drain on the same Stone Eaters with 246 dark magic skill (NIN75/SCH35 with Dark Arts) and obtained the following results (stem-and-leaf plot):

(The reason I used Drain and not Aspir was that my scholar is only level 35, and Aspir is accessible at level 36. I would prefer to gather data for Aspir because Aspir values are obviously less variable than Drain values.)

Given 246 dark magic skill, the predicted maximum for Drain is 266 and the predicted unresisted minimum is 133 (both without any potency gear or day/weather effects), so the distribution of Drain (as represented by the sample) seems consistent with the model.

In conclusion, if one were to be technical in describing the scope of the model I proposed earlier, I would say the model is (likely) valid for Drain between 246 and 300 dark magic skill. It is valid for Aspir between 269 and 300 dark magic skill. And, finally, it is valid for Drain II between 285 and 300 dark magic skill. What happens in the 100s is just not that relevant. (In case you are wondering what data I am referring to, you'd have to check my old posts on Drain and Aspir).

Saturday, July 24, 2010

The "new" melee pDIF

This post is definitely not for those who neither understand nor care about what melee "pDIF" is all about and why it can be of interest, so I find no point in making some sort of "for dummies" kind of introduction and will just jump into the results.

First, a reference to the "new" melee pDIF should be seen as a sarcastic gesture, as there likely have no been wholesale changes to pDIF after the August 2007 version update that brought the gameplay-altering "two-handed weapon adjustment." Therefore, the following results are assumed to reflect the actual changes to pDIF made in August 2007.

Data and results

The guy who plays Masamunai (currently of Cerberus) provided this spreadsheet of data, having tabulated the observed damage values for various ratios of attack to defense (without level correction), using both one-handed and two-handed weapons, on level 63-65 Lesser Colibri and then "standardizing" them to approximate observed pDIF values (acknowledging estimation error associated with in-game truncation of values). There are more details concerning the raw data and he provided his own analysis, but I prefer to do my own analysis so you don't necessarily have to review the spreadsheet yourself.

The following is an image attempting to plot 67,123 of the observed pDIF data values (almost of all the data) to show primarily how the minimum, maximum, and (most important to me) mean pDIF for both critical and non-critical ("normal") hits varies with the ratio of attack to defense:

It is somewhat difficult to plot 67,123 data values cleanly and elegantly with limited resolution, so I exploited transparency of data points, resulting in narrow "bands" that vary in opacity from top to bottom, an attempt to illustrate roughly the relative "density" of observed values. Each band represents the entirety of the data collected for a given attack/defense ratio. Another interpretation is that each band represents the observed conditional distribution of pDIF for a given attack/defense ratio.

The bands for critical pDIF are generally less "dense" or less opaque than those for normal pDIF, reflecting that fact that there are many more data points for normal pDIF (55,956 versus 11,127). Also, the bands are generally most translucent at the endpoints, reflecting the fact that the observed data at the extremes of each conditional pDIF distribution (for a given attack/defense ratio) occur relatively less frequently, which is consistent with the idea that pDIF is now a function of two uniform random variables (either the sum or the product), which follows a trapezoidal(-like) distribution. (But I will not be discussing probability distributions today.)

Aside from the plotting of the data values, regression lines for the mean pDIF (controlling for attack/defense ratio) were also plotted (lines based on ordinary least squares, which is justifiable as there are a lot of data points involved for each level of attack/defense considered). Regression was done in an informal piecewise fashion, as there are specific ranges of attack/defense ratio where the variance of pDIF is obviously not constant, specifically for three cases:

where there is a critical pDIF upper limit imposed (3.15 when attack/defense is approximately greater than 1.65)
where there is a normal pDIF lower limit imposed (1.00 when attack/defense is between 1.25 and 1.5), and
where the mode of normal pDIF is 1.00 and the mode does not occur at the left endpoint of the pDIF distribution (when attack/defense is less than 1.25). It should be noticed that it is impossible to discern the mode of pDIF (conditional on a given attack/defense ratio) based on the above graph. One would have to consult the original source as cited above.

I hope that will suffice as an explanation for the elements of the graph.

Interpretations and conclusions

These are a few of the things one could take away from the graph above.

Aside from the maximum attack/defense ratio attainable, there appear to be no differences in pDIF between one-handed weapons and two-handed weapons. I have incorrectly thought otherwise in the past, but I assumed people who cared about this knew what they were talking about. Obviously not.

While there is no data for two-handed weapons below 1.398 attack/defense ratio, I would invoke model parsimony and assert there is no good reason to expect differences at lower values of attack/defense. Although it is not shown above (and cannot be shown above cleanly), 2.00 is the maximum attack/defense ratio for one-handed weapons, and 2.25 is the maximum attack/defense ratio for two-handed weapons. Support for the these maxima can be found in the spreadsheet.

The ceiling on critical hit pDIF first occurs near 1.65 attack/defense. Moreover, the value of the ceiling, 3.15, is the modal (most frequently occurring) pDIF for attack/defense ratios above 1.65.

Mean pDIF, as a function of attack/defense, does NOT increase at the same rate for critical hits as for normal hits for a given value of attack/defense. A consequence of this is there is no pat way to relate normal pDIF to critical pDIF, like critical pDIF = normal pDIF + 1. To see what I mean, refer to this blog entry (JP), particularly the first image, to get a sense of how pDIF was incorrectly perceived more than a year after the August 2007 version update (a mish-mash of the critical hit pDIF ceiling of 3.15, increased attack/defense ratio maximum, and the old pDIF model).

Irrelevant considerations

This is a matter of personal preference, but I consider the so-called "secondary randomizer" an irrelevant red herring. pDIF as a product of two uniform random variables or sum of two uniform random variables, so what? (But I will note that the slope of mean pDIF without the second random factor does not change if the factor is added and does change if the factor is multiplied.) I just know it's there and I can explain what can cause it, but it is not very important for estimating mean pDIF, which is why I even made this post in the first place.

I also do not care about exactness of any pDIF model. Approximately true is fine with me as far as modeling rates of damage is concerned. (There are other factors when completely ignored or incorrectly computed that cause much more error than mere sampling error based on 60,000+ samples).

Formulas for mean pDIF as a function of attack/defense ratio and whether the weapon is one-handed or two-handed

These formulas are based on the regression estimates. (You may have noticed discontinuities in the piecewise mean pDIF functions suggested in the graph, but I do not care that much about fudging the estimates to eliminate that.) For normal-hit pDIF, which I have denoted as M_Normal, the estimated mean of M_Normal, as a function of H, the number of hands required to wield a weapon (H = 1, 2), and R, the attack/defense ratio (level-corrected or otherwise), is

For critical-hit pDIF, the functional relationship between that and H and R is

The following is the output of the regression procedure. I only include this to show that there is no reason to expect that the coefficient of determination be high, mainly because there is inherent variability of pDIF.

***Regression for normal pDIF, ATK/DEF < 1.25****

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.225403   0.010107    22.3   <2e-16 ***
ratio       0.782699   0.009748    80.3   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.1621 on 12257 degrees of freedom
Multiple R-squared: 0.3447,     Adjusted R-squared: 0.3446 
F-statistic:  6447 on 1 and 12257 DF,  p-value: < 2.2e-16 



***Regression for normal pDIF, 1.25 < ATK/DEF < 1.5****

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.07274    0.04129   1.762   0.0781 .  
ratio        0.90232    0.02969  30.390   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.2369 on 14254 degrees of freedom
Multiple R-squared: 0.06085,    Adjusted R-squared: 0.06078 
F-statistic: 923.6 on 1 and 14254 DF,  p-value: < 2.2e-16 



***Regression for normal pDIF, ATK/DEF > 1.5****

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.294339   0.016866  -17.45   <2e-16 ***
ratio        1.162306   0.009566  121.50   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.2518 on 29479 degrees of freedom
Multiple R-squared: 0.3337,     Adjusted R-squared: 0.3336 
F-statistic: 1.476e+04 on 1 and 29479 DF,  p-value: < 2.2e-16



***Regression for critical pDIF, ATK/DEF < 1.65****

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.94139    0.01739   54.14   <2e-16 ***
ratio2       1.07335    0.01273   84.29   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.232 on 6745 degrees of freedom
Multiple R-squared: 0.513,      Adjusted R-squared: 0.5129 
F-statistic:  7106 on 1 and 6745 DF,  p-value: < 2.2e-16 



***Regression for critical pDIF, ATK/DEF > 1.65****

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.59673    0.04287   37.25   <2e-16 ***
highratio    0.68607    0.02344   29.27   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

Residual standard error: 0.1905 on 4377 degrees of freedom
Multiple R-squared: 0.1637,     Adjusted R-squared: 0.1635 
F-statistic: 856.8 on 1 and 4377 DF,  p-value: < 2.2e-16

Thursday, July 22, 2010

Drain and Aspir minimum and maximum

Figuring out things that were already figured out years ago

After reviewing all my posts on Drain and Aspir, I finally realized that I had enough information to specify the minimum and maximum unresisted values, as a function of dark magic skill, for both Drain and Aspir, and since the distribution of unresisted values can be considered approximately uniformly distributed (acknowledging flooring effects), the mean Drain and Aspir can also be specified with reasonable certainty.

Of course, this was something that was figured out years ago. This Drain and Aspir summary page (JP) has some interesting things to say about Vampiric Mitts/Boots, Diabolos's Pole, Y's Scythe, and factors other than equipment, day/weather, and magic burst affecting Drain that do not affect Aspir, but I will leave it to those interested to check out that source while I summarize the the minima and maxima for Drain, Aspir, and Drain II, and speculate on those for Aspir II.

All specified extrema are defined for dark magic skill between 0 and 300, and for clarity's sake, factors that increase Drain or Aspir potency are not included in the statements aside from dark magic skill. It should be understood that any of those other factors increase the maximum and minimum by some multiplicative constant, and these constants can be multiplied in succession (generally, the order of multiplication is first the constant for equipment, then that for day/weather, then that for magic burst, with some twists on Drain potency you can look up yourself) with flooring after each multiplication step to obtain the final values of the maximum and minimum.

Maximum and minimum of Drain

The maximum unresisted Drain is

skill + 20,

and the minimum unresisted Drain is

floor(0.5(skill + 20)).

It follows that the mean unresisted Drain is approximately

0.75(skill + 20).

Currently 320 HP is the highest possible Drain return before any other potency-increasing factors are considered.

Each 1-point increase in dark magic skill (up to 300 total skill) increases the maximum Drain by 1 HP. Other potency-enhancing factors can increase the magnitude of this marginal return.

Maximum and minimum of Aspir

The maximum unresisted Aspir is

floor(skill/3 + 20),

and the minimum unresisted Aspir is

floor(0.5floor(skill/3 + 20)).

It follows that the mean unresisted Aspir is approximately

0.75(skill/3 + 20).

Currently 120 MP is the highest possible Aspir return before any other potency-increasing factors are considered.

Each 3-point increase in dark magic skill (up to 300 total skill) increases the maximum Aspir by 1 MP. Other potency-enhancing factors can increase the magnitude of this marginal return.

Maximum and minimum of Drain II

This discussion of the Hirudinea Earring effect (Hirudinea Earring seems to increase the potency of Drain or Aspir by either 2.5% or 3%) also has Drain II data that is consistent with the following expressions for the minimum and maximum of Drain II.

One thing to note is that Drain II has less variability than Drain I. (A dot plot would help show this.) Another thing to note is that the observed minima seem as though they are based on the expression for the maximum of Drain I. Then the minimum unresisted Drain II is

skill + 20.

The maximum unresisted Drain II is

skill + 85.

The mean unresisted Drain II is

skill + 52.5.

Each 1-point increase in dark magic skill (up to 300 total skill) increases both the minimum and maximum Drain II by 1 HP. Other potency-enhancing factors can increase the magnitude of this marginal return.

Currently 385 HP is the highest possible Drain II return before any other potency-increasing factors are considered.

Note that unlike Drain and Aspir, the variability of Drain II values appears not to vary with dark magic skill. Instead, varying skill merely shifts the distribution of unresisted values to the left or the right.

Minimum and maximum of Aspir II?

While I have yet to see anything regarding Aspir II, I wouldn't be surprised to see the minimum be

floor(skill/3 + 20)

and the maximum be

floor(skill/3 + 85),

but this is subject to verification based on others' experience.

Summary

The following table (image) summarizes the maximum, minimum and (approximate) mean unresisted value of Drain, Aspir, and Aspir II as a function of dark magic skill ("capping" at 300 skill) when no other potency-increasing factors are present.

Table of extrema and means for Drain, Aspir, and Drain II

Table of extrema and means for Drain, Aspir, and Drain II

Other potency-increasing factors increase the minimum and maximum by their corresponding multiplicative constant (e.g. that for equipment), with flooring after each multiplication step.

Tuesday, July 20, 2010

Increasing Drain potency

This is the last of a series of posts describing my investigation of Drain mechanics, particularly what things increase Drain potency and by how much. The following findings are based yet again on casting Drain on Zvahl Fortalices, details of which (justification, limitations, and whatnot) I described in previous posts (but weren't that detailed).

The motivation: Excelsis Ring

Having received the Excelsis Ring from the Bastok sequence of Wings of the Goddess nation-oriented quests, I wondered whether this ring actually increased the potency of Drain, which should be understood as increasing the maximum Drain or the average (mean) Drain value.

Having previously shown that Drain potency could possibly "cap" at 300 dark magic skill holding all other relevant factors fixed, I felt that it would be useful to verify, in the process of determining whether Excelsis Ring actually increases Drain potency, that Drain potency doesn't increase at higher levels of skill (again), specifically 331 dark magic skill.

After establishing two baseline samples at 331 dark magic skill given no other potency-enhancing equipment (or day/weather effects), I then obtained a sample adding Excelsis Ring to the "baseline." After that, I figured it would be helpful also to quantify the effects of Pluto's Staff and Dark weather on Drain potency, both relative to the baseline.

Wait a second... aren't you ever going to describe an experimental procedure in more detail?

I could, but seriously, is it that hard to think of equipment swaps that guarantee your current HP is 350-400 HP lower than your maximum HP? Hint: HP-increasing equipment upon Drain casting counts. Also, it isn't that hard to think of ways 331 dark magic skill can be attained. I could take a screenshot of the equipment used, but that would mean logging into the game.

Results

The above image is yet another set of dot plots providing a visual summary of the five samples obtained. In the past, I made no statements about the actual distribution of Drain (controlling for resist level), but by now it should be pretty obvious that a uniform distribution is a good model for the data, so further insights will be based on this model.

Also, there were a few Drain resists observed, but as they are obviously resists they can be ignored for the purposes of potency estimation, and the vertical lines denote the suspected minimum (160 HP) and maximum HP (320 HP) drained for the control samples, to be discussed later. A table (ASCII, yes I am that lazy) summarizing the extrema and median of the unresisted Drain values for each sample follows:

                    ·--------·-------------·-------------·------------·
                    |    n   |   Minimum   |   Maximum   |   Median   |
·-------------------·--------·-------------·-------------·------------·
|   Baseline 1      |   48   |       162   |       305   |    223.0   |
|   Baseline 2      |   50   |       160   |       319   |    238.0   |
|   Excelsis Ring   |   51   |       171   |       336   |    255.0   |
|   Pluto's Staff   |   61   |       184   |       365   |    263.0   |
|   Dark weather    |   30   |       190   |       349   |    288.5   |
·-------------------·--------·-------------·-------------·------------·

Speculating on what the extrema could tell us, it looks as though the variability of the data (as indicated by the range of observed values) is highest for Pluto's Staff, which is consistent with the idea that Pluto's Staff increases potency by some multiplicative factor. Also of interest is the possibility that the minimum is merely half of the maximum, so the observed minima could provide some insight on what the maxima should be.

Another obvious thing to note is that even at 331 dark magic skill, there is still no evidence that Drain potency is higher compared to that at 300 skill, so it is reasonable to conclude, considering all the data to date, that the Drain "cap" (holding other potency-enhancing factors fixed) seems to be met somewhere near 300 dark magic skill (if not exactly at 300). Technically, I could invoke an equivalence test yet again (as I did in a previous post), but you would have to be a tool to demand one here.

Since we know that a Pluto's Staff and single Dark weather each increase direct-damage magic (as shown by changes to the initial damage of any of the Bio spells, allowing for truncation) by 15% and 10% respectively, it wouldn't be surprising if these factors increased Drain potency by the same amount. It stands to reason that Excelsis Ring could behave in the same way (but increasing potency by a lesser amount).

One way to estimate the mutiplicative factors is to divide the observed maximum Drain for each of the non-baseline samples by the observed maximum for the baseline samples combined.

Excelsis Ring appears to increase Drain potency by a percentage near (336/319 - 1)100% = 5.3%. But I suspect the maximum Drain for the baseline is 320 (I just wasn't lucky enough to observe it), which would mean the increase in Drain potency from Excelsis Ring could be 5%.

Keeping in mind that the maximum Drain for the baseline could be 320, then Pluto's Staff appears to increase Drain potency by 15%, and Dark weather appears to increase Drain potency by 10%, and these figures are consistent with their effects on direct-damage magic.

An alternative, more statistical method of potency estimation is to divide the mean Drain value for each non-baseline sample by the mean Drain value of the pooled baseline sample. The ratio is then an estimate of the multiplicative factor for the potency-increasing equipment of interest. From the basic bootstrap, a set of simultaneous 95% confidence intervals for the ratios of the means can be obtained.

                    ·----------------·-------------------------------·
                    |   Mean ratio   |   Confidence interval (95%)   |
·-------------------·----------------·-------------------------------·
|   Excelsis Ring   |     1.088056   |        (0.996482, 1.183355)   |
|   Pluto's Staff   |     1.144933   |        (1.057611, 1.237342)   |
|   Dark weather    |     1.193882   |        (1.085225, 1.303625)   |
·-------------------·----------------·-------------------------------·

(Edit: 07/24/2010. I realized the bootstrapped data also included Drain resists. Not correct to include them in the data. Estimates have been corrected.) Not surprisingly, statistical significance for the potency effect of Excelsis Ring isn't achieved, but this is merely a consequence of the sample size. The effects of Pluto's Staff and Dark weather are large enough to yield statistical significance.

Conclusion and open issues

Excelsis Ring appears to increase the potency of Drain by 5%. Pluto's Staff was verified to increase the potency of Drain by 15%, and single Dark weather was verified to increase the potency of Drain by 10%.

Now, this still leaves the accuracy effect of Excelsis Ring to be determined, but quantifying it is difficult as it likely does not impart a large accuracy bonus (if there is one at all). If I were to do so, I'd be interested in determining whether Dark weather or Darksday has an effect on Drain accuracy. If so, it wouldn't be unreasonable to generalize to other types of magic and conclude that the weather or day can affect magic accuracy in general.

Regarding the mechanics of Drain in particular, it seems that the distribution of Drain could be uniform (discrete or continuous with truncation, it doesn't really matter) and that it is parameterized only by the maximum, with the minimum possibly being exactly one-half the maximum. Provided that this holds, one obvious implication is that the variability of Drain increases with dark magic skill (up to a point), and that it also increases with other various potency factors present. Taken altogether, these factors can make it very difficult to draw any conclusions about Drain (and Aspir by analogy) based on eyeballing alone.

As far as the distribution of Drain values given some sort of resist, it still isn't clear how resists relate to non-resists, but resists aren't that interesting to me so I would never willingly investigate Drain resists.

Finally, why have I never made any explicit claims about what the actual value of the Drain cap should be? Considering that Zvahl Fortalices take increased magic damage, it's possible that they also take increased Drain damage, and 320 might not be the "true" cap given that no other potency-enhancing factors are in play. But I was interested in differences, not actual amounts.

Thursday, July 8, 2010

When to use Snake Eye?

What is the appropriate use of Snake Eye?

Recently, I was challenged on the notion that "it is typical to use Snake Eye to get a lucky Phantom Roll total or avoid an unlucky one" (my words).

Instead, it is recommended to save Snake Eye to get an 11 (XI) or to avoid an unlucky roll, while not using Snake Eye to get a lucky total. (Note that there is no reason that Snake Eye couldn't be used on 10 if Snake Eye wasn't used to get there regardless of whether there is intent to use Snake Eye to get a lucky roll.) Can it can be shown probabilistically which Snake Eye tactic is better?

Example: Corsair's roll

Corsair's roll being the only relevant roll for experience parties, let's just use this as an example. For reference, here are the experience point bonuses for "desirable" outcomes based on a tactic of rolling based on the expected value of the bonus if you continue to Double Up, given your current total, exceeding the bonus for your current total, which I have called an "expected value on Double Up" criterion (EVDU). Of course, with Snake Eye available, the unlucky total can be avoided entirely. (See this spreadsheet if that doesn't make sense):


Roll total     EXP Bonus
------------------------
 5             20%
 7             15%
 8             16%
 9              8%
10             17%
11             24%

For all probability calculations, assume job abilities occur instantaneously without recast restrictions for the sake of clarity. Also assume for simplicity that Phantom Roll always has one of several desirable outcomes in effect (Phantom Roll never wearing off) and that Snake Eye can be used only once toward the final outcome. Also assume Phantom Roll lasts either 5 or 10 minutes because who merits Winning Streak? Finally, let us assume no bust-mitigation measures (for now).

First, let's consider the "recommended" course of action, which is to save Snake Eye to get an eleven or to avoid an unlucky roll. In other words, use Snake Eye on 9 or 10. This kind of makes sense to do, since eleven lasts longer and I don't really see a downside to Corsair's roll lasting 10 minutes.

The probability distribution (numbers rounded to ten digits) of getting one of the possible outcomes (shown previously) follows:


Roll total     Probability
--------------------------
   5           .3087705771
   7           .1935656731
   8           .1657878952
  10           .1333804876
  11           .1470336084
Bust           .0514617630

Obviously, with 11 lasting twice as long as each of the others, the relevant consideration is the "proportion of time spent under each effect," so the probabilities need to be adjusted:


Roll total     Probability
--------------------------
   5           .2691905221
   7           .1687532701
   8           .1445362135
  10           .1162829808
  11           .2563719263
Bust           .0448650871

Based on that, the expected (long run) EXP bonus is 18.357% if you use Snake Eye to get 11 and to avoid a 9.

Now, what if you take a more conservative tack and Snake Eye given three conditions: when you are on a 4 (one less than lucky, 5), when you are on a 9 (unlucky), or when you are on a 10? The probabilities (adjusting for time duration differences) shake out as follows:


Roll total     Probability
--------------------------
   5           .4864098317
   7           .1305837864
   8           .1050579060
  10           .0752777122
  11           .1621366109
Bust           .0405341527

Based on that, the expected (long run) EXP bonus is 18.538% if you use Snake Eye to get a 5 (lucky), 10 (avoid an unlucky), or 11. This approach is better probabilistically and you have a lower probability of busting!

For the sake of comparison, the expected EXP bonus if you take a timid approach and avoid busting completely (never getting an 11), but use Snake Eye where it makes sense, is 17.89%.

Um... why don't you account for bust mitigation?

Suppose hypothetically that you can re-roll (start over) indefinitely (and instantaneously) to avoid a bust. Then the asymptotic probabilities for both Snake Eye tactics are as follows:


Roll total     Snake Eye on 9 or 10     Snake Eye on 4, 9, or 10
----------------------------------------------------------------
 5             .2818350774              .5069589846
 7             .1766800353              .1361005051
 8             .1513254427              .1094962435
10             .1217450847              .0784579382
11             .2684143600              .1689863286

The (limiting) expected EXP bonus when using Snake Eye on 9 or 10 is 19.22%, while the (limiting) expected EXP bonus when using Snake Eye on 4, 9, or 10 is 19.322%.

Conclusion

Two Snake Eye tactics were considered for Corsair's roll. One tactic is to use Snake Eye on 9 or 10 (emphasizing getting an 11 at the expense of getting a 5), and the other tactic is to use Snake Eye on 4, 9, or 10 (no emphasis on getting an 11).

Using Snake Eye on 4, 9, or 10 is a superior tactic regardless of attempts at bust mitigation.

By "going for broke" (getting an 11), you give up a sure thing, and the trade-off is not worth it (even if the difference is slight), and this is before considering time spent under suboptimal EXP bonuses in the process of achieving a desirable total.

Does Drain have a cap?

This is a continuation of a line of inquiry regarding the mechanics of Drain, specifically whether an "unconditional" maximum value of Drain exists (still thinking statistically...) and, if so, whether it can be achieved with a minimum level of dark magic skill such that additional skill does nothing to increase the unconditional maximum (holding all other relevant factors fixed, obviously).

To put it another way, does additional skill do nothing to increase Drain potency?

First, I supposed that a maximum Drain value could be achieved with a minimum dark magic skill of 300 (or somewhere around 300), and that 25 additional skill would therefore not change the average potency of Drain. Data were collected in the usual manner (taking care not to cast Drain with Dark weather in effect; see previous posts for more details on "experimental procedure"). All other factors affecting potency and or accuracy were held fixed for both samples. I obtained the following results:

The maximum Drain observed given 300 dark skill was 319 and the maximum observed given 325 dark skill was 318. Also note that there were more "obvious" resists under 300 skill than under 325.

Considering the data as a whole, does 25 additional dark magic skill really have no effect on Drain potency? Noting the lack of difference in maximum Drain values should be enough for most (and it is for me), but let us also consider a way to apply statistics.

First, we can treat this data as though we were doing equivalence testing, so we need to decide what average difference would be considered a sign that there is a difference in Drain potency.

Recall that last time, I observed that, given 269 dark magic skill, I observed a maximum Drain of 288. Here, I observed a maximum Drain of 319 given 300 dark magic skill. Does one point of additional skill really increase the maximum (and minimum) Drain by 1 HP? Who knows, but suppose it were the case. Then 25 additional skill would have increased the maximum (and average) Drain by 25 HP. Let us then consider any observed difference in averages between 0 and 25 to be the result of chance, provided that there really is no difference in potency. Of course, there are some observations that are obviously resists, so let us also use the crude cutoff that any values below 150 be excluded from the analysis.

By the logic of an equivalence test ("two one-sided tests"), a 90% confidence interval for the difference in average potency between 325 skill and 300 skill is (-7.003157, 19.665328). This confidence interval is completely contained in the interval [-25, 25], so from the standpoint of statistical significance one can say that 325 skill is equivalent to 300 skill as far as average potency is concerned.

Conclusions

There appears to be no difference in Drain potency between 300 dark magic skill and 325 dark magic skill, holding all other relevant factors fixed. Additional dark magic skill still appears to affect Drain accuracy beyond 300 dark magic skill, however. Drain potency could cap at 300 dark magic skill (holding other factors fixed).

Personally, these findings devalue somewhat my dark skill equipment that is devoted solely to Drain or Aspir, of which there are three pieces. Certain pieces such as Sorcerer's gloves and Wizard's tonban still provide large boosts to accuracy (at least it is expected).

Sunday, July 4, 2010

Magian weapons: mutli-attack rate estimation

First, I will recap what is "known" about the "occasionally attacks twice" (OAT) rate of Magian weapons, and then present an estimation of the the probability distribution of attacks for "occasionally attacks 2-3 times" (OA2-3T) Magian weapons.

Is there a universal "occasionally attacks twice" rate for Magian weapons?

Appealing to Occam's razor, one could assert that the lack of duplicate entries for "occasionally attacks twice" (OAT) in the .DATs means that all the Magian weapons with that trait have the same OAT rate. While I do not know whether that assertion is actually true, at least I can look at various pieces of published evidence to see if this notion of a universal rate has any traction.

The track record of English-language FFXI sources is dismal: there is exactly one forum post (that I am aware of) that presents any data that can be used to estimate the OAT rate. The "general consensus" is that it's 40%. (By comparison, the Joyeuse rate is 45%.) As far as I know, this is the only evidence that English-language users cite or allude to when making claims about the Magian OAT rate, which is pathetic, but in line with the natural incuriosity of the FFXI sheep.

Another data set that someone shared with me, concerning the OAT rate of a Magian great axe (Luchtaine) with a 19% base double attack rate, showed 344 double attacks out of 689 total attack rounds. Based on these counts, an interval estimate of the Magian OAT rate (given 95% confidence) is (.3357284, .4279119), which is also consistent with the idea of a 40% OAT rate. (Reasoning leading to the OAT rate estimation is similar to that for virtue weapons I discussed previously.)

Among Japanese sources, there is more data but an annoying lack of statistical consistency, if this one blog post is to be taken as a summary of all pieces of evidence regarding the Magian OAT rate. They can be grouped into two categories: evidence consistent with a 40% rate and evidence consistent with a rate higher than 40% but lower than 50% ("statistically significantly," what a gauche phrase). The foolish conclusion that the OAT rate is 43.75%, based on an idiotic pooling of the data, has no traction.

The attack distribution of "occasionally attacks 2-3 times" Magian weapons

Given the above discussion of the lack of reliable information on the Magian OAT rate, the prospect of getting reliable data concerning the attack distribution of Magian weapons that "occasionally attack 2-3 times" (OA2-3T) appears poor. In fact, there is one set of count data (source) for Magian OA2-3T hand-to-hand with MNK (whatever its actual name is for the weapon, I don't give a fuck) that can shed light on the matter, but it can only do so provided that the attack distribution associated with OA2-3T is the same for all Magian weapons and the data are actually credible. The counts are as follows (302 total):

2 attacks: 57
3 attacks: 98
4 attacks: 81
5 attacks: 48
6 attacks: 16
7 attacks: 2

In order to obtain estimates of the attack distribution probabilities for Magian OA2-3T, a probability model needs to be specified and estimation based on this model.

Let H denote the number of attacks in a given attack round. Let π_n denote the probability of n = 1, 2, 3 attacks of a single hand in an attack round, and that the sum of the probabilities equals 1, and also let k denote the probability of a kick attack in an attack round. Provided that the number of attacks of one hand, the number of attacks of the other hand, and the number of kick attacks (all in a given attack round) are mutually independent, the probability mass function of H is

and 0 otherwise.

Aside: "Why do you care about kicks?" is a valid question. The answer is that the data were collected with a parser. Just as WAR cannot have a 0% double attack rate, MNK cannot have a 0% kick attack rate, and kparser cannot make the distinction between a kick and a punch (nor should we expect that kind of distinction to be made). Surely, a person can tell the difference, but why would you expect anyone to count manually when a parser is available? The occurrence of kicks does not provide any useful information about the attack distribution of an OA2-3T weapon (but can help validate the probability model), so all kicks do is introduce undesirable variability to the proceedings, but you can't do anything about it (other than get the data using PUP).

With the above data and probability model, maximum likelihood estimation can proceed. Of immediate concern is whether to assume that the kick attack rate, given 5/5 Kick Attack merits, is actually 17.5%. (Of course, I could let the kick attack rate be yet another parameter to estimate, but estimating four parameters with a sample size of 302 is not really that helpful.) People who play monk are generally fucking retarded, but I'll just use that rate. Using numerical methods, a set of point estimates and 95% simultaneous confidence intervals (Bonferroni, too lazy to care about other methods) is generated:

         p.hat  ci.lower  ci.upper
[1,] 0.4795746 0.4160971 0.5430521
[2,] 0.3377920 0.2492500 0.4263339
[3,] 0.1826334 0.1283014 0.2369655

Assuming the 17.5% kick attack rate is valid (the weakest assumption by far in my view, to go along with all the other assumptions upon which the analysis is based), the probability distribution of attacks for Magian OA2-3T is obviously not the same as that for the likes of Ridill, Mercurial Kris, and Soboro Sukehiro. The alleged 30:50:20 ratio for 1-3 attacks obviously does not agree with the data (and the corresponding estimates). Given the data, the multi-attack probability (including 2 and 3 attacks) could be 1/2, partitioning to 3/10 for two attacks and 1/5 for three attacks.

To put it another way, the ratio of 1-3 attacks could be 50:30:20 for Magian "occasionally attacks 2-3 times" weapons (generalizing from hand-to-hand to all weapons), and that's what I'll stand by until other data persuasively rejects that working hypothesis.

Addendum: numerical maximum likelihood estimation

Analytical MLE for the above case is a complete waste of time if it is even possible, so I tapped out an R script for the purposes of numerical estimation.

ll <- function(p,X,k) {
X2 = X[1]; X3 = X[2]; X4 = X[3]; X5 = X[4]; X6 = X[5]; X7 = X[6]
p1 = p[1]; p2 = p[2]

ll = -(X2*log(p1*p1*(1-k)) +
X3*log(2*p1*p2*(1-k) + p1*p1*k) +
X4*log((2*p1*(1-p1-p2)+p2*p2)*(1-k)+2*p1*p2*k) +
X5*log(2*p2*(1-p1-p2)*(1-k) + (2*p1*(1-p1-p2)+p2*p2)*k) + 
X6*log((1-p1-p2)*(1-p1-p2)*(1-k) + 2*p2*(1-p1-p2)*k) +
X7*log((1-p1-p2)*(1-p1-p2)*k))
return(ll)
}

counts = c(57,98,81,48,16,2)
est = optim(c(.05,.05),ll,X=counts,k=.175,hessian=T,control=list(reltol=1E-40))
fim = solve(est$hessian);
p.hat = c(est$par,1-sum(est$par))

se = c(sqrt(diag(fim)),sqrt(sum(diag(fim))+2*fim[1,2]))
ci.lower = p.hat - qnorm(1-.05/(2*3))*se
ci.upper = p.hat + qnorm(1-.05/(2*3))*se
cbind(p.hat,ci.lower,ci.upper)

k = .175
fitted = c(p.hat[1]*p.hat[1],2*p.hat[1]*p.hat[2],2*p.hat[1]*p.hat[3] + p.hat[2]*p.hat[2],2*p.hat[2]*p.hat[3],p.hat[3]*p.hat[3],0)*(1-k) +
c(0,p.hat[1]*p.hat[1],2*p.hat[1]*p.hat[2],2*p.hat[1]*p.hat[3] + p.hat[2]*p.hat[2],2*p.hat[2]*p.hat[3],p.hat[3]*p.hat[3])*k
chisq.test(counts,p=fitted)

Thursday, July 1, 2010

Restraint: tentative findings

(Edit 07/09/2010: edited some figures to account for a Critical Attack bonus of 5%.)

This time I'm going to start with the current legitimate claims about the effect of Restraint, a level 78 warrior job ability that "enhances your weapon skill power with each normal attack you land, but prevents you from dealing critical hits" per the help description. Then I will go over the support for those claims, one at a time, and then discuss some implications for the effectiveness of Restraint.

Claims

Restraint's enhancement seems to manifest as a damage multiplier distinct from other factors such as TP bonus, pDIF, and "TP modifier" (fTP).
Restraint's enhancement is not exact but actually has some variability, controlling for the number of attacks landed. (The damage multiplier is effectively a random variable.)
The damage multiplier of Restraint appears to have a maximum of 1.5 (+50% bonus).
Restraint's enhancement is dependent on weapon delay. Generally speaking, the higher the weapon delay, the higher the damage increase per normal attack landed.
The damage multiplier appears to increase linearly with the number of landed attacks up to a maximum of 1.5 (controlling for weapon delay).

Is Restraint's effect really a simple damage multiplier?

Using the weapon skill Spirits Within, whose damage function I described when discussing how I determined Fencer's TP bonus, it is straightforward for anyone to show that Restraint doesn't provide a TP bonus like Fencer does.

But first, why use Spirits Within? Its damage is completely deterministic and can be calculated exactly given your current HP and current TP. If there is then any deviation from the predicted value, that deviation can be attributed to whatever factor you had changed. Of course, this doesn't say anything about weapon skills one would actually want to use (for reasons to be discussed later), but getting a general idea of how Restraint appears to work should help focus further investigation (in theory because no one really gives a shit about doing it).

Anyway, ruling out a TP bonus is easy enough as soon as you observe a damage return from Spirits Within under Restraint that is impossible were it the result of a TP bonus. To go over briefly how I determined this, I whacked a Zvahl Fortalice with a Trainee Sword/Trainee's Needle combination (5.1 TP/hit with Dual Wield II) until I got 107.1 TP, then used Spirits Within. (Actually this basically is the general experimental procedure performed to reach some of the conclusions about Restraint I described earlier.) Given my current HP of 1148, the predicted damage, given 107.1 TP, is 147, but the observed damage was 164. It is impossible to obtain 164 damage from a TP bonus (the damage equation I provided is exact and has yet to fail), so a TP bonus can be ruled out.

As for pDIF, obviously pDIF doesn't enter into Spirits Within damage, so one cannot really speak of any kind of pDIF bonus, whether additive or multiplicative.

Ruling out an fTP bonus like that from an elemental gorget is not as straightforward. A conceptually simple method is to determine, using the same weapon(s) (holding weapon delay and therefore TP/hit constant), whether the damage of one weapon skill scales by approximately (accounting for flooring) the same factor as the damage of another weapon skill with a different fTP "profile" given the same TP and the same number of landed attacks under Restraint.

If the scaling factors are dramatically different, this could be considered evidence of an additive fTP bonus and one can rule out the idea of a damage multiplier. However, I chose not to do this for the following reason.

The damage increase from Restraint has some variability...

Continuing to whack on a Fortalice, I made the unpleasant discovery that, even though Spirits Within damage is supposed to be completely deterministic (knowing only two pieces of information, TP and current HP, means being able to calculate the damage exactly), I observed some variability of damage return holding TP constant under the effect of Restraint. The predicted damage values (based on 1148 current HP) and the observed damage values (which have a relationship to the number of attacks landed with Restraint active) are given in the following text table as I was too lazy to use my inelegantly constructed table markup:

Attacks landed     TP        Predicted damage     Observed damage (Restraint)
------------------------------------------------------------------------------
19                 102       143                  160, 158, 160, 160, 158, 158
20                 107.1     147                  164, 163, 160, 166, 163, 164
21                 112.2     147                  167
22                 117.3     152                  174, 170, 179
23                 122.4     156                  177, 179
24                 127.5     161                  183, 180
25                 132.6     165                  184, 189
26                 137.7     170                  192, 197
27                 142.8     170                  195, 202
28                 147.9     174                  201, 201
29                 153       179                  209, 209, 209
30                 158.1     183                  214, 219
31                 163.2     188                  218, 223
32                 168.3     188                  225, 229, 218, 218
33                 173.4     192
34                 178.5     197                  232, 236

To me, the fact that the "designers" apparently decided to make the Restraint enhancement a random variable is extremely obnoxious. (What the fuck is the point? Or was this unintentional?) If this is not merely an "anomaly" specific to Spirits Within, it makes it that much more annoying to pin down the effect of Restraint using weapon skills whose damage is normally variable. But at least now people should be aware of this.

Perhaps this is a glitch specific to dual wielding? I also checked for single wield and also observed variability of Spirits Within damage. I also attempted to check for damage variability with a magical WS whose damage is also "deterministic" (controlling for resist), but after getting a 87-damage quarter-resist and then a 171-damage half-resist with Seraph Blade (given 117.3 TP), I got very annoyed and switched back to Spirits Within for the purposes of exploring Restraint's effect further. Note that 171 is less than twice that of 87 (174), which can be considered evidence of variability fundamental to Restraint.

Restraint's maximum damage increase appears to be 50% (1.5 damage multiplier

For one iteration of Restraint, I wanted to see how much of a damage increase to Spirits Within I could get by accumulating as many landed attacks as possible within the 5-minute duration. I managed to get 110 hits in before using Spirits Within, which gave 807 damage, which happens to be exactly 1.5 times 538, the usual damage at 300 TP given 1148 HP.

Studio Gobli's version update notes also suggest that 50% might be the upper bound for the weapon skill damage bonus. But more important, the update notes indicates that Restraint's effect appears to be dependent on weapon delay.

The effect of Restraint depends on weapon delay

Reiterating Studio Gobli's notes, given 20 landed attacks, the weapon skill bonus (the weapon skill used is not stated) is highest (+21%) for the weapon with the highest delay (444), lower (+17%) for the weapon with the second-highest delay (264), and lowest (+13%) for the weapon with the lowest delay (218). Again, I used a Trainee Sword/Trainee's Needle combination (187 delay per weapon given Dual Wield II), and referring back to my text table, the damage increase given 20 hits was observed to vary from +8.84% (160/147) to +12.9% (166/147), so my results are consistent with Studio Gobli's claims (seemingly unsourced by the way). Moreover, my use of dual wield suggests that Restraint is affected by effective weapon delay.

Therefore, the effect of Restraint on Spirits Within cannot be generalized to other weapon skills because of this dependence on weapon delay. But findings from my Spirits Within investigation can be considered a kind of lower bound on Restraint's WS damage bonus (not that I would actually check for anything below 187 weapon delay).

For a given weapon delay, Restraint's (apparent) damage multiplier may increase linearly with the number of landed attacks

Given the above data in the text table, as well as some other observations toward the "extremes" (based on the number of landed attacks below 19, the minimum number to get to 100 TP after Spirits Within, which never misses, and above 34), I just performed OLS regression mainly to see visually if it is "safe" to assume that the Restraint damage bonus scales linearly with the number of landed attacks:

It appears that linearity is a valid enough assumption, and it could be said that the Restraint damage bonus increases by about 0.00588515 (0.058%) for every additional landed attack, up to a maximum bonus of 1.5 (50%). Also note that my 110-attack observation is well off the trend line. Extrapolation here is not fatal; I predict that either 85 or 86 is the minimum number of landed attacks to reach the maximum bonus.

Implications for Restraint use

Here it may be useful to reflect on snap judgments about Restraint's potential utility (or lack thereof).

First, it has been shown that Mighty Strikes is unaffected by Restraint (but is Restraint unaffected by Mighty Strikes?), so there shouldn't be any disadvantage to using Restraint for the purposes of zerging, whatever the bonus is (or isn't).

Second, suppose that it is even desirable to achieve the maximum Restraint bonus to start a zerg off (say, for a 300 TP Steel Cyclone). There aren't that many situations where this is feasible, due to unavailability of mobs to "power up" Restraint and/or time limitations. (The "stored" WS damage potential disappears when Restraint wears off.)

Other than that, let's look at the use of Restraint from the long-run, "optimal" perspective of dealing damage, which means WS spamming and whatnot. Losing critical hit damage for increased weapon skill damage may not seem like a good trade-off, but whether the trade-off is acceptable is determined primarily by what the actual bonus is, which no one has yet to determine for the "usual" range of landed attacks before using a great axe weapon skill. (I would say the range is between 5 and 9.)

Of course, this doesn't mean one can't estimate how much of a WS damage increase there needs to be to offset the loss of critical hit damage during Restraint. To do this, consider that for two-handed weapons, the loss of critical hits in the auto-attack phase is relatively more "severe" for a low attack/defense ratio than a high one. Also consider that the loss of critical hit damage is relatively more severe if your critical hit rate is high than when it is low. Yet another factor to keep in mind that the more hits that end up being landed in the process of getting to 100 TP (think multi-hit weapons and being lazy), the greater the Restraint bonus must be to offset the loss of critical hits. Finally, since Restraint has often been mentioned with King's Justice (because it is thought that Raging Rush is adversely affected by Restraint), I will base my estimation on the basis of improving KJ damage.

With that in mind, I estimate that, for a high attack/defense ratio (such that the maximum average pDIF is attained without any level correction involved), the Restraint damage bonus (as a percent increase) needed to offset the loss of critical hit damage is between 3.1% and 4.6% given a 9% critical hit rate and between 8.2% and 12.3% given a 24% critical hit rate (on average). It may be that the actual WS damage bonus (for 5, 6, ... landed attacks) exceeds the above estimates given 504 delay, although it remains to be determined.

For a relatively more modest attack/defense ratio (corresponding to an average non-critical pDIF of 1.5 and average critical pDIF of 2.6; these are rough estimates based on someone's empirical observations, which I will not go over at this time), the Restraint damage bonus needed to offset the loss of critical hit damage is between 6.0% and 9.0% given a 9% critical hit rate and between 16.0% and 24.0% given a 24% critical hit rate. (I give ranges based on the some of the current "final upgrade" Magian great axes. But the low estimates are based on the horrible "occasionally attacks 2-3 times" great axe.)

The above suggests a place for Restraint where attack is high or where critical hit rate is low (or a combination of both, which might be experienced in a merit party), but more work needs to be done to justify that contention.

Wednesday, June 23, 2010

TP bonus of Fencer

(Edit: This is for WAR at level 75. I didn't consider the possibility of "increasing levels of mastery" bullshit.)

Saw this dumb shit so I thought I would act a dumb shit too by wasting my time figuring this out. (Figuring out the critical hit rate bonus did not waste that much time as I was sleeping while the data was being collected...)

One way to characterize the TP bonus of Fencer is to see how (and whether) the damage of the weapon skill Spirits Within varies with TP in the presence of Fencer and then compare the results to the damage-TP relationship of Spirits Within without Fencer. (Then you assume the findings can be generalized to all weapon skills and hope your observed damage with other weapon skills is consistent with the findings from Spirits Within testing.)

Some preliminary considerations

The problem is that the latter has not been fully characterized to account for flooring, so after retrieving Spirits Within damage observations between 100 and 300 TP without Fencer (using a Trainee Sword with store TP +5 for 6.7 TP and given 1000 current HP), I came up with a formula that matched the observations exactly.

Let D denote Spirits Within damage, H denote current HP and T denote current TP. Then Spirits Within damage appears to follow the piecewise function

This function describes the TP modifier (the fraction) increasing with TP in increments of 1/256 (other increments, such as 1/128 and 1/1024, result in calculated damage values that disagree with the set of actually observed damage values), so perhaps the TP modifiers at 100, 200, and 300 TP are better described as 32/256, 48/256, and 120/256, respectively. (Note: the inner bracket is there to ensure TP values are floored for the purposes of damage calculation, as TP values, while discrete, need not be integers.)

Now, with the same 1000 current HP and 6.7 TP, we can then observe how Fencer affects Spirits Within damage in terms of modifying base TP. We assume the TP bonus is additive and hope it is constant.

Actual TP bonus determination

The actual TP bonus (assuming it's additive and constant) was determined by a step-wise process of elimination by identifying "candidates" for the TP bonus as follows:

Step 1: For 100.5 TP, the predicted Spirits Within damage is 125. The observed damage is 148. The TP bonus could be 38, 39, 40, 41, 42, or 43. (At this point, 40 is the most plausible candidate as one would expect SE to make the TP bonus a mutiple of 5 or 10.)

Step 2: For 120.6 TP, the predicted Spirits Within damage is 136. The observed damage is 160. The TP bonus could be 37, 38, 39, 40, 41, or 42, but only 38, 39, 40, 41, or 42 are consistent with both observations.

Step 3: For 147.4 TP, the predicted Spirits Within damage is 152. The observed damage is 175. The TP bonus could be 35, 36, 37, 38, 39, or 40, but only 38, 39, or 40 are consistent with all three observations.

Step 4: For 140.7 TP, the predicted Spirits Within damage is 148. The observed damage is 171. The TP bonus could be 35, ..., 41, but, again, only 38, 39, or 40 are consistent with all four observations.

Step 5: For 154.1 TP, the predicted Spirits Within damage is 156. The observed damage is 183. The TP bonus could be 40, 41, 42, 43, 44, or 45, but only 40 is consistent with all five observations. Assuming the TP bonus is additive and constant, Fencer adds +40 TP to the current TP for WS damage calculation.

At this point, we should make sure adding 40 TP to the current TP allows us to predict correctly Spirits Within damage when the "net" TP exceeds 200 TP (so that damage is calculated based on the other part of the function).

For 167.5 TP, the predicted Spirits Within damage, based on 207.5 TP, is 207, which is also the observed value.

For 174.2 TP, the predicted Spirits Within damage, based on 214.2 TP, is 226, which is also the observed value.

For 180.9 TP, the predicted Spirits Within damage, based on 220.9 TP, is 246, which is also the observed value. (Note that you cannot floor the current TP to 180 and then add 40, which would give a predicted value of 242 based on 220 TP, which is wrong.) At this point, it seems reasonable to conclude that there is a 40 TP bonus from Fencer between 100 and 200 TP.

Now what about between 200 and 300 TP?

For 201.0 TP, the predicted Spirits Within damage, based on 241.0 TP, is 300, which is also the observed value.

For 227.8 TP, the predicted Spirits Within damage, based on 267.8 TP, is 375, which is also the observed value.

Finally, to make sure the actual TP for damage calculation is actually min(TP + 40, 300), for 300 TP, the predicted Spirits Within would be 578 given 340 TP, but the observed damage is 468, which is consistent with the 300 TP maximum.

Conclusion

Fencer gives a constant TP bonus of 40 TP for weapon skills independent of what the current TP is.

Tuesday, June 22, 2010

Critical hit rate bonus of Fencer

(Edit: now with information on Fencer with dual wield.)

Fencer is a new job trait from the July 21, 2010 version update that is available to the Warrior job at level 45 and the Beastmaster job at level 80. It has the following help description: "Increases rate of critical hits when wielding with the main hand only. Grants a TP bonus to weapon skills." The critical hit rate bonus was estimated using the following procedure.

Methods (brief)

An estimate of the critical hit rate bonus was obtained by auto-attacking overnight a level 69 Ul'hpemde, which has AGI 65 (source). WAR75/MNK01 was used. The following equipment was used to obtain STR 57, DEX 64, and an accuracy score of 276:

Trainee Knife (240 dagger skill)
Walahra Turban
Dusk Gloves
Snow Ring (STR -2)
Swift Belt (Accuracy +3)
Aurum Sabatons (DEX +3, accuracy +5)

STR 57 ensures 0 damage to any Ul'hpemde, and DEX 64 ensures (with 4/4 critical hit rate merits) a 9% critical hit rate before the effect of Fencer (source). kparser was used for automated data collection.

The level of the targeted Ul'hpemde was inferred by comparing the predicted hit rate for a level 69 Ul'hpemde (.92) against a point estimate of the hit rate of 5628/6133 = .9176 , with 95% confidence interval (.9105, .9244). The observed hit rate is consistent with the prediction.

Estimation of Fencer's effect with dual wield was also done with a Trainee Knife/Trainee's Needle combination, but the Ul'hpemde was level 68. (Critical hit rate is "directly" independent of level, but not AGI, which depends on level to some extent. But for both level 67 and level 68 Ul'hpemdes, the AGI is 65.) The following image summarizes the final base attribute values for this particular trial:

Results

Single wield: A point estimate of 802/5628 - .09 = .0525 was obtained for the critical hit rate bonus, with a 95% confidence interval (.0434, .0619).

Dual wield: A point of estimate of 464/4983 - .09 = .0031 was obtained for the critical hit rate bonus, with a 95% confidence interval (-.0048, .0115).

Interpretation and conclusion

Since critical hit rate has statistically been shown to take only integer percent values, assuming that the bonus is additive, the critical hit rate bonus of Fencer is either 5% or 6% with 95% confidence.

For the dual-wield case, suppose there were a 5% bonus for the main hand and none for the off hand. The effective bonus would then be 2.5%. Yet the observed estimate is much less than 2.5%, which should be taken as evidence that Fencer has no effect when dual wielding.

Saturday, June 19, 2010

Weapon skill critical hit rate bonus: summary of evidence

(Edit #2: added information for Backhand Blow and Blade: Jin, and another source for Rampage.)

(Edit #1: added another source for Drakesbane.)

This is an attempt to summarize any evidence following attempts to determine the critical hit rate bonus at or around 100 TP (if any) for weapon skills whose "chance of critical varies with TP."

I am not aware of any (non-anecdotal) evidence for the following weapon skills: Ascetic's Fury, Vorpal Blade, Power Slash, Sturmwind, Keen Edge, Vorpal Scythe, Vorpal Thrust, Skewer, Blade: Rin, True Strike, Hexa Strike, Sniper Shot, Heavy Shot, Dulling Arrow, and Arching Arrow (17 weapon skills). That leaves only six: Backhand Blow, Evisceration, Rampage, Raging Rush, Drakesbane, and Blade: Jin.

For now, "convenient" determination of critical hit rate is possible only for the first hit. Most of the testing done concerns the first hit, and conclusions are based on the assumption that the bonus (where it exists) is additive.

Backhand Blow (hand-to-hand, 2 hits)

Source: dex/crit relation, WS crits, WS gorgets discussion (Blue Gartr forums)

Comparing the sample proportions 22/50 (.44) at 9% baseline critical rate and 37/50 (.74) at 30% baseline (with 6% from Destroyers), it is obvious that there is some kind of innate critical rate bonus for at least the first hit of Backhand Blow.

But with Backhand Blow TP varying between 100 and 120 TP, it seems likely that the critical rate was not fixed for each sample. The consequences of this on the allocation of Type I error and coverage probability of the corresponding interval estimate are explored for Blade: Jin bonus estimation (later in the post), as data for that was obtained by the same person, but for now I will just describe briefly how to go about estimating the bonus for Backhand Blow.

Assume that the innate bonus is additive and constant (meaning it's independent of whatever the baseline critical rate is). Also assume that the critical rate bonus from Destroyers (6%) increases the critical hit rate of Backhand Blow by an additional 6% (starting from 24%).

Let X₁ be the number of critical hits observed at 9% baseline, n₁ the total number of hits observed at 9%, X₂ the number of critical hits observed at 30% baseline, and n₂ the total number of hits observed at 30%. A natural "pooled" estimator for Backhand Blow's critical hit rate bonus is

and its standard error is

The sample proportion is .395 and a corresponding 95% confidence interval for the WS bonus is (30.32%, 48.68%).

Conclusion: there is a critical hit rate bonus for Backhand Blow at 100 TP. A bonus of 40% would be consistent with the given data.

Evisceration (dagger, 5 hits)

Source: Evis crit rate testing (Allakhazam forums)

At ~100 TP and given 24% base critical hit rate, the pooled sample gives a sample proportion 248/696 = .3563. A 95% confidence interval for the critical hit rate bonus is (8.61%, 15.32%).

Conclusion: there is a critical hit rate bonus for Evisceration at 100 TP, with +10% being a possibility.

Rampage (axe, 5 hits)

Source (1): ランページとDEXの関係

There are two sets of estimates: one for DEX 68, and one for DEX 124, with Gigantobugard as the target mob in both cases. I'm not much interested in calculating base AGI and confirming that the Megalobugard's level range is 40-43, so I ignored the estimates for DEX 68. DEX 124 ensures a 24% base critical hit rate.

At 100 TP, the sample proportion of critical hits is 35/130 = .2692. A 95% confidence interval for the critical hit rate bonus is ( -4.48%, 11.40%). But suppose there actually is a 10% critical hit bonus. For a sample size of 130, the probability that the sample is sufficient to show a statistically significant bonus is about .7388 (power calculation).

At 200 TP, the sample proportion of critical hits is 68/150 = .4533. A 95% confidence interval for the critical hit rate bonus is (3.20%, 29.66%).

Source (2): dex/crit relation, WS crits, WS gorgets discussion (Blue Gartr forums)

I did say I wasn't interested in calculating a mob's AGI, but a Clipper's AGI is either 18 or 21 regardless of the levels reported on FFXIclopedia, and either AGI value doesn't affect the actual crit rate for the DEX 57 case, which is indeed 13%. (See this for details about critical hit rate as a function of your DEX - mob AGI.)

Using the same "pooled" estimator rationale I used for Backhand Blow (earlier in the post), the sample proportion for Rampage's crit bonus at 300 TP is .465 and a corresponding 95% confidence interval for the rate bonus is (31.80%, 61.20%). For the sake of completeness, estimates for the bonus at 100 TP and 200 TP are (-9.26%, 25.40%) and (3.20%, 48.80%), respectively.

Conclusion: if there is a critical hit rate bonus for Rampage at 100 TP, the known evidence is insufficient to show that, but if the bonus were 10%, for n = 130 the power to reject the null hypothesis of no bonus is fairly high (.7388). Given all the data, it is relatively unlikely that the bonus is 10%, but a smaller bonus cannot be ruled out with such small samples.

Unsurprisingly, there is a bonus at 200 TP and 300 TP.

Raging Rush (great axe, 3 hits)

Source (1): レイグラのクリティカル率について　その１

The sample proportion is 20/40 given the usual 24% base. The "control" data for base critical rate (which is a good idea to have by the way), however, gives the sample proportion 44/130 = .3384, which is somewhat unusual, but I write that off merely as that, not a sign of dubious experimental error. This data alone gives the tentative impression that there is a bonus.

Source (2): RagingRush Critical rate test (Killing Ifrit forums)

The raw data (showing damage values) are in a spreadsheet, but you don't need to download it.

At 100 TP and given 24% base critical hit rate, the proportion of critical hits is 155/373 = .4155. A 95% confidence interval for the critical hit rate bonus is (12.50%, 22.74%). This is strong evidence that the critical hit rate bonus is not 10%. Possible candidates are 15% and 20%.

More interesting to me is that the damage for 1 TP return (2o occurrences) was also noted, providing an opportunity to determine whether a critical hit rate bonus also applies to off-hand hits (despite there being no way to tell the difference between a double attack hit and a regular off-hand hit). Assuming a 24% base critical hit rate, with 9 observed critical hits out of 20, the corresponding p-value is .03614, which suggests a critical hit rate bonus.

Conclusion: there is a critical hit rate bonus for Raging Rush at 100 TP, with +15% and +20% being possible candidates. The small sample for critical hits from off-hand hits suggests a critical hit rate bonus for off-hand hits of Raging Rush as well.

Drakesbane (polearm, 4 hits)

Source (1): drakesbane native crit% (FFXIclopedia forums)

The first sample is 38/100 and the second, 24/100 (given 106 TP).

38/100 is a fairly extreme observation given 24% base critical hit rate (if there were no bonus). On the other hand, 24/100 is not that extreme an observation given a 34% rate. Since there is no good reason to think the conditions changed between the two samples, pool the data and crank out an interval estimate for the rate bonus, which is (0.66%, 13.91%).

Source (2): 雲蒸竜変の検証

There are four samples: three for 100 TP and one for 300 TP.

For 100 TP, the sample proportions are 12/49, 15/45, and 15/41 (given 24% base critical hit rate). The pooled estimate is 42/135 = .3111 and a 95% confidence interval for the bonus is (-0.57%, 15.64%). While this interval covers 0, 0 is again close to the left endpoint (in the other case the 0 being on the "right" side based on expectations).

As for 300 TP, the sample proportion is 16/30 and a 95% confidence interval for the rate bonus is (10.32%, 47.66%), which rules out 50% (tentatively).

Conclusion: there is suggestive evidence for a critical hit rate bonus at 100 TP, with +5% and +10% being possible candidates. At 300 TP, a +50% bonus appears to be an "unlikely" possibility.

Blade: Jin (katana, 3 hits)

Source: dex/crit relation, WS crits, WS gorgets discussion (Blue Gartr forums)

The sampling was done in the same fashion as for Backhand Blow, with observed critical hit proportions 3/30 at 9% baseline crit rate and 8/30 at 30% baseline (with Senjuinrikio's 6% bonus) at 100 TP. Using the same estimator that I used for Backhand Blow, the "pooled" sample proportion for Blade: Jin's critical bonus is -0.01167, and a corresponding 95% confidence interval is (-10.73%, 8.39%).

Taking the confidence interval at face value, if there is a critical bonus for Blade: Jin at 100 TP, it is unlikely that it's 10% or higher, especially considering the "sloppy" manner in which the data was likely collected (with TP not being held fixed, the critical hit rate could have varied), which further supports that contention. If the bonus were 10%, obviously, the probability that a 95% confidence interval wouldn't cover 10% at the right endpoint of the interval would be near .025 (half the Type I error). The consequences of experimental "error" are explored in a simulation study described at the end of this post.

Conclusion: if there is a critical hit rate bonus for Blade: Jin at 100 TP, it is unlikely that the bonus is as high as 10%.

Simulation study: is a 10% critical hit rate bonus that unlikely for Blade: Jin?

Consider the following simulation study based on hypotheticals: if there actually were a 10% bonus at 100 TP, with a 1% increase for every 5 TP, then with TP varying between 100 and 119 TP, the critical rate varies between 10% and 13%.

Given that "TP overflow" is inevitable with dual wield, and that extra hits occurring beyond TP were quite possible because data collection was reported to be boring, suppose that each of the critical rates between 10% and 13% (inclusive) are equally likely to be "chosen" for Blade: Jin.

The purpose of the study is to show how likely it is that the "pooled" large-sample confidence interval covers 10% given the above conditions.

A histogram of the simulated sampling distribution of the critical hit rate bonus shows that it's obviously not normal, with the mean (about 11.5%) higher than 10%, which is supposed to be the "actual" bonus at 100 TP for this simulation. (The shape of the large-sample approximation of the sampling distribution is traced with the solid curve.)

On the other hand, the margin of error for all simulated sample proportions is higher than 9.56%, the margin of error for the actual sample, about 97.7% of the time. (The mean margin of error is 11.19%.) Also, the "actual" (in the context of the simulation) Type I error is about .059, with about .040 allocated to the right tail (meaning there is a probability of .0402 that the null hypothesis of .10 is rejected because the estimate is higher than .10 based on the criterion of statistical significance) and about .019 allocated to the left tail (meaning the null is rejected with probability .019 because the observed estimate is significantly lower than .10). By comparison, the nominal left-tail error is .025.

Repeating this exercise under the condition that there is no bonus, the margin of error for all simulated sample proportions is higher than 9.56% only 58.0% of the time, and the probability that a confidence interval's right endpoint is higher than 8.39% is less than 0.1%.

If Blade: Jin's critical hit rate bonus at 100 TP were actually 10%, considering TP overflow and additional hits occurring beyond TP overflow, it would be very unlikely that a given 95% confidence interval would not cover 10%. The margin of error would also be very likely to be higher than 9.56%. Therefore, it is more plausible that its critical rate bonus is significantly less than 10%, if it even exists.

The following is some code for the simulation, but the inner loop should probably be expanded so that it finishes faster.


n = 100000
ci.lower = numeric(n)
ci.upper = numeric(n)
p.pool = numeric(n)
for (i in 1:n) {
X1 = 0
X2 = 0

for (j in 1:30) {
X1 = X1 + rbinom(1,1,sample(seq(.19,.22,by=.01),1))
X2 = X2 + rbinom(1,1,sample(seq(.40,.43,by=.01),1))
}

p.pool[i] = (X1 + X2 - .39*30)/60

ci.upper[i] = p.pool[i] + qnorm(.975)*sqrt((X1/30*(1-X1/30) + X2/30*(1-X2/30))/120)
ci.lower[i] = p.pool[i] - qnorm(.975)*sqrt((X1/30*(1-X1/30) + X2/30*(1-X2/30))/120)
}

mean(p.pool)
me = (ci.upper - ci.lower)*.5
mean(me>sqrt((3/30*(1-3/30)+8/30*(1-8/30))/120)*qnorm(.975))
mean(ci.upper<.10) mean(ci.lower>.10)
mean(ci.upper<.10) + mean(ci.lower>.10)