The Unbearable Triteness of Preening: meleé

Showing posts with label meleé. Show all posts

Friday, June 18, 2010

Why Love Halberd is underrated... for dragoon

While I personally have yet to determine the virtue stone consumption rate for virtue weapons other than Fortitude Axe (so far, I'm assuming it's 55% across all virtue weapons given the limited evidence thus far), how exactly the normal double attack trait interacts with the virtue weapon's "occasionally attacks twice" (OAT) property seems to be described correctly. With a reasonable level of confidence, one can draw conclusions about how effective the other virtue weapons are compared to their "peers."

I can't say the likes of Hope Staff and Prudence Rod are worth discussing, but Love Halberd has some properties relevant for dragoon and samurai that seem to be misunderstood and even dismissed out of hand, the inconvenience of acquiring virtue stones notwithstanding. I go through them in order of importance and then compare Love Halberd to its competing options for DRG.

Is Love Halberd's delay undesirable?

Love Halberd has 396 delay, so with current quantities of Store TP available, it's possible and reasonable to achieve an "8-hit setup" with 23 Store TP (12.5/10.2 = 1.22549, which rounds up to 1.23).

People act like this this is a bad thing. But so what if it takes Love Halberd 8 hits to get to 100 TP? Noting how many hits it takes to get to 100 TP is trivial and irrelevant especially because of Love Halberd's OAT property. Instead, one should ask, how many attack rounds does it take for Love Halberd to get to 100 TP, given that 8 hits are required to get there?

It may help to show a graph illustrating, for both a virtue weapon (singly wielded) and a weapon without any multi-hit property (also singly wielded) but under 9% double attack rate and 95% hit rate, the relationship between the nominal number of hits to get to 100 TP and the "actual" (in a long-run, "missing the first hit of a WS 5% of the time," weapon skill-spamming context), average number of attack rounds it takes to get to 100 TP:

First, look for the average number of attack rounds it takes for a weapon without any multi-hit property to get to 100 TP in 6 hits. On the graph, the average number of attack rounds appears to be 5, and the actual value is 4.9526 rounds. This figure is reasonable because even though 5% of the time, the first hit of a WS misses (most of the time it takes 5 hits to get to 100 TP) , the 9% double attack rate results in the average value falling slightly below 5.

Now, look for the average number of attack rounds it takes for a virtue weapon to get to 100 TP in 8 hits. "Wait a second," you observe, "isn't the corresponding average number of rounds below 4.9526?" In fact, on average it takes a virtue weapon only 4.7305 rounds to get to 100 TP in 8 hits, so an 8-hit virtue weapon setup ideally has a higher weapon skill frequency than a 6-hit setup with a non-multi-hit weapon.

Is the average attack round argument unconvincing? Let's instead examine the probability distributions of the number of attack rounds it takes for a virtue weapon, a weapon without a multi-hit property, and, for comparison's sake, a "Trial of the Magians" OAT weapon (for dragoon, Bradamante) to get to 100 TP:

These probability distributions were obtained via Markov chain methods.

For a weapon without a multi-hit property, the probability of getting to 100 TP in 5 attack rounds is .580, and the probability for fewer than 5 attack rounds is higher than the probability for greater than 5 attack rounds, which is consistent with the average attack round figure of 4.9526.

In comparison, while the probability of getting to 100 TP in 5 attack rounds is lower for a virtue weapon (.403), the higher probability of getting to 100 TP in 4 attack rounds (.373) contributes to the average number of attack rounds to get to 100 TP being lower (4.7305).

And for the sake of comparison, it takes about 3.783 rounds for a Magian OAT weapon to get to 100 TP in 6 hits. This breaks down such that, most of the time, there is a high probability that a Magian OAT weapon takes either 3 or 4 attack rounds to get to 100 TP.

Note that for all three types of weapons, the probability that it takes 7 or more attack rounds to get to 100 TP is, at most, about .028 (for both the virtue weapon and the non-multi-hit weapon), which underscores the fact that, at least given 95% hit rate, it's not like the virtue weapon "needs" 7 or more attack rounds to get to 100 TP with any significant probability just because 8 landed hits are required to generate 100 TP.

In short, delay for virtue weapons, and the corresponding nominal number of hits it takes to get to 100 TP, is relatively unimportant because of the OAT property. In the case of the 8-hit Love Halberd setup, this property results in a lower average number of attack rounds to get to 100 TP than that for a 6-hit setup for a weapon without a multi-hit property (assuming a 55% virtue stone consumption rate).

Is the Love Halberd's base damage rating too low?

Love Halberd's 60 base damage is only 4 lower than Fortitude Axe's 64, which has 504 delay, so I'd say dragoons and samurai are relatively "spoiled" with access to a weapon with such high attack frequency and low delay.

Also, with a low base damage, the relative damage gap between Love Halberd and a higher-damage weapon decreases with additional fSTR.

Does Love Halberd's DEX +7 matter?

This is relatively unimportant, but with DEX +8 generally guaranteeing a 1% increase in critical hit rate when the target's AGI is not obscenely higher than your DEX, one can expect, effectively, a +1% critical hit bonus most of the time with DEX +7, which is not bad. DEX +7 is also a nice amount of DEX in the weapon slot that could help to ramp up one's critical hit rate if the opportunity presents itself (yeah, yeah, Greater Colibri...).

At least you can say it counters the loss of any attack (or accuracy) bonus associated with equipment for the ammo slot, Smart Grenade, Tiphia Sting, or whatever it is that DRG uses.

An additional +5 or +6 accuracy, if actually realized from the DEX bonus, is nothing to ignore, either.

Finally, a comparison of polearm options

All the features of Love Halberd described culminate such that Love Halberd is better than "conventional wisdom" allegedly holds.

Earlier, I did a write-up of how to model (approximately) the effect of Jump on damage rate as a preliminary step to doing a comparison of polearms that accounts for the increased WS frequency that Jumps provide. As usual, this comparison is done in terms of a long-run, WS-spamming, Jump-spamming situation so that one gets a decent idea of the relationship among the weapons in terms of maximum potential.

The weapons to be compared are

Valkyrie's Fork (6 hits to 100 TP)
Bradamante (with 75 base damage and 6 hits to 100 TP)
Love Halberd (8 hits to 100 TP).

Some of the conditions I specified are

fSTR 6 (+5 for Drakesbane)
42 additional WS "base" damage from the STR 50% modifier
95% hit rate
0% Zanshin rate
base double attack rate of 9%
ATK/DEF ratio of 1.5 and base critical hit rate of 9%, corresponding to an (approximate) average pDIF of 1.599 across all weapons (the critical hit rate bonus of Love Halberd treated as though it offsets the use of virtue stones at the expense of any attack bonus from the ammo slot)

Also, for Drakesbane, I am assuming a critical hit rate bonus of +10% and basing WS damage on 100 TP (ignoring excess TP effects, if they even exists). For Jumps (when accounted for), I treat the damage of Jumps as equivalent to normal hits (yet another simplification).

Let's start with a high quantity of haste, say, 64%, which accounts for Hasso (10%), double March (20%), Haste spell (15%), and haste from equipment (19%), which would relatively favor Valkyrie's Fork, a weapon with fundamentally lower WS frequency than the others, because of weapon skill delay (2 seconds).

Without accounting for the effect of Jumps, the summary of relevant numbers comes out as follows:

Weapon	Avg. TP dmg	Avg. WS dmg	Time per WS	Dmg/sec	TP:WS dmg
Valkyrie's Fork	832.01	1041.54	16.29 s	114.98	444:556
Bradamante	701.52	894.93	13.78 s	115.83	439:561
Love Halberd	793.79	789.77	13.23 s	119.61	501:499

These figures are merely a point of comparison to the more "realistic" figures that account for the effect of Jumps. But first, as an aside, I have to point out that the OAT effect of virtue weapons doesn't proc on Jumps and discuss the major implication for using Jumps with Love Halberd.

In general, Jumps can be considered an attack round that occurs "on demand." Moreover, Jumps generally delay the start of the following attack round by 2 seconds (a consequence of job ability or weapon skill delay in general), so Jumps, in effect, help to decrease the time between weapon skills except when the time between auto-attack rounds falls below 2 seconds. This is the primary effect of Jumps as slight increases in Jump damage per hit compared to auto-attack damage per hit are minor in comparison.

But since Jumps with Love Halberd are effectively normal attack rounds, they do not generate TP (on average) as much as auto-attack rounds. Therefore, there is a critical value of haste after which jumping with Love Halberd is unproductive.

Given the above conditions, Love Halberd averages about 1.579 landed hits per attack round, and "normal" jumps average exactly .95*1.09 = 1.0355 landed hits per "attack round" or 0.51775 landed hits per second (if spammed, so this is the upper limit for Jumps). It follows that it's counterproductive to jump with Love Halberd (in a long-run sense, not in a "need damage on demand" sense) when haste is above 53% (an approximate critical value). Therefore, for the following table, the effect of Jumps is considered only for Valkyrie's Fork and Bradamante:

Weapon	Avg. TP dmg	Avg. WS dmg	Time per WS	Dmg/sec	TP:WS dmg
Valkyrie's Fork	832.01	1041.54	16.00 s	117.08	444:556
Bradamante	701.52	894.93	13.51 s	118.13	439:561
Love Halberd	793.79	789.77	13.23 s	119.61	501:499

As stated previously, the primary effect of Jumps is to decrease the time per weapon skill. Given 64% haste, the effective increase in damage per second is at most around 2%. (At lower levels of haste, the contribution of Jumps to increasing the rate of damage is higher.) Even when Jumps are accounted for, Love Halberd is still slightly better than either Valkyrie's Fork or Bradamante. (The TP:WS damage ratios are my usual check on how well the calculations represent what is observed in the game, but I have no idea if these are typical ratios.)

Certainly, virtue stone consumption is a strike against Love Halberd for everyday, humdrum situations, and it's possible Bradamante can be further augmented after future updates, but can Bradamante be enhanced to the point where formerly top-end polearms (like Valkyrie's Fork) are completely outclassed after accounting for human "inefficiency"? It remains to be seen, but now let's consider the viability of these weapons in a zerg-like situation with 80% haste:

Weapon	Avg. TP dmg	Avg. WS dmg	Time per WS	Dmg/sec	TP:WS dmg
Valkyrie's Fork	832.01	1041.54	9.94 s	188.47	444:556
Bradamante	701.52	894.93	8.55 s	186.80	439:561
Love Halberd	793.79	789.77	8.24 s	192.08	501:499

As discussed in a previous post, the benefit of increasing haste is higher for weapons with lower WS frequency than weapons with higher frequency, a consequence of weapon skill delay. Unsurprisingly, Bradamante falls behind Valkyrie's Fork, yet Love Halberd still has a slight advantage over Valkyrie's Fork even at maximum haste, lending actual credence to the use of Love Halberd for high-haste zergs (and discrediting the idea of using Bradamante for such, at least when compared to Valkyrie's Fork).

Conclusions

Love Halberd's delay in conjunction with its OAT property can give it a weapon skill frequency lower than weapons without any multi-hit property. For example, an 8-hit Love Halberd setup has a higher WS frequency than a 6-hit setup for a polearm without any multi-hit property. This, along with its relatively high base damage (for a multi-hit weapon) and DEX +7 make it a "peer" to the likes of Bradamante, the latest fashionable polearm. At 80% haste, Bradamante is a relatively poor weapon compared to Love Halberd.

Tuesday, May 26, 2009

pDIF and obsession with polynomial fits

A recent thread on the BG forums about an investigation of "cRatio for two handers" really underscores the ignorance coming out of the "playerbase" that actively chooses to post on forums. To wit:

You got some motherfucker implying the "center" of advanced knowledge about game mechanics lay among those who were banned for Salvage duping, when the "center" is actually maybe 10 people at best, and not all are necessarily English-speaking, much less posters on BG.
Someone rightly points out that said motherfucker is some obsequious cock-gobbler (ok, those are my terms) since information on pDIF has been outdated since the "2-hander update" (well before the bannings) yet no one has actually bothered to do an honest investigation.
Another one actually bothers to collect some data on damage frequency to see what kind of distribution the data follows, but is easily derailed with a fetish for polynomial fitting to the data and data following a normal distribution (polynomial fitting and normality are contradictions as I will discuss soon).

This inexplicable obsession with polynomial fitting and normality is misguided for several reasons:

Normal distributions have obvious tails at the extremes. Moreover, the tails are neither too short nor too long. The data do not show evidence of any real tails.
Normal distributions are not parameterized by extrema (minimum and maximum). The parameters are the mean (center) and variance (spread).
A second-order polynomial fit cannot "account" for tails. This is obvious because normal distributions have inflection points. So you cannot use a polynomial fit and argue for "normality" at the same time.
Coefficient of determination can be thought of as a summary of a model fit. It doesn't mean the model is actually good. You can draw a squiggly line through all the data points and that will give you a R² of 1, but that would be a terrible and useless model. Polynomial fitting is similarly terrible and useless for the above reasons.
Why even bother with any kind of fitting? As long as the distribution is symmetrical, at least you know the minimum, maximum, and median (same as mean) for pDIF given some value of cRatio, so you can use an expected value argument for long-run damage.

That said, there were some useful comments about the "shape" of the data. One poster actually suggested the data may follow some trapezoidal distribution. This is actually quite plausible under probability theory!

Obviously, the data do not appear to follow a uniform distribution. Even acknowledging the discreteness of damage (due to rounding) such that the minimum and maximum might be observed rarely, a uniform distribution of pDIF (NOT DAMAGE) is not all that likely. Although we cannot observe the uniform distribution of pDIF directly, we can observe histogram of damage (NOT pDIF). For this histogram, if pDIF were actually uniform, one might see an extreme "discontinuous" jump from the minimum to the minimum plus 1, or from the maximum to the maximum minus 1. In other words, the underlying true distribution of damage (NOT pDIF) would appear to be uniform except at the endpoints.

However, from probability theory, it is known that

The sum of two uniform random variables with the same variance (regardless of actual minimum and maximum) follows a triangular distribution
The sum of two uniform random variables with different variances follows a trapezoidal distribution (so a triangular distribution can be thought of as a degenerate trapezoidal distribution)

How can I argue that the underlying random component of melee damage could follow a trapezoidal distribution?

Does anyone actually expect the "developers" to have done anything particularly fancy with pDIF? In many cases, random number generation is basically "sampling" from a uniform distribution, usually Unif(0,1).
If pDIF does follow a uniform distribution (conditional on cRatio) for some ranges of cRatio (I realize this has been shown not to be the case for certain ranges of cRatio), there could easily be another random component to introduce "jitter" into the damage calculation, which would increase the variability of damage output yet keep the mean the same. This would account for the "1.05x correction" on maximum damage I've seen bandied about from time to time.
So, there could be an "effective" pDIF that includes jitter.

To illustrate the plausibility of the last two points, I simulated 9,885 realizations of non-critical melee damage given 55 "base damage," with pDIF that follows Unif(1, 1.8) and a "jitter" component that follows Unif(-0.1,0.1). Here, the random components are summed together so that the end result is that the fake data are trapezoidal. I plotted the frequencies and I also drew a nonsense curve through all the data points (in blue).

Yes, this is completely fake and is not meant to demonstrate the truth of anything, but merely the plausibility that pDIF follows (or has followed) a trapezoidal distribution for some values of cRatio. I even put in a second-order polynomial trend line, which has a very high coefficient of determination, which shows that R² cannot say anything about whether the model is even appropriate. Here, we know it is grossly inappropriate because I know what the underlying probability model is.

Here's the R code for the simulation. Data were exported to Excel. And so goes an hour of my life.

N = 9885
a = trunc(55*(runif(N,min=1,max=1.8)+runif(N,min=-.1,max=.1)))
dmg = seq(min(a),max(a))
N2 = length(dmg)
dmg.counts = numeric(N2)

for (i in 1:N2) {
dmg.counts[i] = length(a[a==(i+min(a)-1)])
}

Thursday, December 4, 2008

A half-year in parses

December 11: I now have the time to add some comments for all the parser output I posted last week.

Treasure and Tribulations BCNM, 1st attempt (July 11)

Melee Damage
Player            Melee Dmg   Hit/Miss  M.Low/Hi    M.Avg
NIN/WAR                 470      38/85      4/18    11.62


Spell Damage
Player                 Spell Dmg   Spell %  #Spells  S.Low/Hi     S.Avg
NIN/WAR                      914   64.55 %       29      4/44     31.52
- Doton: Ni                  164   17.94 %        4     40/44     41.00
- Huton: Ni                  140   15.32 %        4     20/40     35.00
- Hyoton: Ni                 200   21.88 %        6     20/40     33.33
- Katon: Ni                  110   12.04 %        5     10/40     22.00
- Raiton: Ni                 196   21.44 %        5     36/40     39.20
- Suiton: Ni                 104   11.38 %        5      4/40     20.80

Comments: it certainly is more palatable to fight a mimic (Small Box) straight up rather than hope you pick the right treasure chest. Comments on FFXIclopedia recommend sushi "except if you have really good gear," but melee accuracy against this mimic was a joke. I felt better off using the "wheel" lest the fight take 25 minutes.

Treasure and Tribulations BCNM, 2nd attempt (July 12)

Melee Damage
Player            Melee Dmg   Hit/Miss  M.Low/Hi    M.Avg
NIN/WAR                 214      20/78      5/13     8.47


Spell Damage
Player                 Spell Dmg   Spell %  #Spells  S.Low/Hi     S.Avg
NIN/WAR                     1008   80.45 %       36      4/44     28.00
- Doton: Ni                  115   11.41 %        5      5/40     23.00
- Huton: Ni                  190   18.85 %        6     10/40     31.67
- Hyoton: Ni                 220   21.83 %        7     20/40     31.43
- Katon: Ni                  164   16.27 %        7      4/40     23.43
- Raiton: Ni                 145   14.38 %        6      5/40     24.17
- Suiton: Ni                 174   17.26 %        5     10/44     34.80

Comments: more of the same (Small Box again), mainly to corroborate the hideous evasion of these mimics. I am curious whether there is any difference in hit rate targeting the larger boxes instead.

Evasion vs. Water Leaper (August 1)

Attacks Against:
Player           Total   Avoided   Avoid %
NIN/THF            253       247   97.63 %


Standard Defenses
Player           M.Evade  M.Evade %   Shadow  Shadow %   Parry  Parry %
NIN/THF              148    58.73 %       93   93.94 %       6   5.77 %

Comments: I trot out the thief support job to maximize my evasion. (I've seen "Evasion Bonus II" job trait from thief to be both +22 and +23 total.) This may be indispensable for something like Fenrir (I may try soloing it again now that Reraise effects can't be dispelled) but for mundane things not so much. Trading 12 or 13 evasion for all the abilities available to DNC37 (dancer also gets an Evasion Bonus trait) seems like a no-brainer for menial tasks, if I can ever bother to finishing leveling it.

Evasion vs. Goblin Slaughtermen, Temenos - Northern Tower (August 8)

Attacks Against:
Player           Total   Avoided   Avoid %
NIN/THF            241       234   97.10 %


Standard Defenses
Player           M.Evade  M.Evade %   Shadow  Shadow %   Parry  Parry %
NIN/THF              155    65.13 %       71   91.03 %       8   9.64 %

Comments: Ninja soloing for AF+1 in Temenos seems "common" enough for those who have the patience and adequate equipment. I've tended to err toward mixing both haste and evasion if only to speed up the process just a little, so even without maximum evasion, one can still evade a fair amount of attacks. (At least I assume that was the case for this, one of my last Temenos runs.) Sadly, in the past I have actually timed out mainly because of mediocre DD output, but it doesn't really matter to me whether I finish in 20 minutes or 28 minutes.

Enfeebling Despot (October 10)

BLM/RDM
Debuff      # Times   # Successful   # No Effect   % Successful
Bind             26             21             0        80.77 %
Gravity           8              8             0       100.00 %
Poison II        12             11             0        91.67 %

RDM
Debuff      # Times   # Successful   # No Effect   % Successful
Bind              3              2             0        66.67 %
Gravity           6              6             0       100.00 %

Comments: I had such extraordinary success (by my standards) binding Despot that I feel this is an anomaly. I am pretty sure my enfeebling magic skill a few months ago was 269, which isn't good for BLM. Although binding isn't necessary for soloing Despot as a black mage (yes, I didn't solo it here), it can give you a little slack.

Pahluwan Khazagand effect on crit rate (October 16)

Melee Damage
Player            Hit/Miss   M.Avg  #Crit     Crit%
WAR/NIN             459/39  143.43     40    8.71 %
SAM/WAR (Askar)    581/184  140.84     54    9.29 %
MNK72/WAR36       1270/283   52.96    148   11.65 %

Total Experience : 19012
Number of Fights : 100
Start Time       : 10:06:51 AM
End Time         : 11:07:50 AM
Party Duration   : 1:00:58
Total Fight Time : 1:35:08
Avg Time/Fight   : 36.59 seconds
Avg Fight Length : 57.08 seconds
XP/Fight         : 190.12
XP/Minute        : 311.77
XP/Hour          : 18706.50

Comments: I am no fan of the "Mamool Ja north" merit camp, but whatever it takes. I even included the experience summary to show that the exp rate was great (by my standards). Also, I noticed that the monk was wearing the Pahluwan body piece. I have seen bandied about the claim that the crit bonus on Pahluwan is "broken," and I believe this nonsense originated from this idiotic post from 2006. Such fuckers don't realize that the margin of error associated with the sample crit rate in question, even for 718 total hits, will be fairly wide. For example, a 95% Clopper-Pearson interval for the crit rate with Pahluwan body is (0.1366717, 0.1920368), so I wouldn't be talking shit about how the body makes the crit rate "worse."

Going back to the parser output, it seems to confirm the notion that critical hit rate is minimized at 5% (9% with 4 merits). This is to be expected without a sufficient amount of dexterity at this camp. If the monk's base crit rate before equipment was indeed 9% (assuming the monk had all the merits), then there is strong evidence that Pahluwan body does have an effect (trivial conclusion since the item description explicitly states there is one). As for the magnitude of the effect, a 95% CI for the crit rate bonus is (0.00939805, 0.04546965), so I am 95% confident that the true bonus is somewhere in that interval. So much for crit rate being "broken" (not that the effect isn't weak).

Enfeebling Aura Statues

≥ 82 (October 23)

BLM/RDM
Debuff      # Times   # Successful   # No Effect   % Successful
Bind             92             52             0        56.52 %
Dispel            1              0             1         0.00 %
Gravity         234            184             1        78.63 %
Sleep            34             22             0        64.71 %
Sleep II        120             94             0        78.33 %
Sleepga II        1              1             0       100.00 %
Stun             40             39             0        97.50 %

≥ 25 (October 24)

BLM/RDM
Debuff      # Times   # Successful   # No Effect   % Successful
Bind             11              8             0        72.73 %
Gravity          80             62             0        77.50 %
Sleep            11              7             0        63.64 %
Sleep II         26             24             0        92.31 %
Stun             22             22             0       100.00 %

≥ 9 (November 13)

BLM/RDM
Debuff      # Times   # Successful   # No Effect   % Successful
Bind              6              4             0        66.67 %
Gravity          25             20             0        80.00 %
Sleep II          5              5             0       100.00 %
Stun              8              8             0       100.00 %

Comments: Now that I have some working hypothesis on the relationship between magic skill and magic "hit rate" (again, to make a distinction between a "lack of resist" rate and the magic accuracy attribute), I am going to put it to the test against Aura Statues once I reach 289 enfeebling magic skill. (Merciful Cape is absolutely out of the question as I am not that masochistic; Enfeebling Torque is overpriced and obtaining Wizard's Coat +1 is contingent on luck getting the materials.) Oddly, the resist rate estimates seem consistent only for gravity. I'll have to look into it. Again, I wouldn't be surprised that a level correction plays some role.

Direct magic damage to Genbu (October 26)

Bio II
    3:   17
   35:    1
Burst II
 1067:    1
Thundaga III
  532:    1
Thunder IV
   73:    1
   99:    3
  199:    1
  398:    3
  795:    1
  798:   12

Comments: I seemed to have pretty good success damaging Genbu this time.

Dancer (lv14-15) EXP/hour (November 7)

Total Experience : 5845
Number of Fights : 82
Start Time       : 3:07:05 PM
End Time         : 4:17:42 PM
Party Duration   : 1:10:37
Total Fight Time : 1:43:29
Avg Time/Fight   : 51.68 seconds
Avg Fight Length : 75.73 seconds
XP/Fight         : 71.28
XP/Minute        : 82.76
XP/Hour          : 4965.81

Mob Listing
Mob                        Base XP   Number   Avg Fight Time
Akbaba                         ---        1             0.00
Canyon Crawler                  80        1            35.00
Canyon Rarab                    60        2            24.50
Canyon Rarab                    65        5            29.01
Canyon Rarab                    70        9            40.56
Canyon Rarab                    75        4            49.29
Goblin Digger                   80        1          1:33.09
Goblin Thug                     60        1         37:07.29
Goblin Thug                     65        1            35.00
Goblin Tinkerer                 80        1            54.01
Goblin Tinkerer                 90        1          1:00.04
Killer Bee                      70        6            36.41
Killer Bee                      75        5            38.22
Killer Bee                      80        4          2:37.06
Pygmaioi                        65        3            34.68
Pygmaioi                        70        3            50.68
Pygmaioi                        75        7            45.59
Pygmaioi                        80        2          3:35.61
Strolling Sapling               65        8            33.02
Strolling Sapling               70       10            42.63
Strolling Sapling               75        1             6.00
Yagudo Acolyte                  60        2            19.51
Yagudo Persecutor               90        2            42.54
Yagudo Piper                    90        1          1:01.01
Yagudo Scribe                   60        1            13.01
Yagudo Scribe                   65        1            10.00

Comments: I've become progressively less patient with leveling subjobs even though the last few have been easy to solo (against goblin pets), from samurai to dark knight to red mage and, now, dancer. I just don't see myself leveling another job as my playing time wanes, especially considering no job other than dancer will let me spend 70 minutes mowing down every EP nonstop for almost 5k exp/hr.

Monday, October 20, 2008

The relationship between DEX and critical hit rate

My previous post somehow got over 40 "click-throughs" on TTTO, perhaps because its authoritative title, "King's Justice versus Raging Rush," promised a decisive comparison yet its conclusions were slightly less touchy-feely than eyeballing. (I was actually looking for some feedback, but I guess it wasn't meant to be.) In that vein, I also offer this bait-and-switch regarding the relationship between DEX and critical hit rate.

I would not care about such things if not for the prospect of obtaining Byakko's Haidate one day; with its 15 DEX, surely there must be some obvious increase in critical hit rate, right?

In fact, for some reason or another 15 DEX was once "thought" always to increase critical hit rate by a paltry 1-2% despite the reality of sampling error. (I've always wondered how people arrived at such conclusions by sampling. Even if you collected data through a parse, if you had a sample of 2500 hits, the margin of error associated with your crit rate estimate would be as much as 2%.) This conventional "wisdom" was then debunked around March 2007 with a discussion of the DEX/crit relation motivated by the observation that lots of DEX sent crit rates soaring up to some maximum. Coincidentally or not, around that time there was a parallel discussion on Allakhazam about the same topic.

Sure, these people didn't bother to control for mob AGI. Now, it appears evident that your DEX relative to your target's AGI is a factor in the critical hit rate determination. But for the experiments discussed in those threads, AGI wasn't controlled. The AGI of Robber Crabs, a test subject in the Alla thread, apparently is either 39 or 42, and the AGI of Tavnazian Sheep and Miner Bees, targets in the BG thread, probably varies too. But despite the lack of control it was obvious that piling on enough DEX will increase your critical hit rate markedly at some point.

Unfortunately, this conclusion is couched in the lazy terminology of "tiers." Some examples are

(1) "Stack enough DEX to break some critical rate tier, where each point of DEX you add within that tier has a larger effect."

(2) "Any large amounts of DEX before a critical rate tier will not have a major effect on critical hit rate."

Implicit in such statements is that if you don't break a "tier," it isn't worth trying to pile on DEX. In turn, considering that "tiers" in crafting refer to discontinuous jumps in HQ rate, it isn't surprising that a "tier" in terms of crit rate is also thought of as a sudden, discontinuous jump at some critical level of DEX. But the evidence provided in the above threads doesn't really point to such a discontinuous phenomenon.

First, consider the results from BG thread. Amazingly, the point estimates were given as approximations based on sample sizes of about 300 (really, that lazy not to record the exact sample sizes?), but that isn't that big a deal. But these point estimates are themselves random variables with corresponding distributions so it is helpful to visualize confidence intervals for the true values of these crit rates for given levels of DEX, and I created a graph to help with that:

The 95% confidence intervals are represented by black bars with the point estimates centered within the CIs. I also marked what are thought to be the minimum and maximum crit rates for DEX only with gray lines, 9% minimum and 24% maximum with 4/4 critical hit rate merits (who doesn't have those?). Critical hit rate bonuses from equipment are not subject to the caps.

The data corresponding to "low" and "high" DEX on this graph conform to the minimum and maximum crit rates. (At least there is no reason to believe otherwise.) At some point, though, crit rate increases with DEX in seemingly a linear fashion, which could awkwardly be described as a "tier," I suppose. This evokes a parallel with overall hit rate versus accuracy, with a minimum of 20% and a maximum of 95% and hit rate thought to vary linearly with accuracy in between. So if crit rate does increase (linearly) within a certain range of DEX, it is worth adding DEX within this interval all other things being equal. Sure, I guess you are within a "tier" when this happens, but where's the evidence for a discontinuous jump to reach this "tier"?

Furthermore, there is hardly any evidence for the plural tiers.

I've also graphed the first set of data from Allakhazam (first post), which is similar to the BG one:

Interestingly, here the crit rate estimates increase over a 15-DEX range, even more evidence against the idea of a discontinuous jump.

Finally, in the Alla discussion data from the Robber Crabs was pooled. Pooled data generally poses statistical hazards (for one, we're assuming the exact experimental conditions for each person involved but you figure there's gotta some idiot to fuck it up or some other factor... like the fact that the AGI of Robber Crabs varies!), but let's just run with this. I created a graph of 95% CIs for the pooled data as follows:

Even in violating statistical assumptions (independence) it is obvious there is no discontinuous jump in crit rate to be seen that cannot be attributed to sampling error. And even with the fundamental shadiness of this experiment (not controlling AGI), I even had the cheerful temerity to do least-squares linear regression (which itself is inappropriate for a variety of reasons) on the data points for which over 1000 samples were collected, in the DEX region where crit rate seems to increase linearly. For me it's enough to know that there is an obvious increase in crit rate; it doesn't matter what the exact increase will be for 1 additional DEX.

Also, the region is fairly narrow (10-15 DEX) for Robber Crabs, which would explain why people observe a sudden jump when adding DEX, as there is the view that adding DEX for the purposes of increasing crit rate should be an all-or-nothing thing (never mind the reality that the tradeoffs you make to stack DEX make such an attempt impractical).

It isn't necessarily true that the results from robber crabs can be generalized to other mobs. But if this phenomenon is real and can be generalized, then you may not have to go for an all-or-nothing attempt to increase crit rates with DEX, either in an auto-attack or WS phase, as long as your DEX is within the region where DEX is considered helpful.

For robber crabs, this region appears to be between 77 and 92 DEX. The higher level robber crabs in Kuftal Tunnel have 42 AGI, which jibes with the idea that your crit rate is capped when your DEX is 50 higher than your target's AGI.

The "transition region" clearly doesn't start when your DEX is equal to your target's AGI, but where should it start? The statement in the previous paragraph implies that it could start at about 35 DEX above your target's AGI, but this is a troublesome statement to make given that the crit rates consistently appear to be above 9% (the minimum) before 77 DEX. One possible explanation is that crit rate could be a minimum when (DEX - AGI) is less than or equal to 0, and rises very slowly from 0 to around 35. This could be why it's difficult to see any improvement in crit rates from adding DEX on your usual merit mobs, which all have AGI above 67.

I admit I didn't break any new ground, but I thought it might be fun to show my take on this.

Monday, September 22, 2008

pDIF distributions

Anyone with at least a passing interest in how the game calculates physical damage is likely aware of the so-called pDIF factor, which is a function of the ratio of one's attack to the opponent's defense (ATK/DEF). A given value of ATK/DEF corresponds to a specific range, or distribution, of possible damage values constrained by a minimum and a maximum, and one can treat the pDIF graph as a concise summary of the possible distributions from 0 to 2 ATK/DEF.

But what is the underlying probability distribution for all possible ranges? A uniform distribution with the parameters of pDIF minimum and pDIF maximum has strong intuitive appeal because random numbers from a uniform distribution are simple to compute. It would seem impractical for the programmers to mess with normal distributions, and the apparent reliability of pDIF max and min in predicting a range of damage values (at least for one-handed weapons these days) basically precludes the use of standard normal. (It makes no sense to parameterize a normal distribution with pDIF min and max, anyway.)

Moreover, assuming a uniform distribution makes it easier to calculate damage with an expected value of pDIF, which would be just the midpoint between the endpoints of a given distribution if it and all others were really uniform.

But is it really the case? To get a sense of it I considered what would be the easiest, least riskiest, least costly and least time-consuming way to collect data without actually paying attention to the game, which basically meant poking at Campaign fortifications with dual-wield katanas I already had (Mamushito +1).

I acknowledge that my original goal in doing so was not really to gather evidence for a uniform distribution but rather to see to what extent the distribution of damage values might change with an increase in attack from a meat mithkabob (told you I was going on the cheap, and I was thinking maybe the distributions aren't uniform). I also ended up concluding that fortifications are a nice target for testing this, in a way; because of the extremely limited range of actual damage values due to their damage-reduction properties, I had no need to trouble myself with appropriate histogram binning.

A rank promotion later, I put together some "composite histograms" in Excel to summarize my peculiar results:

"Lower attack"

"Higher attack"

While the "higher attack" case didn't yield any surprises, the "lower attack" case was definitely not uniform in the least, but why the bias toward 6 damage? What's up with that?

Nonplussed, I attempted to find any snippets of comments regarding pDIF using Google, and I came across an interesting statement about pDIF, which is paraphrased as follows:

"For a given pDIF distribution, if pDIF 1.0 is within the range of possible pDIF values, pDIF 1.0 has a probability of 1/3, with the other possible values being uniformly distributed otherwise."

This statement, if true, would apply to cases of ATK/DEF between 0.5 and 1.5, which pretty much encompasses everyday conditions when fighting. It seems plausible enough in light of the data I collected, but why would anyone go to the trouble of making it so?

At this point, I thought it might help to try some simulation with random uniform numbers to see if I could obtain similar results to what I showed in the graphs above, and by doing so illustrate a possible method for creating a bias toward pDIF 1.0. The biggest problem was making an educated guess about the fortification's attributes, especially the damage reduction property, but I had to run with something.

For the "higher attack" case I managed to get a similar result to my obtained data with a ATK/DEF ratio of about 1.521:

For the "lower attack" case I was unsure how to simulate a result similar to what I obtained from data collection and I looked for further clarification. One idea held that pDIF 1.0 at the endpoint of a distribution is the result of random values below 1.0 (or above) being rounded up (or down) to 1.0. But this doesn't jibe with a large data set I collected while poking at a fortification (when I regrettably neglected to record STR and attack) where 6 damage (the mode) seems to correspond to 1.0, yet 5 damage was recorded also:

But wait! Ignoring the 6, don't the data suggest a long right-hand tail? A uniform distribution doesn't have tails! And why does the range of damage go from 5 to 14? At 395 attack, maximum damage was shown to be 11. I probably wasn't using a meat mithkabob, and I try to maximize attack speed so I don't bother with attack equipment. But, it might be useful for reference later.

So ultimately, I have no conclusion that I'd rely on. I did perform another simulation to demonstrate how the "lower attack" case described a long time ago might come to pass. Let's say about 25% of all pDIF random values on the interval [1,1.65] (ATK/DEF ratio 1.375) end up being converted to pDIF 1.0, ensuring that 1.0 is the mode of any pDIF distribution that contains it. Otherwise, the data are random uniform numbers. Then, this criterion works in my simulation (rather, I ran the simulation a bunch of times until I found a result that looks similar to the one above):

It's too bad getting a feel for the underlying distribution from a random sample is quite annoying in the case of pDIF. Maybe I'll try again with attack lower than 344 next time.

Data collection was made possible with the "offense detail" feature in kparser. Otherwise I wouldn't even bother.