Monday, October 20, 2008

The relationship between DEX and critical hit rate

My previous post somehow got over 40 "click-throughs" on TTTO, perhaps because its authoritative title, "King's Justice versus Raging Rush," promised a decisive comparison yet its conclusions were slightly less touchy-feely than eyeballing. (I was actually looking for some feedback, but I guess it wasn't meant to be.) In that vein, I also offer this bait-and-switch regarding the relationship between DEX and critical hit rate.

I would not care about such things if not for the prospect of obtaining Byakko's Haidate one day; with its 15 DEX, surely there must be some obvious increase in critical hit rate, right?

In fact, for some reason or another 15 DEX was once "thought" always to increase critical hit rate by a paltry 1-2% despite the reality of sampling error. (I've always wondered how people arrived at such conclusions by sampling. Even if you collected data through a parse, if you had a sample of 2500 hits, the margin of error associated with your crit rate estimate would be as much as 2%.) This conventional "wisdom" was then debunked around March 2007 with a discussion of the DEX/crit relation motivated by the observation that lots of DEX sent crit rates soaring up to some maximum. Coincidentally or not, around that time there was a parallel discussion on Allakhazam about the same topic.

Sure, these people didn't bother to control for mob AGI. Now, it appears evident that your DEX relative to your target's AGI is a factor in the critical hit rate determination. But for the experiments discussed in those threads, AGI wasn't controlled. The AGI of Robber Crabs, a test subject in the Alla thread, apparently is either 39 or 42, and the AGI of Tavnazian Sheep and Miner Bees, targets in the BG thread, probably varies too. But despite the lack of control it was obvious that piling on enough DEX will increase your critical hit rate markedly at some point.

Unfortunately, this conclusion is couched in the lazy terminology of "tiers." Some examples are

(1) "Stack enough DEX to break some critical rate tier, where each point of DEX you add within that tier has a larger effect."

(2) "Any large amounts of DEX before a critical rate tier will not have a major effect on critical hit rate."

Implicit in such statements is that if you don't break a "tier," it isn't worth trying to pile on DEX. In turn, considering that "tiers" in crafting refer to discontinuous jumps in HQ rate, it isn't surprising that a "tier" in terms of crit rate is also thought of as a sudden, discontinuous jump at some critical level of DEX. But the evidence provided in the above threads doesn't really point to such a discontinuous phenomenon.

First, consider the results from BG thread. Amazingly, the point estimates were given as approximations based on sample sizes of about 300 (really, that lazy not to record the exact sample sizes?), but that isn't that big a deal. But these point estimates are themselves random variables with corresponding distributions so it is helpful to visualize confidence intervals for the true values of these crit rates for given levels of DEX, and I created a graph to help with that:



The 95% confidence intervals are represented by black bars with the point estimates centered within the CIs. I also marked what are thought to be the minimum and maximum crit rates for DEX only with gray lines, 9% minimum and 24% maximum with 4/4 critical hit rate merits (who doesn't have those?). Critical hit rate bonuses from equipment are not subject to the caps.

The data corresponding to "low" and "high" DEX on this graph conform to the minimum and maximum crit rates. (At least there is no reason to believe otherwise.) At some point, though, crit rate increases with DEX in seemingly a linear fashion, which could awkwardly be described as a "tier," I suppose. This evokes a parallel with overall hit rate versus accuracy, with a minimum of 20% and a maximum of 95% and hit rate thought to vary linearly with accuracy in between. So if crit rate does increase (linearly) within a certain range of DEX, it is worth adding DEX within this interval all other things being equal. Sure, I guess you are within a "tier" when this happens, but where's the evidence for a discontinuous jump to reach this "tier"?

Furthermore, there is hardly any evidence for the plural tiers.

I've also graphed the first set of data from Allakhazam (first post), which is similar to the BG one:



Interestingly, here the crit rate estimates increase over a 15-DEX range, even more evidence against the idea of a discontinuous jump.

Finally, in the Alla discussion data from the Robber Crabs was pooled. Pooled data generally poses statistical hazards (for one, we're assuming the exact experimental conditions for each person involved but you figure there's gotta some idiot to fuck it up or some other factor... like the fact that the AGI of Robber Crabs varies!), but let's just run with this. I created a graph of 95% CIs for the pooled data as follows:



Even in violating statistical assumptions (independence) it is obvious there is no discontinuous jump in crit rate to be seen that cannot be attributed to sampling error. And even with the fundamental shadiness of this experiment (not controlling AGI), I even had the cheerful temerity to do least-squares linear regression (which itself is inappropriate for a variety of reasons) on the data points for which over 1000 samples were collected, in the DEX region where crit rate seems to increase linearly. For me it's enough to know that there is an obvious increase in crit rate; it doesn't matter what the exact increase will be for 1 additional DEX.

Also, the region is fairly narrow (10-15 DEX) for Robber Crabs, which would explain why people observe a sudden jump when adding DEX, as there is the view that adding DEX for the purposes of increasing crit rate should be an all-or-nothing thing (never mind the reality that the tradeoffs you make to stack DEX make such an attempt impractical).

It isn't necessarily true that the results from robber crabs can be generalized to other mobs. But if this phenomenon is real and can be generalized, then you may not have to go for an all-or-nothing attempt to increase crit rates with DEX, either in an auto-attack or WS phase, as long as your DEX is within the region where DEX is considered helpful.

For robber crabs, this region appears to be between 77 and 92 DEX. The higher level robber crabs in Kuftal Tunnel have 42 AGI, which jibes with the idea that your crit rate is capped when your DEX is 50 higher than your target's AGI.

The "transition region" clearly doesn't start when your DEX is equal to your target's AGI, but where should it start? The statement in the previous paragraph implies that it could start at about 35 DEX above your target's AGI, but this is a troublesome statement to make given that the crit rates consistently appear to be above 9% (the minimum) before 77 DEX. One possible explanation is that crit rate could be a minimum when (DEX - AGI) is less than or equal to 0, and rises very slowly from 0 to around 35. This could be why it's difficult to see any improvement in crit rates from adding DEX on your usual merit mobs, which all have AGI above 67.

I admit I didn't break any new ground, but I thought it might be fun to show my take on this.

Thursday, October 16, 2008

King's Justice versus Raging Rush

How does King's Justice stack up to Raging Rush? I decided to waste my time providing an answer to this question by creating some frivolous graphs to compare the average WS damage of Raging Rush with that of King's Justice on everyone's favorite canonical merit party fodder, the greater colibri (lv 82).

Given that the current incarnation of the physical damage equation is still a reasonable approximation (a generous assumption), I calculated these averages based on the attributes of my character's WS setup. (And to make more approximations upon approximations, I assumed the pDIF distribution for my cRatio, 1.433, was uniform over [1, 1.719].) Interestingly, FFXIclopedia gives a fTP "bonus" of 0.5 for the first hit of Raging Rush, which contradicts other sources (Gobli among them) and seems incorrect. I used 1.0 because if it were 0.5, Raging Rush would obviously be inferior. (I suppose I should get into some merit party for the first time in months to see if my calculations are way off.)

I plotted average WS damage of R.R. and K.J. versus critical hit rate since I don't know how exactly the TP modifiers affect crit rate for Raging Rush and neither does anyone else:



Suppose that at 100 TP there is no crit rate bonus for Raging Rush. Looking around for the relationship between DEX and crit rate, I place my overall crit rate at 12% on colibri, and behold, Raging Rush and King's Justice are pretty close in average damage. If this is indeed the case in practice, I probably won't bother unlocking King's Justice just for better Mighty Strikes/300 TP weapon skills. Skillchains? No one cares.

Recall that Raging Rush's first-hit damage used to vary with TP (1.00/1.50/2.50 at 100/200/300 TP, but .35 STR modifier as it is now), so you can get a sense of the magnitude of the increase in average R.R. damage since the exalted "2-hander update" just by looking at the graph (starting at 0% critical hit rate and ending at whatever crit rate you think is associated with R.R.).

Of course, mere averages don't give any idea of the distribution of possible WS damage values. I've seen a few comments that King's Justice is more consistent than Raging Rush, and that Raging Rush yields higher "spikes." You certainly don't need to do any frivolous simulation to lend credence to this perception. I'm not even going to say the shapes of these simulated distributions of WS damage for R.R. and K.J. are even accurate (after piling on approximation after approximation, I wouldn't think so), but they do give some idea of their variance. Even though the average WS damage is close, there is slightly less variance in WS damage associated with King's Justice. (The "sample" means for both R.R. and K.J. damage were within single digits of one another.)

Sunday, October 12, 2008

La Vaule seized

It seems at the eleventh hour some of the Japanese population on Fenrir server took the initiative to gain control of La Vaule for this week. Does this mean the "Splitting Heirs" Campaign Op is available on Fenrir? Will Fenrir be swimming in Cuchulain's Mantles and Witch Sashes? Not so fast...


As you can see, San d'Oria lost control of Jugner Forest, of all areas! Way to hold it down! At the moment though Sandy is still up on the beastmen in La Vaule, but their advantage will probably be erased by the end of the day.

How's this for idiotic: some user on FFXIclopedia observed that the existence of Cuchulain's Mantles on Asura, without any nation having control of all its contiguous areas, "disproves" the idea that control of all areas is required for access to "Splitting Heirs" and the like. Have you ever heard of server transfers? I see that the concept of arbitrage is way beyond your ken.

Moving on, I've heard that Windurst has control of all its adjacent areas on Phoenix server, so its "beastman assassination" analogue to "Splitting Heirs" should be available. The Campaign op is called "Plucking Wings," and I'm sure we can find some information about it in a little bit. Check wiki.ffo.jp periodically. Phoenix server already has an auction house listing for Karasutengu Kogake (INT +3, Campaign: refresh effect) and Roundel Earring.

Edit (Monday): Cuchulain's Belt is also a potential reward.

To summarize, here are the treasure pools for "Splitting Heirs" and "Plucking Wings," and an inferred one for Bastok's "Cracking Shells":

Splitting Heirs (La Vaule):
0-1 of Cuchulain's Mantle, Orcish Gauntlets, Witch Sash
1 of Brave Grip, Wise Strap
2 of "miscellaneous" items (gems, Spectacles, Vile Elixir +1, etc)

Plucking Wings (Castle Oztroja):
0-1 of Cuchulain's Belt, Karasutengu Kogake, Roundel Earring
1 of Brave Grip, Wise Strap
2 of "miscellaneous" items (gems, etc)

Cracking Shells (Beadeaux):
0-1 of Airy Buckler, Balestarius, Crapaud Earring
1 of Brave Grip, Wise Strap
2 of "miscellaneous" items

Looking at the dwindling number of items unaccounted for from the June version update, it's a reasonable guess that Crapaud Earring and Airy Buckler come from Bastok's "Cracking Shells." Both those items don't have an "ex" flag. But both of them have limited appeal. The shield can be used only by THF/PUP/DNC. Well, the Crapaud Earring is of interest to the vast majority of black mages who don't have Novio Earring. It'll help push my Thunder IV to 1470 without food on Ebony Puddings if I maximize INT in all my slots. If I had Novio, that number would be 1512 before food. (I currently give up 8 INT out of cheapness and desire for a "maximum MP" setup for NW Apollyon.)

The fact that Castle Oztroja can be overtaken gives me hope that Beadeaux can be taken over as well even though the logistics are daunting. I just don't expect any group on Fenrir to be willing and capable of doing so.

Saturday, October 11, 2008

Number of quests

It's not uncommon when visiting FFXIclopedia to see in the "Latest Activity" box some asshole updating a personal checklist of maps or quests. (I myself maintain a list of incomplete quests, but not for public display.) FFXIclopedia does not distinguish between real quests that appear in the quest logs and those that could be considered quests but do not appear in the logs. In case you have some interest in knowing the number of quests available for each region, I've tallied the number of quests that appear in the quest log. (This excludes garbage like "beastman treasure" and chocobo riding.) Please correct me if my totals are wrong.

As of the Sept 2008 version update:
Jeuno:       119
Other: 62
Outlands: 48
Aht Urhgan: 64
Crystal War: 41
San d'Oria: 79
Bastok: 87
Windurst: 89

Total: 589

I will be updating these totals when new version updates come out. Not really useful, but if you ever feel like rebutting some player's claims about doing 500+ quests, you can see if he's (is it ever a she?) BS'ing by asking how many quests he's completed for each region. It doesn't take too long to count accurately, eight at a time, by scrolling with shift + right key.

Thursday, October 2, 2008

Occasionally posts once

It's hard to tell what the consensus is about how frequently Ridill and Kraken Club process x number of hits. Two Japanese sources indicate either directly or indirectly that for Ridill the proportions for one, two, and three hits per attack round are .3, .5, and .2, respectively.

For Kraken Club, however, Studio Gobli gives the distribution of swings per attack round as "5:15:25:25:15:10:3:2". This corresponds to 3.82 expected hits per attack round. Another source (I don't know how "authoritative" it is) specifies 3.2 expected swings per attack around without giving any proportions.

Out of curiosity, I'd like to know how exactly these claims are justified. Did an SE representative give out this information, so it must be true? If these claims were justified empirically, where are the data?

But why should anyone really care about the distribution of number of hits for multihit weapons? Believe it or not, some freaks have been concerned that double attack traits (from the warrior job trait, equipment, Fighter's Roll, whatever) attenuate the number of triple attacks for Ridill (and number of attacks greater than 2 for K. Club), so it may be helpful to know whether this attenuation results in worse performance of Ridill (and other multi-hit weapons) in the presence of double attack than without DA. I myself am more interested in how to analyze any data collected in support or contradiction of a belief. This is for the sake of making conclusions that are marginally better than hand-waving about "margin of error" without even quantifying it.

Collecting data for Kraken Club from English-language sources appears to be a non-starter, but some data for Ridill is easily found. The talk page for Ridill on FFXIclopedia has some good data sets for the number of x hits (x = 1, 2, 3). This is assuming that FFXI's random number generator is sufficiently random (no reason to believe otherwise).

Apparently, the purpose of this data collection was to find evidence that DA affects Ridill's output. But how would DA affect Ridill's output? There were two claims implied by the inane discussion:

(1) Double attack trait processes on all attack rounds equally. This means that single attacks are "converted" to double attacks and triple attacks are "converted" to double attacks. (DA trait "overrides" the Ridill proc.) DA trait may also process when a double attack occurs, but there is no difference in result. As a result, the proportions of single and triple attacks are reduced by the same percentage.

(If the average number of hits/round is less than 2, the net result is a slight increase in Ridill output. If exactly 2, no change regardless of DA level. If greater than 2, a slight decrease in Ridill output.)

(2) DA trait "disproportionately" reduces the number of triple attacks compared to single attacks. Ridill nerfed!

The second claim is really a poorly formed and vague hypothesis; there is no suggestion as to how to express this hypothesis in numerical terms. In contrast, the first claim at least provides some basis for statistical inference because there is a specific claim of how DA interacts with Ridill.

Supposing that the multihit distribution of Ridill as stated previously is really true (a working assumption), then we can calculate Ridill's hit distribution in the presence of warrior's double attack job trait (10% DA) under the first claim:

single: .3(1-.1) = .27
double: .5 + .1(.3 + .2) = .55
triple: .2(1-.1) = .18

The very first data set on the talk page was collected using a WAR/NIN with no other DA from equipment or other sources. The sample proportions are

single: 276/1020 = 0.2705882
double: 541/1020 = 0.5303922
triple: 203/1020 = 0.1990196

At first blush, there seems to be no need to go through the motions of performing a statistical analysis. (Never mind that I saw the data before proposing a hypothesis...) Even though the usual logic of using some statistical hypothesis test doesn't really hold (not trying to assemble evidence against a "null" hypothesis, but rather trying to find corroborating evidence to support one), I use this example to illustrate a few approaches one might use to analyze the data.

One approach is to generate simultaneous confidence intervals (with some pre-specified confidence level) for the proportions of single, double, and triple attacks.

Formally speaking, these multihit distributions can be modeled using a multinomial distribution with educated guessing about the parameters (the proportions of x-hits). Given the data above, a set of approximate simultaneous CIs, using the approach of Goodman (1965), will give a range of probable values of the true proportions of Ridill's x-hits.

If I wanted to be (at least) 95% confident that all the confidence intervals contained the true proportions, then I obtain this set of CIs for the given data:

single: (0.23864, 0.30510)
double: (0.49292, 0.56753)
triple: (0.17081, 0.23059)


I think a family of (simultaneous) CIs is more useful than a CI for an individual proportion if only to get some sense of the "big picture" and limit your attention to "plausible" sets of multiple proportions. With the right techniques, your CIs won't be much wider than the individual CIs you would calculate the usual way. The downside is that there aren't any statistical packages that have built-in options to generate simultaneous intervals.

Conclusion: The above CIs happen to cover the null parameters, so the proposed model seems like a good fit to the data, using the logic of a goodness-of-fit test ("accepting" a null hypothesis in the absence of contradictory data). ("Double attack trait processes on all attack rounds equally.")

Instead of dealing with confidence intervals for multiple proportions, you could focus your attention instead on confidence intervals for the sample mean (expected value) of the number of hits per attack round, which is a random variable just as the numbers of single/double/triple attacks are random variables (all of which depend on the sample size, hence the use of the sample mean).

Indeed, the mean number of hits per attack round is a linear function of the numbers of single/double/triple attacks, and we can use this observation to compute the variance of the sample mean, using the fact that the sum of the individual proportions must equal 1 (for any multinomial distribution).


Thus, for the "null" hypothesis we are currently considering, the expected value of the sample mean of hits/round is 1.91, and the variance of the sample mean is 0.0004332353. By the central limit theorem, the sampling distribution of the sample mean is approximately normal for sufficiently large n. We can use this fact to obtain confidence intervals for the (sample) expected value of number of hits/round.

Personally, I don't think I would bother employing this method. It might be easier to understand if only for the sake of debunking bullshit assertions that arise from point estimates of the expected value for a given sample size. (I'll point out a few of these assertions after I use this method for the previously considered data.) But you lose a sense of the "big picture" when you sacrifice detail for concision.

From the data above, it can be shown that the sample mean of hits/round is 1.928431. Since we already have an assumption about the expected value of the sample mean, we might as well use the population variance of the sample mean (0.0004332353) instead of fussing with a sample variance. (You could also argue that with a sample size of 1,020, who cares?) Then, a 95% confidence interval for the sample mean of hits/round is

1.928431 ± (1.959964)(0.02081431) or (1.888, 1.969)

Recall that the expected value of the sample mean is 1.91. There is no reason to believe that 1.928 is an "extreme" result, assuming that the true distribution of Ridill multi-hits with DA job trait is .27/.55/.18. This can be illustrated with a histogram of a simulated sampling distribution of hits/round (dotted vertical line denoting 1.928431 from the sample and red vertical lines denoting the bounds of the CI), overlaid with a graph of a normal distribution with mean 1.91 and variance 0.0004332353:


Note that the normal distribution and the simulated sampling distribution agree, as expected.

Conclusion: Using the criterion of "average swings per attack round", the proposed model seems like a good fit to the data. ("Double attack trait processes on all attack rounds equally.")

So how does this apply to the discussion of Ridill on FFXIclopedia? To reiterate, from the first data set (Ridill multihits with WAR DA trait only), the estimated sample value was 1.928. Later on, there is a data set for Ridill multihits in the presence of WAR DA trait, Brutal Earring (assumed DA 5%), Warrior's Cuisses (1%), and Fighter's Calligae (1%), for a total of 17% DA. Does DA "nerf" Ridill or not going from 10% DA to 17% DA? (Whether or not it's really 17% DA, it's higher than 10%.)

Similar to what was shown earlier, it is easy to calculate an alternative distribution under 17% DA (null being 10% DA), assuming DA affects all x-hits equally:

single: .3(1-.17) = .249
double: .5 + .17(.3 + .2) = .585
triple: .2(1-.17) = .166

The sample proportions from the data are

single: 257/1022 = 0.2514677
double: 611/1022 = 0.5978474
triple: 154/1022 = 0.1506849

The sample mean of hits/round for Ridill is 1.899 given DA 17%, which is less than 1.928 given DA %10.

I recall on BG someone drew the erroneous conclusion that additional DA (from equipment) has the effect of "nerfing" Ridill without accounting for random variability! But before evaluating this assertion, I want to finish up discussing whether the alternative hypothesis is a good fit to the data.

Is 1.899 an "extreme" result given the "alternative" hypothesis just specified? Under the alternative, the expected value of the sample mean of hits/round is 1.917, and the variance is 0.0003993258. We can then repeat the exercise of generating a graph, this time of a normal distribution with mean 1.917 and variance 0.0003993258, along with a simulated sampling distribution of the mean:


As you can see, 1.899 is not an extreme result under the above distribution. Furthermore, because the expected value of hits/round is 1.917 and the underlying (sampling) distribution is normal, if you repeat this experiment many, many times, about half of the observed hits/round must be below 1.917, and about half must be above 1.917.

But this wasn't the null distribution, or the point of the comparison. Even under the null distribution (first graph), 1.899 is not an extreme result. This shows that for sample sizes around 1,000 (1,000 is really large for any typical hypothesis testing that "really matters"), the effect of DA, if it really exists, is obscured by random error, at least under the assumptions I'm subscribing to.

If the Japanese sources are really correct, then there is no point in doing statistics. But if they are not correct, statistics probably won't help to reveal what seems to be a very slight effect from a change in DA (without using excessive sample sizes). Assuming that calculating average number of hits/round is valid, going from 10% DA to 17% DA is, in the long run, a .37% increase in hits/round.

Conclusion: using the "number of hits/round" criterion, the evidence doesn't show that a DA increase has a "statistically significant" effect, neither worse nor better. (Here, I wanted to find evidence against the null of "no change from 10% DA to ~17% DA.)

(If you used the method of obtaining simultaneous 95% confidence intervals instead, you would get (0.22043, 0.28528) for singles, (0.56068, 0.63392) for doubles, and (0.12585, 0.17942) for triples, each of which covers the parameters they correspond to for the 17% DA case. Incidentally, they don't cover the parameters under the 10% DA case. In fact, a chi-square goodness-of-fit test would "reject" at the 5% level the null hypothesis that the data are a random sample from the case where DA is 10%. Such are the perils of choosing appropriate statistics for inference.

Since the null hypothesis model is a not-so-good fit to the data, maybe you would favor the idea that DA improves the output of Ridill, however negligible.)

At this point, you might be wondering what's the point of this post then, and I'm wondering that myself, too. The point is that when taking a random sample of data, remember the "random" part. An effect that you happen to observe in a one-shot sample could easily be ascribed to sampling error, and a goal of statistical inference is to rule out random variability as a possible explanation.

Finally, would random error explain what appears to be an increase in triple attacks in the presence of DA from equipment? Sure. (I already said it's possible earlier, but here is yet another illustrative example.) Consider the following data set (source: QCDN):

War/Drg + Askar Korazin & Brutal Earring:

Triples: 18.37%
Doubles: 59.77%
Sinlges: 21.86%
Total: 430 Rounds. 845 Swings. (1.97 Swings/Round)

I would have to assume out of 430 rounds, 79 triples, 257 doubles, and 94 singles occurred. If DA procs on all hits equally (17% here), then the hypothesized proportions of single/double/triple attacks are .249/.585/.166 respectively. Note that the sample size is 430. A 95% confidence interval for the number of swings/round is

1.965116 ± (1.959964)(0.03054195) or (1.905, 2.025)

This CI happens to cover what we assume is the true expected value (1.917). 1.965 swings/round is not so "extreme" a result if our assumptions are indeed true.

Monday, September 29, 2008

Not the path of least resistance

For players obsessed with rewards through safe, unadventurous busywork with long-established best practices, "beastman stronghold invasion" (FFXIclopedia) in "Shadowreign" areas draws almost zero interest, which is also the case for most of WoTG that isn't Campaign so far (deserved or not). Most of the items are lackluster or useless, but it's precisely because of that (nothing lootwhores want) that a group of players shouldn't have too much trouble obtaining access to the culminating battles (18 members maximum) in each of the beastman strongholds. The potential rewards from these battles aren't all that enticing, either--marginal rare/ex items aside from beastman headgear and low-value auctionable stuff--but the battles seem like a fun diversion, and if your group doesn't operate during Japan prime time you shouldn't have any problems with competition for scarce resources. (The NMs are timed spawns and are slow to respawn after being killed.) And even though the items aren't all that, you can still hope "early adopters" blow their gil on these latest toys.

Let's not forget the dealbreaker of having "friends" to do this with, but that wasn't going to stop me from frittering away the exp buffer I unintentionally built up while auto-attacking fortifications last time.

Starting with La Vaule, six of the eight NMs are monks, rangers, or mages (SCH, BLM), automatically ruling them out as feasible for ninja soloing, at least without a dancer sub ("Easy Mode") to try (in vain?) to counter a steady loss of HP. I looked for Coinbiter Cjaknokk (DRK) to see first hand if it would actually spam Shoulder Charge, but it was nowhere to be found.

Dismayed, I shifted my attention to Beadeaux, where five of eight NMs are rangers, monks, or mages, leaving Mu'Nhi Thimbletail (THF), Ga'Lhu Nevermolt (PLD), and Di'Zho Spongeshell (DRK). Spongeshell alternates between melee-absorbing and magic-absorbing states (absorbing meaning healing), and I didn't feel like committing inventory space to elemental ninjutsu (assuming I could even make inroads on it). Thimbletail was nowhere to be found, leaving Nevermolt.

Not unexpectedly, these NMs can be found among the general population, which can make isolating them very time-consuming, but Nevermolt will separate from nearby Quadav, making a pull easy. But, it was painfully obvious after several single-digit katana swings (most for 0) that I had no chance. Even worse, committing to any sort of evasion setup is completely pointless: out of 45 attacks, I evaded exactly 2 and parried 3 and I used my evasion setup exclusively. Paladin spells do consume shadows, yes, but the only reason I even lasted that long was that Nevermolt turns its shell on you periodically, giving you some time to recast shadows. Hojo resisted several times, too.

I hold out hope that I can see Thimbletail in the next 24 hours so I can proceed to get dumped and give up the futility.

Monday, September 22, 2008

pDIF distributions

Anyone with at least a passing interest in how the game calculates physical damage is likely aware of the so-called pDIF factor, which is a function of the ratio of one's attack to the opponent's defense (ATK/DEF). A given value of ATK/DEF corresponds to a specific range, or distribution, of possible damage values constrained by a minimum and a maximum, and one can treat the pDIF graph as a concise summary of the possible distributions from 0 to 2 ATK/DEF.

But what is the underlying probability distribution for all possible ranges? A uniform distribution with the parameters of pDIF minimum and pDIF maximum has strong intuitive appeal because random numbers from a uniform distribution are simple to compute. It would seem impractical for the programmers to mess with normal distributions, and the apparent reliability of pDIF max and min in predicting a range of damage values (at least for one-handed weapons these days) basically precludes the use of standard normal. (It makes no sense to parameterize a normal distribution with pDIF min and max, anyway.)

Moreover, assuming a uniform distribution makes it easier to calculate damage with an expected value of pDIF, which would be just the midpoint between the endpoints of a given distribution if it and all others were really uniform.

But is it really the case? To get a sense of it I considered what would be the easiest, least riskiest, least costly and least time-consuming way to collect data without actually paying attention to the game, which basically meant poking at Campaign fortifications with dual-wield katanas I already had (Mamushito +1).

I acknowledge that my original goal in doing so was not really to gather evidence for a uniform distribution but rather to see to what extent the distribution of damage values might change with an increase in attack from a meat mithkabob (told you I was going on the cheap, and I was thinking maybe the distributions aren't uniform). I also ended up concluding that fortifications are a nice target for testing this, in a way; because of the extremely limited range of actual damage values due to their damage-reduction properties, I had no need to trouble myself with appropriate histogram binning.

A rank promotion later, I put together some "composite histograms" in Excel to summarize my peculiar results:

"Lower attack"

"Higher attack"

While the "higher attack" case didn't yield any surprises, the "lower attack" case was definitely not uniform in the least, but why the bias toward 6 damage? What's up with that?

Nonplussed, I attempted to find any snippets of comments regarding pDIF using Google, and I came across an interesting statement about pDIF, which is paraphrased as follows:

"For a given pDIF distribution, if pDIF 1.0 is within the range of possible pDIF values, pDIF 1.0 has a probability of 1/3, with the other possible values being uniformly distributed otherwise."

This statement, if true, would apply to cases of ATK/DEF between 0.5 and 1.5, which pretty much encompasses everyday conditions when fighting. It seems plausible enough in light of the data I collected, but why would anyone go to the trouble of making it so?

At this point, I thought it might help to try some simulation with random uniform numbers to see if I could obtain similar results to what I showed in the graphs above, and by doing so illustrate a possible method for creating a bias toward pDIF 1.0. The biggest problem was making an educated guess about the fortification's attributes, especially the damage reduction property, but I had to run with something.

For the "higher attack" case I managed to get a similar result to my obtained data with a ATK/DEF ratio of about 1.521:


For the "lower attack" case I was unsure how to simulate a result similar to what I obtained from data collection and I looked for further clarification. One idea held that pDIF 1.0 at the endpoint of a distribution is the result of random values below 1.0 (or above) being rounded up (or down) to 1.0. But this doesn't jibe with a large data set I collected while poking at a fortification (when I regrettably neglected to record STR and attack) where 6 damage (the mode) seems to correspond to 1.0, yet 5 damage was recorded also:


But wait! Ignoring the 6, don't the data suggest a long right-hand tail? A uniform distribution doesn't have tails! And why does the range of damage go from 5 to 14? At 395 attack, maximum damage was shown to be 11. I probably wasn't using a meat mithkabob, and I try to maximize attack speed so I don't bother with attack equipment. But, it might be useful for reference later.

So ultimately, I have no conclusion that I'd rely on. I did perform another simulation to demonstrate how the "lower attack" case described a long time ago might come to pass. Let's say about 25% of all pDIF random values on the interval [1,1.65] (ATK/DEF ratio 1.375) end up being converted to pDIF 1.0, ensuring that 1.0 is the mode of any pDIF distribution that contains it. Otherwise, the data are random uniform numbers. Then, this criterion works in my simulation (rather, I ran the simulation a bunch of times until I found a result that looks similar to the one above):


It's too bad getting a feel for the underlying distribution from a random sample is quite annoying in the case of pDIF. Maybe I'll try again with attack lower than 344 next time.

Data collection was made possible with the "offense detail" feature in kparser. Otherwise I wouldn't even bother.