The Unbearable Triteness of Preening: June 2009

Friday, June 26, 2009

Retaliation

The warrior job ability Retaliation (level 60) is something I rarely use outside Nyzul (for bosses), so I was curious about how frequently it activates. This is something I would probably have a good idea about just from doing Campaign... if I actually did Campaign. Of course, FFXIclopedia's article is not much help as to what actually affects this frequency (there seems to be variability based on something else other than neglecting to reactivate Retaliation), so I went looking for any sort of data elsewhere.

From wiki.ffo.jp, apparently it is thought that Retaliation rate is based on weapon delay, from 20% at 999 delay to 50% at 200 delay, subject to further verification. (At least this is what I gleaned from Google Translate.)

As for data to support that kind of trend, so far I found one source that describes the results of dual wielding Brass Jadagna +1 (334 delay) and Caduceus (216 delay). Indeed, it is not immediately obvious whether Retaliation, in a dual wield context, is dependent only on the main weapon or based on total delay. The results indicate that Retaliation rate depends only on the main weapon:

Brass Jadagna +1 (main)/Cauduceus (sub): 71/244 (.291) with 95% confidence interval (0.2347960, 0.3523304)

Caduceus (main)/Brass Jadagna +1 (sub): 110/241 (.456) with 95% confidence interval (0.3923515, 0.5215964)

I would assume the experiment was done with Retaliation constantly active, at maximum hit rate (95%), and the Retaliation rate is not dependent on mob's melee attack speed and level (to account for the possibility that there were different mobs involved). Even more important, I would assume the rates were estimated by taking the ratio of Retaliation procs over the number of times being hit.

If these assumptions actually hold, then Retaliation frequency does appear to depend on the main weapon delay only, not the total delay from both weapons. The higher delay weapon, Brass Jadagna +1, had a significantly lower rate than the lower delay weapon, Caduceus. It is not clear whether dual wielding would be any different from using a 2-handed weapon, but I don't see why there would be a difference.

Chocobo update: winning streak ended at 10; have won 12 of the last 13 (7 uncontested). Record is 111-79-24-15.

Wednesday, June 24, 2009

Phantom Roll and support for discretion to roll however you want

Before I discuss the optimality of several approaches to using Phantom Roll, I want to talk glibly about whether Phantom Roll even involves the use of a fair die in practice, which is the major assumption underlying Phantom Roll "strategies." I will use Pearson's chi-square test to check badness of fit of the following count data.

This is the only discussion I've seen so far (2006) that entertains the possibility that the outcome of Phantom Roll (I through VI inclusive) is not uniformly distributed. The point of the data collection was to find some evidence that the die becomes weighted in the presence of the optimal job associated with the specific roll (Bard with Choral Roll, etc.). You can read the thread for details.

The first example, with 700 uses of Corsair's Roll, did not yield compelling evidence against unbiasedness (p-value .0811).

The other eight examples involved sampling 100 times under varying conditions. At this point it bears reminding that the distribution of p-values under the null hypothesis of a fair die is (asymptotically) uniformly distributed (keeping in mind bin specification for the sake of generating histograms), as illustrated below with a bunch of simulated p-values sorted into histogram bins, given a sample size of 100.

I bring this up only as a reminder of what the possible p-values can be under the null hypothesis.

For the remaining eight data sets, tests for unbiasedness yield p-values of .4614, .2739, .3601, .007439, .7974, .3101, .09696, and .2763. Based on this crude analysis, only the data set for Healer's Roll with WHM present showed statistically significant evidence of biasedness (specifically 30/100 for a roll of I), but compared to the other non-significant results, it seems difficult to attribute this to something other than Type I error.

Of course, the primary question of interest was not whether Phantom Roll gives unbiased rolls regardless of situation, but whether the presence of the optimal job changes the "weight" of the roll. Tests for homogeneity for each specific roll (multiple testing duly noted) give p-values of .5402 (bard), .1077 (white mage), .6433 (ranger), and .1099 (thief).

Generally speaking, chi-square tests have pretty low power, and one tends not to "invert" these to (sets of) confidence intervals to get a good sense of how (in)adequate the sample sizes are. But considering this data as a whole there isn't a particularly good reason to think that the Phantom Roll die is biased.

Now, optimality of two Phantom Roll approaches

The following could basically be summarized as comparing the pros and cons of busting more versus busting less depending on how you go about doubling up.

There is a spreadsheet that provides a convenient summary of whether to Double-Up for various roll types, based on a criterion of conditional expectation (actual buff value), given the current roll total. Basically, consideration of (conditional) expected value is a formal way to make a decision that can be mostly carried out using common sense--you will never double up with a total of 11, as the expected value of the buff after Double-Up must be 0--but addressing borderline cases where it may not be obvious whether one should double-up, for example if your current roll is 6. I awkwardly call this the "expected value on double-up" (EVDU) approach.

The spreadsheet also gives an unconditional expected value of the roll after doubling up based on the expected value criterion, which could be useful for comparing different types of rolls on a "long-run" basis.

For wannabe nerds who can't even calculate the conditional expectations or understand probability, that is one way to go about it. Not unexpectedly, these min/maxing wannabe nerds frown upon conservative approaches that seek to minimize the probability of a Bust, with the implication that people who refuse to Double-Up on a 6 are "suboptimal." For the remainder of this post, I will call categorical refusal to Double-Up on 6 (unless 6 is unlucky), yet still Doubling-Up if one gets an "unlucky" total (therefore risking a Bust), as the "conservative" approach (and the only one I will consider in this post).

I am willing to bet that most of the people who advocate EVDU (implicitly or not) never actually bothered to compare EVDU with more conservative approaches quantitatively, especially for specific types of rolls. By quantitatively, I mean comparing (unconditional) expected values under each approach to see how much better in the long-run EVDU is, and also comparing the busting proportions under each approach to see how much riskier in the long-run EVDU is.

Personally, I don't really give a shit what rolling strategy a corsair actually uses, since to me it mostly falls under the purview of individual playing style.

Consider Corsair's Roll, for example. Under EVDU, the expected percentage increase in EXP is 15.66% while the conservative approach gives an expected increase of 15.55%, which to me is a really trivial difference. Moreover, the probability of busting under EVDU is .051 while the probability of busting "conservatively" is 0. If you are willing to assume an actual 5% (non-zero) chance of busting for a theoretical 0.11% long-run increase in EXP, fine. But here, the tradeoff between risk and reward is not all that good.

I also estimated the probabilities for the Corsair's Roll bonuses under each strategy (since I didn't want to waste even more time thinking about how to hand-calculate them) to make it easier to compare the strategies in probabilistic terms. (Relative frequencies may not add up to 1 due to rounding.)

COR Roll	EVDU	Conservative
Bust	.051	.000
8%	.134	.082
13%	.000	.309
15%	.193	.142
16%	.165	.114
17%	.095	.044
20%	.309	.309
24%	.052	.000

I colored the relevant probabilities one "side" might use to make a case against the other. Note that the probability of obtaining the "lucky" result (20% EXP increase) is the same regardless of approach. I also did the same for Hunter's Roll (melee and ranged accuracy) and Chaos Roll (melee and ranged attack) without the presence of the optimal job.

Again, the tradeoff between risk and reward is not so great. Whether you, as a corsair, want to make that tradeoff should be up to you and not to dumbasses who need to rely on mindless rules of thumb because they don't know any better. Personally, I would rather allocate all of my busting risk to another roll rather than to Corsair's Roll if the increased risk is actually worth it on another roll. But when is it worth it? I repeat the above exercise with both Hunter's Roll and Chaos Roll, rolls that are available early on.

For Hunter's Roll, the expected value under EVDU is 29.63 accuracy, and 28.09 taking the more conservative tack. Clearly, a 1.54-point difference in average accuracy is such a profound increase as to assume a greater risk of busting. The estimated probabilities are given below.

RNG Roll	EVDU	Conservative
Bust	.135	.057
20	.000	.3o9
25	.194	.142
27	.161	.101
30	.124	.063
40	.264	.264
50	.122	.064

For Chaos Roll, the expected value under EVDU is 18.6% attack increase (47.5/256), and 17.6% attack increase (45.0/256) playing it conservatively. Again, a 0.98% average difference in attack obviously warrants the increased risk of busting. The estimated probabilities are given below.

DRK Roll (xx/256)	EVDU	Conservative
Bust	.134	.058
32	.000	.3o8
40	.193	.142
44	.163	.101
48	.124	.063
64	.265	.265
80	.124	.063

Sure, a 1-point or 1% difference may be important enough to you, but 0.11%?

I spent time constructing this post while considering whether to level corsair to 75. (I won't but not based on what I found in this post. Ultimately I'd rather buy an account with a ready-to-play COR75 than waste time leveling another job to 75.) Take-home message: do whatever the hell you want as long as you can support it logically.

Thursday, June 18, 2009

Milestones

Somehow I managed to update sporadically this testament to a lack of priorities for almost one year. To be honest, I wrote foremost for myself so I didn't really bother to spend much time making these entries easily digestible for a wider audience. In particular, I found an excuse to apply some basic probability and statistics to FFXI, the playing of which is also a testament to a lack of priorities and extremely poor taste. On the other hand, I did try to focus my attention on the mechanics of the game so that these entries would have some informative value--at least some people thought so--instead of being just some masturbatory self-chronicle.

I would really like to maintain this conceit, anyway, but there is not much of an empirical mentality among the so-called playerbase to provide persuasive support for any theories that are developed or serendipitously discover non-trivial things about how the game works. Maybe some dead-enders take great pride in running their mouths without putting their bullshit to the test, but I find legitimizing claims with real data and observations to be far more interesting. I thought about turning this blog into a "digest" of sorts to summarize both new and old findings and give credit to the individuals that shed some light on some aspect of game mechanics, but actually it is just too tiresome to comb forums with crap search functions and garbage "intellects" for shiny nuggets of insight. So I will just continue talking about things that interest me enough to commit to blog, even though the posting frequency based on that criterion will be very low.

Anyway, this wasn't supposed to be just some navel-gazing exercise. Instead of making individual posts for the following topics, none of which really warrants standalone status, I decided to throw them all into a single entry.

Thoughts on Aspir

I managed to finish collecting some data to check the effect of Pluto's Staff and magic accuracy +12 on the potency of Aspir and updated the dot plot:

I think it's safe to say that INT or magic accuracy (MAB too, based on tarblm's results) have no role in the potency of Aspir (and by analogy, Drain). Note that I am not bothering with statistics and just arguing informally that none of MAB, macc, and INT increased the maximum in these samples.

As far as accuracy is concerned, that is much more inconvenient to check. The main purpose of my collecting data was to visualize the distribution of Aspir values. It seems here that the range of possible values of unresisted Aspir is fairly wide. The low values of Aspir observed may also be the result of a half-resist, which may cut the unresisted Aspir value in half. Unfortunately, you can see how partially resisted Aspirs are easily confounded with unresisted Aspirs if these ideas are true. Going back to tarblm's data though, the observed Aspirs generally have much greater variability, which could be attributed to partial resists.

Chocobo racing

Haven't talked about this in a while. A few months ago I actually canceled my content ID, but I let myself get pulled back into this pit of mediocrity that is FFXI. Since then I made some effort to maintain more detailed information on my Chocobo Circuit results, particularly whether my chocobo was competing against other PCs.

Not surprisingly, few of my C1 races were uncontested. In fact, 16 of the first 20 after I returned had at least one PC chocobo and 7 of those 16 had 2 PCs. I won only nine of those races, with a pretty abysmal 3-3-1-1 record against only one other PC.

During this time I came across a testimonial of another chocobo racer, also with a SS/B/B/B chocobo, who claimed to have won 67% of his races (128-41-22). This kind of pissed me off because farming chocobucks is extremely boring and here this guy was getting nearly 2 million more gil with the same chocobo profile and similar number of races. I thought perhaps he faced much less competition on his server and that his use of leather saddles may also have been a factor in his great success. But rather than shell out another $25 bucks to SE just for a server transfer, I tried the Sheep Leather Saddle for another month to see if taking the receptivity hit would be worth it in races with one or more other PCs.

My results were even worse with the leather saddle. In 20 races, I went 8-10-0-2 with 11 contested races, and in four of those contested races, an NPC chocobo won (I placed 2nd in all four) and in one case, two NPCs actually placed 1-2. (I finished out of the top 3 in this one.) Even worse, I won only four of the uncontested races, races in which I really "needed" to win to blunt the annoyance of losing chocobucks.

I then went back to the elm saddle and am now nurturing a nine-race winning streak (four contested), by far the longest streak I ever had. During this streak, I also reached the 10-million gil mark in net earnings. I am now going out of my way to race only in "off-hours" time slots to try to get my win rate back to 50%. My current record is 107-78-24-15.

Having reached 10 million in net earnings, I calculated an approximate rate of gil per hour earned in chocobo racing based on the time (one free race per 5 minutes) and gil spent to accumulate enough chocobucks (5,846) to enter the races and the gross earnings. Including this nine-race winning streak, chocobo racing has yielded an average of 74,644 gil per hour. (I admit this figure does not include time spent running to Chocobo Circuit.) Not as efficient as the guy winning 67% of his races, but still a nice reminder that even though chocobuck farming is a real pain in the ass, at least this huge barrier to entry allows chocobo racing to provide a steady income to those who actually put up with it... assuming the C1 races aren't so congested.

Thoughts on the possibility of chain 6 solo on Ebony Puddings without Novio and Manafont

I was extremely disappointed to find out that I had died 957 times between Adventurer "Appreciation" 2008 and A.A. 2009, the majority on black mage. (If I am really appreciated, can I get a Chocobo Wand in fewer than two weeks without being lucky?) I had considered myself more risk-averse over the past year (I did not even do Dynamis at all), but in retrospect this was not true, since you tend to die a lot in pickups, soloing, and poor event linkshells.

The pain of losing experience points on black mage (never mind the bullshit conceit that losing experience points on top of losing the time and resources you wasted is a reasonable penalty) can be blunted somewhat with efficient rate of gain of EXP. But what is considered efficient? From experience, the best I can do is around 8,000 EXP/hour (estimated by the time required to burn off an Emperor Band), and that's when being somewhat vigilant about achieving chain 5 (which is trivial if you are paying attention). Anyone who claims rates of 10,000 EXP/hr solo is full of shit until proven otherwise.

Is it possible to achieve chain 6 solo without Novio, though? By the time chain 5 rolls around I almost always have insufficient MP without Aspir to mount a chain 6 attempt, an indication that my maximum MP is not high enough. Moreover, even three "tier 4" nukes tend to leave a sliver of HP (on off-weather days), meaning that I have to rely on Drain to finish off a pudding. Casting Drain on chain 4 and 5 wastes MP that could be used for chain 6. These factors, along with half-resists and weather effects, conspire to make it really difficult to achieve chain 6. If it is easier than I am pondering, I'd like to know though, but not from shit Morrigan's users who still cast AM II on puddings.

Benevolent Despot

With the advent of Fields of Valor, the prospect of not spawning Despot in a timely fashion is less unpalatable since a training regime awards 1,550 EXP in about an hour of killing 11 placeholders. And maybe those tabs will actually be useful someday. Also, soloing without desirable rewards has grown pretty tiresome. I still have a Fenrir solo (ninja) on the back burner now that Lunar Roar supposedly does not dispel reraise, but I haven't been motivated to do that yet. At the moment, Despot is the only remotely appealing "get out there and kill shit" profit opportunity with a relatively high barrier to entry (actually being able to solo it under 30 minutes to lower the chance of vultures finding you and trying vainly to MPK you).

Though not quite as enjoyable as killing Despot and hardly a consolation prize, watching hapless groups kill Despot can provide some humor to brighten the day. Not a few weeks ago, I had the "pleasure" of witnessing this quartet of PLD/NIN, NIN/DRK, BLM and MNK struggle with Despot for over 40 minutes! Apparently, it didn't occur to these people to shed enmity via teleport in order to expedite the kill. Unfortunately, even inept players win eventually, a testimony to the dominance of the lowest common denominator in FFXI. Throw infinite time and resources at something and you can triumph! (Except for Absolute Virtue.)

Spending time farming gems of the west also has given me an opportunity to "get back" at the MPKing Tarutaru Duo That Shall Not Be Named by dispatching Despot in 22-27 minutes while those oblivious assholes continue to kill placeholders long after I'm gone.

Thursday, June 4, 2009

Aspir data and observations

Edit (June 5): some attempts at clarification.

What affects the accuracy of dark magic? Skill only? How about potency? How can we describe the distribution of MP absorbed with Aspir? What data are out there to support the prevailing assertions? Finding old Aspir data sets (from 2006) was easy enough, and I also collected some Aspir data on my own.

This may seem like treading old ground if not for the B.S. I cited earlier in the week. At least there is some data you can cite when making an argument now.

Regarding the 2006 data set, the writer (whom I will call "tarblm" for now) collected Aspir data from King Buffalo (lv 79-82) over six trials. Each trial involved only a single buffalo. The data were collected under one of three configurations, all with a Pluto's Staff:

"Dark magic skill": +40 dark magic skill (above 269) was the primary factor
"MAB": +30 MAB from equipment (relative to control) was the primary factor
Control: 269 dark magic skill

The writer also noted the experience gain for each buffalo. The data are presented in a dotplot:

First, the data give the impression that some amount of dark magic skill increases the average MP absorbed, whereas MAB doesn't, confirming previous beliefs. Certainly the maximum Aspirs are higher. Low values of MP drained seem fairly rare and set apart from the rest of the data, so describing the data as coming from a uniform distribution doesn't quite work.

Was "effective" magic accuracy capped on "very tough" buffalo? I would say yes. Otherwise, level difference would have confounded the results. If there is no difference in accuracy across the trials, one could attribute the average to an increase in so-called potency alone.

To try to avoid that uncertainty about capped magic accuracy for my data, I focused my attention on low-level Tunnel Worms and collected 50 Aspir samples for each of the following conditions without a Pluto's Staff:

Control: 77 INT, 269 dark magic skill
INT: +43 INT above control (120 total)
Dark magic skill: +22 dark magic skill above control (291 total)

Since the worms are so low in level, I just assumed my effective magic accuracy was capped. This is a major assumption but a reasonable one given magic skill level. (Correction: I earlier used a level correction argument to make this assumption. It has not been established that level difference affects mobs in the same way that it affects PCs.) The data are illustrated in the following dotplot:

Similar to tarblm's data, some amount of dark magic skill seems to increase the average MP absorbed, although the increase is not statistically significant. As magic accuracy was probably capped in this scenario, it is probably safe to say that dark magic skill would increase potency by a statistically significant amount if I had more dark magic skill to pile on. In terms of the distribution of Aspir, it seems to shift the range of possible values to the right.

INT doesn't seem to cause any change in potency. That is not to say INT doesn't affect accuracy in some way! Low values of MP (here, below 50) were infrequent and I assure you they didn't result from capping total MP. A decrease in accuracy may manifest in a higher frequency of low values, resulting in a lower average if not a shift in the range of possible values.

Note that last time I gave an example of a data set (on FFXIclopedia) that showed INT increased the average Aspir, and I said this was a potency-only effect based on the assumption of capped accuracy. Perhaps there are other confounding factors that were not cited.

Also, data overall give an impression that Pluto's Staff affects potency (one, the 2006 data are more variable, Aspirs achieve higher values despite King Buffalo being 62-72 levels higher than Tunnel Worm).

From all this, it seems reasonable to conclude that

Dark magic affects potency (still not sure about magic accuracy attribute, e.g., from equipment)
MAB does not affect potency
INT does not affect potency but it seems likely to affect accuracy in some way. Do not confuse accuracy with potency. In a sense, increasing one or the other should still increase the average drained up to a point, but the way each does that is different. An analogy to melee attack and melee accuracy should make sense.
Magic accuracy probably affects accuracy but not potency, but I haven't found nor collected any data to check this.

You may also have noticed that the maxima and sample means in the 2006 data set are larger than those in the 2009 set. I'm pretty sure the difference can be attributed to Pluto's Staff.

Tuesday, June 2, 2009

Dark magic and INT

This is just going to be a quick and dirty post but may motivate a more enlightened post later, but just to reinforce the ignorance exhibited by the "player base," here's another chuckle-inducing, basically worthless discussion about what affects the accuracy and potency of Aspir. Here's another discussion from idiots talking about "tests" on the potency of Aspir, but where's the fucking data? Here's an obvious question. Did anyone ever actually disentangle accuracy from potency in a controlled experiment? Or how about, if you are at some hypothesized potency cap, why would you expect potency to increase with anything?

I mean, really, assertions like "Accuracy of [Aspir] is most highly affected by Dark Magic Skill, and is not affected by Magic Attack Bonus, or INT" are based on something rather than pointless anecdote, right?

Actually, in the talk discussion of that FFXIclopedia article on Aspir, there seems to be something like a controlled experiment with random sampling, with a RDM75 (200 dark magic skill) casting Aspir on a single worm between level 10 and 12. Assuming that "accuracy" is capped in some sense on such a low-level target, it appears that INT does affect potency (just do a quick two-sample t). Sure, type I error, blah blah, but it's better than total bullshit.