The Unbearable Triteness of Preening

Tuesday, May 26, 2009

pDIF and obsession with polynomial fits

A recent thread on the BG forums about an investigation of "cRatio for two handers" really underscores the ignorance coming out of the "playerbase" that actively chooses to post on forums. To wit:

You got some motherfucker implying the "center" of advanced knowledge about game mechanics lay among those who were banned for Salvage duping, when the "center" is actually maybe 10 people at best, and not all are necessarily English-speaking, much less posters on BG.
Someone rightly points out that said motherfucker is some obsequious cock-gobbler (ok, those are my terms) since information on pDIF has been outdated since the "2-hander update" (well before the bannings) yet no one has actually bothered to do an honest investigation.
Another one actually bothers to collect some data on damage frequency to see what kind of distribution the data follows, but is easily derailed with a fetish for polynomial fitting to the data and data following a normal distribution (polynomial fitting and normality are contradictions as I will discuss soon).

This inexplicable obsession with polynomial fitting and normality is misguided for several reasons:

Normal distributions have obvious tails at the extremes. Moreover, the tails are neither too short nor too long. The data do not show evidence of any real tails.
Normal distributions are not parameterized by extrema (minimum and maximum). The parameters are the mean (center) and variance (spread).
A second-order polynomial fit cannot "account" for tails. This is obvious because normal distributions have inflection points. So you cannot use a polynomial fit and argue for "normality" at the same time.
Coefficient of determination can be thought of as a summary of a model fit. It doesn't mean the model is actually good. You can draw a squiggly line through all the data points and that will give you a R² of 1, but that would be a terrible and useless model. Polynomial fitting is similarly terrible and useless for the above reasons.
Why even bother with any kind of fitting? As long as the distribution is symmetrical, at least you know the minimum, maximum, and median (same as mean) for pDIF given some value of cRatio, so you can use an expected value argument for long-run damage.

That said, there were some useful comments about the "shape" of the data. One poster actually suggested the data may follow some trapezoidal distribution. This is actually quite plausible under probability theory!

Obviously, the data do not appear to follow a uniform distribution. Even acknowledging the discreteness of damage (due to rounding) such that the minimum and maximum might be observed rarely, a uniform distribution of pDIF (NOT DAMAGE) is not all that likely. Although we cannot observe the uniform distribution of pDIF directly, we can observe histogram of damage (NOT pDIF). For this histogram, if pDIF were actually uniform, one might see an extreme "discontinuous" jump from the minimum to the minimum plus 1, or from the maximum to the maximum minus 1. In other words, the underlying true distribution of damage (NOT pDIF) would appear to be uniform except at the endpoints.

However, from probability theory, it is known that

The sum of two uniform random variables with the same variance (regardless of actual minimum and maximum) follows a triangular distribution
The sum of two uniform random variables with different variances follows a trapezoidal distribution (so a triangular distribution can be thought of as a degenerate trapezoidal distribution)

How can I argue that the underlying random component of melee damage could follow a trapezoidal distribution?

Does anyone actually expect the "developers" to have done anything particularly fancy with pDIF? In many cases, random number generation is basically "sampling" from a uniform distribution, usually Unif(0,1).
If pDIF does follow a uniform distribution (conditional on cRatio) for some ranges of cRatio (I realize this has been shown not to be the case for certain ranges of cRatio), there could easily be another random component to introduce "jitter" into the damage calculation, which would increase the variability of damage output yet keep the mean the same. This would account for the "1.05x correction" on maximum damage I've seen bandied about from time to time.
So, there could be an "effective" pDIF that includes jitter.

To illustrate the plausibility of the last two points, I simulated 9,885 realizations of non-critical melee damage given 55 "base damage," with pDIF that follows Unif(1, 1.8) and a "jitter" component that follows Unif(-0.1,0.1). Here, the random components are summed together so that the end result is that the fake data are trapezoidal. I plotted the frequencies and I also drew a nonsense curve through all the data points (in blue).

Yes, this is completely fake and is not meant to demonstrate the truth of anything, but merely the plausibility that pDIF follows (or has followed) a trapezoidal distribution for some values of cRatio. I even put in a second-order polynomial trend line, which has a very high coefficient of determination, which shows that R² cannot say anything about whether the model is even appropriate. Here, we know it is grossly inappropriate because I know what the underlying probability model is.

Here's the R code for the simulation. Data were exported to Excel. And so goes an hour of my life.

N = 9885
a = trunc(55*(runif(N,min=1,max=1.8)+runif(N,min=-.1,max=.1)))
dmg = seq(min(a),max(a))
N2 = length(dmg)
dmg.counts = numeric(N2)

for (i in 1:N2) {
dmg.counts[i] = length(a[a==(i+min(a)-1)])
}

Saturday, April 25, 2009

Acknowledgement of comments received

Just a brief post acknowledging that I have read two comments made in the past few months when I wasn't doing anything with this blog.

Comments on data table header translations of Lodeguy's data. Technically, I did not really translate anything as I don't know Japanese (I didn't say I was translating anything), barring being able to read katakana, simple phrases and basic kanji.

Comments on my criticism of an alternative analysis of paralyze data. I do not have issues with the original analysis. The secondary analysis I nitpicked past its veneer of soundness. I made this post on a "whim," which kinda goes to show I seek out "fun."

Thursday, April 23, 2009

Expected magic damage in terms of accuracy - more parlor talk

In some sense, having to hoard magic accuracy for so-called high-resist targets when nuking (or enfeebling, etc.) is an all-or-nothing proposition for various reasons I just made up that may incidentally be shared by others. One, these high-resist targets have such high magic "evasion" that it is completely untenable (from experience or whatever) to nuke with the same equipment you would use for Ebony Puddings. Two, even if the target of interest is not quite as resistant as, say, a wyrm or sky god, it can be difficult to ascertain what amount of magic accuracy is acceptable to reach some threshold (say 90% acc.) without personal experience (or the experience of others). Three, compared to mindless melee auto-attacking and WSing, nuking specifically is inefficient from the standpoint of theoretical damage dealt in the "long run" (MP being an important limiting factor), so it seems pragmatic to accept unconditionally the trade-off of lowering maximum magic damage for fewer resists when there is any doubt.

If there comes a time where it is easier to ascertain the magic evasion of any mob of interest (probably never given the FFXI "team's" fetish for making basic game mechanics as opaque as possible, and lack of information sharing among the "playerbase"), perhaps it can be useful to quantify the difference in overall magic damage between a "high-resist" setup (with the purpose of maximizing magic accuracy) and a normal setup for resistant NMs and whatnot. But realistically this is just another parlor talk.

A long time ago, I argued that levels of resistance for a single "nuke" can be modeled by a one-parameter categorical distribution, with the parameter being the probability that a nuke is not resisted at all (full damage). This probability will be called "overall magic accuracy" for the remainder.

To reiterate, the distribution can be described as

no resist: π
1/2 resist: π(1-π)
1/4 resist: π(1-π)²
1/8 resist: (1-π)³

This assertion was based on prior observations by me and others that multinomial count data for nukes, categorized by level of resist, seemed to conform to such a pattern. (I will not discuss the speculated motivation for the programmers to use this model, assuming it is true.) If this is a reliable model, it seems reasonable to think about the effect of overall magic accuracy on magic damage in terms of expected value.

Ignoring rounding, let X be the actual damage of a nuke (subject to being resisted) with unresisted damage D. The expected value of X can be expressed as

E[X] = D[π+0.5π(1-π)+0.25π(1-π)²+0.125(1-π)³]

Based on this expression, overall magic accuracy can be thought of as attenuating the unresisted damage of a single nuke in the long run, multiplying that damage by some factor less than 1 that is a function of π. Therefore, in making some assessment of overall magic damage as a function of magic accuracy, we don't have to consider the actual distribution of resists given π, just as players calculating physical damage don't consider the distribution of pDIF given a ratio of attack to defense.

Just as magic accuracy attenuates unresisted magic damage by some factor less than 1, magic attack bonus (MAB) amplifies magic damage by a factor greater than 1. This is illustrated and summarized with the following graph plotting these factors described (for magic accuracy and magic attack):

As you can see, when "long run" magic damage is considered, there is decreasing return to overall magic accuracy, π (the expected value computed earlier is a third-order polynomial with respect to π), and constant return to MAB. The endpoints also make sense, too. If you happen to have 100 MAB, your overall damage is twice as high. If you happen to have 100% overall magic accuracy (recognizing that this is impossible in FFXI for nukes), then there is no attenuating of your potential magic damage.

Using the model for levels of resistance I described, it is possible and simple to estimate the percent change in long-run magic damage between two equipment setups of interest. Suppose you have a normal setup with +70 MAB such that you know will achieve 60% overall magic accuracy on some target of interest (this means in the long run 60% of your nukes will be unresisted) and you are interested in assessing whether utilizing your "high-resist" setup is worth the tradeoff in potential damage. Suppose your high-resist setup has +63 MAB and +26 more magic accuracy than the normal one.

At this point, there should be no need for quantifying the relative performance increase, but perhaps you want to quantify it anyway.

Since the magic damage "formula" is just multiplying various factors together, it is easy to calculate a percent difference that is independent of base damage, INT, weather effect, etc. (all of which could be considered constant). One needs merely to identify the multiplicative factors associated with MAB (+63 and +70) and m. acc (60% and 86%). Through direct calculation,

(1.63)(0.924757)/[(1.70)(0.752)] - 1 = 0.179

In the "long run," the overall damage using the "high resist" setup will be almost 18% higher than that using the normal one.

Again, is this useful or practical? Not really. But it could serve as a theoretical framework for "theorycrafting" (oh how I hate MMORPG-related jargon).

Tuesday, April 21, 2009

One more time

I am fairly amused that the conclusions from lodeguy's magic accuracy experimentation and my data analysis have been used to support the shibboleth of "320 skill/120 INT" for direct-magic damage (just browsing FFXI forums periodically). Maybe "shibboleth" is too strong a pejorative, since at least this rule of thumb acknowledges that INT contributes to overall magic accuracy (even though this acknowledgment seemed to be supported mainly with anecdotes and collective experience rather than formal data collection).

Should we really care about attaining 120 INT?

As you may recall, lodeguy gave us data that suggest (informally) a critical point for ΔINT (caster's INT minus target's INT) that "connects" two distinct regimes of rate of change of overall magic accuracy with respect to INT. To summarize, before ΔINT +10, the rate of change is estimated to be 1% per 1 INT (actually a little less from statistical significance testing), and between ΔINT +10 and ΔINT +30, 0.5% per 1 INT. I only emphasize this range because there is no data to show what might happen beyond ΔINT +30. (Moreover, there was no data to suggest, as far as I can recall, the effect of INT below 50% overall m. acc. But, realistically speaking, no one is ever going to investigate these issues. This is the best we will ever get, probably.)

With that in mind, it might be interesting to get some sense of whether 120 INT is generally suitable in "endgame" to reach the second ΔINT range with the slower rate of change. To do this, one must compare 120 INT to the INT of various "endgame" mobs.

Regrettably, information about mob INT from English-language sources is either poorly documented (sequestered in obscure FFXI forum posts) or almost non-existent (seriously, does anyone give a fuck about anything other than Ebony Puddings?), and this annoyed me to the point that I attempted to calculate the INT (as well as magic defense bonus, or MDB, and reduction of magic damage taken, or MDT-) of various mobs that I faced over the past few months to get a sense of whether I was surpassing ΔINT +10 most of the time. As I said in the last post, magic damage is deterministic (level of resist is random), so it should be fairly straightforward to calculate mob INT in many cases. Of course, I could have made calculation errors or overlooked level variability for specific mobs. I will leave it to others to verify or refute my calculations.

There isn't much variety in what I do in FFXI, though. All I have is data for mobs in NW Apollyon and those for various ZNMs. First, NW Apollyon:

Monster	INT	MDB	MDT-
Bardha	75	0	0
Pluto	82	0	0
Mountain Buffalo	60	0	0
Apollyon Scavenger	62	0	0
Gorynich	72	0	0
Kronprinz Behemoth	74	0	0
Kaiser Behemoth	???	???	???

As you can see, most of the "normal" mobs have low INT so that ΔINT +10 is easily cleared. As for Kaiser Behemoth, I didn't gather enough information, but I am pretty sure it possesses some combination of MDB and MDT- traits. I also collected similar data on some ZNMs I fought several months ago:

Monster	INT	MDB	MDT-
Lil' Apkallu	60	0	1/4
Verdelet	115	0	0
Experimental Lamia	89	0	1/8
Mahjlaef the Paintorn	112	0	1/4
Cheese Hoarder Gigiroon	81	0	0
Vulpangue	78	0.20	0
Dea	62	0	0
Iriz Ima	70	0	0
Gotoh Zha the Redolent	92	0.28	1/8
Tinnin	85	0.20	0
Achamoth	65	0.16	0

Here, MDB is reported in terms of amount above 1.00. MDT- is reported in terms of fractional reduction of magic damage.

Other than Verdelet (an imp) and Mahjlaef the Paintorn (a soulflayer), all of the ZNMs have INT such that ΔINT is well above +10. Therefore, from the standpoint of optimizing overall magic accuracy (given what we know), it seems practical to exchange INT in excess of ΔINT +10 for elemental magic skill or magic accuracy. In particular, this could be useful for Tinnin, which seems to have higher magic resistance than the "lower-tier" ZNMs (probably a result of level difference) despite having "only" 85 INT.

Moreover, there could be some patterns to mob INT despite the limited information available. Beastmen and other "sentient" mob types (particularly soulflayers and imps) could have higher INT in general than other types. Magic users have higher INT in general than non-magic users (I will treat this as self-evident).

But concerning the main question, it appears, at least for most ZNMs that are worth nuking and mobs in NW Apollyon, that ΔINT +10 is surpassed most of the time. If you happen to get close to 120 INT incidentally, that's great, but not necessarily at the expense of possible improvements to magic skill/magic accuracy. For example, Dea has only 62 INT, but it is still prone to resisting Thunder IV (compared to Blizzard IV). Therefore, it would be appropriate to use Sorcerer's Petasos instead of Demon Helm +1 for the sake of improving accuracy.

None of these mobs even have INT above 120, so it's not like you would get much of an improvement to resist rates whoring INT (such that ΔINT +10 is satisfied) compared to whoring magic skill/accuracy (all things being equal).

So what about beastmen "kings" and HNMs? Bahamut ("The Wyrmking Descends") is reported to have 115 INT (from Studio Gobli, if you can actually find the documentation). (Bahamut is sentient, right? Check.) Also Jormungand is reported to have 120 INT (also from Studio Gobli). (Perhaps the example of Jormungand motivated the 120 INT figure?) Other than that, I have no other information.

Anyone can calculate mob INT, but...

... magic defense bonus (MDB) and reduction in magic damage taken (MDT-) can get in the way of calculating INT. These factors may play a role in determining overall magic damage for things like Sarameya and Tyger. Without knowing MDB and MDT- and considering the incessant flooring involved in these calculations, it is somewhat difficult to arrive at a unique set of MDB/MDT-/INT that allows you to calculate magic damage exactly without using formal optimization methods, and I am not interested in doing that.

However, this post offers some very useful facts to determine what exactly a mob's potential MDB or MDT- is. In particular,

1000 Needles is not affected by MDB.
Quick Draw is not affected by MDT-.
Damage calculations for both are independent of mob INT.

Unfortunately, I don't have access to blue mage or corsair, but these tools would be very useful if I had access to them. Practically speaking, it doesn't seem particularly appropriate to do this kind of testing during "serious" events (how seriously do you take Proto-Ultima?), but your mileage may vary (enough with the cliches!).

Saturday, February 7, 2009

The seed

First, Tarutaru Times Online no longer indexes new blog entries. Second, I have decided to stop writing about FFXI in general, so this blog will not be updated further.

However, I would like to share one last thing that I am looking into at the moment. I am calculating the INT of various "Zeni Notorious Monsters" based on data I've collected over the past few weeks.

Frankly, calculating INT should be a trivial exercise since potential (maximum) magic damage is not a function of a random variable, but I wasted a lot of time looking into the effects of magic defense bonus. Before considering possible MDB, you should first consider reductions in magic damage taken. Most of the time any overall reduction in magic damage is the result of a reduction in magic damage taken, not magic defense bonus. I am not sure to what decimal place the game rounds the ratio of MAB/MDB, either, so I would mess with that only as a last resort. (To the hundredths place? Thousandths?)

To give an example, Apkallus in general appear to take a 25% reduction in magic damage (64/256). Lil' Apkallu, then, also takes a 25% reduction in magic damage. Knowing this, it is easy to confirm what Lil' Apkallu's INT is. Other ZNMs also appear to take a similar magic damage reduction.

As another example, Verdelet does not have any magic damage reduction, so it's even easier to calculate its INT.

However, I will not give out these values as I do not feel particularly compelled anymore to share basic things to a collectively ignorant "playerbase." Fine, not everyone understands basic statistics and probability, but anyone can gather data; it's even easier to gather data automatically with a parser.

Regrettably, the magic damage formula on FFXIclopedia is incorrect in various ways, most critically the application of rounding and the order of factors that contribute to the calculation. Refer to wiki.ffo.jp for the correct expression.

Tuesday, January 27, 2009

Tears of a clown

So that this post is not a complete waste of time, unlike the vast majority of hand-wringing and gloating cluttering TTTO over the past week post-banning (as opposed to the usual gloating and preening about acquired equipment and "accomplishments"), there were some more interesting results from lodeguy's experimentation that I did not address previously but are still interesting (they just didn't require any statistical techniques to analyze).

For direct-magic damage, does weakness to a certain element guarantee half-resists at worst?

Lodeguy demonstrated that there is a case where elemental weakness--specifically a Fire Elemental's weakness to water--guarantees half-resists for direct-damage magic at worst. He never observed anything worse than a half-resist casting Water magic over 1,000 times on a Fire Elemental. Remember that the usual distribution of resists is easily predicted (well, if you have some estimate of an "unresisted" rate to begin with), and that the proportion of quarter or "full" (1/8) resists when effective magic accuracy (no resist) is .30 is expected to be almost .50.

This observation probably does not apply to high-level Notorious Monsters with "known" elemental weaknesses. If it were true, there would be little incentive to maximize magic accuracy for such targets. I think players would generally accept the tradeoff of having few unresisted nukes in exchange for guaranteeing at least half-damage. But perhaps there is more to this phenomenon than what has been observed.

Again, I do not read Japanese so there may be something important in the discussion that I overlooked.

What is the maximum effective magic accuracy?

It is obviously 95% for direct-magic damage. Again, under the usual distribution of resists, the proportion of half resists when effective magic accuracy is 95% is .0475; for quarter resists, .002375. Note the rare quarter-resist event. A "full" resist with magic accuracy capped is even rarer, with hypothetical probability .000125, or basically a 1-in-10,000 event.

Now that that's taken care of, some idle thoughts on the duping-related bans.

Tears of a clown - my, my, how the ressentiment flies

It is obvious to me that the point of any carte blanche clause in a "terms of service" you may observe is so that the service provider has an easy out to rid itself of undesirables. I do not really care that SE reserves a prerogative to regulate its own product through banning of accounts.

Yet although SE certainly does not need to justify any bans or suspensions it metes out, the mere appearance of uneven application of "punishment" (regardless of the fact that punishment was unevenly applied) makes FFXI look even more of a joke than it already is.

That this exploit remained in place for over 18 months is enough of a joke. Never mind SE's execrable neglect of widespread RMT activity (considerably more pervasive than anything to do with Salvage) left unabated for years in a MMORPG whose conditions were extremely conducive to RMT activity (despite SE reserving carte blanche to terminate accounts), the "nerf" to Pandemonium Warden and Absolute Virtue in response to negative publicity, etc., etc.

And, anyone who thinks anyone at SE wasn't cognizant of the duping exploit at any time during the 18 or so months before the lead-up to patch-and-ban is an idiot. There is always some "snitch" who would report such a thing. I say snitch in the vein of someone who does the right thing mostly out of impure motives, like "seeing bitches get their comeuppance."

Face it, no one gives a fuck about the "integrity" of a consumer product/service like a MMORPG except immature 39-year-olds who deign to waste all their time fulminating about a trivial thing.

Of course, this doesn't stop idiots with no sense of proportion from riding SE's dick any chance they get. That SE should even pursue damages in court for duping in a video game to keep players "honest" is simply a farcical notion. Acting like SE is some poor besieged entity in a game rife with whining entitled players---most of whom, at the end of the day, despite crying about "intolerable" low drop rates in an endgame activity they voluntarily entered into, still waste their money on FFXI--is also a joke. SE and the "player community," you deserve one another.

At the end of the day, SE can merely point to your monthly credit card statements, cheater or not, and simply say, "monthly fee." Money is the prime mover, although SE sometimes seems not to act like it is.

Thursday, January 8, 2009

Mr. Decay

This post will be a potpourri of topics.

Chocobo racing - Crystal Stakes results

As of this post I've raced my (good) chocobo (SS/B/B/B) 112 times in the Crystal Stakes (C1) and obtained the following results:

1st: 52
2nd: 40
3rd: 16
4-8: 4

Total: 112

I haven't seen much information on results with other chocobo configurations, except from this one forum post (chocobo attributes unknown):

1st: 27
2nd: 12
3rd: 10
4-8: 7

Total: 56

Now, I have no idea how often this other chocobo faced competing PC chocobos, but seeing another chocobo's results helps to provide some more perspective.

Is B receptivity a good hedge if it means placing 2nd relatively more often than a chocobo with lower discernment, "all things being equal" (which never happens, but let's just finish this filler post)? One one hand, I'd rather place 1st more often at the expense of placing 3rd or lower more often, as in the long run the return could be better. (One way to think of it is that it's better to place 1st and 3rd in two races than 2nd both times.)

On the other hand, I don't like farming chocobucks.

Sure, you can calculate expected gains and losses of chocobucks per race, but I won't do it because it won't motivate me in any way to raise another chocobo.

Magic accuracy - does weather and day have an effect?

Earlier (see previous post), I observed that the (effective) magic accuracy of Paralyze seemed not to be (statistically) significantly affected when Paralyze was cast during Firesday and Iceday. I tried to search for more data sets, but I didn't find anything meaningful.

Lodeguy himself seems to have said neither day nor weather have an effect (too lazy to find the actual quote), but, really, since his goal was to measure changes in (effective) magic accuracy, whether or not there is a day/weather effect (that he didn't control for and is not practical to control for) doesn't matter all that much considering the effect, if it exists, processes only 1/3 of the time. (I haven't verified this myself though.)

Anyway, I guess I could operate under the assumption that weather and day do affect resist rates. But are the effects of day and weather on accuracy (if they exist) the same in magnitude as the effects of day and weather on damage?

If you wanted to test this assumption and you have a scholar, you could see whether single weather and day combined drastically increase the accuracy of nukes of the same element. (You could also check for the reduction in accuracy of nukes of the opposite element.)

Laziness dictates that I should do a basic statistical power calcuation to obtain the number of Bernoulli trials needed to observe that a possible 20% increase in effective magic accuracy is statistically significant, given a Type I error of 5%:

Computed N Total

Actual        N
Power    Total

0.801      166

This conservative (but one-sided) power calculation (details omitted) indicates I need a total of 332 samples (166 for the trials without the effect of weather and day, and 166 for the trials with the effect) to observe statistical significance (using Fisher's exact test) with a probability of .8. And this probability assumes that this 20% increase (or reduction depending on your approach) is real.

But, I would have to make sure my effective magic accuracy, without the effect of weather and day, is somewhere above 50% and less than 75%. Based on lodeguy's data, one could figure this out for Earth Elementals (...) or, better, a Qiqirn ranger.

If weather has an effect on magic accuracy, why does Klimaform exist?

The English description of Klimaform states that the ability "[i]ncreases the magic accuracy for spells of the same element as the current weather." This statement does not really imply an existing accuracy bonus from weather before Klimaform, nor does it really imply no weather bonus before Klimaform.

Magic accuracy - revisiting data sets other than lodeguy's

A long time ago I looked at this data set and then just glossed over it while talking about lodeguy's results. But "intellectual honesty" compels me to attempt to explain the results of this other data set.

Actually I do not recall all the experimental details, but I "hope" the Ebony Puddings targeted were at the infamous Mount Zhayolm experience "camp." There, Ebony Puddings have a level of 79 or 80. (Incidentally, I noticed that these flans provide a experience point bonus of 5%, which I could not find corroboration for on FFXIclopedia.) Then that makes the observed data more "plausible."

First, it would be pretty obnoxious to say that the effect of magic accuracy increases with skill level without even acknowledging the imprecision of the estimates. If you are going to claim that, then you have to claim that one point of magic accuracy input gives an effective magic accuracy increase well above 1%, as shown below, using the nuke data from "Test III" and "Test IV" together (without the INT observations):

                            Analysis Of Parameter Estimates

                              Standard     Wald 95% Confidence       Chi-
Parameter    DF    Estimate       Error           Limits            Square    Pr > ChiSq

Intercept     1     -3.2974      0.6462     -4.5639     -2.0309      26.04        <.0001 skill         1      0.0143      0.0022      0.0099      0.0186      40.56        <.0001 macc          1      0.0179      0.0027      0.0126      0.0232      43.27        <.0001

Not only can you not argue that macc is "better" than skill, you also cannot really say with a straight face that 1 point of magic accuracy input increases effective magic accuracy by some value well above 1%. That is just ridiculous on its face.

One possible explanation for the data is that the level 79 and level 80 Ebony Puddings were not targeted in roughly equal proportions; in the worst-case scenario, puddings of one level were inadvertently targeted exclusively for "Test III," and puddings of the other level were used exclusively for "Test IV." Since lodeguy provided some evidence of a level difference penalty (or bonus), we should be wary of such a phenomenon when collecting data.

For this data and experimental setting, a potential consequence of severe imbalance in the relative proportions of level 79 and level 80 Ebony Puddings targeted is a "distortion" of the true sampling distributions associated with the "skill" and "macc" effects, "true" meaning that the distributions should have a mean of 0.01.

This can be demonstrated through simulation as a demonstration of the concept. This is not a "proof" of anything, just a whimsical example. Suppose that the difference in level penalty between a level 79 and level 80 Ebony Pudding is 10% magic accuracy. Then, using the worst-case scenario I described above, I can generate approximate sampling distributions (with many, many assumptions) for the slopes associated with the main effects.

The most important assumption for this simulation is that 1 point of skill equals 1% effective magic accuracy, and 1 point of magic accuracy input equals 1% magic accuracy output (regardless of whether this is true in reality, which I think it is).

For elemental magic skill, the approximate sampling distribution has a mean of 0.0154 (not 0.01) and a standard deviation of .00245, which is close to the standard error from the actual data.

For magic accuracy input, the approximate sampling distribution has a mean of about 0.0133 (not 0.01) and standard deviation of about .00334. The standard error from the actual data is not close to .00334, but the concept still shows the "plausibility" of the data. Moral of the story: failing to control for real effects may have deleterious consequences.

As for the apparent (lack of) effect of INT below 50% effective magic accuracy ("Test II"), if 30 INT really corresponds to a 15% magic accuracy bonus (assuming any bonuses are cut in half because of the hit rate penalty), observing no improvement (or worse) is virtually guaranteed not to happen. At this point, I would just keep this result in mind but take it with a grain of salt.

Here's the R code I used to generate the above graphs:

n = 10000

skill2 = rep(0,n)
macc2 = rep(0,n)
skill_se = rep(0,n)
macc_se = rep(0,n)

for (i in 1:n) {
 success = c(rbinom(100,1,.59),rbinom(100,1,.72),rbinom(100,1,.72),rbinom(100,1,.79),rbinom(100,1,.90),rbinom(100,1,.90))
 skill = c(rep(274,100),rep(274,100),rep(287,100),rep(284,100),rep(284,100),rep(295,100))
 macc  = c(rep(0,100),rep(13,100),rep(0,100),rep(0,100),rep(11,100),rep(0,100))
 trials = data.frame(cbind(success,skill,macc))
 model = glm(success ~ skill + macc, family=binomial(link="identity"),data=trials)
 skill2[i] = coef(summary(model))[2,1]
 skill_se[i] = coef(summary(model))[2,2]
 macc2[i] = coef(summary(model))[3,1]
 macc_se[i] = coef(summary(model))[3,2]
}


win.graph(width = 6, height = 4.5, pointsize = 12)
hist(skill2,freq=FALSE)

win.graph(width = 6, height = 4.5, pointsize = 12)
hist(macc2,freq=FALSE)