Genshin Impact: Probability Distributions of Artifact Critical Value

by greenturtle3141, Dec 3, 2021, 1:08 AM

Reading Difficulty: 4-5/5
Prerequisites: Know what an expected value is

I know Genshin is extremely popular nowadays, but I will write this for a general audience. Consequently, if you're reading this, you do NOT need to have played Genshin Impact to read!

Genshin Impact is a relatively new game by MiHoYo.
  • It is an open world game, meaning that it consists of a huge world map to explore. No area is locked for exploration, meaning you can basically go anywhere you want from the getgo (...except Inazuma)
  • It is an RPG. You play as various characters and you can level them up in various ways to make them stronger, so that you can tackle stronger enemies and bosses.
  • Lastly, it is a gacha game, which is one of the more notorious aspects of Genshin. Essentially, this means that there are heavy luck-based mechanics for getting new characters (and weapons). Moreover, you can spend real-life money to increase the odds of getting a certain character you want. This is how they make money, since Genshin itself is a free (!) game.
  • Genshin is insanely popular for the open world, its anime art-style, and (bluntly) cute girls.

Today, I will be showcasing how math can be surprisingly helpful in deep-end Genshin Impact theory. This is my own work, but I won't be surprised if this math has been done before.

Particularly, we'll be exploring the artifact system. An artifact is one of many things that can be levelled up and optimized in order to make a particular character better.

The main question is: How can we compute the probability of getting an artifact that is "as good" as this one (or some other artifact)? We'll need to quantify exactly what this means, but hopefully it's enough to give you an idea of what we're trying to do here.

DISCLAIMERS
1. Genshin Impact is a highly addictive game. Seriously, it will very likely take up a ton of your free time. Do not start playing unless you have a ton of time to kill.
2. The gacha system in Genshin Impact is gambling, which is addictive. Do NOT spend money on this game unless you are a millionaire. If you do spend money, please be careful and be mindful of your spendings. Gambling addiction has destroyed lives, even in gachas, and it can happen to you.
(That being said, Genshin Impact is a breathtakingly beautfiul game, and despite its faults, I'm very happy to have gotten to experience its world. If you've been looking for a new game to play, I would recommend giving Genshin a try. Just make sure you have the time to kill, and be careful when dealing with the Gacha aspects.)

Part 1: Artifact Substat Mechanics and Theory

If you want to cut right to the chase and skip reading this part, here is a picture that summarizes everything you need to know:

https://i.imgur.com/9ZiLcOU.png

(Imgur Link: https://imgur.com/a/OluIYXA)
(Direct Image Link: https://i.imgur.com/9ZiLcOU.png)

I'm including this because this part may be long and boring. But if you want some details and/or motivation, here it is.

The Basics

An artifact is basically an accessory with magical powers or something. This might be a bit overwhelming, but here are all the possible different characteristics of an artifact:
  • Not Important Things: Seriously this isn't important
  • Rarity: There are five different rarities, from 1* to 5*.
  • Level: All artifacts start at Level 0. By doing stuff, you can level up an artifact to increase its effectiveness. The maximum level depends on its rarity, and is given times $4 \times \text{Rarity}$ (in particular, a 5* artifact has max level 20). Higher max levels is one of the main benefits for higher rarity artifacts.
  • Main Stat: This is the primary "magical power/buff" of the artifact. For example, the main stat could be "+35% HP%, meaning that your HP is increased by a factor of $1.35$. The main stat increases as you level up the artifact.
  • Substats: These are the secondary "side buffs" of the artifact. Although they tend to be weaker than the main stat, there are multiple of them and they can contribute significantly to the character's overall power. The substats (kinda) increase as you level up the artifact. Unfortunately there's some nuance here, and in fact it's going to be the main subject of this post. More on that later!

The picture from before should help you digest this.

Critical Hits

If you've ever played an RPG, you probably recognize what a critical hit is. For example, let's say your character normally does $100$ damage. Every time you hit an enemy, there is a certain probability that you deal extra damage. The probability is called the critical rate, and the extra damage is called the critical damage. I'll abbreviate these as "crit rate" and "crit damage", or sometimes CR and CD.

Let's start off with a (not that important) small exercise! By the way, most people like to write $\mathbb{E}[X]$ to denote the expected value of the random variable $X$. Unfortunately, I prefer to write the bracketless $\mathbb{E}X$. I just think it looks cool. Sorry!

Warmup Exercise: Suppose that your crit rate is $\text{CR}$ and your crit damage is $\text{CD}$. That is, every hit has a $\text{CR}$ chance of dealing $1+\text{CD}$ times more damage than usual. If $A$ is the attack you normally deal, then what is the expected value of damage dealt by a single hit?

Solution

The conclusion of this exercise leads to the next concept...

Critical Value

Deep-end Genshin Impact theory dictates that the most important stats to maximize for optimizing any heavy-damage character are the Crit Damage and the Crit Rate! Hence, much of the focus in artifact analysis revolves around these two stats.

Although it is possible to get a crit-related main stat on an artifact, such artifacts are not worth analyzing too much because they are more "clear cut" Elaboration. Hence, we're going to assume that the artifact has a non-crit main stat. Then, the goal is to maximize the "value" you get from the crit substats.

How is such "value" estimated? Deep-end Genshin theory coined a simple formula for a quantity called the critical value ("crit value", or $\text{CV}$) that computes exactly this!

$$\text{CV} := \text{CD} + 2 \times \text{CR}$$
You might be wondering why crit rate (CR) is valued two times more than crit damage (CD). It's not because it's necessarily more important (see the warmup exercise!). The reason is different, but simple: CR substats are hardcoded by the game to be typically half that of CD. That is, more CR is twice as hard to get than more CD.

If you're a non-Genshin player and made it this far, then congrats! Your artifact meta-gaming knowledge is now above average. Pat yourself on the back, take a sip of water, then get up and stretch your legs a bit... because we still have one more subsection to go before we can dive into the math.

How substats incrase with level

Firstly, if you're optimizing artifacts for critical value, then you're probably in the endgame. This means that you're probably not caring too much about optimizing 4* artifacts at this point. Therefore:

ASSUMPTION 1: All artifacts are 5* rarity

Consequently, the max level of an artifact is 20.

When you first obtain an artifact, it starts at level 0. What about the substats? Well, it starts out with either 3 or 4 different substats, based on chance. But that's annoying to think about. Let's assume that you got lucky.

ASSUMPTION 2: All artifacts start with 4 substats

You might be worried about the rigor of this post. Don't be! All these assumptions are in favor of the player. The final probability I get will ultimately be an upper bound anyways, so it will still work. Also, if you study the math, I'm sure you can figure out how to fix it to account for these cases if you really wanted to.

Alright, so we're assuming that any particular artifact will start with 4 substats at Level 0. In this case, it can't gain any more, so now they can only be increased. Therefore, if these four substats don't include both CD and CR, then it's... not very good. Let's assume that you got lucky, and that you have both a CD and CR substat.

ASSUMPTION 3: 2 of the 4 substats are crit damage and crit rate

This is actually not wonderfully likely. Whatever the final probability is, you can probably at least divide it by 5 (EDIT: Actually this apparently has around 1% chance of happening) to account for this single assumption (not to mention the others!). It's not too hard to factor the case into the calculations though, it'll just make this post much longer than it needs to be.

Great, with all these assumptions in favor of the player, it's now time to talk about how these substats are upgraded as the artifact is levelled up. Here's how it works:
  • A substat upgrade occurs every 4 levels. That is, an upgrade occurs at levels 4, 8, 12, 16, and 20 (a total of five times).
  • The upgrade is done as follows: A random substat is picked, and then that substat is increased.
  • The increase itself is chosen at random, and is dictated by the bottommost table here (read the page if you want, but no need).

Here are the exact values we care about:
  • If the game chose to increase a CD substat, then the increase is chosen uniformly at random from $\{5.44, 6.22, 6.99, 7.77\}$.
  • If the game chose to increase a CR substat, then the increase is chosen uniformly at random from $\{2.72, 3.11, 3.50, 3.89\}$.

How is the initial value of a substat chosen? It's simply chosen from these above sets. So, for example, if an artifact indeed starts with a CD substat at level 0, then that substat is equally likely to be any of $5.44, 6.22, 6.99,$ or $7.77$.

And... that's it! Now take a look at my friend's artifact.

https://i.imgur.com/TMkCtai.png

You can see that the CV is $49$. That's pretty high!

MAIN QUESTION: How lucky did my friend get with their CV? In general, how can you estimate the probability of getting some high CV artifact?

Finally, the math can start. Buckle your seatbelts.

Part 2: The Weak Markov Bound

Theorem (Markov's Inequality): Let $X$ be a random variable that takes non-negative real values. Then for all $t > 0$:
$$\mathbb{P}(X \geq t) \leq \frac{\mathbb{E}X}{t}$$
You can see why this would be useful to use: By letting $X$ be the random variable representing the Critical Value of an artifact, we can, say, set $t = 50$ to get a bound on how unlikely it is to get an artifact of $\text{CV} \geq 50$. The only thing we really need to do is compute the expected value of $X$.

The main tool here is just linearity of expectation:
  • By linearity of expectation, $\mathbb{E}[\text{CV}] = \mathbb{E}[\text{CD}] + 2\mathbb{E}[\text{CR}]$.
  • By linearity again, the expectation of the crit damage is just the expected value of its initial value plus the sum of the expected increases across each of the 5 substat upgradings.
  • Expected initial value is $\frac{5.44 + 6.22 + 6.99 + 7.77}{4} = 6.61$.
  • If the CD is increased, then the expected value of the increase is, again, $6.61$. But there's only a $1/4$ chance that CD is chosen, so the expected increase of each upgrade is $6.61/4=1.65$.
  • In total, the expected value of the final CD is $6.61 + 5(1.65) = 14.86$.
  • Similarly, the expected value of the final CR is $7.43$.
  • Therefore $\mathbb{E}[\text{CV}] = 14.86 + 2(7.43) = \boxed{29.72}$.

Using this, we can get an upper bound on the probability that my friend got their $49$ CV artifact:
$$\mathbb{P}(\text{CV} \geq 49) \leq \frac{29.72}{49} = 0.607$$
...er, that's not a very good upper bound. This doesn't at all prove that their artifact was a miracle. We'll have to do better...


Part 3: Chebyshev to the Rescue!

Definition (Variance): The variance of a random variable $X$ is essentially a measure of how much it differs from its mean (expected value), giving more weight to larger deviation than smaller ones. It is given by:
$$\operatorname{Var} X := \mathbb{E}[(X-\mathbb{E}X)^2]$$In this form, the definition makes a lot of sense. But, by linearity of expectation, this can be manipulated into a form that is easier to compute as follows:
$$\operatorname{Var} X = \mathbb{E}[X^2 - 2X\mathbb{E}X + (\mathbb{E}X)^2] = \mathbb{E}[X^2] - 2(\mathbb{E}X)(\mathbb{E}X) + (\mathbb{E}X)^2 = \mathbb{E}[X^2] - (\mathbb{E}X)^2$$Note that $(\mathbb{E}X)^2$ and $\mathbb{E}[X^2]$ are not the same thing!!! (If you're not convinced, try computing both quantities when $X$ represents a random dice roll.)

Now we can get to the theorem that will ultimately help us...

Theorem (Chebyshev's Inequality): Let $X$ be a random variable. Then:
$$\mathbb{P}(|X - \mathbb{E}X| \geq t) \leq \frac{\operatorname{Var} X}{t^2}$$
You can see why this might be better: The denominator is now increases quadratically (instead of linearly) in $t$. Hopefully, this will result in a sharper upper bound!

To compute $\operatorname{Var} \text{CV}$, we need to compute both $\mathbb{E}[\text{CV}]$ and $\mathbb{E}[\text{CV}^2]$. Fortunately, we did the first one already. The problem is going to be computing the second one. How the heck do we figure that out without bashing out every possible value for $\text{CV}$?

First, let's do some housekeeping. We can write $\text{CV} = \text{CD} + 2\text{CR}$, so by linearity of expectation:
$$\mathbb{E}[\text{CV}^2] = \mathbb{E}[(\text{CD} + 2\text{CR})^2] = \mathbb{E}[\text{CD}^2] + 4\mathbb{E}[\text{CD}\text{CR}] + 4\mathbb{E}[\text{CR}^2]$$We cannot simplify the middle term to $\mathbb{E}[\text{CD}]\mathbb{E}[\text{CR}]$. That's only allowed if $\text{CD}$ and $\text{CR}$ were independent random variables. But they aren't! (Exercise: Why?)

Now what? The key idea is to do some MORE housekeeping by "modelling" what exactly $\text{CD}$ and $\text{CR}$ are. Remember: Each of these random variables consists of some random initial value, plus 5 possible increases. This inspires trying to rewrite them using these parts as random variables!

Let's consider $\text{CD}$. We may write $\text{CD} = A + X_1 + X_2 + X_3 + X_4 + X_5$, where $A$ is the random variable representing the initial increase, and $X_i$ is the variable representing the possible increase.

$A$ is just uniformly chosen from $\{5.44, 6.22, 6.99, 7.77\}$, so that's easy. As for $X_i$, it's trickier. There's a $3/4$ chance that the CD is not increased, so $X_i = 0$ with probability $3/4$. But there's a $1/4$ chance that it IS chosen, in which case the value is chosen randomly from $\{5.44, 6.22, 6.99, 7.77\}$. Hence the distribution of $Y_i$ is given by the following table:

3/4 0
1/16 5.44
1/16 6.22
1/16 6.99
1/16 7.77

Now we can compute:
$$\mathbb{E}[\text{CD}^2] = \mathbb{E}\left[\left(A + \sum_{i=1}^5 X_i\right)^2\right]$$We need to do some unfortunate expanding here...
$$ = \mathbb{E}[A^2] + 2\sum_{i=1}^5\mathbb{E}[AX_i] + \sum_{i=1}^5\sum_{j=1}^5\mathbb{E}[X_iX_j]$$This looks horrific, but bear with me! First, notice that all the $X_i$ have the same distribution, because they all represent an increase of the same substat. What this means is that their expected values must be the same! The expectation of $AX_i$ should also be the same for all $i$. So we can simplify a bit:
$$ = \mathbb{E}[A^2] + 10\mathbb{E}[AX_1] + \sum_{i=1}^5\sum_{j=1}^5\mathbb{E}[X_iX_j]$$To deal with the second term, we first split it into two sums: One where $i=j$, and another where $i \ne j$.
$$ = \mathbb{E}[A^2] + 10\mathbb{E}[AX_1] + \sum_{i=1}^5\mathbb{E}[X_iX_i] + \sum_{i \ne j} \mathbb{E}[X_iX_j]$$Now here's the punchline: The $X_i$ are actually independent! This is because, say, if CD is increased at lvl 4, then this won't affect the potential for further increases at lvl 8, 12, etc., and so $\mathbb{E}[X_iX_j] = \mathbb{E}X_i\mathbb{E}X_j = \mathbb{E}X_1\mathbb{E}X_1$. (Of course, you should note that this cannot be done when $i=j$, because variables can't be independent against themselves... unless you're doing something dumb.) This gives us more simplification!
$$ = \mathbb{E}[A^2] + 10\mathbb{E}[A]\mathbb{E}[X_1] + 5\mathbb{E}[X_1^2] + 20(\mathbb{E}X_1)^2$$That's great! All of these values are easily computable. If you want to be really fancy, you can note that $\mathbb{E}X_1 = \frac34(0) + \frac14\mathbb{E}A$, and $\mathbb{E}[X_1^2] = \frac34(0) + \frac14\mathbb{E}[A^2]$, letting us simplify even more:
$$ = \mathbb{E}[A^2] + 10\mathbb{E}[A]\left(\frac14\mathbb{E}A\right) + \frac54\mathbb{E}[A^2] + 20\left(\frac14\mathbb{E}A\right)^2$$$$ = \mathbb{E}[A^2] + \frac52(\mathbb{E}A)^2 + \frac54\mathbb{E}[A^2] + \frac54(\mathbb{E}A)^2$$$$ = \frac94\mathbb{E}[A^2] + \frac{15}4(\mathbb{E}A)^2$$Similarly, if we separate $\text{CR}$ using the variables $B,Y_1,\cdots,Y_5$ defined in the same way, then:
$$\mathbb{E}[\text{CR}^2] = \frac94\mathbb{E}[B^2] + \frac{15}4(\mathbb{E}B)^2$$We're halfway there! But now there's this pesky $\mathbb{E}[\text{CD}\text{CR}]$ term. How will we evaluate that? ...well, using the same trick of course! This is literally just:
$$\mathbb{E}[\text{CD}\text{CR}] = \mathbb{E}\left[(A+X_1+\cdots+X_5)(B+Y_1+\cdots+Y_5)\right]$$Some more nasty expanding is in order, but don't worry... it will collapse quickly.
$$ = \mathbb{E}A\mathbb{E}B + 5\mathbb{E}A\mathbb{E}Y_1 + 5\mathbb{E}B\mathbb{E}X_1 + \sum_{i=1}^5\sum_{j=1}^5 \mathbb{E}[X_iY_j]$$I jumped the gun a bit there. Make sure you understand how exactly we can get the above by using identical distributions of the $X_i$ (and $Y_i$) as well as applying independence properly (be sure to understand which pairs of variables can be considered independent!) Let's again split this sum:
$$ = \mathbb{E}A\mathbb{E}B + 5\mathbb{E}A\mathbb{E}Y_1 + 5\mathbb{E}B\mathbb{E}X_1 + \sum_{i=1}^5\mathbb{E}[X_iY_i] + \sum_{i\ne j} \mathbb{E}[X_iY_j]$$Here's the magic: I claim that $\sum_{i=1}^5\mathbb{E}[X_iY_i] = 0$. Why is that? Answer
So this devolves into:
$$ = \mathbb{E}A\mathbb{E}B + 5\mathbb{E}A\mathbb{E}Y_1 + 5\mathbb{E}B\mathbb{E}X_1 + 20\mathbb{E}[X_1\mathbb{E}[Y_1]$$$$ = \mathbb{E}A\mathbb{E}B + \frac54\mathbb{E}A\mathbb{E}B + \frac54\mathbb{E}B\mathbb{E}B + \frac54\mathbb{E}A\mathbb{E}B$$$$ = \frac{19}4\mathbb{E}A\mathbb{E}B$$
Voila! Therefore:
$$\mathbb{E}[\text{CV}^2] = \mathbb{E}[\text{CD}^2] + 4\mathbb{E}[\text{CD}\text{CR}] + 4\mathbb{E}[\text{CR}^2]$$$$ = \frac94\mathbb{E}[A^2] + \frac{15}4(\mathbb{E}A)^2 + 9\mathbb{E}[B^2] + 15(\mathbb{E}B)^2 + 19\mathbb{E}A\mathbb{E}B$$Finally, let's start plugging stuff in:
  • $\mathbb{E}A = \frac{5.44 + 6.22 + 6.99 + 7.77}{4} = 6.61$
  • $\mathbb{E}[A^2] = \frac{5.44^2 + 6.22^2 + 6.99^2 + 7.77^2}{4} = 44.38$
  • $\mathbb{E}B = \frac{2.72 + 3.11 + 3.50 + 3.89}{4} = 3.31$
  • $\mathbb{E}[B^2] = \frac{2.72^2 + 3.11^2 + 3.50^2 + 3.89^2}{4} = 11.11$

$$\mathbb{E}[\text{CV}^2] = \frac94(44.38) + \frac{15}{4}(6.61)^2 + 9(11.11) + 15(3.31)^2 + 19(6.61)(3.31) = 943.734775$$
Recalling that $\mathbb{E}[\text{CV}] = 29.72$, we can finally get:
$$\operatorname{Var} \text{CV} = \mathbb{E}[\text{CV}^2] - \mathbb{E}[\text{CV}]^2 = \boxed{60.46}$$Brilliant! Finally let's put this into Chebyshev's inequality:

$$\mathbb{P}(|\text{CV} - 29.72| \geq t) \leq \frac{60.46}{t^2}$$
To bound the probability that my friend got their $49$ CV artifact, we first write:

$$\mathbb{P}(\text{CV} \geq  49) = \mathbb{P}(\text{CV}-29.72 \geq 19.28)$$$$ = \mathbb{P}(|\text{CV}-29.72| \geq 19.28) - \mathbb{P}(29.72 - \text{CV} \geq 19.28)$$$$ = \mathbb{P}(|\text{CV}-29.72| \geq 19.28) - \mathbb{P}(10.44 \geq \text{CV})$$Now note that $\mathbb{P}(10.44 \geq \text{CV}) = 0$ (why?), so this is:
$$ = \mathbb{P}(|\text{CV}-29.72| \geq 19.28) \leq \frac{60.46}{19.28^2} = \boxed{0.163}$$
This is much better than the Markov bound we had earlier! It's still quite a bit off from the actual probability (which turns out to be around $0.003$ or less, by a Monte Carlo simulation), but it's nice to have some bound that can be mathematically derived without a computer. Also, keep in mind that if you remove the assumptions that the artifact starts with 4 substats and will always have crit rate and crit damage to start with, then this probability is even lower!

Is there a way to lower this upper bound using a different method? I'm not sure. I've tried some other things like the Central Limit Theorem, but it did not improve on $0.163$. Feel free to try your hand at lowering the bound if you can :)

Takeaways
  • mAtH iS eVeRyWhErE
  • Chebyshev is better than Markov, but the obtained bound can still be a ways off from the actual probability.
  • I probably play too much Genshin

That's all from me for today! A video of me failing the AMC 12B will be coming soonish.
This post has been edited 5 times. Last edited by greenturtle3141, Dec 3, 2021, 1:44 AM

Comment

5 Comments

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Amazing, not only could I understand this blog post, I also enjoyed reading it. A rarity among AoPS blog posts today!

by DirePotatoHead, Dec 4, 2021, 3:54 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Thanks! I'm glad you enjoyed it.

by greenturtle3141, Dec 5, 2021, 4:34 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
omg I love this

actually such a high-quality post

by samrocksnature, Dec 14, 2021, 9:10 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
I love applications like this that are actually relevant to students. Looking forward to that video too.

by obtuse, Dec 14, 2021, 3:35 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
dang high quality
That game is so widespread it sure is making an impact on this gen(era)shin

by v4913, Dec 23, 2021, 11:57 AM

Turtle math!

avatar

greenturtle3141
Archives
+ October 2024
Shouts
Submit
  • Can you give some thought to dropping a guide to STS? Just like how you presented your research (in your paper), what your essays were about, etc. Also cool blog!

    by Shreyasharma, Mar 13, 2025, 7:03 PM

  • this is so good

    by purpledonutdragon, Mar 4, 2025, 2:05 PM

  • orz usamts grader

    by Lhaj3, Jan 23, 2025, 7:43 PM

  • Entertaining blog

    by eduD_looC, Dec 31, 2024, 8:57 PM

  • wow really cool stuff

    by kingu, Dec 4, 2024, 1:02 AM

  • Although I had a decent college essay, this isn't really my specialty so I don't really have anything useful to say that isn't already available online.

    by greenturtle3141, Nov 3, 2024, 7:25 PM

  • Could you also make a blog post about college essay writing :skull:

    by Shreyasharma, Nov 2, 2024, 9:04 PM

  • what gold

    by peace09, Oct 15, 2024, 3:39 PM

  • oh lmao, i was confused because of the title initially. thanks! great read

    by OlympusHero, Jul 20, 2024, 5:00 AM

  • It should be under August 2023

    by greenturtle3141, Jul 11, 2024, 11:44 PM

  • does this blog still have the post about your math journey? for some reason i can't find it

    by OlympusHero, Jul 10, 2024, 5:41 PM

  • imagine not tortoise math

    no but seriously really interesting blog

    by fruitmonster97, Apr 2, 2024, 12:39 AM

  • W blog man

    by s12d34, Jan 24, 2024, 11:37 PM

  • very nice blog greenturtle it is very descriptive and fascinating to pay attention to :-D

    by StarLex1, Jan 3, 2024, 3:12 PM

  • orz blog

    by ryanbear, Dec 6, 2023, 9:23 PM

67 shouts
Tags
About Owner
  • Posts: 3554
  • Joined: Oct 14, 2014
Blog Stats
  • Blog created: Oct 23, 2021
  • Total entries: 54
  • Total visits: 41095
  • Total comments: 126
Search Blog
a