2023 CMO(CHINA) Problems/Problem 3

Problem

Given a prime number $p \geq 5$, let $\Omega=\{1,2, \ldots, p\}$. For any $x, y \in \Omega$, define: \[r(x, y)= \begin{cases}y-x, & \text { if } y \geq x \\ y-x+p, & \text { if } y<x\end{cases}\]

For a non-empty subset $A$ of $\Omega$, define: \[f(A)=\sum_{x \in A} \sum_{y \in A}(r(x, y))^2\]

A subset $A$ of $\Omega$ is called a "good subset" if $0<|A|<p$ and for any subset $B$ of $\Omega$ with $|B|=|A|$, it holds that $f(B) \geq f(A)$. Find the largest positive integer $L$ such that there exist $L$ pairwise distinct good subsets $A_1, A_2, \ldots, A_L$ of $\Omega$ satisfying $A_1 \subseteq A_2 \subseteq \cdots \subseteq A_L$

Solution 1

Given a prime $p$ not less than 5 , define the directed distance between two points $x, y \in$ $\Omega:=\{1,2, \ldots, p\}$ as $D(x, y):=(y-x \bmod p)$, taking the smallest non-negative remainder. For any non-empty proper subset $A$ of $\Omega$, define \[\mu(A):=\sum_{x \in A} \sum_{y \in A}(D(x, y))^2 .\]

We call a non-empty proper subset $A$ a "good set" if and only if \[\forall B:(B \subseteq \Omega, \# B=\# A) \rightarrow \mu(B) \geq \mu(A) .\] (where the \# symbol denotes the number of elements). Find the largest positive integer $l$ such that there exists a chain of good sets \[A_1 \subsetneq A_2 \subsetneq \cdots \subsetneq A_l .\]

Related Problems/Lemma 1: Given positive integers $m$ and $n$, let $a_1, a_2, \ldots, a_m$ be integers summing to $n$. Find the necessary and sufficient condition for \[a_1^2+a_2^2+\cdots+a_m^2\] to attain its minimum possible value. Proof of Lemma 1: Since there are only finitely many ways to choose $a_i$, there must be a minimum value. The assertion is: For the minimum value, for any $i \neq j, a_i-a_j \leq 1$. Otherwise, suppose $a_i-a_j \geq 2$. Then we can replace $a_i$ and $a_j$ with $a_i-1$ and $a_j+1$, respectively, reducing the objective function, which contradicts minimality. Thus, the assertion holds, and the minimum is achieved when there are $r$ numbers equal to $k+1$ and $(n-r)$ numbers equal to $k$, where $k$ and $r$ come from the division with remainder: \[n=k m+r, \quad 0 \leq r<m\]

Lemma 2: For any $1 \leq m \leq p-1$ and any real number $1 \leq x<2$, the set consisting of the following $m$ elements is a good set: \[a_t:=\left\lfloor x+\frac{t p}{m}\right\rfloor, \quad t=0,1,2, \ldots, m-1 .\] (The notation $\lfloor\cdot\rfloor$ denotes the floor function; indices taken modulo $m$ ). Furthermore, if the values of all $a_t$ are taken modulo $p$ to the smallest positive remainder, the conclusion still holds for any real number $x$.

Proof of Lemma 2: For integers $s: 1 \leq s<m$, we have $D\left(a_t, a_{t+s}\right)=\left(\left\lfloor x+\frac{(t+s) p}{m}\right\rfloor-\left\lfloor x+\frac{t p}{m}\right\rfloor\right) \bmod p$, which is at least $\frac{s p}{m}-1$ and at most $\frac{s p}{m}+1$. Note that $\frac{s p}{m} \notin \mathbb{Z}$ (using that $p$ is prime), meaning there are at most two adjacent values. For a set $B=\left\{b_0, b_1, \ldots, b_{m-1}\right\} \subseteq \Omega$ with $b_j<b_{j+1}$, we have \[\mu(B)=\sum_{s=1}^{m-1} \sum_{t=0}^{m-1} D\left(b_t, b_{t+s}\right)^2\] interpreted modulo $m$. It is easy to verify that $\sum_{t=0}^{m-1} D\left(b_t, b_{t+s}\right)=s p$. According to Lemma 1 , the minimum possible value of $\sum_{t=0}^{m-1} D\left(b_t, b_{t+s}\right)^2$ is achieved when $b_t=a_t$ (sufficient but not necessarily necessary in general). Thus, under the constraint of fixed element numbers, these $a_t$ minimize $\mu$, proving Lemma 2.

Lemma 3: Continuing from Lemma 2's notation, if $B=\left\{b_0, b_1, \ldots, b_{m-1}\right\} \subseteq \Omega$ is a good set with $b_j<b_{j+1}$, then the multiset \[\left\{D\left(b_t, b_{t+s}\right), 0 \leq t<m\right\}=\left\{D\left(a_t, a_{t+s}\right), 0 \leq t<m\right\}\] and both satisfy the necessary and sufficient condition from Lemma 1. Proof of Lemma 3: A good set must satisfy \[\mu(A)=\mu(B)=\sum_{s=1}^{m-1} \sum_{t=0}^{m-1} D\left(b_t, b_{t+s}\right)^2 \geq \sum_{s=1}^{m-1} \sum_{t=0}^{m-1} D\left(a_t, a_{t+s}\right)^2=\mu(A),\] thus all \[\sum_{t=0}^{m-1} D\left(b_t, b_{t+s}\right)^2=\sum_{t=0}^{m-1} D\left(a_t, a_{t+s}\right)^2\] and by Lemma 1 (necessary and sufficient condition for minimizing the sum of squares), they are equal as multisets.

Lemma 4: The complement of a good set is also a good set.

Proof of Lemma 4: \[\mu(A)-\mu(\Omega \backslash A) =\sum_{x \in A} \sum_{y \in A}(D(x, y))^2-\sum_{x \notin A} \sum_{y \notin A}(D(x, y))^2\] \[=\sum_{x \in A} \sum_y(D(x, y))^2-\sum_{x \notin A} \sum_y(D(x, y))^2=(2 \# A-p)\left(1^2 \# A-p\right)\left(1^2+\cdots+(p-1)^2\right).\] which only depends on \#A, not the specific elements. Thus, the minimality of $\mu(A)$ is equivalent to the minimality of $\mu(\Omega \backslash A)$ when fixing the number of elements, making both good sets.

Returning to the original problem: First, in the case of non-Mersenne primes (i.e., $p \neq 2^k-1$ ), construct a chain of good sets of length $l=2\left(\left\lfloor\log _2 p\right\rfloor\right)=2\left(\left\lfloor\log _2(p+1)\right\rfloor\right)$. Choose any $x$ and denote \[a_{t, m}:=\left\lfloor x+\frac{t p}{m}\right\rfloor \bmod p, \quad t=0,1,2, \ldots, m-1\] taking the smallest positive remainder. According to Lemma 2, for each given $m$, the above $a_{t, m}$ forms a good set denoted as $S_m$. Construct a chain of good sets of length $\frac{l}{2}$ (choosing the same $x$ ): \[A_1=S_1, A_2=S_2, A_3=S_4, \ldots, A_{\left[\log _2 p\right]}=S_{2^{\left[\log _2 p\right]-1}} .\]

Their containment is quite obvious: in fact, $a_{t, m}=a_{2 t, 2 m}$. The key is how to connect the good set of $2^{\left[\log _2 p\right]-1}$ elements to the good set of $p-$ $2^{\left[\log _2 p\right]-1}$ elements. In fact, let $B:=S_{2\left(\left[\log _2 p\right]-1\right)+1} \bmod p($ taking the smallest positive remainder). It is easy to verify $B \cap A_{1+\left[\log _2 p\right]}=\emptyset$. Obviously, $B$ is also a good set, so $\Omega-$ $B$ is also a good set with $p-2^{\left[\log _2 p\right]-1}$ elements and contains $A_{1+\left[\log _2 p\right]}$. We only need to let \[A_{\left[\log _2 p\right]+j}:=\Omega \backslash\left(A_{\left[\log _2 p\right]+1-j}+1 \bmod p\right), \quad j=1,2, \ldots,\left\lfloor\log _2 p\right\rfloor,\] to complete the construction of the chain of good sets of length $l$. For Mersenne primes (i.e., $p=2^k-1$ ), we need to construct a chain of good sets of length $l=2 k$. Choose any $x$, and according to Lemma 2 , the good set of elements $2^{k-1}=\frac{p+1}{2}$ is denoted as $S_{2^{k-1}}$ as $A_{k+1}$. Let $A_k=\Omega-\left(A_{k+1}+1 \bmod p\right)$. Note that the intervals between elements of $A_{k+1}$ are 1 and all 2 , so $A_k \subseteq A_{k+1}$ and contains one less element. We alternately split the elements of $A_{k+1}$ into two parts: \[\left\{\left\lfloor x+\frac{t p}{2^{k-1}}\right\rfloor \bmod p\right\}=\left\{\left\lfloor x+\frac{t p}{2^{k-2}}\right\rfloor \bmod p\right\} \cup\left\{\left\lfloor x+\frac{1}{2^{k-1}}+\frac{t p}{2^{k-2}}\right\rfloor \bmod p\right\}\] which are disjoint and both good sets (Lemma 2), with one part contained in $A_k$ (corresponding to $y=x$ or $y=x+\frac{1}{2^{k-1}}$ ). Define them as $A_{k-1}$ and let \[A_j:=S_{2^{j-1}}(y)=\left\{\left\lfloor y+\frac{t p}{2^{j-1}}\right\rfloor \bmod p\right\}, \quad j=1,2, \ldots, k-1,\] and then let \[A_{2 k+1-j}:=\Omega-\left(A_j+1 \bmod p\right), \quad j=1,2, \ldots, k-1,\] which are all good sets by Lemma 4, thus constructing a chain of good sets of length $l=$ $2 k$.

Now prove the optimality of $l=2\left(\left\lfloor\log _2(p+1)\right\rfloor\right)$. We attempt to estimate the gap: if $1 \leq a<b<\frac{p}{3}$, and $A \subseteq B$ are both good sets with $\# A=a$ and $\# B=b$, we will prove $b \geq 2 a$. We use contradiction. Suppose $b<2 a$. Consider the interval composition of elements of $A$ (in the $\bmod p$ ring), assuming they are some number $t(\geq 4)$ and some number $t+1$. Since the number of new points in $B$ (i.e., $B \backslash A$ ) is fewer than in $A$, they cannot exist between each pair of elements in $A$, so the interval composition of elements in $B$ must include $t$ or $t+1$. If the interval composition of $B$ includes $t+1$, then splitting the interval in $A$ will result in an interval less than $t$, contradicting $B$ being a good set (Lemma 1). Thus, the interval composition of elements in $B$ can only be $t$ and $t-1 \geq 3$ (note $b<\frac{p}{3}$ ). But splitting an interval of $t+1$ in $A$ results in intervals still within $t$ and $t-1$, so $t=3$, contradiction.

If $1 \leq a<b<\frac{p}{2}$, and $A \subseteq B$ are both good sets with $\# A=a$ and $\# B=b$ but $b<2 a$, similarly to the above reasoning, we infer: the interval composition of elements in $A$ must be 4 or 3 , and the interval composition of elements in $B$ must be 3 or 2 . The elements of $B \backslash A$ must all divide the intervals of length 4 in $A$.

We further assert that $B$ (or $A$ ) does not contain two adjacent intervals of length 3. Otherwise, the "sum of lengths of two adjacent segments" in $B$ will simultaneously include 4, 5, and 6, contradicting the characterization of good sets by Lemma 3.

We further assert: the number of intervals of length 3 in $B$ (or $A$ ) must be 1 . Otherwise, let the number of intervals of length 3 be $k$, and the number of intervals of length 4 between any two elements in $A$ be positive integers $c_1, c_2, \ldots, c_k$ (they split into twice as many intervals of length 2 in $B$ ), satisfying \[3 k+4\left(c_1+c_2+\cdots+c_k\right)=p .\]

Clearly, these $c_i$ are not all equal (since $p$ is prime). Let $c=\min c_i$, consider the "sum of lengths of $2 c+2$ adjacent segments" in $B$, which can span over $2 c$ intervals of length 2 and extend by one interval of length 3 on each side (note that some $c_i=c$ ), resulting in a total length of $4 c+6$. But since not all $c_i$ are equal, at least one $c_i \geq c+1$, so there exists $2 c+2$ adjacent intervals of length 2 in $B$, with a total length of $4 c+4$. This contradicts the characterization of good sets by Lemma 3.

That is, "in most cases", if $1 \leq a<b<\frac{p}{2}$, and $A \subseteq B$ are both good sets with \#A $=a$ and \#B=b, then $b \geq 2 a$. The only exceptional case is $p=4 t-1, a=t$, and $b=2 t-1$.

Within the range of element numbers not exceeding $\frac{p}{2}$, the number of elements in each $A_k$ in the chain of good sets is \[\{1\},\{2,3\},\{4,5,6,7\}, \ldots,\left\{2^{\left[\log _2 p\right]-1}, \ldots, \frac{p-1}{2}\right\} .\]

When $p$ is a non-Mersenne prime, at most one number can appear in each of these sets; for Mersenne primes, the last set may contain at most 2 , while the others can contain only 1. This upper chain has a length of $\left[\log _2(p+1)\right]$.

By Lemma 4, within the range of element numbers exceeding $\frac{p}{2}$, the upper limit of the length of the chain of good sets is also $\left[\log _2(p+1)\right]$, proving the overall upper limit.

~xiaohuangya|szm

See Also

2023 CMO(CHINA) (ProblemsResources)
Preceded by
Problem 2
Followed by
Problem 4
1 2 3 4 5 6
All CMO(CHINA) Problems and Solutions