A question about automata

by t0rajir0u, Apr 12, 2009, 9:20 PM

Question: Let $ A$ be a finite alphabet and let $ L$ be a language on $ A$. Let $ L(x) = \sum_{n \ge 0} L_n x^n$, where $ L_n$ is the number of words in $ L$ of length $ n$.

Is it true that $ L(x)$ is a ratio of integer polynomials if and only if there exists another language $ L'$ such that $ L_n' = L_n$ for all $ n$ and such that $ L'$ can be recognized by a finite automaton (equivalently, a regular expression)?

For example, the set of palindromes cannot be recognized by a finite automaton, but there is a bijection from palindromes to strings where each letter except the last letter is repeated exactly twice in a row, and that language can be recognized.

The way I really want to phrase the $ L'$ requirement is that $ L$ and $ L'$, regarded as species, should be related by a natural transformation, but I don't know whether this is strictly stronger than the requirement I've written down. Anyway, I'm reasonably sure that as it stands the answer is "yes."

Here's an actual question which I wish I had the CS background to answer: characterize the languages $ L$ with the property that if $ w \in L$ then every (edit:) contiguous substring of $ w$ is in $ L$, i.e. describe what mode of computation recognizes such languages.

Comment

14 Comments

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
I don't know the answer to your particular question, but I wanted to mention that in general, having the same generating function is strictly weaker than being naturally isomorphic. (You asked for a natural transformation, but if it's not an isomorphism, it certainly won't do what you ask.)

The canonical example is that the set of bijections from a set to itself has the same size as the set of total orderings on the set: both are $ n!$. In any case, the proof that these are not naturally isomorphic species is that a permutation of the underlying set acts trivially on the set of bijections, but acts nontrivially on the set of total orderings. The standard isomorphism requires first choosing a total ordering for each set: the set of bijections from a totally ordered set to itself _is_ isomorphic to the set of permutations on a totally ordered set, because there are no automorphisms on a totally ordered set.

As for notation, for species on the category of finite sets, I use exponential generating functions: both the species of permutations and the species of total orderings have generating function $ \sum n! x^n/n! = 1/(1-x)$. For species on the category of totally ordered sets, I use ordinary generating functions: the two species mentioned are isomorphic, and are $ \sum n! x^n$.

Since there are no nontrivial automorphisms in the category of totally ordered sets, a species is determined up to natural isomorphism (but the choice of such an isomorphism is not usually determined) by its (for me always ordinary) generating function. Is your finite alphabet totally ordered?

by Anonymous, Apr 12, 2009, 11:13 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Thanks! I've been wondering about the natural isomorphism question for some time, and that comment about the category of totally ordered sets clarified some things for me.

Edit: As for the second question, it's been pointed out to me that if you let $ L$ be the set of words on $ \{ 0, 1 \}$ that don't have a substring of the form $ 01^n0$ where the $ n^{th}$ Turing machine halts, anything that recognizes $ L$ can solve the halting problem.

by t0rajir0u, Apr 13, 2009, 4:39 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
How do you keep inventing interesting problems over and over?

Well, here is one direction of your question. Let $ A$ be a finite alphabet, and $ L$ a regular language on $ A$. We claim that the generating function $ L\left(X\right) = \sum_{n\geq 0}L_nX^n$ is a rational function, where $ L_n$ means the number of words in $ L$ that have length $ n$.

Sketch of proof (of the above direction). Since $ L$ is regular, there exists a deterministic finite state machine that recognizes $ L$. WLOG assume that it has exactly one start state, labeled $ 1$, and exactly one accept state, labeled $ m$, and all other states are labeled $ 2$, $ 3$, ..., $ m - 1$. Consider the matrix $ U = \left(u_{i,j}\right)_{1\leq i\leq m,\ 1\leq j\leq m}\in\left(\mathbb{Q}\left[\left[X\right]\right]\right)^{m\times m}$ whose entry $ u_{i,j}$ is defined as the number of all letters in $ A$ which induce a transition from state $ i$ to state $ j$.

Then, for any nonnegative integer $ n$, the $ \left(1,m\right)$-th entry of the matrix $ U^n$ will be the number of all $ n$-letter words in $ L$. Hence, $ L\left(X\right) = \sum_{n\geq 0}L_nX^n$ is the $ \left(1,m\right)$-th entry of the matrix $ \sum_{n\geq 0}U^nX^n = \left(1 - UX\right)^{ - 1}$. This is a rational function in $ X$, since $ \left(1 - UX\right)^{ - 1} = \frac {1}{\det\left(1 - UX\right)}\mathrm{adj}\left(1 - UX\right)$ (or, if you don't want to use adjoints, because the matrix $ 1 - UX$ is invertible over the ring $ \mathbb{Q}\left(X\right)$ and its inverse doesn't contain negative powers of $ X$). Proof complete.

As for the other direction, I have not really thought about it yet, but it looks promising. Note that claiming that $ L\left(X\right)$ is a rational function is equivalent to claiming that the sequence $ \left(L_0,L_1,L_2,...\right)$ satisfies a linear recurrence with constant coefficients (like, e. g., the Fibonacci sequence, which corresponds to the regular language $ \left(10\mid 0\right)^{\ast}$). So you are basically asking whether any linearly recurrent sequence consisting of nonnegative integers has a combinatorial interpretation like the classical Fibonacci subsets-of-$ \left\{1,2,...,n\right\}$-with-no-two-adjacent-elements one. Whether or not this is true, I cannot imagine it to be difficult.

darij

by darij grinberg, Apr 13, 2009, 5:56 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Yes, I thought more or less the same. Apparently the left implication was proven by Chomsky and Schutzenberger in 1963 (I guess as part of the foundation of formal language theory) and I think I have, more or less, a proof of the right implication. Starting with a linear recurrence with integer coefficients, construct its companion matrix. If it is all positive entries, we're just talking about walks on some graph and we can translate to a DFA. If some of the entries are negative, replace each vertex with two copies of itself and rewrite the adjacency matrix as follows: a positive integer $ a$ becomes $ \left[ \begin{array}{cc} a & 0 \\
0 & a \end{array} \right]$ and a negative integer $ - b$ becomes $ \left[ \begin{array}{cc} 0 & b \\
b & 0 \end{array} \right]$. In other words, each vertex becomes a pair of vertices that keeps track of both "positive walks" and "negative walks" and then we're done again. (I think.)

Anyway, I have been told that apparently the more interesting function to consider is the zeta function $ \exp \left( \sum_{n \ge 1} \frac {L_n}{n} t^n \right)$, which isn't rational in general, but apparently is closely connected to the zeta function of a variety; see this paper and these slides.

by t0rajir0u, Apr 13, 2009, 10:03 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
If by "substring" you mean contiguous substrings, then I don't know. But if you allow non-contiguous substrings, which are usually referred to as subsequences, then the answer has been known for a while, though is perhaps not as well-known as it should be.

Theorem (Higman, 1952): If L is any language and SUBSEQ(L) is the language of all subsequences of strings in L, then SUBSEQ(L) is regular.

For a bit more on this, see:

http://weblog.fortnow.com/2006/01/theorem-that-should-be-better-known.html

by Anonymous, Apr 13, 2009, 10:57 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
t0rajir0u: maybe I am stupid, but I don't understand how you prove the other direction. You can transform your finite automaton into a new one which accepts exactly those words which are accepted by the old automaton and have even length. Same for odd length. But you probably want one which accepts a set of words whose cardinality is the difference between the number of even words and the number of odd words, or something like that (I guess even and odd are not what you want, but the problem is the same - it should be a difference anyway). How do you implement this?

darij

by darij grinberg, Apr 14, 2009, 10:51 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
According to Wikipedia the regular languages are closed under difference, which is the property I want, but I don't know the construction that proves this.

by t0rajir0u, Apr 14, 2009, 2:26 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
That's offtopic - we don't want the difference between the languages, but a language whose number of words (in length $ n$) equals to the difference between the numbers of words (in length $ n$) in each of the languages. This is something different, unless we are able to force one language to become a subset of the other.

As for the proof that the difference between regular languages is regular, this immediately follows from two facts:
- if $ L$ and $ M$ are regular languages, then so is $ L\cup M$ (trivial using regexes, or easy to see using nondeterministic finite automata);
- if $ L$ is a regular language, then so is its complement $ \overline{L}$ (not easy at all using regexes, but trivial using deterministic finite automata - just rename all non-final states into final ones and vice versa).

darij

by darij grinberg, Apr 14, 2009, 2:46 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
A first step on the way to answering your last question might be to replace "contiguous substring" with "prefix," i.e., give some classification of languages $ L$ such that $ xy \in L$ implies $ x \in L$ for any strings $ x, y$ (where $ xy$ is concatenation). I feel like this may be hiding somewhere in Sipser's book, but I wasn't able to find it. I guess I'll have to go give this a thinking-over.

by JBL, Apr 16, 2009, 7:26 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
This is quite easy. I think maybe perhaps we should converse on such a topic. People like you are quite scarce, so we shall converse face-to-face and discuss such math topics not only to benefit you, but in the process me too. :|

by USAMO13, Apr 24, 2009, 11:54 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
You want to show that regular languages are closed under difference? This seems easy to me from the automata perspective, although maybe I am missing something. Start with finite state automata A and B which recognize your two languages. Now build a new automaton which simulates running A and B in parallel (the statae space is the product of the state spaces of A and B). Output YES if and only if A outputs YES and B outputs NO.

by Anonymous, Apr 29, 2009, 5:55 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Once again, I do NOT want to show that regular languages are closed under difference.

darij

by darij grinberg, May 7, 2009, 9:09 PM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
Right. The general result we want, which I am almost certain is false, is something like if $ L, L'$ are two regular languages such that $ L_n \ge L_n'$ for all $ n$, then there exists a regular language $ L''$ such that $ L_n = L_n' + L_n''.$

by t0rajir0u, May 8, 2009, 2:18 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
The language described as the product of A and B above would satisfy the requirements for L'', I should think?

by Anonymous, Sep 30, 2009, 5:04 AM

Various mathematical thoughts, ranging from problem-solving techniques to attempts to tie together related concepts.

avatar

t0rajir0u
Archives
+ March 2009
+ October 2007
+ May 2007
Shouts
Submit
  • orz $~~~~$

    by clarkculus, Jan 10, 2025, 4:13 PM

  • Insanely auraful

    by centslordm, Jan 1, 2025, 11:17 PM

  • Fly High :(

    by Siddharthmaybe, Oct 22, 2024, 8:34 PM

  • Dang it he is gone :(( /revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive/revive.

    by 799786, Aug 4, 2022, 1:56 PM

  • annoying precision

    by centslordm, May 16, 2021, 7:34 PM

  • rip t0rajir0u

    by OlympusHero, Dec 5, 2020, 9:29 PM

  • Shoutbox bump xD

    by DuoDuoling0, Oct 4, 2020, 2:25 AM

  • dang hes gone :(

    by OlympusHero, Jul 28, 2020, 3:52 AM

  • First shout in July

    by smartguy888, Jul 20, 2020, 3:08 PM

  • https://artofproblemsolving.com/community/c2448

    has more.

    -πφ

    by piphi, Jun 12, 2020, 8:20 PM

  • wait hold up 310,000
    people visited this man?!?!??

    by srisainandan6, May 29, 2020, 5:16 PM

  • first shout in 2020

    by OlympusHero, Apr 4, 2020, 1:15 AM

  • in his latest post he says he moved to wordpress

    by MelonGirl, Nov 16, 2019, 2:43 AM

  • Please revive!

    by AopsUser101, Oct 30, 2019, 7:10 PM

  • first shout in october fj9odiais

    by bulbasaur., Oct 14, 2019, 1:14 AM

128 shouts
Tags
About Owner
  • Posts: 12167
  • Joined: Nov 20, 2005
Blog Stats
  • Blog created: Dec 5, 2006
  • Total entries: 48
  • Total visits: 321828
  • Total comments: 202
Search Blog
a