LLMs as minimally adequate probabilistic teachers for DFA learning

arXiv:2408.02999v1 Type of announcement: cross Abstract: The emergence of intelligence in large language models (LLMs) has inspired research on their integration into automata learning. This paper presents the probabilistic formulation of the minimally adequate teacher (pMAT), which exploits a probabilistic oracle that could randomly yield persistent errors when answering membership queries for learning deterministic finite automata (DFAs). Given the tendency of LLMs to produce hallucinatory content, we developed techniques to improve the accuracy of answers and ensure the correctness of learned automata. We propose the $\mathtt{Discrimination}$ prompt and the $\mathtt{Verification}$ prompt and explore their advantages over common prompts. Furthermore, we compare the DFA learning performance between the TTT algorithm and common active learning algorithms. To cope with the exponential number of persistent errors, we implement a dynamic query cache refinement algorithm that identifies and corrects conflicting queries by combining active and passive learning algorithms. Empirical results demonstrate the robustness and effectiveness of our approach, providing a theoretical foundation for learning automata with in-the-loop LLMs.