Industrial catalysis, as a core field of chemical engineering, is characterized by intensive professional terminology and complex knowledge structures, making it challenging for general-purpose large language models to accurately understand and apply relevant professional knowledge. This research presents a domain-specific fine-tuning technique and retrieval-augmented generation system for the industrial catalysis field. Through a multi-model collaborative data processing pipeline, we construct high-quality training corpora, employ parameter-efficient fine-tuning techniques to train specialized domain models, and design a retrieval-augmented generation workflow based on consistency verification. The research first establishes a training corpus containing 2.3 billion tokens, including 1.1 billion domain-specific tokens and 1.2 billion general tokens with a balanced 1:1 ratio strategy. Subsequently, we apply rank-stabilized low-rank adaptation (rsLoRA) method to perform parameter-efficient fine-tuning on the Yi-1.5-6B model, resulting in the PeiYang Micro-Emergence model, which achieves a score of 76.81 in industrial catalysis field evaluation, significantly outperforming the general-purpose model Qwen2.5-72B-Instruct (65.45 points) with 12 times the parameters, while maintaining good general capabilities. We further construct a 3.37 million domain-specific retrieval pair dataset and optimize the embedding model using Matryoshka representation learning (MRL) techniques, achieving an average improvement of 2.87 percentage points in domain retrieval recall@3 while slightly enhancing general capabilities. Finally, we design a professional retrieval-augmented generation workflow integrating bilingual hypothetical document generation, dual-path retrieval, and consistency verification, achieving high-quality professional knowledge services. This system provides accurate and reliable professional knowledge services for the industrial catalysis field, demonstrates the application value of domain-specific large language models in resource-constrained environments, and offers a replicable technical pathway for artificial intelligence applications in other specialized domains.



