The rise of AI speech synthesis, while achieving impressive naturalness, has revealed a profound educational challenge: its failure to convey complex human emotions and contextual nuance—termed the “affective gap”—threatens to undermine the ecology of voice artistry and societal aesthetic discernment. This paper first diagnoses this gap by examining its key manifestations (compound emotion flattening, contextual deafness, the prosodic uncanny valley) and tracing its root cause to the epistemological divide between AI’s data-driven pattern recognition and human embodied experience. It then analyzes the consequent structural disruption to the voice-acting industry’s traditional “pyramid” training model and the broader risk of cultural aesthetic deskilling. In response, the paper’s central contribution is to propose a novel pedagogical framework designed to bridge this gap. This framework advocates a decisive shift in voice education from skill transmission towards critical voice artistry, centered on cultivating students’ capacities for deep textual/contextual analysis, empathetic and embodied sense-making, and the critical evaluation and direction of AI-generated speech. The paper argues that by integrating this critical pedagogical approach with strategic technology use, educators can empower future artists to navigate and shape a hybrid human-AI creative landscape. Ultimately, this work provides a theoretically grounded and actionable roadmap for innovating performing arts education in the AI era, positioning educational technology as vital steward of uniquely human expressive intelligence.



