- 1.
Cao, G. The Historical Inheritance and Contemporary Value of Yellow River Culture. Jinyang Acad. J. 2022, 2, 119-124.
- 2.
Langote, M.; Saratkar, S.; Kumar, P.; et al. Human-computer interaction in healthcare: Comprehensive review. Aims Bioeng. 2024, 11, 343-390.
- 3.
De Wet, L. Teaching Human-Computer Interaction Modules—And Then Came COVID-19. Front. Comput. Sci. 2021, 3, 793466.
- 4.
Amato, F.; Barolli, L.; Cozzolino, G.; et al. An Intelligent Interface for Human-Computer Interaction in Legal Domain. In Proceedings of the In International Conference on P2P, Parallel, Grid, Cloud and Internet Computing, Tirana, Albania, 27-29 October 2022.
- 5.
Hirsch, L.; Paananen, S.; Lengyel, D.; et al. Human-Computer Interaction (HCI) Advances to Re-Contextualize Cultural Heritage toward Multiperspectivity, Inclusion, and Sensemaking. Appl. Sci. 2024, 14, 7652.
- 6.
Achiam, J.; Adler, S.; Agarwal, S.; et al. Gpt-4 technical report. arXiv 2023, arXiv:2303.08774.
- 7.
Touvron, H.; Lavril, T.; Izacard, G.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971.
- 8.
Yang, A.; Xiao, B.; Wang, B.; et al. Baichuan 2: Open large-scale language models. arXiv 2023, arXiv:2309.10305.
- 9.
Liu, A.; Feng, B.; Xue, B.; et al. Deepseek-v3 technical report. arXiv 2024, arXiv:2412.19437.
- 10.
Yang, A.; Yang, B.; Zhang, B.; et al. Qwen2. 5 technical report. arXiv 2024, arXiv:2412.15115.
- 11.
Roziere, B.; Gehring, J.; Gloeckle, F.; et al. Code llama: Open foundation models for code. arXiv 2023, arXiv:2308.12950.
- 12.
Li, Y.; Li, Z.; Zhang, K.; et al. Chatdoctor: A medical chat model fine-tuned on a large language model meta-ai (llama) using medical domain knowledge. Cureus 2023, 15, e40895.
- 13.
Zhang, H.; Qiu, B.; Feng, Y.; et al. Baichuan4-Finance Technical Report. arXiv 2024, arXiv:2412.15270.
- 14.
Cui, J.; Ning, M.; Li, Z.; et al. Chatlaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture-of-experts large language model. arXiv 2023, arXiv:2306.16092.
- 15.
Jiang, Z.; Wang, J.; Cao, J.;et al. Towards better translations from classical to modern Chinese: A new dataset and a new method. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing, Foshan, China, 12-15 October 2023.
- 16.
Chang, E.; Shiue, Y.T.; Yeh, H.S.; et al. Time-aware ancient chinese text translation and inference. arXiv 2021, arXiv:2107.03179.
- 17.
Li, Z.; Sun, M. Punctuation as implicit annotations for Chinese word segmentation. Comput. Linguist. 2009, 35, 505-512.
- 18.
Yu, P.; Wang, X. BERT-based named entity recognition in Chinese twenty-four histories. In Proceedings of the International Conference on Web Information Systems and Applications, Guangzhou, China, 23-25 September 2020.
- 19.
Han, X.; Xu, L.; Qiao, F. CNN-BiLSTM-CRF model for term extraction in Chinese corpus. In Proceedings of the Web In- formation Systems and Applications: 15th International Conference, WISA 2018, Taiyuan, China, 14-15 September 2018.
- 20.
Wang, D.; Liu, C.; Zhao, Z.; et al. GujiBERT and GujiGPT: Construction of intelligent information processing foundation language models for ancient texts. arXiv 2023, arXiv:2307.05354.
- 21.
Chang, L.; Dongbo, W.; Zhixiao, Z.; et al. SikuGPT: A generative pre-trained model for intelligent information processing of ancient texts from the perspective of digital humanities. arXiv 2023, arXiv:2304.07778.
- 22.
Wptoux. Bloom-7B-Chunhua. Available online: https://huggingface.co/wptoux/bloom-7b-chunhua (accessed on 1 October 2023).
- 23.
XunziALLM. Available online: https://github.com/Xunzi-LLM-of-Chinese-classics/XunziALLM (accessed on 1 March 2024).
- 24.
Cao, J.; Peng, D.; Zhang, P.; et al. TongGu: Mastering Classical Chinese Understanding with Knowledge-Grounded Large Language Models. arXiv 2024, arXiv:2407.03937.
- 25.
Mallen, A.; Asai, A.; Zhong, V.; et al. When not to trust language models: Investigating effectiveness of parametric and non-parametric memories. arXiv 2022, arXiv:2212.10511.
- 26.
Carlini, N.; Tramer, F.; Wallace, E.; et al. Extracting training data from large language models. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Virtual, 11-13 August 2021.
- 27.
Huang, L.; Yu, W.; Ma, W.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. Acm Trans. Inf. Syst. 2025, 43, 1-55.
- 28.
Izacard, G.; Lewis, P.; Lomeli, M.; et al. Atlas: Few-shot learning with retrieval augmented language models. J. Mach. Learn. Res. 2023, 24, 1-43.
- 29.
Wu, Y.; Rabe, M.N.; Hutchins, D.; et al. Memorizing transformers. arXiv 2022, arXiv:2203.08913.
- 30.
He, Z.; Zhong, Z.; Cai, T.; et al. Rest: Retrieval-based speculative decoding. arXiv 2023, arXiv:2311.08252.
- 31.
Kang, M.; Gu… rel, N.M.; Yu, N.; et al. C-rag: Certified generation risks for retrieval-augmented language models. arXiv 2024, arXiv:2402.03181.
- 32.
Karpukhin, V.; Oguz, B.; Min, S.; et al. Dense Passage Retrieval for Open-Domain Question Answering. In Proceedings of the Empirical Methods in Natural Language Processing, Virtual, 16-20 November 2020.
- 33.
Ni, J.; Qu, C.; Lu, J.; et al. Large dual encoders are generalizable retrievers. arXiv 2021, arXiv:2112.07899.
- 34.
Nogueira, R.; Cho, K. Passage Re-ranking with BERT. arXiv 2019, arXiv:1901.04085.
- 35.
Yoran, O.; Wolfson, T.; Bogin, B.; et al. Answering questions by meta-reasoning over multiple chains of thought. arXiv 2023, arXiv:2304.13007.
- 36.
Yao, S.; Zhao, J.; Yu, D.; et al. React: Synergizing reasoning and acting in language models. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1-5 May 2023.
- 37.
Lewis, P.; Perez, E.; Piktus, A.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459-9474.
- 38.
Liu, Z.; Simon, C.E.; Caspani, F. Passage segmentation of documents for extractive question answering. In European Conference on Information Retrieval; Springer: Berlin/Heidelberg, Germany, 2025; pp. 345-352.
- 39.
Laitenberger, A.; Manning, C.D.; Liu, N.F. Stronger Baselines for Retrieval-Augmented Generation with Long-Context Language Models. arXiv 2025, arXiv:2506.03989.
- 40.
Edge, D.; Trinh, H.; Cheng, N.; et al. From local to global: A graph rag approach to query-focused summarization. arXiv 2024, arXiv:2404.16130.
- 41.
Cho, J.; Mahata, D.; Irsoy, O.;et al. M3docrag: Multi-modal retrieval is what you need for multi-page multi-document understanding. arXiv 2024, arXiv:2411.04952.
- 42.
Faysse, M.; Sibille, H.; Wu, T.; et al. Colpali: Efficient document retrieval with vision language models. arXiv 2024, arXiv:2407.01449.
- 43.
Wang, Q.; Ding, R.; Chen, Z.; et al. Vidorag: Visual document retrieval-augmented generation via dynamic iterative reasoning agents. arXiv 2025, arXiv:2502.18017.
- 44.
Memon, J.; Sami, M.; Khan, R.A.; et al. Handwritten optical character recognition (OCR): A comprehensive systematic literature review (SLR). IEEE Access 2020, 8, 142642-142668.
- 45.
Ingemarsson, P.; Daniel, P. PDF Parsing, Unveiling the Most Efficient Method. Bachelor’s Thesis, Linnaeus University, Va… xjo… , Sweden, 2024.
- 46.
LiveTalking: Real-Time Interactive Streaming Digital Human. 2024. Available online: https://github.com/lipku/livetalking (accessed on 16 March 2025).
- 47.
Prajwal, K.R.; Mukhopadhyay, R. Wav2Lip: A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild. 2020. Available online: https://github.com/Rudrabha/Wav2Lip (accessed on 16 March 2025).
- 48.
Zhang, Y.; Liu, M.; Chen, Z.; et al. Musetalk: Real-time high quality lip synchronization with latent space inpainting. arXiv 2024, arXiv:2410.10122.
- 49.
Metahuman-stream: Real-time Streaming Digital Human Based on NeRF. 2023. Available online: https://github.com/tsman/metahuman-stream (accessed on 16 March 2025).
- 50.
Adobe Systems Incorporated. Real-Time Messaging Protocol (RTMP) Specification. 2002. Available online: https://web.archive.org/web/20201001140644/https://www.adobe.com/content/dam/acom/en/devnet/rtmp/pdf/rtmp_specification_1.0.pdf (accessed on 16 March 2025).
- 51.
IETF and W3C. Web Real-Time Communication (WebRTC) Standard. 2011. Available online: https://www.w3.org/TR/webrtc/ (accessed on 16 March 2025).
- 52.
Synthesia. Synthesia: AI Video Generation Platform. 2017. Available online: https://www.synthesia.io/ (accessed on 16 March 2025).
- 53.
Diener, V. VTube Studio: Live2D VTuber Streaming Software. 2021. Available online: https://github.com/mouwoo/VTubeStudio/wiki (accessed on 16 March 2025).
- 54.
Guo, Z.; Xia, L.; Yu, Y.; et al. Lightrag: Simple and fast retrieval-augmented generation. arXiv 2024, arXiv:2410.05779..
- 55.
Gao, Z.; Li, Z.; Wang, J.; et al. Funasr: A fundamental end-to-end speech recognition toolkit. arXiv 2023, arXiv:2305.11013.
- 56.
. Edge-tts: Use Microsoft Edge’s Online Text-to-Speech Service from Python WITHOUT Needing Microsoft Edge or Windows or an API Key. 2024. Available online: https://github.com/rany2/edge-tts (accessed on 16 March 2025).
- 57.
Glm, T.; Zeng, A.; Xu, B.; et al. Chatglm: A family of large language models from glm-130b to glm-4 all tools. arXiv 2024, arXiv:2406.12793.
- 58.
Chaplot, D.S.; Jiang, A.Q.; Sablayrolles, A.; et al. Mistral 7B. arXiv 2023, arXiv:2310.06825.
- 59.
Jha, R.; Wang, B.; Gu… nther, M.; et al. Jina-colbert-v2: A general-purpose multilingual late interaction retriever. arXiv 2024, arXiv:2408.16672.
- 60.
Vavekanand, R.; Sam, K. Llama 3.1: An in-depth analysis of the next-generation large language model. Preprint 2024.
- 61.
Lu, H.; Liu, W.; Zhang, B.; et al. Deepseek-vl: Towards real-world vision-language understanding. arXiv 2024, arXiv:2403.05525.
- 62.
Laurenc¸on, H.; Tronchon, L.; Cord, M.; et al. What matters when building vision-language models? Adv. Neural Inf. Process. Syst. 2024, 37, 87874-87907.
- 63.
Guo, Z.; Xu, R.; Yao, Y.; et al. Llava-uhd: An lmm perceiving any aspect ratio and high-resolution images. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2024; pp. 390-406.
- 64.
Dong, X.; Zhang, P.; Zang, Y.; et al. Internlm-xcomposer2-4khd: A pioneering large vision-language model handling resolutions from 336 pixels to 4k hd. Adv. Neural Inf. Process. Syst. 2024, 37, 42566-42592.
- 65.
Hu, A.; Xu, H.; Ye, J.; et al. mplug-docowl 1.5: Unified structure learning for ocr-free document understanding. arXiv 2024, arXiv:2403.12895.
- 66.
Bai, J.; Bai, S.; Chu, Y.; et al. Qwen technical report. arXiv 2023, arXiv:2309.16609.
- 67.
Li, Z.; Yang, B.; Liu, Q.; et al. Monkey: Image resolution and text label are important things for large multi-modal models. In proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17-18 June 2024.