DeadMap: Open Yet Locked—Key-Controlled Dataset Protection with Feature-Level Collisions

Ting Yang; Mahabubur Rahman Miraj; Xinyu Lei; Nankun Mu; Hongyu Huang

doi:10.53941/jmlis.2026.100005

Abstract

The performance of deep learning models relies heavily on high-quality datasets. However, under the prevailing paradigm of pretrained-model–based applications, once a dataset is publicly released, data owners largely lose control over how it is used for model training. Existing data protection approaches either focus on post hoc accountability or irreversibly destroy the training utility of the data through perturbations, making it difficult to simultaneously satisfy the dual requirements of offline public sharing, authorized and controllable usage. In this paper, we propose DeadMap, a reversible training usability control framework designed for model fine-tuning scenarios. DeadMap introduces a secret label permutation combined with synchronized multi-layer feature-alignment perturbations. While preserving visual imperceptibility and keeping the surface labels unchanged, this design causes fine-tuning without the correct key to suffer severe performance degradation. In contrast, authorized users only need to possess a lightweight label-mapping key and apply a simple label remapping during training to recover performance close to that achieved with the original datasets. Experimental results show that DeadMap can effectively establish a significant performance difference between authorized and unauthorized settings, thus providing a lightweight and practical solution for balancing open data sharing and the controlled use of high-value datasets.

References

1.
Bayoudh, K.; Knani, R.; Hamdaoui, F.; et al. A Survey on Deep Multimodal Learning for Computer Vision: Advances, Trends, Applications, and Datasets. Vis. Comput. 2022, 38, 2939–2970.
2.
Jabeen, S.; Li, X.; Amin, M.S.; et al. A Review on Methods and Applications in Multimodal Deep Learning. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 19, 1–41.
3.
Hatcher, W.G.; Yu, W. A Survey of Deep Learning Platforms and Applications. IEEE Access 2018, 6, 24411–24432.
4.
Amiri, Z.; Heidari, A.; Navimipour, N.J.; et al. Deep Learning Techniques for Pattern Recognition in Cyber-Physical-Social Systems. Multimed. Tools Appl. 2024, 83, 22909–22973.
5.
Rather, I.H.; Kumar, S.; Gandomi, A.H. Breaking the Data Barrier: Democratizing AI with Small Datasets. Artif. Intell. Rev. 2024, 57, 226.
6.
Susha, I.; Zuiderwijk, A.; Janssen, M.; et al. Benchmarks for Evaluating Open Data Adoption. Soc. Sci. Comput. Rev. 2015, 33, 613–630.
7.
Conrado, D.J.; Karlsson, M.O.; Romero, K.; et al. Open Innovation: Sharing Data, Models, and Workflows. Eur. J. Pharm. Sci. 2017, 109, S65–S71.
8.
Li, Y.; Shao, S.; He, Y.; et al. Rethinking Data Protection in the Generative AI Era. arXiv 2025, arXiv:2507.03034.
9.
Xue, M.; Zhang, Y.; Wang, J.; et al. Intellectual Property Protection for Deep Learning Models. IEEE Trans. Artif. Intell. 2021, 3, 908–923.
10.
Han, Z.; Gao, C.; Liu, J.; et al. Parameter-Efficient Fine-Tuning for Large Models. arXiv 2024, arXiv:2403.14608.
11.
Ding, N.; Qin, Y.; Yang, G.; et al. Parameter-Efficient Fine-Tuning of Large-Scale Language Models. Nat. Mach. Intell. 2023, 5, 220–235.
12.
Xu, L.; Xie, H.; Qin, S.Z.J.; et al. Parameter-Efficient Fine-Tuning Methods for Pretrained Language Models. arXiv 2023, arXiv:2312.12148.
13.
Kum, H.-C.; Ahalt, S. Privacy-by-Design: Understanding Data Access Models for Secondary Data. AMIA Jt. Summits Transl. Sci. Proc. 2013, 2013, 126–130.
14.
Upreti, R.; Lind, P.G.; Elmokashfi, A.; et al. Trustworthy Machine Learning for Security and Privacy. Int. J. Inf. Secur. 2024, 23, 2287–2314.
15.
Almotairi, S.; Addula, S.R.; Alharbi, O.; et al. Personal Data Protection Model in IOMT-Blockchain on Secured Bit-Count Transmutation Data Encryption Approach. Fusion: Pract. Appl. 2024, 16, 152–170.
16.
Mahajan, A.; Powell, D. Digital Watermarking for Authenticity and Provenance. npj Digit. Med. 2025, 8, 31.
17.
Ye, P.; Ren, H.; Li, Z.; et al. Securing Large Language Models: A Survey of Watermarking and Fingerprinting Techniques. ACM Comput. Surv. 2026, 58, 1–35.
18.
Wang, Z.; Ma, J.; Wang, X.; et al. Threats to Training: Poisoning Attacks and Defenses. ACM Comput. Surv. 2022, 55, 1–36.
19.
Mousavi, S.K.; Ghaffari, A.; Besharat, S.; et al. Security of IoT Based on Cryptographic Algorithms. Wirel. Netw. 2021, 27, 1515–1555.
20.
Zhao, P.; Zhu, W.; Jiao, P.; et al. Data Poisoning in Deep Learning: A Survey. arXiv 2025, arXiv:2503.22759.
21.
Goldblum, M.; Tsipras, D.; Xie, C.; et al. Dataset Security for Machine Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1563–1580.
22.
Shan, S.; Ding, W.; Passananti, J.; et al. Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2024; pp. 807–825.
23.
Shan, S.; Cryan, J.;Wenger, E.; et al. Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models. In Proceedings of the 32nd USENIX Conference on Security Symposium, Anaheim, CA, USA, 9–11 August 2023; pp. 2187–2204.
24.
Honig, R.; Rando, J.; Carlini, N.; et al. Adversarial Perturbations Cannot Reliably Protect Artists. arXiv 2024, arXiv:2406.12027.
25.
Li, Z.; Xie, L.; Zhou, J.; et al. Anti-Diffusion: Preventing Abuse of Diffusion Models. arXiv 2025, arXiv:2503.05595.
26.
Li, X.; Chen, Y.; Wang, C.; et al. When Deep Learning Meets Differential Privacy. IEEE Netw. 2022, 35, 148–155.
27.
Chen, H.-L.; Chen, J.-Y.; Tsou, Y.-T.; et al. Evaluating the Risk of Data Disclosure Using Noise Estimation for Differential Privacy. In Proceedings of the 2017 IEEE 22nd Pacific Rim International Symposium on Dependable Computing (PRDC), Christchurch, New Zealand, 22–25 January 2017; pp. 339–347.
28.
Miraj, M.R.; Huang, H.; Yang, T.; et al. GK-SMOTE: A Hyperparameter-Free Noise-Resilient Gaussian KDE-Based Oversampling Approach. arXiv 2025, arXiv:2509.11163.
29.
Zeng, Y.; Pan, M.; Just, H.A.; et al. Narcissus: A Practical Clean-Label Backdoor Attack with Limited Information. In Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, Copenhagen, Denmark, 26–30 November 2023; pp. 771–785.
30.
Zhang, C.; Sun, S.; Tu, J.; et al. Clean-Label Backdoor Attack via Sample-Customized Feature Alignment. Expert Syst. Appl. 2026, 297, 129481.
31.
Zhao, S.; Tuan, L.A.; Fu, J.; et al. Exploring Clean-Label Backdoor Attacks and Defense in Language Models. IEEE/ACM Trans. Audio Speech Lang. Process. 2024, 32, 3014–3024.
32.
Zhao, J.; Huang, H.; Miraj, M.R. Covert Backdoor Attacks to Pre-Trained Encoders. In Proceedings of the 2025 International Joint Conference on Neural Networks (IJCNN), Rome, Italy, 30 June–5 July 2025; pp. 1–8.
33.
Sommestad, T.; Holm, H.; Ekstedt, M. Estimates of Success Rates of Denial-of-Service Attacks. In Proceedings of the 2011 IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications, Changsha, China, 16–18 November 2011; pp. 21–28.
34.
Meng, Z.; Chen, C.; Zhu, Y.; et al. Diagnostic Performance of the Automated Breast Volume Scanner: A Systematic Review and Meta-Analysis. Eur. Radiol. 2015, 25, 3638–3647.
35.
Shan, S.; Wenger, E.; Zhang, J.; et al. Fawkes: Protecting Privacy Against Unauthorized Deep Learning Models. In Proceedings of the 29th USENIX Conference on Security Symposium, Boston, MA, USA, 12–14 August 2020; pp. 1589–1604.

Scilight Press

Author Information

Abstract

Keywords

References

About Scilight

Journals

Publishing Policies

Contact Us