Reinforcement Learning (RL) has emerged as a promising tool for decision-making in various applications, particularly in uncertain environments. While its adoption in embedded systems—especially hard real-time systems—faces challenges due to stringent timing constraints, integrating shielding mechanisms may offer a pathway for RL to optimize its scheduling decisions, preserving worst-case timing guarantees. This position paper shows a use case where RL selects compliant execution versions for fault-tolerant real-time systems while minimizing the system utilization in runtime. Furthermore, we discuss possible directions for further exploring RL’s role in real-time systems for improved adaptability.