In the real-time systems community, the performance of newly designed scheduling algorithms or schedulability tests is typically examined either by theoretical methods, like dominance relations or speedup factors, or using empirical evaluations. Such empirical methods often evaluate the acceptance ratio of the new algorithm/test compare to other algorithms/tests based on synthesized task sets. However, it is often difficult to perform such a comparison since not all implementations are publicly available, use a different programming language, a different task generator, etc. Hence, for self-suspending tasks, we provide an easy to use evaluation framework with a preimplemented task generator, some pre-implemented schedulability tests, and an integrated plotting tool. The framework is written in Python and can be extended by including additional schedulability tests.