Creativity research relies heavily on human ratings, often a laborious and costly task due to the volume of responses per rater. Planned missing data designs offer a potential solution by having raters assess only a subset of responses, yet ensuring ratings meet psychometric standards like reliability remains a challenge. Our work introduces how judge response theory and simulations can optimize these designs, providing open-source code and demonstrating through a practical example that this fine-tuning balances time and cost savings with expected rating reliability.