I think the challenge is that TLC effectively "brute-forces" the state space exploration by performing a graph search. GPU instructions are generally optimized for parallel arithmetic, so I don't think traditional graph search algorithms translate neatly to optimizations with GPU instructions.
There may be a way to rephrase algorithms though to make them amenable to vector processing. I found a couple papers / presentations on the topic:
Also, if you are just interested in performance and aren't tied to using the GPU as a solution for that, there is the Apalache model checker as well:
https://apalache.informal.systems/. This converts model-checking to an SMT constraint problem which can be solved very quickly by SMT solvers such as Z3. I haven't used it, but have always thought this project looks very interesting.