(づ•ᴥ•)づ┬─┬

Notes

Trying out the GEPA prompt optimizer with chess puzzles

dspy now connects with an implementation of the GEPA optimizer and it's quite easy to use.

To try it out, I loaded twenty chess puzzles and wrote a rudimentary signature:

class ChessSolver(dspy.Signature):
    """From the set of allowed moves, pick the next best move for this chess puzzle."""
    puzzle: str = dspy.InputField(description="The current state of the board, represented as a FEN string.")
    allowed_moves: list[str] = dspy.InputField(description="The set of allowed moves for the current player.")
    solution: str = dspy.OutputField(description="The next move to make, represented as a UCI string.")

If you are not familiar with dspy, the doctext in the signature is part of the LLM prompt.

The task was for the model to pick the next best move for the puzzle. I used gpt-4.1-mini as the student and gpt-5 as the reflection-lm.

We have discussed this before in this blog, this is a hard task for LLMs when using a FEN representation. Typically they hallucinate pieces away from their true positions, etc., so I was interested to see what kind of prompt GEPA would come up with.

GEPA requires a metric with textual feedback. Using stockfish, I wrote a rudimentary one:

def metric_with_feedback(example, prediction, trace=None, pred_name=None, pred_trace=None):
    llm_answer = prediction.solution
    correct_move = example['solution']
    board = chess.Board(example['puzzle'])

    llm_answer_score = shallow_score_of_moves([llm_answer], board)[0]
    board.push(chess.Move.from_uci(llm_answer))
    best_counter_move = get_best_legal_counter_move_by_opponent(board)
    feedback_text = ""

    if correct_move != prediction.solution:
        feedback_text = f"The move is not the best move (best move is {correct_move}). The best counter move for {llm_answer} from the opponent is {best_counter_move}. Also stockfish score (negative is bad, positive is good) for {llm_answer} is {llm_answer_score}."  
    else: 
        feedback_text = "Perfect! The move is a good move."

    return dspy.Prediction(score=int(correct_move == prediction.solution), feedback=feedback_text)

At the end, the optimized prompt didn't really change the score (partially because the feedback from the metric is not that great and partially because it may be hard to fix the position-reading issues just by prompting?!).

It is interesting to see how the prompt is fighting against the FEN reading comprehension issues 4.1-mini has:

Task
Choose the single best move from a provided restricted set in a chess puzzle and justify it with precise, verified calculation from the given FEN.

Inputs
- puzzle: a FEN string describing the full position.
- allowed_moves: a set of legal candidate moves in UCI notation (e.g., g8h8, f8f7, c4f1, e7e8q for a promotion).

Output (exact format)
- reasoning: a concise, factual justification of why your chosen move is best among the allowed options. State the key idea(s) and, when relevant, the opponent’s best reply and one short principal variation (PV).
- solution: the chosen move in UCI notation, exactly matching one of the allowed_moves.

Board parsing and verification (mandatory)
1) Parse the FEN fully before calculating:
   - Field 1 (piece placement): ranks 8→1, files a→h; digits are consecutive empty squares; pieces: KQBNRP (white), kqbnrp (black).
   - Field 2 (side to move): w or b (determine whose turn it is).
   - Field 3 (castling rights): any of KQkq or "-" if none (rights must exist and be legally usable for castling moves).
   - Field 4 (en passant target square): a square like e6 or "-" if none (only relevant if an EP capture is possible now).
   - Fields 5–6: halfmove clock and fullmove number (generally irrelevant to tactics).
2) Reconstruct the board map:
   - Place all pieces correctly and locate both kings.
   - Confirm side to move.
   - Note castling and EP rights if any candidate could involve them or be affected by them.
3) UCI move rules you must respect:
   - UCI is from-square + to-square, plus optional promotion piece letter (q/r/b/n) for pawn promotions (e.g., e7e8q).
   - Castling is the king move in UCI: e1g1, e1c1, e8g8, e8c8; ensure:
     - Castling right exists from FEN.
     - All traversed squares are empty.
     - The king does not start, pass through, or end in check.
   - En passant is only legal if FEN EP target matches the capture square and the capturing pawn’s move matches EP rules; encode as from-to (no suffix).
   - If a promotion is among allowed moves, ensure the pawn is on the 7th (white) or 2nd (black) rank and the destination is the last rank.

Hard rules
1) Never output a move that is not in allowed_moves. The solution string must exactly equal one of the provided UCI moves (including correct promotion letter if applicable).
2) Verify every tactical and legal claim against the reconstructed board:
   - A “capture” only if the destination square currently contains an opponent piece (or is the EP target when EP is legal). Identify the captured piece and square correctly.
   - A “check” only if, after the move, the opponent king is actually attacked. State the attacking line/pattern (e.g., discovered check, rook along file with no blockers, knight L-jump).
   - A “mate” only if no legal defense exists. Explicitly consider:
     - All king moves (confirm destination squares are not occupied by own pieces and not attacked).
     - All possible interpositions on the checking line by any legal piece.
     - All legal captures of the checking piece (including en passant when relevant) and recaptures.
   - Confirm piece movement legality and clear paths for sliders (queen/rook/bishop).
   - Confirm square identities and sides (avoid misidentifying pieces or colors).
3) Evaluate the allowed moves by concrete calculation first:
   - Prioritize forcing moves: checks > captures > strong threats > quiet moves.
   - For each candidate, test the opponent’s best defense at least one move deep; if the line remains forcing or tactically critical, extend another move or two.
   - Always check for simple, strong defensive resources (king flights, interpositions, counterchecks, captures).
   - Compare outcomes by:
     - Immediate material result (clean wins/losses, forcing forks, pins, skewers).
     - King safety and attack continuity (forced mates, perpetuals avoided/forced).
     - Unstoppable threats (promotion, decisive invasion).
     - Conversion prospects (passed pawns, piece activity, coordination).
   - Do not prefer a “safe-looking” quiet move over a forcing win; validate by calculation. Conversely, avoid tempting but unsound tactics if the opponent has a clear refutation.
4) Special cases and tie-breakers:
   - If only one allowed move is given, select it (you may note it’s the only allowed move).
   - If two candidates are strong, choose the one that is more forcing or yields a more durable, unavoidable advantage (e.g., forced mate or clear material gain).
   - Do not dismiss a forcing line on vague “king safety” grounds; either refute it concretely or accept it if it stands.
5) Style and constraints:
   - Be concise and factual. Do not include engine scores or vague claims.
   - Include a brief PV for tactical/forcing positions: your move, opponent’s best reply, and one or two key follow-up moves.
   - Use precise square references and correct piece identities based on the FEN.
   - Output format must be exactly:
     reasoning: <short, factual justification with key line(s)>
     solution: <best move in UCI notation>

Domain reminders and common pitfalls (avoid them)
- Confirm the side to move from the FEN; never assume.
- Verify sliding paths are clear before asserting a check/capture.
- Before claiming a capture or check (e.g., Qg7+, Nxg5, Rxd8), confirm the piece can legally reach the destination and the line is not blocked.
- When asserting mate, systematically eliminate: king moves, interpositions, and captures (including unusual ones like EP when legal).
- Promotions must include the correct UCI promotion letter (q/r/b/n); ensure the pawn starts on the correct rank and reaches the last rank.
- En passant is only legal when the FEN’s EP square enables it in that exact ply.
- Among allowed moves, prefer decisive forcing tactics (e.g., immediate checkmate or winning material) over “safer-looking” non-forcing moves; verify by calculating the opponent’s best reply, not a cooperative one.

Decision/verification checklist before finalizing
- Parsed FEN fields correctly; board and side to move verified.
- Legalities checked: move pattern, occupancy, pins (king not left in check), castling/EP specifics where relevant.
- Forcing lines examined for each allowed move; opponent’s strongest defense considered.
- If claiming check/mate/material win, verified with explicit lines and defenses refuted.
- Chosen move is in allowed_moves and encoded correctly in UCI (with promotion when applicable).
- reasoning is concise, factual, includes one short PV and the key idea(s).

Getting this prompt took 40 minutes with five examples used as training and five as validation.