science-gym quickstart

This tutorial walks you through a single end‑to‑end run that

  1. trains an SAC agent on the Basketball environment (reward context 0, full supervision),

  2. records successful experiments (that is, shots) in a CSV,

  3. discovers a closed‑form equation for the ball’s trajectory via PySR. The ball’s trajectory is an instance of the moral general notion of projectile motion.

The complete script is <download_ `run_single_experiment.py`_> and takes about 10 minutes on a modern laptop.

Prerequisites

Requirement

Install command

Python 3.9 – 3.12

Science‑Gym (core) + RL + SymPy + PySR

pip install "science-gym[rl,sym]" pysr

Stable‑Baselines 3 (SB3)

included in the ``rl`` extra

Gymnasium

pulled automatically by Science‑Gym

Note

Heavy libs such as PyTorch will be installed; use a virtual environment to keep your system site‑packages clean.

The script

Save the following as ``run_single_experiment.py``:

 1import csv
 2from pathlib import Path
 3
 4import numpy as np
 5import pandas as pd
 6
 7from stable_baselines3.common.vec_env import DummyVecEnv
 8from sciencegym.agents.StableBaselinesAgents.SACAgent import SACAgent
 9from sciencegym.simulations.Simulation_Basketball import Sim_Basketball
10from sciencegym.problems.Problem_Basketball import Problem_Basketball
11from sciencegym.equation import Equation
12from pysr import PySRRegressor
13
14# ------------------------------------------------------------------
15TIMESTEPS       = 50_000     # 2e5 for research‑grade results
16SUCCESS_THRESH  = 80         # Basketball reward ≥ 80 marks a “good” shot
17RESULTS_DIR     = Path("quickstart_results")
18RESULTS_DIR.mkdir(exist_ok=True)
19CSV_PATH        = RESULTS_DIR / "successful_states.csv"
20# ------------------------------------------------------------------
21
22# 1) Environment + Problem wrapper
23sim      = Sim_Basketball(context=0, rendering=False)
24problem  = Problem_Basketball(sim)
25vec_env  = DummyVecEnv([lambda: problem])
26
27# 2) SAC agent
28act_dim  = int(sim.action_space.shape[0])
29obs_dim  = sim.observation_space.shape
30agent    = SACAgent(obs_dim, act_dim, policy="MlpPolicy")
31model    = agent.create_model(vec_env, verbose=0)
32model.learn(TIMESTEPS)            # training
33
34# 3) Evaluate & save successful episodes
35successes = []
36for _ in range(400):              # evaluation roll‑outs
37    obs, _ = vec_env.reset()
38    done, R = False, 0.0
39    while not done:
40        action, _ = model.predict(obs, deterministic=True)
41        obs, reward, done, info = vec_env.step(action)
42        R += reward
43    if R >= SUCCESS_THRESH:
44        successes.append(info[0]["terminal_observation"].flatten())
45
46if not successes:
47    raise RuntimeError("No successful shots recorded — adjust threshold.")
48
49with open(CSV_PATH, "w", newline="") as f:
50    writer = csv.writer(f)
51    writer.writerow(problem.variables)
52    writer.writerows(successes)
53
54# 4) Symbolic regression (PySR)
55# Note that we pre-compute some useful variables for the final equation.
56df      = pd.read_csv(CSV_PATH)
57df["velocity_sin_angle"] = df["velocity"] * np.sin(df["angle"])
58df["g"] = 9.80665
59X       = df[["velocity_sin_angle", "time", "g"]].values
60y       = df["ball_y"].values
61
62model_sr = PySRRegressor(
63    niterations=30,
64    binary_operators=["*", "-", "+"],
65    unary_operators=[],
66    model_selection="best",
67).fit(X, y, variable_names=["v*sin(θ)", "t", "g"])
68
69print("\nDiscovered expressions:")
70print(model_sr)
71
72# 5) Compare to ground‑truth
73best = model_sr.get_best().sympy_format
74gt_eq = problem.solution()        # returns sciencegym.equation.Equation
75mse = lambda yhat: np.mean((y - yhat) ** 2)
76
77y_pred = Equation(str(best)).evaluate(df)
78print(f"\nMSE(best) = {mse(y_pred):.4e}")
79print(f"MSE(GT)   = {mse(gt_eq.evaluate(df)):.4e}")
80print(f"Ground‑truth: {gt_eq}")

Running the example

python run_single_experiment.py

Console output (abridged):

Discovered expressions:
1.6 * (v*sin(θ)) * t - 4.9 * t^2
...
MSE(best) = 8.3e-04
MSE(GT)   = 2.1e-16
Ground‑truth: (v*sin(θ))*t - 4.905*t**2

You should be able to recover the equation for projectile motion, up to a constant.

Where next?

  • Replace TIMESTEPS with 200_000 to gather more data.

  • Switch Sim_BasketballSIRVOneTimeVaccination or Sim_Lagrange and update the preprocessing as in :pyfile:`threshold_and_save.py <threshold_and_save.py>` to reproduce the full paper pipeline.

  • Use the multi‑context driver script (threshold_and_save.py) to run the entire benchmark automatically.

Happy experimenting!