science-gym quickstart¶
This tutorial walks you through a single end‑to‑end run that
trains an SAC agent on the Basketball environment (reward context 0, full supervision),
records successful experiments (that is, shots) in a CSV,
discovers a closed‑form equation for the ball’s trajectory via PySR. The ball’s trajectory is an instance of the moral general notion of projectile motion.
The complete script is <download_ `run_single_experiment.py`_> and takes about 10 minutes on a modern laptop.
Prerequisites¶
Requirement |
Install command |
---|---|
Python 3.9 – 3.12 |
– |
Science‑Gym (core) + RL + SymPy + PySR |
|
Stable‑Baselines 3 (SB3) |
included in the ``rl`` extra |
Gymnasium |
pulled automatically by Science‑Gym |
Note
Heavy libs such as PyTorch will be installed; use a virtual environment to keep your system site‑packages clean.
The script¶
Save the following as ``run_single_experiment.py``:
1import csv
2from pathlib import Path
3
4import numpy as np
5import pandas as pd
6
7from stable_baselines3.common.vec_env import DummyVecEnv
8from sciencegym.agents.StableBaselinesAgents.SACAgent import SACAgent
9from sciencegym.simulations.Simulation_Basketball import Sim_Basketball
10from sciencegym.problems.Problem_Basketball import Problem_Basketball
11from sciencegym.equation import Equation
12from pysr import PySRRegressor
13
14# ------------------------------------------------------------------
15TIMESTEPS = 50_000 # 2e5 for research‑grade results
16SUCCESS_THRESH = 80 # Basketball reward ≥ 80 marks a “good” shot
17RESULTS_DIR = Path("quickstart_results")
18RESULTS_DIR.mkdir(exist_ok=True)
19CSV_PATH = RESULTS_DIR / "successful_states.csv"
20# ------------------------------------------------------------------
21
22# 1) Environment + Problem wrapper
23sim = Sim_Basketball(context=0, rendering=False)
24problem = Problem_Basketball(sim)
25vec_env = DummyVecEnv([lambda: problem])
26
27# 2) SAC agent
28act_dim = int(sim.action_space.shape[0])
29obs_dim = sim.observation_space.shape
30agent = SACAgent(obs_dim, act_dim, policy="MlpPolicy")
31model = agent.create_model(vec_env, verbose=0)
32model.learn(TIMESTEPS) # training
33
34# 3) Evaluate & save successful episodes
35successes = []
36for _ in range(400): # evaluation roll‑outs
37 obs, _ = vec_env.reset()
38 done, R = False, 0.0
39 while not done:
40 action, _ = model.predict(obs, deterministic=True)
41 obs, reward, done, info = vec_env.step(action)
42 R += reward
43 if R >= SUCCESS_THRESH:
44 successes.append(info[0]["terminal_observation"].flatten())
45
46if not successes:
47 raise RuntimeError("No successful shots recorded — adjust threshold.")
48
49with open(CSV_PATH, "w", newline="") as f:
50 writer = csv.writer(f)
51 writer.writerow(problem.variables)
52 writer.writerows(successes)
53
54# 4) Symbolic regression (PySR)
55# Note that we pre-compute some useful variables for the final equation.
56df = pd.read_csv(CSV_PATH)
57df["velocity_sin_angle"] = df["velocity"] * np.sin(df["angle"])
58df["g"] = 9.80665
59X = df[["velocity_sin_angle", "time", "g"]].values
60y = df["ball_y"].values
61
62model_sr = PySRRegressor(
63 niterations=30,
64 binary_operators=["*", "-", "+"],
65 unary_operators=[],
66 model_selection="best",
67).fit(X, y, variable_names=["v*sin(θ)", "t", "g"])
68
69print("\nDiscovered expressions:")
70print(model_sr)
71
72# 5) Compare to ground‑truth
73best = model_sr.get_best().sympy_format
74gt_eq = problem.solution() # returns sciencegym.equation.Equation
75mse = lambda yhat: np.mean((y - yhat) ** 2)
76
77y_pred = Equation(str(best)).evaluate(df)
78print(f"\nMSE(best) = {mse(y_pred):.4e}")
79print(f"MSE(GT) = {mse(gt_eq.evaluate(df)):.4e}")
80print(f"Ground‑truth: {gt_eq}")
Running the example¶
python run_single_experiment.py
Console output (abridged):
Discovered expressions:
1.6 * (v*sin(θ)) * t - 4.9 * t^2
...
MSE(best) = 8.3e-04
MSE(GT) = 2.1e-16
Ground‑truth: (v*sin(θ))*t - 4.905*t**2
You should be able to recover the equation for projectile motion, up to a constant.
Where next?¶
Replace
TIMESTEPS
with200_000
to gather more data.Switch
Sim_Basketball
→SIRVOneTimeVaccination
orSim_Lagrange
and update the preprocessing as in :pyfile:`threshold_and_save.py <threshold_and_save.py>` to reproduce the full paper pipeline.Use the multi‑context driver script (
threshold_and_save.py
) to run the entire benchmark automatically.
Happy experimenting!