'Living Room', 'Kitchen', 'Office', 'Hallway', 'Dining Room']) test_eq(rooms, [
vacuum
Support code for defining the discrete states and discrete actions of a vacuum cleaning robot.
State
Actions
'L', 'R', 'U', 'D']) test_eq(action_space, [
= Variables()
VARIABLES
= VARIABLES.discrete_series("X", [1, 2, 3], rooms) # states for times 1,2 and 3
X = VARIABLES.discrete_series("A", [1, 2], action_space) # actions for times 1 and 2
A = gtsam.DiscreteConditional(X[2], [X[1], A[1]], action_spec)
motion_model pretty(motion_model)
P(X2|X1,A1):
X1 | A1 | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 0 | 0 | 0 |
0 | 1 | 0.2 | 0.8 | 0 | 0 | 0 |
0 | 2 | 1 | 0 | 0 | 0 | 0 |
0 | 3 | 0.2 | 0 | 0 | 0.8 | 0 |
1 | 0 | 0.8 | 0.2 | 0 | 0 | 0 |
1 | 1 | 0 | 1 | 0 | 0 | 0 |
1 | 2 | 0 | 1 | 0 | 0 | 0 |
1 | 3 | 0 | 0.2 | 0 | 0 | 0.8 |
2 | 0 | 0 | 0 | 1 | 0 | 0 |
2 | 1 | 0 | 0 | 0.2 | 0.8 | 0 |
2 | 2 | 0 | 0 | 1 | 0 | 0 |
2 | 3 | 0 | 0 | 1 | 0 | 0 |
3 | 0 | 0 | 0 | 0.8 | 0.2 | 0 |
3 | 1 | 0 | 0 | 0 | 0.2 | 0.8 |
3 | 2 | 0.8 | 0 | 0 | 0.2 | 0 |
3 | 3 | 0 | 0 | 0 | 1 | 0 |
4 | 0 | 0 | 0 | 0 | 0.8 | 0.2 |
4 | 1 | 0 | 0 | 0 | 0 | 1 |
4 | 2 | 0 | 0.8 | 0 | 0 | 0.2 |
4 | 3 | 0 | 0 | 0 | 0 | 1 |
Sensing
'1/1/8 1/1/8 2/7/1 8/1/1 1/8/1') test_eq(sensor_spec,
RL
calculate_value_function
calculate_value_function (R:<built-infunctionarray>, T:<built- infunctionarray>, pi:<built-infunctionarray>, gamma=0.9)
Calculate value function for given policy
Type | Default | Details | |
---|---|---|---|
R | array | reward function as a tensor | |
T | array | transition probabilities as a tensor | |
pi | array | policy, as a vector | |
gamma | float | 0.9 | discount factor |
calculate_value_system
calculate_value_system (R:<built-infunctionarray>, T:<built- infunctionarray>, pi:<built-infunctionarray>, gamma=0.9)
Calculate A, b matrix of linear system for value computation.
Type | Default | Details | |
---|---|---|---|
R | array | reward function as a tensor | |
T | array | transition probabilities as a tensor | |
pi | array | policy, as a vector | |
gamma | float | 0.9 | discount factor |
# From section 3.5:
= gtsam.DiscreteConditional((2,5), [(0,5), (1,4)], action_spec)
conditional = np.empty((5, 4, 5), float)
R = np.empty((5, 4, 5), float)
T for assignment, value in conditional.enumerate():
= assignment[0], assignment[1], assignment[2]
x, a, y = 10.0 if y == rooms.index("Living Room") else 0.0
R[x, a, y] = value T[x, a, y]
2, 1], [10, 0, 0, 0, 0]) test_eq(R[
Calculating the value function of a given policy pi
:
= [2, 1, 0, 2, 1]
reasonable_policy = calculate_value_system(R, T, reasonable_policy)
AA, b
test_close(
AA,
np.array(
[0.1, 0, 0, 0, 0],
[0, 0.1, 0, 0, 0],
[0, 0, 0.1, 0, 0],
[-0.72, 0, 0, 0.82, 0],
[0, 0, 0, 0, 0.1],
[
]
),
)10, 0, 0, 8, 0])) test_close(b, np.array([
= calculate_value_function(R, T, reasonable_policy)
value_for_pi 100, 0, 0, 97.56097561, 0])) test_close(value_for_pi, np.array([
= [0, 0, 1, 2, 2]
optimal_policy = calculate_value_function(R, T, optimal_policy)
value_for_pi
test_close(
value_for_pi,100, 97.56097561, 85.66329566, 97.56097561, 85.66329566]),
np.array([ )