#OFN Key Concepts of Reward Maximization
Agent and Environment Interaction:
The AI system (agent) interacts with its environment, receiving feedback in the form of rewards for its actions.
Example in
#OpenfabricA I: A trading bot in a financial market takes actions (buy, sell, hold) and receives profits or losses as rewards.
Reward Function:
A function that maps each action taken in a given state to a numerical reward.
Example: In a recommendation engine, a reward could be assigned based on whether a user clicks on a suggested item or makes a purchase.
Cumulative Reward:
The goal is not to maximize immediate rewards but the total expected reward over time.
Formula:
�
�
=
�
�
+
1
+
�
�
�
+
2
+
�
2
�
�
+
3
+
…
G
t
=R
t+1
+γR
t+2
+γ
2
R
t+3
+…
G_t is the total return starting from time step t,
γ (gamma) is the discount factor controlling the importance of future rewards.
Policy (π):
A strategy that defines the actions to take in each state to maximize rewards.
Example: A chatbot's policy determines how to respond to user inputs to keep users engaged and satisfied.