Function Approximators For Solving Reinforcement Learning Problems

The Function Approximator Choice is the Most Important Component When We Decide How To Solve a Reinforcement Learning Problem.

It determines how you represent:

  • The State Value Function v(s)
  • The Action-Value Function v(s, a)
  • The Policy-Value Function pi(a, s)
  • The Transition Model p(s'|s,a)

These are key function approximator we can use to solve our Reinforcement Learning problem.

Tabular Representations

Appropriate usecases: Small, discrete state and action spaces (gridworld, bandits)

Pros:

  • Exact
  • Simple
  • Converges with theoretical guarantees

Cons:

  • Impossible for large/continuous environments
  • No generalization

Linear Function Approximators

Used heavily in classic RL (Sutton & Barto, tile coding, LFA algorithms).

Feature types:

  • Tile coding (very popular)
  • Radial basis functions
  • Fourier basis
  • Polynomial features

Great for control tasks like Mountain Car, Acrobot, Lunar Lander with SARSA, TD, Actor–Critic.

Neural Networks (Deep Function Approximation)

When state spaces are large, high-dimensional, or continuous, neural networks are the standard choice.

Types: a) Multilayer Perceptrons (MLPs) Used in:

  • DQN (small obs)
  • A2C
  • PPO
  • DDPG
  • TD3
  • SAC

b) Convolutional Neural Networks (CNNs) Used in:

  • Atari
  • Robotic vision
  • Autonomous driving simulation

c) Recurrent Neural Networks (RNNs: LSTM / GRU) Used when the environment is partially observable (POMDPs).

d) Transformers

Used in:

  • Decision Transformers
  • Gato
  • World-model RL (like "Ghost" / large agent models)

Model-Based Approximators

Function approximators used to learn transition or reward models:

a) Neural networks Learn f(s,a)→ s′ and r(s,a)

b) Probabilistic models

  • Gaussian mixtures
  • Ensembles (MBPO, PETS)
  • Bayesian neural networks

Used in:

  • Dreamer / World Models
  • Model-based control (MPC + learned model)

Hybrid Approximators

Examples:

  • Tile coding for input → neural network for output
  • Fourier basis → critic
  • CNN encoder → MLP policy

Useful when:

  • Raw state is large/unstructured (images)
  • But structure inside the state is simple (angles/positions)
Want to Receive Updates On Fastest AI Models, Successful AI Startups and New Hiring Candidates. Subscribe To My Newsletters
Subscribe