publications | Mohamad H Danesh

2024

AAAI Oral
Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning

Guy Azran, Mohamad H Danesh, Stefano V Albrecht, and 1 more author

In 38th Annual AAAI Conference on Artificial Intelligence, 2024

Abs Bib PDF

Recent studies show that deep reinforcement learning (DRL) agents tend to overfit to the task on which they were trained and fail to adapt to minor environment changes. To expedite learning when transferring to unseen tasks, we propose a novel approach to representing the current task using reward machines (RMs), state machine abstractions that induce subtasks based on the current task’s rewards and dynamics. Our method provides agents with symbolic representations of optimal transitions from their current abstract state and rewards them for achieving these transitions. These representations are shared across tasks, allowing agents to exploit knowledge of previously encountered symbols and transitions, thus enhancing transfer. Empirical results show that our representations improve sample efficiency and few-shot transfer in a variety of domains.
@inproceedings{azran2024contextual, title = {Contextual Pre-Planning on Reward Machine Abstractions for Enhanced Transfer in Deep Reinforcement Learning}, author = {Azran, Guy and Danesh, Mohamad H and Albrecht, Stefano V and Keren, Sarah}, booktitle = {38th Annual AAAI Conference on Artificial Intelligence}, year = {2024}, organization = {AAAI}, oral = {true} }
Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Shengyi Huang, Quentin Gallouedec, Florian Felten, and 8 more authors

arXiv preprint arXiv:2402.03046, 2024

Abs arXiv Bib PDF

In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, including not only the usual data such as episodic return, but also all algorithm-specific and system metrics. Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data. At the time of writing, more than 25,000 runs have been tracked, for a cumulative duration of more than 8 years. Open RL Benchmark covers a wide range of RL libraries and reference implementations. Special care is taken to ensure that each experiment is precisely reproducible by providing not only the full parameters, but also the versions of the dependencies used to generate it. In addition, Open RL Benchmark comes with a command-line interface (CLI) for easy fetching and generating figures to present the results. In this document, we include two case studies to demonstrate the usefulness of Open RL Benchmark in practice. To the best of our knowledge, Open RL Benchmark is the first RL benchmark of its kind, and the authors hope that it will improve and facilitate the work of researchers in the field.
@article{huang2024open, title = {Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning}, author = {Huang, Shengyi and Gallouedec, Quentin and Felten, Florian and Raffin, Antonin and Dossa, Rousslan Fernand Julien and Zhao, Yanxiao and Sullivan, Ryan and Makoviychuk, Viktor and Makoviichuk, Denys and Danesh, Mohamad H and others}, journal = {arXiv preprint arXiv:2402.03046}, year = {2024}, }

2022

CoRL Oral
LEADER: Learning Attention Over Driving Behaviors For Planning Under Uncertainty

Mohamad H Danesh, Panpan Cai, and David Hsu

In 6th Annual Conference on Robot Learning, 2022

Abs arXiv Bib PDF Code

Uncertainty on human behaviors poses a significant challenge to autonomous driving in crowded urban environments. The partially observable Markov decision processes (POMDPs) offer a principled framework for planning under uncertainty, often leveraging Monte Carlo sampling to achieve online performance for complex tasks. However, sampling also raises safety concerns by potentially missing critical events. To address this, we propose a new algorithm, LEarning Attention over Driving bEhavioRs (LEADER), that learns to attend to critical human behaviors during planning. LEADER learns a neural network generator to provide attention over human behaviors in real-time situations. It integrates the attention into a belief-space planner, using importance sampling to bias reasoning towards critical events. To train the algorithm, we let the attention generator and the planner form a min-max game. By solving the min-max game, LEADER learns to perform risk-aware planning without human labeling.
@inproceedings{danesh2022leader, title = {LEADER: Learning Attention Over Driving Behaviors For Planning Under Uncertainty}, author = {Danesh, Mohamad H and Cai, Panpan and Hsu, David}, booktitle = {6th Annual Conference on Robot Learning}, year = {2022}, organization = {PMLR}, webpage = {https://sites.google.com/view/leader-paper/home}, talk = {https://youtu.be/56LzTZfwY2Q?t=2859}, oral = {true}, awards = {best paper awards finalist}, }
NeurIPS nCSI
Enhancing Transfer of Reinforcement Learning Agents with Abstract Contextual Embeddings

Guy Azran, Mohamad H Danesh, Stefano V Albrecht, and 1 more author

In Neural Information Processing Systems, nCSI Workshop, 2022

Abs Bib PDF

Deep reinforcement learning (DRL) algorithms have seen great success in perform- ing a plethora of tasks, but often have trouble adapting to changes in the environ- ment. We address this issue by using reward machines (RM), a graph-based ab- straction of the underlying task to represent the current setting or context. Using a graph neural network (GNN), we embed the RMs into deep latent vector represen- tations and provide it to the agent to enhance its ability to adapt to new contexts. To the best of our knowledge, this is the first work to embed contextual abstractions and let the agent decide how to use them. Our preliminary empirical evaluation demonstrates improved sample efficiency of our approach upon context transfer on a set of grid navigation tasks.
@inproceedings{azran2022enhancing, title = {Enhancing Transfer of Reinforcement Learning Agents with Abstract Contextual Embeddings}, author = {Azran, Guy and Danesh, Mohamad H and Albrecht, Stefano V and Keren, Sarah}, booktitle = {Neural Information Processing Systems, nCSI Workshop}, year = {2022}, organization = {PMLR}, }

2021

ICML
Re-understanding Finite-State Representations of Recurrent Policy Networks

Mohamad H Danesh, Anurag Koul, Alan Fern, and 1 more author

In International Conference on Machine Learning, 2021

Abs arXiv Bib PDF Code Slides

We introduce an approach for understanding control policies represented as recurrent neural networks. Recent work has approached this problem by transforming such recurrent policy networks into finite-state machines (FSM) and then analyzing the equivalent minimized FSM. While this led to interesting insights, the minimization process can obscure a deeper understanding of a machine’s operation by merging states that are semantically distinct. To address this issue, we introduce an analysis approach that starts with an unminimized FSM and applies more-interpretable reductions that preserve the key decision points of the policy. We also contribute an attention tool to attain a deeper understanding of the role of observations in the decisions. Our case studies on 7 Atari games and 3 control benchmarks demonstrate that the approach can reveal insights that have not been previously noticed.
@inproceedings{danesh2021re, title = {Re-understanding Finite-State Representations of Recurrent Policy Networks}, author = {Danesh, Mohamad H and Koul, Anurag and Fern, Alan and Khorram, Saeed}, booktitle = {International Conference on Machine Learning}, pages = {2388--2397}, year = {2021}, organization = {PMLR}, }
ICML UDL Oral
Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results

Mohamad H Danesh, and Alan Fern

In International Conference on Machine Learning, UDL Workshop, 2021

Abs arXiv Bib PDF Slides

We study the problem of out-of-distribution dynamics (OODD) detection, which involves detecting when the dynamics of a temporal process change compared to the training-distribution dynamics. This is relevant to applications in control, reinforcement learning (RL), and multi-variate time-series, where changes to test time dynamics can impact the performance of learning controllers/predictors in unknown ways. This problem is particularly important in the context of deep RL, where learned controllers often overfit to the training environment. Currently, however, there is a lack of established OODD benchmarks for the types of environments commonly used in RL research. Our first contribution is to design a set of OODD benchmarks derived from common RL environments with varying types and intensities of OODD. Our second contribution is to design a strong OODD baseline approach based on recurrent implicit quantile networks (RIQNs), which monitors autoregressive prediction errors for OODD detection. Our final contribution is to evaluate the RIQN approach on the benchmarks to provide baseline results for future comparison.
@inproceedings{danesh2021out, title = {Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results}, author = {Danesh, Mohamad H and Fern, Alan}, booktitle = {International Conference on Machine Learning, UDL Workshop}, year = {2021}, organization = {PMLR}, baseline = {https://github.com/modanesh/recurrent_implicit_quantile_networks}, benchmark = {https://github.com/modanesh/anomalous_rl_envs}, oral = {true} }
Stochastic Block-ADMM for Training Deep Networks

Saeed Khorram, Xiao Fu, Mohamad H Danesh, and 2 more authors

arXiv preprint arXiv:2105.00339, 2021

Abs arXiv Bib PDF

In this paper, we propose Stochastic Block-ADMM as an approach to train deep neural networks in batch and online settings. Our method works by splitting neural networks into an arbitrary number of blocks and utilizes auxiliary variables to connect these blocks while optimizing with stochastic gradient descent. This allows training deep networks with non-differentiable constraints where conventional backpropagation is not applicable. An application of this is supervised feature disentangling, where our proposed DeepFacto inserts a non-negative matrix factorization (NMF) layer into the network. Since backpropagation only needs to be performed within each block, our approach alleviates vanishing gradients and provides potentials for parallelization. We prove the convergence of our proposed method and justify its capabilities through experiments in supervised and weakly-supervised settings.
@article{khorram2021stochastic, title = {Stochastic Block-ADMM for Training Deep Networks}, author = {Khorram, Saeed and Fu, Xiao and Danesh, Mohamad H and Qi, Zhongang and Fuxin, Li}, journal = {arXiv preprint arXiv:2105.00339}, year = {2021}, }
AAAI
Reducing Neural Network Parameter Initialization Into an SMT Problem (Student Abstract)

Mohamad H Danesh

In Proceedings of the AAAI Conference on Artificial Intelligence, 2021

Abs arXiv Bib PDF

Training a neural network (NN) depends on multiple factors, including but not limited to the initial weights. In this paper, we focus on initializing deep NN parameters such that it performs better, comparing to random or zero initialization. We do this by reducing the process of initialization into an SMT solver. Previous works consider certain activation functions on small NNs, however the studied NN is a deep network with different activation functions. Our experiments show that the proposed approach for parameter initialization achieves better performance comparing to randomly initialized networks.
@inproceedings{danesh2021reducing, title = {Reducing Neural Network Parameter Initialization Into an SMT Problem (Student Abstract)}, author = {Danesh, Mohamad H}, booktitle = {Proceedings of the AAAI Conference on Artificial Intelligence}, volume = {35}, number = {18}, pages = {15775--15776}, year = {2021}, }

2018

CSICC

Tracking Moving Objects in Multi-Camera Environments using Appearance Features

Mohamad H Danesh, Mahsa Hasheminejad, and Ahmad Nickabadi

In 2018 International Computer Conference (CSICC), 2018

Bib

@inproceedings{danesh2018tracking,
  title = {Tracking Moving Objects in Multi-Camera Environments using Appearance Features},
  author = {Danesh, Mohamad H and Hasheminejad, Mahsa and Nickabadi, Ahmad},
  booktitle = {2018 International Computer Conference (CSICC)},
  year = {2018},
}