The application of deep reinforcement learning to combinatorial optimization methods.
Reference:
Link:
- Learning to Perform Local Rewriting for Combinatorial Optimization
- graph networks
- pointer networks
- Neural Combiantorial Optimization with Reinforcement Learning
Combinatorial Optimization Problem
Combinatorial optimization problem(COP) usually described as:
Solving combinatorial optimization problems involves two types of methods: exact and approximate approaches.
For exact approaches, there are the ‘Branch and Bound’ and ‘Dynamic Programming’ algorithms.
The Branch and Bound algorithm depends on efficient estimation of the lower and upper bounds of regions/branches of the search space. If no bounds are available, the algorithm degenerates to an exhaustive search.
Can the Branch and Bound algorithm be used for continuous problems?For approximate approaches, there are the “Approximate” and “Heuristic” algorithm.
Introduction
Hopefield addresses the first thought about using Neural Networks to solve optimization problems.
- “ Neural ” computation of decisions in optimization problems
- Neural Networks for Combinatorial Optimization: A Review of More Than a Decade of Research
In 2015, Vinyals initiated the application of neural networks in optimization by comparing combinatorial optimization to machine translation and introducing the Pointer Network (Ptr-Net). This comparison highlighted their common ground in sequence-to-sequence (Seq2Seq) modeling.
After introducting Ptr-Net, many advanced algorithms have been developed that combine graph networks or transforms. Combining the Pointer Network (Ptr-Net) with graph networks is futile. The essence of graph networks lies in message passing, which relies on mean-field theory. Therefore, enhancing these algorithms amounts to an approximation. The method of choosing the mean field determines whether those functions can succeed.
The above algorithm employs an end-to-end approach, which has the advantage of providing answers directly and more quickly compared to traditional algorithms. However, the drawback is that it does not guarantee the optimality of the solutions and may fail when dealing with larger problems.
Another type is to base deep reinforcement learning on boosting traditional algorithms, with the dual directions of improving the branch and bound method and the iterative searching approach.
Before continuing our discussion, let’s address why reinforcement learning is needed to solve combinatorial optimization problems. While supervised learning networks perform well, their capability is constrained by the training data. As a result, the quality of the solution cannot exceed that of the input data. Hence, it is essential to employ reinforcement learning to enhance the solution quality.
- The performance of the model is tied to the quality of the supervised labels.
- Getting high-quality labeled data is expensive and may be infeasible for new problem statements.
How were they sure they got the best answer? - One cares more about finding a competitive solution more than replicating the results of another algorithm.
End-to-End Approach
Using pointer networks and graph networks is a common approach to solving combinatorial optimization problems with reinforcement learning.
Two directions for improving the end-to-end approach are changing the model and using different training patterns. In my view, the network model determines the peak performance level, while the training patterns influence whether this peak can be reached and factors like time and data requirements.
However, sometimes network models incorporate training patterns, necessitating an in-depth exploration of specific algorithms. This study originated from Neural Combiantorial Optimization with Reinforcement Learning, which was later modified by Reinforcement Learning for Solving the Vehicle Routing Problem, followed by numerous improved versions.
Local Search Methods based DRL
Previously, local search algorithm rules were designed manually, which is a form of heuristic search. However, we now aim to use reinforcement learning to automatically generate these search rules to get better performance.
Chen addressed a combination optimaztion search model “Learning to Perform Local Rewriting for Combinatorial Optimization“ based on DRL. Yolcu modify the local search methods in satisfiability problem, though need less step find the ground-state, it will use more time. Gao based on Large Neighborhood Search optimize Destory abd Repair operator. Lu created Learn to improve(LSI).