0%

Learning to Perform Local Rewriting for Combinatorial Optimization

这篇文章提出了NeuRewriter的方法,使用强化学习中 actor-critic 的方式训练。

感觉类似于 “A Monte Carlo Policy Gradient Method with Local Search for Binary Optimization” 中的操作。当然了,NeuRewriter是最先提出来的,时间跨度基本都有4年了。

Reference: * Learning to Perform Local Rewriting for Combinatorial Optimization * github地址

Problem Setup

Let 𝒮 be the solution’s space of problem domain, and c : 𝒮 → ℝ be the cost function. The goal of optimization is to find arg mins ∈ 𝒮c(s). Formally, each solutionis a state, and each local region and the associated rule is an action.

Optimization as a rewriting problem. Let 𝒰 be the rewriting ruleset. Suppose st is the current solution (or state) at iteration t. We first compute a state-dependent region set Ω(st), then pick a region ωt ∈ Ω(st) using the region-picking policy πω(ωt ∣ st). We then pick a rewriting rule ut applicable to that region ωt using the rule-picking policy πu(ut ∣ st[ωt]), where st[ωt] is a subset of state st.

Ω(st) is a problem-dependent region set. For expression simplification, Ω(st) includes all sub-trees of the expression parse trees; for job scheduling, Ω(st) covers all job nodes for scheduling; and for vehicle routing, it includes all nodes in the route.

We then apply this rewriting rule ut ∈ 𝒰 to st[ωt], and obtain the next state st + 1= f(st, ωt, ut). Given an initial solution (or state) s0, our goal is to find a sequence of rewriting steps (s0, (ω0, u0)), (s1, (ω1, u1)), …, (sT − 1, (ωT − 1, uT − 1)), sT so that the final cost c(sT) is minimized.

In this part mention two new functions: region-picking and rule-picking, what’s mean of them? At the end of the following is rewriting rule. I need the accurate meaning of rewriting. Before this paper, same idea had been proposed by Halide.

Instead of searching from scratch, this work searches solution by iteratively applying local rewriting rules to the existing until convergence. Thus, rewriting formulation is suitable for such problem:

  • Easily find feasible solution.
  • Well-behaved local structures, which could be utilized to incrementally improve the solution. It’s a hard to satisfy feature in spin glass.

Three words means nedd to know: * region-picking * rule-picking * rewriting rule

Neural Rewriter Model

Model Overview

Above is the entire model framework. Show the pseudo-code below.

ForwardPass

Score predictor. Given the state st, the score predictor computes a score Q(st, ωt) for every ωt ∈ Ω(st), which measures the benefit of rewriting st[ωt]. A high score indicates that rewriting st[ωt] could be desirable.

In lines 2-10, I believe this algorithm resembles the Monte Carlo method; it initially establishes a probability distribution by Ωω and gets each score; then selects one from it. However, the choice is not entirely random—an acceptance probability is set. Through this process, high-quality data for learning are generated. Thus, region-picking function serves as a judgment function to assess whether the situation is favorable or not. The loss function of this par write as:

$$\begin{align} L_\omega(\theta)=\frac{1}{T} \sum_{t=0}^{T-1}\left(\sum_{t^{\prime}=t}^{T-1} \gamma^{t^{\prime}-t} r\left(s_t^{\prime},\left(\omega_t^{\prime}, u_t^{\prime}\right)\right)-Q\left(s_t, \omega_t ; \theta\right)\right)^2 \end{align}$$

Rule selector. Given st[ωt] to be rewritten, the rule-picking policy predicts a probability distribution πu(st[ωt]) over the entire ruleset 𝒰, and selects a rule ut ∈ 𝒰 to apply accordingly.

In lines 11-16, we will employ the Advantage Actor-Critic algorithm to train both the Rule-picking and Score-picking models. The primary network we aim to obtain is the Rule-picking model, which will assist us in addressing subsequent problems.

$$\begin{align} \Delta\left(s_t,\left(\omega_t, u_t\right)\right) &\equiv \sum_{t^{\prime}=t}^{T-1} \gamma^{t^{\prime}-t} r\left(s_t^{\prime},\left(\omega_t^{\prime}, u_t^{\prime}\right)\right)-Q\left(s_t, \omega_t ; \theta\right) \\ L_u(\phi)&=-\sum_{t=0}^{T-1} \Delta\left(s_t,\left(\omega_t, u_t\right)\right) \log \pi_u\left(u_t \mid s_t\left[\omega_t\right] ; \phi\right) \\ L(\theta, \phi)&=L_u(\phi)+\alpha L_\omega(\theta) \end{align}$$

The rewriting rule merely changes the description method; in fact, it addresses the same issue, similar to the gauge theory in the SK model.