这篇文章提出了NeuRewriter的方法,使用强化学习中 actor-critic 的方式训练。
感觉类似于 “A Monte Carlo Policy Gradient Method with Local Search for Binary Optimization” 中的操作。当然了,NeuRewriter是最先提出来的,时间跨度基本都有4年了。
Reference:
Problem Setup
Let $\mathcal{S}$ be the solution’s space of problem domain, and $c: \mathcal{S} \rightarrow \mathbb{R}$ be the cost function. The goal of optimization is to find $\arg \min _{s \in \mathcal{S}} c(s)$. Formally, each solutionis a state, and each local region and the associated rule is an action.
Optimization as a rewriting problem. Let $\mathcal{U}$ be the rewriting ruleset. Suppose $s_t$ is the current solution (or state) at iteration $t$. We first compute a state-dependent region set $\Omega\left(s_t\right)$, then pick a region $\omega_t \in \Omega\left(s_t\right)$ using the region-picking policy $\pi_\omega\left(\omega_t \mid s_t\right)$. We then pick a rewriting rule $u_t$ applicable to that region $\omega_t$ using the rule-picking policy $\pi_u\left(u_t \mid s_t\left[\omega_t\right]\right)$, where $s_t\left[\omega_t\right]$ is a subset of state $s_t$.
$\Omega\left(s_t\right)$ is a problem-dependent region set. For expression simplification, $\Omega\left(s_t\right)$ includes all sub-trees of the expression parse trees; for job scheduling, $\Omega\left(s_t\right)$ covers all job nodes for scheduling; and for vehicle routing, it includes all nodes in the route.
We then apply this rewriting rule $u_t \in \mathcal{U}$ to $s_t\left[\omega_t\right]$, and obtain the next state $s_{t+1}=$ $f\left(s_t, \omega_t, u_t\right)$. Given an initial solution (or state) $s_0$, our goal is to find a sequence of rewriting steps $\left(s_0,\left(\omega_0, u_0\right)\right),\left(s_1,\left(\omega_1, u_1\right)\right), \ldots,\left(s_{T-1},\left(\omega_{T-1}, u_{T-1}\right)\right), s_T$ so that the final cost $c\left(s_T\right)$ is minimized.
In this part mention two new functions: region-picking and rule-picking, what's mean of them? At the end of the following is rewriting rule. I need the accurate meaning of rewriting. Before this paper, same idea had been proposed by [Halide](https://github.com/halide/Halide). Instead of searching from scratch, this work searches solution by iteratively applying local rewriting rules to the existing until convergence. Thus, rewriting formulation is suitable for such problem: * Easily find feasible solution. * Well-behaved local structures, which could be utilized to incrementally improve the solution. It's a hard to satisfy feature in spin glass. Three words means nedd to know: * region-picking * rule-picking * rewriting ruleNeural Rewriter Model
Above is the entire model framework. Show the pseudo-code below.
Score predictor. Given the state $s_t$, the score predictor computes a score $Q\left(s_t, \omega_t\right)$ for every $\omega_t \in \Omega\left(s_t\right)$, which measures the benefit of rewriting $s_t\left[\omega_t\right]$. A high score indicates that rewriting $s_t\left[\omega_t\right]$ could be desirable.
In lines 2-10, I believe this algorithm resembles the Monte Carlo method; it initially establishes a probability distribution by $\Omega_\omega$ and gets each score; then selects one from it. However, the choice is not entirely random—an acceptance probability is set. Through this process, high-quality data for learning are generated. Thus, region-picking function serves as a judgment function to assess whether the situation is favorable or not. The loss function of this par write as:
Rule selector. Given $s_t\left[\omega_t\right]$ to be rewritten, the rule-picking policy predicts a probability distribution $\pi_u\left(s_t\left[\omega_t\right]\right)$ over the entire ruleset $\mathcal{U}$, and selects a rule $u_t \in \mathcal{U}$ to apply accordingly.
In lines 11-16, we will employ the Advantage Actor-Critic algorithm to train both the Rule-picking and Score-picking models. The primary network we aim to obtain is the Rule-picking model, which will assist us in addressing subsequent problems.
The rewriting rule merely changes the description method; in fact, it addresses the same issue, similar to the gauge theory in the SK model.