参加杨东平老师在之江实验室举办的大语言模型研讨会议,从物理、神经计算、生物等方面理解和看待语言模型以及神经网络的发展。
DeepSeek-OCR: Contexts Optical Compression
RECONCILE and ReAgent
Why Do Multi-Agent LLM Systems Fail?
多智能体(Multi-Agent System, MAS)合作处理问题的思路十分流行,但是在一些热门的batchmark上并没有明显的表现差距。本文针对MAS没有性能提升的问题进行探究,总结出以下三个方面:
- specification and system design failures
- inter-agent misalignment
- task verification and termination.
DeepSeek
回顾DeepSeek模型发展过程,从最初的数据训练和模型搭建出发,为了在受限的硬件条件下创在出更加高效的模型,修改模型的架构,最后提出基于强化学习的模型微调方案。
Reference: * DeepSeek LLM Scaling Open-Source Language Models with Longtermism * DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models * DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model * DeepSeek-V3 Technical Report * DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning * DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models * Brief analysis of DeepSeek R1 and its implications for Generative AI * Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts * Training language models to follow instructions with human feedback
Bayesian Optimization
贝叶斯优化(Bayesian Optimization)是一种基于贝叶斯定理:
$$ P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)} $$
的全局优化方法,通常用于在计算代价高昂的情况下优化黑箱函数。它主要用于高效地寻找目标函数的最优解,尤其在函数不可微、函数形状复杂、或者评估函数代价昂贵(如深度学习模型的超参数优化)时特别有效。
A law of data separation in deep learning
这篇文章研究机器学习过程中,隐藏层对数据的处理方式,发现数据在等几何定律的分离,并且可以观察到类别的出现,因此总结归纳了一个可以量化的规律。
Reference: * A law of data separation in deep learning * github地址
Active Learning Literature Survey
介绍Active Learning的基本概念与算法,以及相关python库——ALiPy的使用。
The key idea behind active learning is that a machine learning algorithm can achieve greater accuracy with fewer labeled training instances if it is allowed to choose the data from which is learns.
Reference: * Active Learning Literature Survey * ALiPy: Active Learning in Python * GitHub:ALiPy
Tensor network Monte Carlo simulations for the two-dimensional random-bond Ising model
Tensor Network Monte Carlo (TNMC) method将张量网络和蒙特卡洛模拟结合,是一种新的模拟方法。本文分为两个部分,介绍TNMC方法,以及其在随机二维Ising模型上的实验。
Link: * Tensor network Monte Carlo simulations for the two-dimensional random-bond Ising model * Unbiased Monte Carlo for the age of tensor networks
Code: * TNMC
Annealing approach to root finding
在数值分析和科学计算中,Newton-Raphson方法是一个非常重要的工具,它被广泛用于求解方程的根。然而,经典的Newton-Raphson方法在面对复杂的非线性方程和多个根的情况下,可能会出现收敛性差、振荡或发散的情况。为了解决这些问题,研究者们提出了一种基于物理学启发的新方法,该方法在保留Newton-Raphson方法优点的同时,通过引入一个新的参数β,有效提升了算法的收敛速度和稳定性。