LG - 机器学习 CV - 计算机视觉 CL - 计算与语言
1、[LG] Structure-based drug discovery with deep learning
2、[CL] GENIE: Large Scale Pre-training for Text Generation with Diffusion Model
3、[LG] Policy learning "without'' overlap: Pessimism and generalized empirical Bernstein's inequality
4、[CL] FiDO: Fusion-in-Decoder optimized for stronger performance and faster inference
5、[CV] Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
摘要:基于结构的药物发现与深度学习、基于扩散模型的文本生成大规模预训练、悲观和推广的经验伯恩斯坦不等式、面向更强性能和更快推理的解码器内融合优化、文本引导图像补齐的推进与评估
R Özçelik, D v Tilborg, J Jiménez-Luna, F Grisoni
[Eindhoven University of Technology & Microsoft Research Cambridge]
基于结构的药物发现与深度学习
要点:
摘要:
以深度学习为主要形式的人工智能(AI)为药物发现和化学生物学带来了希望,例如预测蛋白质结构和分子生物活性,规划有机合成,以及设计新分子。虽然药物发现中的大多数深度学习工作都集中在基于配体的方法上,但基于结构的药物发现有可能应对未解决的挑战,例如未探索蛋白质靶点的亲和力预测、结合机制阐明以及相关化学动力学特性的合理化。深度学习方法的进步和蛋白质三级结构准确预测的可用性倡导在人工智能指导下基于结构的药物发现方法的复兴。本综述总结了基于结构的药物发现深度学习中最突出的算法概念,并预测了未来的机会、应用和挑战。
Artificial intelligence (AI) in the form of deep learning bears promise for drug discovery and chemical biology, e.g., to predict protein structure and molecular bioactivity, plan organic synthesis, and design molecules de novo. While most of the deep learning efforts in drug discovery have focused on ligand-based approaches, structure-based drug discovery has the potential to tackle unsolved challenges, such as affinity prediction for unexplored protein targets, binding-mechanism elucidation, and the rationalization of related chemical kinetic properties. Advances in deep learning methodologies and the availability of accurate predictions for protein tertiary structure advocate for a renaissance in structure-based approaches for drug discovery guided by AI. This review summarizes the most prominent algorithmic concepts in structure-based deep learning for drug discovery, and forecasts opportunities, applications, and challenges ahead.
https://arxiv.org/abs/2212.13295
Z Lin, Y Gong, Y Shen, T Wu, Z Fan...
[Microsoft Research Asia & Xiamen University & Tsinghua University & Fudan University]
GENIE: 基于扩散模型的文本生成大规模预训练
要点:
摘要:
本文提出一种名为GENIE的基于扩散模型的大规模文本生成语言预训练。GENIE是一个预训练的序列到序列文本生成模型,结合了Transformer和扩散。扩散模型接收编码器的潜信息,用于指导当前时间步的去噪。在多次此类去噪迭代后,扩散模型可以将高斯噪声恢复到由输入文本控制的多样化输出文本。此外,这种架构设计还允许对GENIE进行大规模的预训练。根据扩散模型的特点,本文提出一种名为连续段落去噪的新的预训练方法。在XSum、CNN/DailyMail和Gigaword基准上的广泛实验表明,GENIE可以实现具有各种强基线的可比性能,特别是在预训练后,GENIE的生成质量大大提高。
In this paper, we propose a large-scale language pre-training for text GENeration using dIffusion modEl, which is named GENIE. GENIE is a pre-training sequence-to-sequence text generation model which combines Transformer and diffusion. The diffusion model accepts the latent information from the encoder, which is used to guide the denoising of the current time step. After multiple such denoise iterations, the diffusion model can restore the Gaussian noise to the diverse output text which is controlled by the input text. Moreover, such architecture design also allows us to adopt large scale pre-training on the GENIE. We propose a novel pre-training method named continuous paragraph denoise based on the characteristics of the diffusion model. Extensive experiments on the XSum, CNN/DailyMail, and Gigaword benchmarks shows that GENIE can achieves comparable performance with various strong baselines, especially after pre-training, the generation quality of GENIE is greatly improved. We have also conduct a lot of experiments on the generation diversity and parameter impact of GENIE. The code for GENIE will be made publicly available.
https://arxiv.org/abs/2212.11685
Y Jin, Z Ren, Z Yang, Z Wang
[Stanford University & University of Chicago & Yale University & Northwestern University]
“没有重叠”的策略学习:悲观和推广的经验伯恩斯坦不等式
要点:
摘要:本文研究离线策略学习,旨在利用先验收集的观察结果(来自固定或适应性演化的行为策略)来学习最佳的个性化决策规则,为特定种群实现最佳的总体结果。现有的策略学习方法依赖于一个统一的重叠假设,即探索所有单个特征的所有操作的倾向在离线数据集中是有下界的;换句话说,现有方法的性能取决于离线数据集中的最坏情况倾向。由于无法控制数据收集过程,因此在许多情况下,这种假设可能是不现实的,特别是当行为策略随着时间的推移随着某些行动倾向的降低而演化时。本文提出一种优化策略值的较低置信界(LCB)以取代点估计的新算法。LCB用收集离线数据的行为策略知识构建。在不假设任何统一重叠条件的情况下,为算法的次优性建立了一个与数据相关的上界,这仅取决于 (i) 最优策略的重叠,以及 (ii) 优化策略类的复杂性。这意味着,对于适应性收集的数据,只要最佳行动的倾向随着时间的推移而降低,而次优行动的倾向被允许任意快速减少,就能确保高效的策略学习。在本文提供的理论分析中,为逆倾向加权估计器开发了一种新的自归一化类型浓度不等式,将众所周知的经验伯恩斯坦不等式推广到无界和非独立同分布的数据。
This paper studies offline policy learning, which aims at utilizing observations collected a priori (from either fixed or adaptively evolving behavior policies) to learn an optimal individualized decision rule that achieves the best overall outcomes for a given population. Existing policy learning methods rely on a uniform overlap assumption, i.e., the propensities of exploring all actions for all individual characteristics are lower bounded in the offline dataset; put differently, the performance of the existing methods depends on the worst-case propensity in the offline dataset. As one has no control over the data collection process, this assumption can be unrealistic in many situations, especially when the behavior policies are allowed to evolve over time with diminishing propensities for certain actions. In this paper, we propose a new algorithm that optimizes lower confidence bounds (LCBs) -- instead of point estimates -- of the policy values. The LCBs are constructed using knowledge of the behavior policies for collecting the offline data. Without assuming any uniform overlap condition, we establish a data-dependent upper bound for the suboptimality of our algorithm, which only depends on (i) the overlap for the optimal policy, and (ii) the complexity of the policy class we optimize over. As an implication, for adaptively collected data, we ensure efficient policy learning as long as the propensities for optimal actions are lower bounded over time, while those for suboptimal ones are allowed to diminish arbitrarily fast. In our theoretical analysis, we develop a new self-normalized type concentration inequality for inverse-propensity-weighting estimators, generalizing the well-known empirical Bernstein's inequality to unbounded and non-i.i.d. data.
https://arxiv.org/abs/2212.09900
M d Jong, Y Zemlyanskiy, J Ainslie, N FitzGerald, S Sanghai...
[Google Research & University of Southern California]
FiDO: 面向更强性能和更快推理的解码器内融合优化
要点:
摘要:
解码器内融合(FiD)是一个强大的检索增强语言模型,在许多知识密集型NLP任务上达到了最佳指标。然而,FiD受到非常昂贵的推理的影响。大多数推理时间是由解码器中的内存带宽约束引起的,并提议对FiD架构进行两个简单的更改,以将推断速度提高7倍。更快的解码器推断允许使用更大的解码器。将上述修改的FiD表示为FiDO,并表明它比现有的FiD模型更能提高性能,用于广泛的推理预算。例如,FiDO-Large-XXL比FiD-Base执行更快的推理,并且比FiD-Large性能更好。
Fusion-in-Decoder (FiD) is a powerful retrieval-augmented language model that sets the state-of-the-art on many knowledge-intensive NLP tasks. However, FiD suffers from very expensive inference. We show that the majority of inference time results from memory bandwidth constraints in the decoder, and propose two simple changes to the FiD architecture to speed up inference by 7x. The faster decoder inference then allows for a much larger decoder. We denote FiD with the above modifications as FiDO, and show that it strongly improves performance over existing FiD models for a wide range of inference budgets. For example, FiDO-Large-XXL performs faster inference than FiD-Base and achieves better performance than FiD-Large.
https://arxiv.org/abs/2212.08153
S Wang, C Saharia, C Montgomery, J Pont-Tuset, S Noy, S Pellegrini, Y Onoe, S Laszlo, D J. Fleet, R Soricut...
[Google Research]
Imagen编辑器和EditBench:文本引导图像补齐的推进与评估
要点:
摘要:
文本引导图像编辑可在支持创意应用方面产生变革性影响。一个关键的挑战是生成忠实于输入文本提示的编辑,同时与输入图像保持一致。本文提出Imagen编辑器,一种通过在文本引导图像补全上微调Imagen构建的级联扩散模型。Imagen编辑器的编辑忠实于文本提示,这是通过在训练期间使用目标检测器提出补全掩码来完成的。此外,图像编辑器通过调节原始高分辨率图像上的级联管道来捕获输入图像中的精细细节。为了改进定性和定量评估,引入了EditBench,文本引导图像补全的系统基准。EditBench评估自然和生成图像的补全编辑,探索对象、属性和场景。通过对EditBench的广泛人工评估,发现训练期间的目标掩码导致文本图像对齐的全面改进——例如,图像编辑器优于DALL-E 2和Stable Diffusion——作为一个队列,这些模型更擅长目标渲染而不是文本渲染,并且比计数/形状属性更好地处理材料/颜色/大小属性。
Text-guided image editing can have a transformative impact in supporting creative applications. A key challenge is to generate edits that are faithful to input text prompts, while consistent with input images. We present Imagen Editor, a cascaded diffusion model built, by fine-tuning Imagen on text-guided image inpainting. Imagen Editor's edits are faithful to the text prompts, which is accomplished by using object detectors to propose inpainting masks during training. In addition, Imagen Editor captures fine details in the input image by conditioning the cascaded pipeline on the original high resolution image. To improve qualitative and quantitative evaluation, we introduce EditBench, a systematic benchmark for text-guided image inpainting. EditBench evaluates inpainting edits on natural and generated images exploring objects, attributes, and scenes. Through extensive human evaluation on EditBench, we find that object-masking during training leads to across-the-board improvements in text-image alignment -- such that Imagen Editor is preferred over DALL-E 2 and Stable Diffusion -- and, as a cohort, these models are better at object-rendering than text-rendering, and handle material/color/size attributes better than count/shape attributes.
https://arxiv.org/abs/2212.06909