Ritwik Gupta – reviewer4you.com

Ritwik Gupta wrote a new post on the site reviewer4you.com 1 month, 1 week ago

Generating 3D Molecular Conformers via Equivariant Coarse-Graining and Aggregated Attention – The Berkeley Artificial Intelligence Research Blog Figure 1: CoarsenConf architecture. Molecular conformer generation is a fundamental task in computational chemistry. The objective is to predict stable low-energy 3D molecular structures, known as conformers, given the 2D molecule. Accurate molecular conformations are crucial for various applications that depend on precise spatial and geometric qualities, including drug discovery and protein docking. We introduce CoarsenConf, an SE(3)-equivariant hierarchical variational autoencoder (VAE) that pools information from fine-grain atomic coordinates to a coarse-grain subgraph level representation for efficient autoregressive conformer generation. Background Coarse-graining reduces the dimensionality of the problem allowing conditional autoregressive generation rather than generating all coordinates independently, as done in prior work. By directly conditioning on the 3D coordinates of prior generated subgraphs, our model better generalizes across chemically and spatially similar subgraphs. This mimics the underlying molecular synthesis process, where small functional units bond together to form large drug-like molecules. Unlike prior methods, CoarsenConf generates low-energy conformers with the ability to model atomic coordinates, distances, and torsion angles directly. The CoarsenConf architecture can be broken into the following components:(I) The encoder $q_phi(z| X, mathcal{R})$ takes the fine-grained (FG) ground truth conformer $X$, RDKit approximate conformer $mathcal{R}$ , and coarse-grained (CG) conformer $mathcal{C}$ as inputs (derived from $X$ and a predefined CG strategy), and outputs a variable-length equivariant CG representation via equivariant message passing and point convolutions.(II) Equivariant MLPs are applied to learn the mean and log variance of both the posterior and prior distributions.(III) The posterior (training) or prior (inference) is sampled and fed into the Channel Selection module, where an attention layer is used to learn the optimal pathway from CG to FG structure.(IV) Given the FG latent vector and the RDKit approximation, the decoder $p_theta(X |mathcal{R}, z)$ learns to recover the low-energy FG structure through autoregressive equivariant message passing. The entire model can be trained end-to-end by optimizing the KL divergence of latent distributions and reconstruction error of generated conformers. MCG Task Formalism We formalize the task of Molecular Conformer Generation (MCG) as modeling the conditional distribution $p(X|mathcal{R})$, where $mathcal{R}$ is the RDKit generated approximate conformer and $X$ is the optimal low-energy conformer(s). RDKit, a commonly used Cheminformatics library, uses a cheap distance geometry-based algorithm, followed by an inexpensive physics-based optimization, to achieve reasonable conformer approximations. Coarse-graining Figure 2: Coarse-graining Procedure. (I) Example of variable-length coarse-graining. Fine-grain molecules are split along rotatable bonds that define torsion angles. They are then coarse-grained to reduce the dimensionality and learn a subgraph-level latent distribution. (II) Visualization of a 3D conformer. Specific atom pairs are highlighted for decoder message-passing operations. Molecular coarse-graining simplifies a molecule representation by grouping the fine-grained (FG) atoms in the original structure into individual coarse-grained (CG) beads $mathcal{B}$ with a rule-based mapping, as shown in Figure 2(I). Coarse-graining has been widely utilized in protein and molecular design, and analogously fragment-level or subgraph-level generation has proven to be highly valuable in diverse 2D molecule design tasks. Breaking down generative problems into smaller pieces is an approach that can be applied to several 3D molecule tasks and provides a natural dimensionality reduction to enable working with large complex systems. We note that compared to prior works that focus on fixed-length CG strategies where each molecule is represented with a fixed resolution of $N$ CG beads, our method uses variable-length CG for its flexibility and ability to support any choice of coarse-graining technique. This means that a single CoarsenConf model can generalize to any coarse-grained resolution as input molecules can map to any number of CG beads. In our case, the atoms consisting of each connected component resulting from severing all rotatable bonds are coarsened into a single bead. This choice in CG procedure implicitly forces the model to learn over torsion angles, as well as atomic coordinates and inter-atomic distances. In our experiments, we use GEOM-QM9 and GEOM-DRUGS, which on average, possess 11 atoms and 3 CG beads, and 44 atoms and 9 CG beads, respectively. SE(3)-Equivariance A key aspect when working with 3D structures is maintaining appropriate equivariance. Three-dimensional molecules are equivariant under rotations and translations, or SE(3)-equivariance. We enforce SE(3)-equivariance in the encoder, decoder, and the latent space of our probabilistic model CoarsenConf. As a result, $p(X | mathcal{R})$ remains unchanged for any rototranslation of the approximate conformer $mathcal{R}$. Furthermore, if $mathcal{R}$ is rotated clockwise by 90°, we expect the optimal $X$ to exhibit the same rotation. For an in-depth definition and discussion on the methods of maintaining equivariance, please see the full paper. Aggregated Attention Figure 3: Variable-length coarse-to-fine backmapping via Aggregated Attention. We introduce a method, which we call Aggregated Attention, to learn the optimal variable length mapping from the latent CG representation to FG coordinates. This is a variable-length operation as a single molecule with $n$ atoms can map to any number of $N$ CG beads (each bead is represented by a single latent vector). The latent vector of a single CG bead $Z_{B}$ $in R^{F times 3}$ is used as the key and value of a single head attention operation with an embedding dimension of three to match the x, y, z coordinates. The query vector is the subset of the RDKit conformer corresponding to bead $B$ $in R^{ n_{B} times 3}$, where $n_B$ is variable-length as we know a priori how many FG atoms correspond to a certain CG bead. Leveraging attention, we efficiently learn the optimal blending of latent features for FG reconstruction. We call this Aggregated Attention because it aggregates 3D segments of FG information to form our latent query. Aggregated Attention is responsible for the efficient translation from the latent CG representation to viable FG coordinates (Figure 1(III)). Model CoarsenConf is a hierarchical VAE with an SE(3)-equivariant encoder and decoder. The encoder operates over SE(3)-invariant atom features $h in R^{ n times D}$, and SE(3)-equivariant atomistic coordinates $x in R^{n times 3}$. A single encoder layer is composed of three modules: fine-grained, pooling, and coarse-grained. Full equations for each module can be found in the full paper. The encoder produces a final equivariant CG tensor $Z in R^{N times F times 3}$, where $N$ is the number of beads, and F is the user-defined latent size. The role of the decoder is two-fold. The first is to convert the latent coarsened representation back into FG space through a process we call channel selection, which leverages Aggregated Attention. The second is to refine the fine-grained representation autoregressively to generate the final low-energy coordinates (Figure 1 (IV)). We emphasize that by coarse-graining by torsion angle connectivity, our model learns the optimal torsion angles in an unsupervised manner as the conditional input to the decoder is not aligned. CoarsenConf ensures each next generated subgraph is rotated properly to achieve a low coordinate and distance error. Experimental Results Table 1: Quality of generated conformer ensembles for the GEOM-DRUGS test set ($delta=0.75Å$) in terms of Coverage (%) and Average RMSD ($Å$). CoarsenConf (5 epochs) was restricted to using 7.3% of the data used by Torsional Diffusion (250 epochs) to exemplify a low-compute and data-constrained regime. The average error (AR) is the key metric that measures the average RMSD for the generated molecules of the appropriate test set. Coverage measures the percentage of molecules that can be generated within a specific error threshold ($delta$). We introduce the mean and max metrics to better assess robust generation and avoid the sampling bias of the min metric. We emphasize that the min metric produces intangible results, as unless the optimal conformer is known a priori, there is no way to know which of the 2L generated conformers for a single molecule is best. Table 1 shows that CoarsenConf generates the lowest average and worst-case error across the entire test set of DRUGS molecules. We further show that RDKit, with an inexpensive physics-based optimization (MMFF), achieves better coverage than most deep learning-based methods. For formal definitions of the metrics and further discussions, please see the full paper linked below. For more details about CoarsenConf, read the paper on arXiv. BibTex If CoarsenConf inspires your work, please consider citing it with: @article{reidenbach2023coarsenconf, title={CoarsenConf: Equivariant Coarsening with Aggregated Attention for Molecular Conformer Generation}, author={Danny Reidenbach and Aditi S. Krishnapriyan}, journal={arXiv preprint arXi […]
Ritwik Gupta wrote a new post on the site reviewer4you.com 1 month, 1 week ago

On the Stepwise Nature of Self-Supervised Learning – The Berkeley Artificial Intelligence Research Blog […]
Ritwik Gupta wrote a new post on the site reviewer4you.com 1 month, 1 week ago

Training Diffusion Models with Reinforcement Learning – The Berkeley Artificial Intelligence Research Blog […]
Ritwik Gupta wrote a new post on the site reviewer4you.com 1 month, 1 week ago

Rethinking the Role of PPO in RLHF – The Berkeley Artificial Intelligence Research Blog Rethinking […]
Ritwik Gupta wrote a new post on the site reviewer4you.com 1 month, 1 week ago

Goal Representations for Instruction Following – The Berkeley Artificial Intelligence Research Blog […]
Ritwik Gupta wrote a new post on the site reviewer4you.com 1 month, 1 week ago

Asymmetric Certified Robustness via Feature-Convex Neural Networks – The Berkeley Artificial Intelligence Research Blog Asymmetric Certified Robustness via Feature-Convex Neural Networks TLDR: We propose the asymmetric certified robustness problem, which requires certified robustness for only one class and reflects real-world adversarial scenarios. This focused setting allows us to introduce feature-convex classifiers, which produce closed-form and deterministic certified radii on the order of milliseconds. Figure 1. Illustration of feature-convex classifiers and their certification for sensitive-class inputs. This architecture composes a Lipschitz-continuous feature map $varphi$ with a learned convex function $g$. Since $g$ is convex, it is globally underapproximated by its tangent plane at $varphi(x)$, yielding certified norm balls in the feature space. Lipschitzness of $varphi$ then yields appropriately scaled certificates in the original input space. Despite their widespread usage, deep learning classifiers are acutely vulnerable to adversarial examples: small, human-imperceptible image perturbations that fool machine learning models into misclassifying the modified input. This weakness severely undermines the reliability of safety-critical processes that incorporate machine learning. Many empirical defenses against adversarial perturbations have been proposed—often only to be later defeated by stronger attack strategies. We therefore focus on certifiably robust classifiers, which provide a mathematical guarantee that their prediction will remain constant for an $ell_p$-norm ball around an input. Conventional certified robustness methods incur a range of drawbacks, including nondeterminism, slow execution, poor scaling, and certification against only one attack norm. We argue that these issues can be addressed by refining the certified robustness problem to be more aligned with practical adversarial settings. The Asymmetric Certified Robustness Problem Current certifiably robust classifiers produce certificates for inputs belonging to any class. For many real-world adversarial applications, this is unnecessarily broad. Consider the illustrative case of someone composing a phishing scam email while trying to avoid spam filters. This adversary will always attempt to fool the spam filter into thinking that their spam email is benign—never conversely. In other words, the attacker is solely attempting to induce false negatives from the classifier. Similar settings include malware detection, fake news flagging, social media bot detection, medical insurance claims filtering, financial fraud detection, phishing website detection, and many more. Figure 2. Asymmetric robustness in email filtering. Practical adversarial settings often require certified robustness for only one class. These applications all involve a binary classification setting with one sensitive class that an adversary is attempting to avoid (e.g., the “spam email” class). This motivates the problem of asymmetric certified robustness, which aims to provide certifiably robust predictions for inputs in the sensitive class while maintaining a high clean accuracy for all other inputs. We provide a more formal problem statement in the main text. Feature-convex classifiers We propose feature-convex neural networks to address the asymmetric robustness problem. This architecture composes a simple Lipschitz-continuous feature map ${varphi: mathbb{R}^d to mathbb{R}^q}$ with a learned Input-Convex Neural Network (ICNN) ${g: mathbb{R}^q to mathbb{R}}$ (Figure 1). ICNNs enforce convexity from the input to the output logit by composing ReLU nonlinearities with nonnegative weight matrices. Since a binary ICNN decision region consists of a convex set and its complement, we add the precomposed feature map $varphi$ to permit nonconvex decision regions. Feature-convex classifiers enable the fast computation of sensitive-class certified radii for all $ell_p$-norms. Using the fact that convex functions are globally underapproximated by any tangent plane, we can obtain a certified radius in the intermediate feature space. This radius is then propagated to the input space by Lipschitzness. The asymmetric setting here is critical, as this architecture only produces certificates for the positive-logit class $g(varphi(x)) > 0$. The resulting $ell_p$-norm certified radius formula is particularly elegant: [r_p(x) = frac{ color{blue}{g(varphi(x))} } { mathrm{Lip}_p(varphi) color{red}{| nabla g(varphi(x)) | _{p,*}}}.] The non-constant terms are easily interpretable: the radius scales proportionally to the classifier confidence and inversely to the classifier sensitivity. We evaluate these certificates across a range of datasets, achieving competitive $ell_1$ certificates and comparable $ell_2$ and $ell_{infty}$ certificates—despite other methods generally tailoring for a specific norm and requiring orders of magnitude more runtime. Figure 3. Sensitive class certified radii on the CIFAR-10 cats vs dogs dataset for the $ell_1$-norm. Runtimes on the right are averaged over $ell_1$, $ell_2$, and $ell_{infty}$-radii (note the log scaling). Our certificates hold for any $ell_p$-norm and are closed form and deterministic, requiring just one forwards and backwards pass per input. These are computable on the order of milliseconds and scale well with network size. For comparison, current state-of-the-art methods such as randomized smoothing and interval bound propagation typically take several seconds to certify even small networks. Randomized smoothing methods are also inherently nondeterministic, with certificates that just hold with high probability. Theoretical promise While initial results are promising, our theoretical work suggests that there is significant untapped potential in ICNNs, even without a feature map. Despite binary ICNNs being restricted to learning convex decision regions, we prove that there exists an ICNN that achieves perfect training accuracy on the CIFAR-10 cats-vs-dogs dataset. Fact. There exists an input-convex classifier which achieves perfect training accuracy for the CIFAR-10 cats-versus-dogs dataset. However, our architecture achieves just $73.4%$ training accuracy without a feature map. While training performance does not imply test set generalization, this result suggests that ICNNs are at least theoretically capable of attaining the modern machine learning paradigm of overfitting to the training dataset. We thus pose the following open problem for the field. Open problem. Learn an input-convex classifier which achieves perfect training accuracy for the CIFAR-10 cats-versus-dogs dataset. Conclusion We hope that the asymmetric robustness framework will inspire novel architectures which are certifiable in this more focused setting. Our feature-convex classifier is one such architecture and provides fast, deterministic certified radii for any $ell_p$-norm. We also pose the open problem of overfitting the CIFAR-10 cats vs dogs training dataset with an ICNN, which we show is theoretically possible. This post is based on the following paper: Asymmetric Certified Robustness via Feature-Convex Neural Networks Samuel Pfrommer, Brendon G. Anderson, Julien Piet, Somayeh Sojoudi, 37th Conference on Neural Information Processing Systems (NeurIPS 2023). Further details are available on arXiv and GitHub. If our paper inspires your work, please consider citing it with: @inproceedings{ pfrommer2023asymmetric, title={Asymmetric Certified Robustness via Feature-Convex Neural Networks}, author={Samuel Pfrommer and Brendon G. Anderson and Julien Piet and Somayeh Sojoudi}, booktitle={Thirty-seventh Conference on Neural Information Processing Systems}, year={2023} } […]
Ritwik Gupta wrote a new post on the site reviewer4you.com 1 month, 1 week ago

Detecting Text Ghostwritten by Large Language Models – The Berkeley Artificial Intelligence Research Blog […]
Ritwik Gupta wrote a new post on the site reviewer4you.com 1 month, 1 week ago

The Shift from Models to Compound AI Systems – The Berkeley Artificial Intelligence Research Blog […]
Ritwik Gupta wrote a new post on the site reviewer4you.com 1 month, 1 week ago

2024 BAIR Graduate Directory – The Berkeley Artificial Intelligence Research Blog Every year, the […]
Ritwik Gupta wrote a new post on the site reviewer4you.com 1 month, 1 week ago

Modeling Extremely Large Images with xT – The Berkeley Artificial Intelligence Research Blog As […]

ritwik_gupta

Shopping cart