Absci Invites | EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction | Absci Absci Invites | EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction | Absci

05.05.2022

Absci Invites | EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction

Absci hosted Hannes Staerk from MIT and the Technical Institute of Munich for our Absci Invites seminar series. Hannes presented his impressive work on Equibind, an extremely fast machine-learning method for structural predictions of drug – protein complexes.

Disclaimer: Views and content presented by Hannes Staerk are his own and should not be attributed to Absci.

Presentation Transcript:

Joshua Meier:
Okay. So let’s get started. Hello everyone. And thanks for joining us today. I’m Joshua Meier, lead AI scientist here at Absci. And today I’m really excited to welcome Hannes Staerk from MIT and the Technical Institute of Munich.

Joshua Meier:
Hannes is an expert in bioinformatics and machine learning. And today he’s going to tell us about EquiBind, which is an extremely fast, and in my opinion, really exciting, deep learning model he’s developed to predict how drug-like molecules bind to specific protein targets.

Joshua Meier:
We encourage you to ask questions throughout the presentation. So if you have a question, please press the raised hand button at the bottom of the screen and I’ll come to you in real time on audio. A pop-up will appear asking you to unmute. So you’ll need to click that before we can hear you.

Joshua Meier:
And keep in mind that we are recording this for distribution on YouTube. So if you’d prefer to enter your question using the Q&A window with text, that’ll work as well. I’m happy to just ask the questions myself then. And with that, I’ll pass it over to you, Hannes, to hear about this great work.

Hannes Staerk:
Perfect. Thank you for the great introduction. And I want to stress again, feel free to interrupt me. So the task we’re taking here, as Josh said, we want to find the structure in which small molecule binds to a protein.

Hannes Staerk:
But we only have a 2D graph of the molecule as input and we don’t have a bounding box to which we’re docking. We are considering the blind docking scenario here, where the molecule could bind to any sort of location on the protein. And we want to predict the binding location.

Hannes Staerk:
And as we already discussed before, I probably don’t have to mention here as much in detail why this task is interesting. Probably you can tell me a lot more about that actually than I can. But yeah.

Hannes Staerk:
We know that the previous or the most methods that we used for docking, for predicting the 3D structure in which a small molecule binds to a protein, they sample many positions, many possible candidate locations of the items of the ligand and then they score them with some score function.

Hannes Staerk:
This can be quite time intensive if we have to sample many, many of these positions. Instead, EquiBind predicts the 3D structure and really gives you the core and it’s just in a single forward pass. So we also don’t have scores for our 3D coordinates or for confirmations that we produce in the end.

Hannes Staerk:
We just have a prediction of a single 3D structure. And the way we do that, or the very high-level overview, and to also mention the task, again, a little bit is we have this 2D ligand and we have the receptor with the 3D structure.

Hannes Staerk:
Again, we have no bounding box. We’re doing blind docking and the ligan could bind anywhere. And then what we actually do in the beginning to get an initial 3D structure for the ligand is we just choose a random RDKit conformer.

Hannes Staerk:
And in the end, we will also use the bond length and the bond angles of this RDKit conformer and EquiBind. It will only change torsion angles of the conformer. Because we assume that the bond lengths and bond angles are pretty correct. So pretty accurate anyways given from the RDKit conformer.

Hannes Staerk:
And how the confirmation really ends up being in the end, that mostly depends on the torsion angles. And then we want to predict the 3D coordinates in which the small ligand binds to the protein. But then how do we actually do it?

Hannes Staerk:
So here we have this as we already set the step where we generate the initial 3D conformer of our ligand. This transition right here, this is just a sketch now. I’m moving to the cartoon world to be able to explain a little easier with my drawings here.

Hannes Staerk:
What we have here is a 3D graph of the ligand given by the atoms. We connect each atom with its neighbor depending on radius around the atom. We have this radius parameter, which is the hyper parameter.

Hannes Staerk:
And if we were to set the radius higher, then this atom would also be connected to this atom and this and this. But now on the sketch, we just have the radius cut off in such a way that it is very similar to the 2D molecular graph.

Hannes Staerk:
But for the protein, we encode it into a graph in the similar fashion to a 3D structure as well, where we have the alpha carbons, where we only consider the alpha carbons, right? We only have the alpha carbons as nodes in our graph and we connect them with the radius.

Hannes Staerk:
If another alpha carbon is in the radius of our first alpha carbon, then they’re connected. And this way we end up with this graph of the protein here as well. And then we have a graph neural network processing both the ligand and another graph neural network with the other weight processing the receptor to produce a little bit changed coordinates.

Hannes Staerk:
Well, here they look very similar in my cartoon world. They look the same. But yeah, we have a little bit changed coordinates after these EGNN layers because they transform the features and the coordinates of our graphs.

Hannes Staerk:
How this EGNN works is by…. Well, let me just say it is a message passing neural network where we have the distances as edge features. So we look at the edges of the distances and we have some initial node features like the atom type or whether or not the atom is in a ring or whether or not the atom is in a fixed ring.

Hannes Staerk:
Features like this and we also have features like double bond, single bond, and so on. And for the receptor, we have features like the amino acid type for the nodes and so on.

Hannes Staerk:
And then we do message passing, meaning that, for example, if we look at these two nodes right now, or we look at this node right now, then we update its feature by first taking the features here and here and putting them through some NOP and ending up with new features.

Hannes Staerk:
And we do the same for the features here and here and end up with some new features. And then we aggregate them with some permutation in variant function and end up with our updated feature for this node.

Hannes Staerk:
And this message passing is done in this EGNN. But the EGNN, also called EquiBind Graph New Network, it also considers the coordinates. Every one of our atoms is associated with the coordinate and we change those coordinates after every single layer.

Hannes Staerk:
And the way this is done is in an equivariant way, meaning that no matter where our initial location is, where our ligand is in the beginning, like if our ligand was here in the beginning and after our EGNN layer we would end up with the ligand being like this, or like this node or just edge being like this.

Hannes Staerk:
If the ligand in the beginning was here, then we would still end up with the node switch to the side like this exactly in the same fashion. So we are independent of the initial location in terms of how we change the coordinates. And that’s why it’s called equivariant.

Hannes Staerk:
All right. But then what else are we doing here? We are not only having the message passing here inside of our ligand graph and inside of our receptor graph. And this message passing, it always considers the distances. And this way we consider the 3D geometry.

Hannes Staerk:
But what we additionally do is we have message passing. And after each EGNN layer, we pass messages from every single of our receptor atoms to our ligand atoms with an attention mechanism. Because we don’t know the number of atoms in each ligand and each receptor.

Hannes Staerk:
But this message passing, there we do not consider distances. Why not? Because we want to be independent of the initial location of the ligand. If our ligand in the beginning was here instead of up there, then the distance features would be different.

Hannes Staerk:
And of course, we always want to make the same prediction no matter where the ligand was in the beginning of our model. So this way we can still remain independent of our initial location.

Hannes Staerk:
But now we have done this first step, which is nothing special. We did some message passing here, some message passing here, some message passing in between. We ended up with updated features, with updated coordinates.

Hannes Staerk:
And now how do we actually bind or how do we now come up with our final 3D structure prediction? So what we do there is we construct key points for the receptor and key points for the ligand. Let’s first say how the key points for the receptor are constructed.

Hannes Staerk:
These key points are supposed to somehow capture the location of the binding site, like the location of the sites where the ligand interacts with the receptor. And now if we do this at the key points for the receptor, the first thing we do is we take the mean of all of the features in our ligand. Then we end up with a single vector. Let’s draw it over here.

Hannes Staerk:
And this vector we use as a query in an attention mechanism, where the keys and the values all come from our receptor atoms. So each single node of our receptor gives us a key and a value. But the key comes from the feature of the receptor and the value is actually the coordinate.

Hannes Staerk:
So what we more or less do with our keys, with the keys that come from the features of the receptor, we calculate an attention distribution over every single atom and this attention distribution we apply to the coordinates of the atoms.

Hannes Staerk:
And then we end up with an interpolation between all of the coordinates. And maybe our query told us to pay a lot of attention to… Let me take this color. Our query told us to pay a lot of attention to this coordinate and also a little bit to this. Almost no attention to all of the other coordinates.

Hannes Staerk:
So the coordinate that we will finally end up with is over here maybe. Because we also had some attention to the coordinates over here, but only very little. And we can do that again because we maybe don’t just produce a single query.

Hannes Staerk:
We can have multi-head attention and can have as many queries as we want. So in practice, we use something like 30 of these key points. But now let me draw four key points. So then we maybe have another key point coming from another attention head here, another key point here, and maybe the fourth key point ends up being here.

Hannes Staerk:
And then we again do the same thing for the ligand, just the other way around. And then we end up with key points. Maybe a key point is here, maybe a key point is here, and one key point is here, and the last one is here.

Hannes Staerk:
Then the final thing that we do is to… Now we have these key points for the ligand and for the receptor. And what we do now is to find the translation and rotation that moves the key points of the ligand as close as possible to the key points of the receptor in terms of RMSD. And that can be done with the capture algorithm.

Hannes Staerk:
So this capture algorithm, let me do it like this, it just takes these key points and the key points we have done below and it spits us out the rotation and translation that we need to apply to the green point cloud to have it end up as close as possible in terms of RMSD to the red point cloud.

Hannes Staerk:
And after we’ve done that, we just apply the same rotation and translation to the ligand coordinates and they end up maybe hopefully perfectly in the binding pocket. And that is the key of the mechanism or the key of the model.

Hannes Staerk:
Because this way, we are completely independent of the initial location of the ligand. Because no matter where the ligand is in the beginning, the key points will always be constructed in the same position relative to the ligand.

Hannes Staerk:
And then the rotation and translation calculated by the capture algorithm will always move the ligand into the right spot. So if there are any questions about that, please go ahead. This is the key mechanism and it’s maybe also a little complicated. So we can spend some time on that.

Hannes Staerk:
But otherwise, I would also summarize it again a little bit in this figure here. In the beginning, we have our ligand with the coordinates of each node and features for each node. Similar for the receptor, we have some coordinates for each node and features, F dash, for each node.

Hannes Staerk:
We put them through this inter message passing in intra message passing, where the intra messages, so the messages that only happen in here and only in here, they have distances. And the other ones don’t.

Hannes Staerk:
We end up with our transformed point cloud, transformed features, transformed point cloud for the receptor and transformed features. And then we construct the key points here. Y and Y dash.

Hannes Staerk:
The key points from those transformed coordinates and those key points give us the rotation and translation that we need to apply to the ligand to have it end up in its final location.

Hannes Staerk:
And didn’t you say there’s also a torsion angle degree of freedom in the ligand as well. How is that incorporated into the key points? From your explanation, it seems like the key points would be rigidly attached to the ligand structure.

Hannes Staerk:
So let’s get into that, right? We have our transformed point cloud here. And from this transformed point cloud, we construct the key points. But this transformed point cloud, which originally came from our ligand, this actually can be very janky. Let me go maybe here.

Hannes Staerk:
It can look like this, for example. It doesn’t necessarily have to look like a realistic molecule. This is only basically used for constructing the key points here that we used to get the rotation and translation.

Hannes Staerk:
But then the step where we get the final conformer of the ligand, because we don’t just use this point cloud here that can have unrealistic bond lengths and bond angles.

Hannes Staerk:
The way we get the final confirmation of the ligand, that is actually by taking these transformed coordinates and then changing the original ligand with the realistic bond lengths and bond angles by only changing its torsion angles to fit this transformed point cloud as closely as possible.

Hannes Staerk:
And that’s how we end up with the final confirmation that we then actually put into the pocket with the rotation and translation. So this is to solve the problem of this point cloud not being realistic, right? Not realistic bond length and bond angles.

Hannes Staerk:
We do that by taking the realistic bond length and bond angles and then only changing the torsion angles to match this point cloud as closely as possible. And in our paper, we call that a fast point cloud fitting.

Joshua Meier:
So I have a follow-up based on that as well. How important is that step? So for example, if you were to take some of these key points and like add noise on top of them, do you expect the method would still work? How important is it to predict this exactly right?

Hannes Staerk:
You mean the key points or do you mean the confirmation that we…

Joshua Meier:
Actually, let’s say both.

Hannes Staerk:
So I think if you were to add noise to the key points, it’s maybe rather robust to it because the key points… If we maybe go back to our drawing here, like this, say our key points were moved a little bit in each direction.

Hannes Staerk:
And then probably the rotation and translation that we need to apply to put it as close as possible to the key points of the receptor, this rotation and translation probably doesn’t even change that much. So I think this would be rather robust to some noise.

Hannes Staerk:
But for the point cloud here, the Z point cloud, which gives us the final conform of the ligand, there I think we need to get it rather well. Otherwise, say maybe these two points, if they are like over here and over here, then the torsion angle of this bond would be very different.

Hannes Staerk:
And maybe this ring over here would be rotated like this instead. And we would end up with the wrong conformation. That would be my intuition.

Joshua Meier:
Okay. That’s really helpful. There’s another question also, someone Slacked me, which is about flexible docking. Actually, I have a question on this also. When you look at this independent SE(3)-equivariant graph matching network input for the receptor is X prime and output is Z prime, right?

Joshua Meier:
So that means you are actually changing the coordinates. In practice when you run this method, how much flexibility do you see happening? Because this is basically flexible docking. You’re allowing your small molecule and your protein to change. Is that right? Or no?

Hannes Staerk:
Sorry. This might have been a little bit confusing in the end. We only use the changed coordinate, the Z dash, we only use to construct these key points over here, the Y dash. And in the final bond confirmation, we used the original X dash.

Joshua Meier:
I see. Okay. So you don’t get any gradients basically on the rest of the coordinates, right? So you’re getting the Z prime. It’s three by M and you’re just taking a couple of the coordinates there in order to construct your key points. Is that right?

Hannes Staerk:
Well, you are getting some gradients because you’re getting the gradients of the key points that you constructed, which are interpolations of the coordinates of the receptor. And those coordinates of the receptor are the Z dash.

Joshua Meier:
Okay. That makes sense.

Hannes Staerk:
And why can’t we take the Z dash and do flexible docking? Well, let’s go back to this drawing and say we have a ligand that binds to the receptor over here. And then our point cloud, the key points that we construct, they are only able to lie within the convex hurl of our receptor.

Hannes Staerk:
Because we use this attention mechanism over the coordinates. So they’re only able to lie in here. Oops. Let me take another color. The key points are only able to lie in here. So if we had a ligand that is like this that binds over here, then we have to have our EGNN transform the…. Now let me remove this color.

Hannes Staerk:
And then we need our EGNN to transform the receptor. Oops. That did not work. To transform the receptor that in the end some of its coordinates at least are up here. So we then have our convex hull like this and can actually end up with key points that are in here. And we can place our ligand over here. I’m not sure if that’s entirely clear.

Joshua Meier:
So that makes sense for this method. I’m wondering if you have any thoughts of the future of like flexible docking in this context. What do you see are the key things that need to happen in order to enable flexible docking?

Hannes Staerk:
Yeah. So to additionally have receptor flexibility, right?

Joshua Meier:
Yeah.

Hannes Staerk:
So I will take the freedom to only answer this in terms of the EquiBind model. There we could also imagine to have something like an additional EGNN, where the purpose of one is just to change the coordinates to obtain the key points and the job of the other EGNN is to come up with a Z dash dash, which are then the final coordinates of the receptor in which we put the ligand.

Hannes Staerk:
So this, for example, would be one idea to get the flexibility there. I mean, this is now not a very general statement of what needs to happen to get receptor flexibility working in machine learning models or in graph neural networks operating on 3D structures of receptors.

Hannes Staerk:
But I think one large part is maybe really just better 3D and coders for proteins. We have, for example, seen the Skillnet paper this year come out, or there’s also the Intrinsic-Extrinsic Convolution paper to encode through these structures of EGNN’s better.

Hannes Staerk:
We always see that the gains that we get by these more complicated models to encode through these structures, I don’t want to dismiss the work they did, but they aren’t maybe so relevant compared what you can get with an EGNN.

Joshua Meier:
Okay. Cool. There’s one more question from the chat before I move on from me that says, “I’m wondering if key points can be viewed as pharmacophore, in other words, geometrically depicted charges distribution.”

Hannes Staerk:
Let me get to that point a little bit later when I explain an addition point of how we try to come up with physically plausible key points that should somehow represent binding locations.

Joshua Meier:
Okay.

Hannes Staerk:
But then let’s finally have a look at some last images for this step here, where we go from this janky point cloud that is not necessarily a realistic molecule to our final ligand confirmation that we then put into the receptor pocket.

Hannes Staerk:
And here we just have a few images of the Z point cloud. And then how the ligand looks like if we take realistic bond angles and bond length and only change the torsion angles to fit this point as well as possible.

Hannes Staerk:
But then we also have an additional regularization step in our EGNN. We have this EGNN which changes the coordinates of the ligand and we actually don’t want to produce these janky ligands. And they could be even much more janky if we didn’t have this additional soft regularization step.

Hannes Staerk:
And I sort of imagine it like a batch norm, where after every EGNN layer, you still want to keep your, this time not features, but you still want to keep your coordinates in a reasonable distance to the next bond, to the next atom, and so on.

Hannes Staerk:
And what we have there is this turn right here, where we have some distances in the original point cloud and we have some distances after an EGNN layer. And we say that we want these distances to be similar.

Hannes Staerk:
And which are these distances? Well, there are our one hop distances. Now we’re talking about the molecular graph, not the 3D graph. And there are the two hop distances in the molecular graph. And there are all of the distances that are in arithmetically.

Hannes Staerk:
Again, why do we choose these one hop and two hop distances in our molecular graph and say we want them to be preserved? Well, that is because the one hop distances and the two hop distances, if they are not changed, then the bond angles and the bond length are not changed.

Hannes Staerk:
We are only allowing the model to change three hop distances and all other distances because we only want the model to change torsion angles. If we were to keep three hop distances fixed as well, so one, two, three hop, then we would not be able to change the torsion angle here.

Hannes Staerk:
If we were to rotate this atom to over here, now our oxygen is here, then this distance would’ve changed as well. And that’s why we have this term, which is sort of a lost term, right? But how do we actually preserve it?

Hannes Staerk:
Well, we take the gradient with respect to our produced point cloud and we update our produced point cloud with a few gradient descents to match these constraints more closely. And down here, we just have a visualization for one molecule, the distances of which edges would be preserved, or we want to preserve. But this is a soft regularization.

Hannes Staerk:
This is just an additional term in the very end if we end up with our final ligand location prediction here and we have our receptor over here. Then we sort of put cautions around each atom of the ligand, each atom of the receptor and we say we want a little overlap.

Hannes Staerk:
If there’s large overlap, like over here, then we have a large, additional loss term, which we call intersection loss. And we can provide some more details if you’re interested in that at the end. Yeah. But then we had the question from G or Gee about how or what the key points might correspond to.

Hannes Staerk:
And what we do there is we have the concept of key points and pocket points. So what are pocket points now? We define these poker points now, first of all. Let’s say we take the atoms that are on the outside of our ligand. Let me only consider this four for now.

Hannes Staerk:
And then we look at the line to the closest receptor atom, which is maybe over here for this ligand atom. And then we look at the line between them and we define the middle of this line and this color maybe. We define the middle of this line to be a pocket point. And this way we end up with all of these pocket points.

Hannes Staerk:
But let me now draw a fifth one. So let’s now consider that we only have five pocket points in this situation. Well, maybe let’s say six, because then this would be a plausible configuration that we might end up with.

Hannes Staerk:
And these pocket points, we say we want the key points that we produce to be close to these pocket points let me say on average. Because the key points that we’re producing there, we always have a fixed number.

Hannes Staerk:
We always have four for every single ligand and every single receptor. While if we have a very large ligand, then we have way more pocket points because we have way more points of interaction between the ligand and the receptor.

Hannes Staerk:
So what we use is an optimal transport loss between the coordinates of the key points and the coordinates of the pocket points. And then maybe the lowest possible optimal transport loss here would be achieved by having a key point that ends up over here. And maybe a key point here, one here, here, and here.

Hannes Staerk:
And sort of the key point, the first that we drew, this is assigned to these two pocket points with the optimal transport loss, where we have a mass preserving loss between the pocket points and the key points, and they do not have to have the same cardinality.

Hannes Staerk:
So in the end, we can say that we have this additional optimal transport loss term to make our key points correspond to interaction points between the ligand and the receptor. Assuming that we actually capture interaction points with this pocket point construction, where we look at the outer atoms of the ligand and the closest atom of the receptor to those outer atoms of the ligand and interpolate between those two.

Hannes Staerk:
Okay. Well, then we have everything together. These are the main components of EquiBind, or these are. Then let’s test EquiBind. And for that, we use PDBBind and we take all of the structures that were released in 2019 or earlier as our training data and all of the structures from 2020 or later we use as test data.

Hannes Staerk:
Well, not actually. We also remove all of the data. We also remove all of the complexes that have a shared ligand from the test data. And we also consider the test scenario where we have no shared ligand and no shared receptors as well.

Hannes Staerk:
And then when we test our or EquiBind like this, we compare it against some baselines, which are SMINA. First of all, all of these baselines we use this standard paradigm, where we sample many different locations, we score them. And then we choose the best scoring locations based on the scoring function.

Hannes Staerk:
But for this, we really need a good scoring function. And these methods have different ones and different sampling schemes as well. For example, GNINA down here, this has a deep learning based scoring function with some 3D CNN.

Hannes Staerk:
But then we also use some commercial software. And so we, for example, also use QuickVina-W, which is made for wide bounding boxes. So for blind docking as well. We compare these against EquiBind.

Hannes Staerk:
And in our comparisons, I will always show curves like this. And here in this curves, on the X axis we have the RMSD of the ligand, where we put it and where it’s actually crystallized, where it’s actual location is in the co-crystallized structure.

Joshua Meier:
Just a quick question about that. Does that mean you’re taking the molecule in the correct position and doing capture in RMSD? So how are you computing an RMSD here?

Hannes Staerk:
We are computing ligand RMSD, or if we call that. So we don’t do a capture alignment first and then calculate RMSD. We just take where we put the ligand and where the co-crystallized ligand is. And then we calculate the RMSD.

Joshua Meier:
So that means the position doesn’t matter. It doesn’t matter where you place it. It just matters that you predicted the confirmation correctly. Is that right?

Hannes Staerk:
It does matter. We do not do a capture alignment and then take the RMSD. So what we do is we take the co-crystallized location, which maybe now over here in space, and we take the location where we put the ligand. So this is maybe over here. And then we take the RMSD between those two.

Joshua Meier:
And then receptor is fixed between them?

Hannes Staerk:
Yeah. The receptors maybe around the co-crystallized ligand and maybe the predicted ligand is over here. And then maybe the receptors somewhere. The receptor we cannot get wrong, say, because we don’t move it.

Joshua Meier:
Yeah. Okay. Perfect. That answers the question. Thank you.

Hannes Staerk:
Yeah. Thanks for the clarification. But this is what we have on the X axis. And on the Y axis, we have the fraction of predictions which have a lower RMSD than what we have on the X axis. So let’s say we have a curve going like this.

Hannes Staerk:
Then we would know here at this location, from this point of the curve, we would know that 60% of our predictions have an RMSD that is better than 10. These are how to interpret these parts. So the best curve would look something like this.

Hannes Staerk:
Well, of course it would look like this. But yeah, you get the point. But then before we actually look at those curves, let’s first of all look a little bit at the run times of our models. Here what we’re showing in this table is, first of all, the four baselines that we’re considering.

Hannes Staerk:
And here we have the average time that it takes a model in seconds to make a prediction for a single complex in our test set. And this is without any receptor pre-processing.

Hannes Staerk:
Because in practice, the receptor pre-processing, in many applications, you would only do it once and then you would dock many ligands to it. And then we can look at the time that EquiBind takes. And this is really an order of orders of magnitude faster.

Hannes Staerk:
Also, of course, if we use a GPU, which you should usually would do, I suppose, then you’re even faster. Then we can also look at these plots over here. And what we see is in the area above five angstrom era, above five angstrom linked RMSD, EquiBind is already doing a lot better than the baselines.

Hannes Staerk:
But in this low below two angstrom RMSD region, there EquiBind is not doing as well as the baselines. So we see that EquiBind is very good at getting the approximate location right. But then the final exact coordinates of the receptor of the ligand are not found that accurately. And that sort of makes sense, right?

Hannes Staerk:
If you are able to predict or if you’re able to have many, many tries of where you put the individuals atoms now in the very end, like all of these baselines too, then you are probably much better at finding the exact final locations of the atom instead of doing the EquiBind approach where you just put the ligand immediately into you just make your final prediction in one shot.

Hannes Staerk:
So what we then do, EquiBind is good at getting the approximate location. You try to use EquiBind together with one of those classical methods to fine tune this initial prediction of EquiBind further. And this way we end up with numbers like this.

Hannes Staerk:
If we look at this table over here maybe, if we use QVINA for fine-tuning, for example, or SMINA for fine-tuning, we’re no longer as fast because we do additional sampling always. But we can actually still be quite a bit faster than the fastest baseline while having better numbers.

Hannes Staerk:
In terms of these curves, this would then look like the light blue line over here. So we can really get a good trade-off between runtime versus accuracy, or you can decide which level of this you want have.

Hannes Staerk:
But finally, I want to look at some visualizations because Patrick Walters or Pat Walters, as he’s probably known, from the relay therapeutics, he sent us this tyrosine kinase and said this a challenging target. We should try this challenging target.

Hannes Staerk:
He thinks we won’t be able to dock two drugs to it because I suppose many other docking or this was a hard target for most docking schemes or docking tools. And he sent it to us. So this is not cherry picked, but still it’s only a single example. But yeah, let’s see what this is about.

Hannes Staerk:
We have this tyrosine kinase here, the protein, and we have two different drugs in green. So these are two different molecules. And these two different drugs, they bind well to this location and that location.

Hannes Staerk:
These are the ground truth location that we would want our models to predict the ligand to be in, the two different ligands. But if we look at the predictions that GNINA makes for this, it puts both ligands into the same pockets.

Hannes Staerk:
So one prediction is pretty good, where the ligand should actually be, but the other one is completely wrong. Because the gray one, it should be up here. And similar, if we look at the predictions of SMINA, they are both in the other pockets. So again, one’s completely wrong.

Hannes Staerk:
Then we can look at the predictions of GLIDE and it actually swaps the two ligands around. So it puts both in the completely wrong positions. Then we can look EquiBind S, EquiBind with SMINA fine-tuning. And there we see EquiBind is able to put the ligand in the correct approximate location.

Hannes Staerk:
And then with fine-tuning with SMINA, we almost get the perfect confirmation in the end. And I don’t want to forget to mention that these examples are not in the training data of the model. And with this, I would be very happy to take any questions.

Hannes Staerk:
And I hope you now are convinced that EquiBind is capable of making even real-world or taking these real-world examples of tyrosine kinase and two specific drugs and binding to it and docking them in the right locations where the baselines are. So I’m happy if you’ve got any further questions.

Joshua Meier:
Cool. First of all, thank you for the excellent talk. This was presented really well. All those graphics were very helpful for better understanding the method and the results. So thanks again. So a couple of questions.

Joshua Meier:
So first of all, this is really cool. You just brought up a point at the end about how this isn’t in the training set. I’m wondering how you assess generalization in this space. We do mostly protein interactions at Absci, designing antibody drugs.

Joshua Meier:
But curious when you’re doing protein small molecules, how do you do generalization? Are there like structural holdouts? Are you doing that on the receptor side or the small molecule side? What does that look like?

Hannes Staerk:
So on the molecule side, we could do a scaffold split, where we only consider molecules in our test sets that have a very different scaffold than the ones in the training set. But we actually not do this with EquiBind.

Hannes Staerk:
We only have this time-based splits, which we hope reflects reality. And additionally, make sure that there’s no receptor and ligand overlap. But we do not have a scaffold split, for example. We also do not have a sequence similarity split, for example, on the side of the receptor.

Hannes Staerk:
What would you actually say would be the most convincing for you in terms of the protein and splitting the protein? Would you say it would be the most convincing if we maybe have a sequence similarity split or somehow take structurally different receptors or structurally different binding pockets?

Joshua Meier:
Yeah, I think it would be the latter. So let’s say you’re working on a drug discovery project and then you’ve got some target where you don’t have any small molecules that bind to it and you’re trying to make a small molecule drug for it.

Joshua Meier:
So that is maybe a receptor that looks different from other ones you’ve had before. Because otherwise people are using like homology modeling and things like that in order to find some initial lead small molecules, initial hits, and then they can optimize them.

Joshua Meier:
Again, I’m not expert in small molecules, but it could be really interesting if you had that structural holdout. So you had proteins in a test set that are very different from any proteins you’ve seen before in terms of their structure because it’s a structure-based method and showing if you could generalize to that.

Hannes Staerk:
Yeah. Also, what I said maybe with the structural different binding pockets maybe doesn’t even make that much sense and we really just want structurally different proteins, because we are considering blind docking here where we don’t have the pocket and the molecule could just bind to everywhere on the receptor.

Joshua Meier:
Yeah. But again, it depends what the problem is. I think there’s already a lot of value here, especially with like the fine-tuning with the existing methods. If you just evaluate the same kind of benchmark, it’s still very useful, even for those kind of homology modeling stuff. You can do things faster here. So it could be very impactful.

Hannes Staerk:
I hope so. Yeah. This is actually something that Regina, who is also my future supervisor, Regina Barzilay. She’s very interested now in how well these 3D models generalize. My hypothesis sort of is that all of the sequence-based models will not generalize as well as the structure-based models.

Hannes Staerk:
Because if we consider or if we force the model to consider the structure and maybe the distances between some atoms, then we bring it much more closer to reason about the underlying physics or we increase the probability of the model actually reasoning about the underlying physics. And if it actually ends up doing that, well, then we are able to generalize because the underlying physics is how things really happen.

Joshua Meier:
Yeah. That’s a really good perspective. So what is next in this line of work?

Hannes Staerk:
Next are affinity predictions, I would say. Because we often not only want the 3D structure, but it would also be nice to have an affinity value if we have something of which we do not know whether or not it binds.

Hannes Staerk:
Well, this is what we most often, I would assume, have in practice, where we have a receptor and we want to discover some drug candidates that might bind to it, inhibit it. We want this affinity score when we search through our library of a billion different molecules to find ligand that binds to it. No other questions in the chat, I suppose.

Joshua Meier:
Well, I still have one more question here, which is you’re talking about binding affinity. What kind of data are you going to use in order to predict the affinity of the ligand to the receptor?

Hannes Staerk:
So there when PDB bind, we also have the affinity scores for every single complex. And then we have these complexes which are only molecules that do dock, then we would additionally need some data, of which we know that they do not bind.

Joshua Meier:
Okay. Great. We have one minute left and there’s one more question from the chat from Bob, who asks, “Might adding a fourth dimension time improve the possibilities for affinity predictions because molecules can wiggle?”

Hannes Staerk:
I’m very happy about this question, but maybe the minute is not very sufficient. But the thing I will say is if you have some… What I’m very interested in that space are diffusion models, where we already know Dally 2, for example, which uses diffusion to generate some images.

Hannes Staerk:
And we also have models like GeoDiff, which use diffusion to generate the 3D structures of a molecule only given its 2D molecular graph. And we do basically diffusion denoising on the coordinates of the molecule.

Hannes Staerk:
And what if we now put a point cloud into the binding pocket of a protein and then do diffusion on the coordinates of the atoms that we put into the pocket? And then we denoise it to finally end up with a molecule that fits into the pocket very well.

Hannes Staerk:
And I think this is almost like modeling the physics because we can then reason about the distances and we can have… What your score model does is almost predict forces that you then have to apply to each atom.

Joshua Meier:
Awesome. Well, I think we’re at time. Really, thank you again for coming to give a seminar. I think this was a really enjoyable. A bunch of thanks from the chat as well. It was great learning more about this work.

Hannes Staerk:
I have to echo the very enjoyable. And thanks for the nice questions also from the chat.