Python: Port Flask models to use API graphs#5103
Conversation
Most of the type trackers in this model were easily replaceable with uses of the API graph, but the ones for tracking subclasses are problematic, as these take us out of the API graph.
This is slightly dubious, and should really be in the currently unimplemented "def" counterpart to the "use" bits we already have. However, it seems to work correctly, and in the spirit of moving things along, this seemed like the easier solution. We can always replace the implementation with the "proper" approach at a later point.
This is not strictly necessary, but it was bothering me that this simply covered _all_ nodes that were both definitions and names at the same time. Now it actually encompasses what the documentation claims it does.
At this point, we may want to reconsider whether we really want the deeply-nested module structure we had before (and which made the type trackers somewhat bearable). There's also a question of how we can make this a bit more smooth. I think we need to consider exactly how we would like the interface to this to work.
Co-authored-by: yoff <lerchedahl@gmail.com>
yoff
left a comment
There was a problem hiding this comment.
This is fantastic! I also like how the intermediate results are kept as API::Nodes.
Co-authored-by: yoff <lerchedahl@gmail.com>
I prefer this name to `CfgCallNode` as the latter will make autocomplete more difficult.
yoff
left a comment
There was a problem hiding this comment.
This certainly looks a fair bit more polished :-)
There was a problem hiding this comment.
Nice 👍
I like that we now allow dotted names as imports, since it makes things feel a bit more natural.
CallCfgNode
I'm also a fan of CallCfgNode (although the name does kinda stick out). I'm wondering if making this change would result in the same behavior?
private class FlaskAppRouteCall extends FlaskRouteSetup, DataFlow::CallCfgNode {
FlaskAppRouteCall() {
- this.getFunction() = flask::Flask::route().getAUse()
+ this = flask::Flask::route().getACall()
}Future of modeling
Now that things are so easy to model, we can reconsider how we want our modeling to look like. With type-tracking there was a need to expose every little bit we were interested in, but we don't need that anymore.
I don't think we need to do that in this PR per say. I have a few ideas, so I'm happy to merge this PR (once we've verified no performance problems), and create a draft we can discuss.
The two major things I think we can do now are
-
Remove explicit modeling of things that are only used once.
For example, changing our modeling to be the following, we can remove the
routepredicate entirely from theflask::Flaskclass-FlaskAppRouteCall() { this.getFunction() = flask::Flask::route().getAUse() } +FlaskAppRouteCall() { this.getFunction() = flask::Flask::instance().getMember("route").getAUse() }
-
Restructure class/predicate layout. Since we don't need explicit CodeQL modules for all Python modules we care about anymore, we can restructure things a bit, and expose the things we care about in a manner that (potentially) makes more sense from a modeling perspective.
For example, we could remove the
flask::prefix on everything.
python/ql/src/semmle/python/dataflow/new/internal/DataFlowPublic.qll
Outdated
Show resolved
Hide resolved
|
I was just about to merge this PR, but then remembered that we still need to do performance testing... so I converted it to a draft so we don't accidentally merge it 😄 |
…graphs" This reverts commit 2c4a477. It's probably best _not_ to do this, as any `getMember` cycle in the API graph will lead to nontermination.
This fixes some spurious results that occurred when we considered _any_ use of `request.something` to be a source, even ones we had tracked into other functions. To prevent this, using `getAnImmediateUse` better captures the fact that we want the source to be just the actual attribute access.
…ic.qll Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
I think these are excellent suggestions, but probably best left for another PR (as the commit history for this one is getting a bit complicated). I'm in favour of doing both of the things you suggest. |
|
Performance seems decent, and will be possible to improve once we've moved all of our type trackers to API graphs. This can now be merged as far as I'm concerned. |
|
Alright, let's go 🚀 |
Most of the type trackers in this model were easily replaceable with
uses of the API graph, but the ones for tracking subclasses are
problematic, as these take us out of the API graph.
This is still a WIP, but in the spirit of ship-to-learn I'm putting it up now. It's already much more succinct than the original. 🎉To handle subclasses, I will look at extending the API graph to account for these by adding edges and nodes where appropriate.To do: