Skip to content

Python: Port Flask models to use API graphs#5103

Merged
RasmusWL merged 16 commits intogithub:mainfrom
tausbn:python-port-flask-to-api-graphs
Feb 16, 2021
Merged

Python: Port Flask models to use API graphs#5103
RasmusWL merged 16 commits intogithub:mainfrom
tausbn:python-port-flask-to-api-graphs

Conversation

@tausbn
Copy link
Contributor

@tausbn tausbn commented Feb 5, 2021

Most of the type trackers in this model were easily replaceable with
uses of the API graph, but the ones for tracking subclasses are
problematic, as these take us out of the API graph.


This is still a WIP, but in the spirit of ship-to-learn I'm putting it up now. It's already much more succinct than the original. 🎉

To handle subclasses, I will look at extending the API graph to account for these by adding edges and nodes where appropriate.

To do:

  • Performance testing.

Most of the type trackers in this model were easily replaceable with
uses of the API graph, but the ones for tracking subclasses are
problematic, as these take us out of the API graph.
@tausbn tausbn added the Python label Feb 5, 2021
This is slightly dubious, and should really be in the currently
unimplemented "def" counterpart to the "use" bits we already have.

However, it seems to work correctly, and in the spirit of moving
things along, this seemed like the easier solution. We can always
replace the implementation with the "proper" approach at a later point.
This is not strictly necessary, but it was bothering me that this
simply covered _all_ nodes that were both definitions and names at the
same time. Now it actually encompasses what the documentation claims
it does.
At this point, we may want to reconsider whether we really want the
deeply-nested module structure we had before (and which made the type
trackers somewhat bearable).

There's also a question of how we can make this a bit more
smooth. I think we need to consider exactly how we would like the
interface to this to work.
Co-authored-by: yoff <lerchedahl@gmail.com>
@tausbn tausbn marked this pull request as ready for review February 5, 2021 22:29
@tausbn tausbn requested a review from a team as a code owner February 5, 2021 22:29
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic! I also like how the intermediate results are kept as API::Nodes.

yoff
yoff previously approved these changes Feb 9, 2021
Copy link
Contributor

@yoff yoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This certainly looks a fair bit more polished :-)

RasmusWL
RasmusWL previously approved these changes Feb 9, 2021
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

I like that we now allow dotted names as imports, since it makes things feel a bit more natural.

CallCfgNode

I'm also a fan of CallCfgNode (although the name does kinda stick out). I'm wondering if making this change would result in the same behavior?

 private class FlaskAppRouteCall extends FlaskRouteSetup, DataFlow::CallCfgNode {
   FlaskAppRouteCall() {
-    this.getFunction() = flask::Flask::route().getAUse()
+    this = flask::Flask::route().getACall()
   }

Future of modeling

Now that things are so easy to model, we can reconsider how we want our modeling to look like. With type-tracking there was a need to expose every little bit we were interested in, but we don't need that anymore.

I don't think we need to do that in this PR per say. I have a few ideas, so I'm happy to merge this PR (once we've verified no performance problems), and create a draft we can discuss.

The two major things I think we can do now are

  1. Remove explicit modeling of things that are only used once.

    For example, changing our modeling to be the following, we can remove the route predicate entirely from the flask::Flask class

    -FlaskAppRouteCall() { this.getFunction() = flask::Flask::route().getAUse() }
    +FlaskAppRouteCall() { this.getFunction() = flask::Flask::instance().getMember("route").getAUse() }
  2. Restructure class/predicate layout. Since we don't need explicit CodeQL modules for all Python modules we care about anymore, we can restructure things a bit, and expose the things we care about in a manner that (potentially) makes more sense from a modeling perspective.

    For example, we could remove the flask:: prefix on everything.

@RasmusWL RasmusWL marked this pull request as draft February 9, 2021 15:11
@RasmusWL
Copy link
Member

RasmusWL commented Feb 9, 2021

I was just about to merge this PR, but then remembered that we still need to do performance testing... so I converted it to a draft so we don't accidentally merge it 😄

…graphs"

This reverts commit 2c4a477.

It's probably best _not_ to do this, as any `getMember` cycle in the
API graph will lead to nontermination.
@tausbn tausbn dismissed stale reviews from RasmusWL and yoff via 4c66071 February 11, 2021 15:10
tausbn and others added 2 commits February 15, 2021 13:51
This fixes some spurious results that occurred when we considered
_any_ use of `request.something` to be a source, even ones we had
tracked into other functions. To prevent this, using
`getAnImmediateUse` better captures the fact that we want the source
to be just the actual attribute access.
…ic.qll

Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
@tausbn
Copy link
Contributor Author

tausbn commented Feb 15, 2021

  1. Remove explicit modeling of things that are only used once.
    For example, changing our modeling to be the following, we can remove the route predicate entirely from the flask::Flask class
-FlaskAppRouteCall() { this.getFunction() = flask::Flask::route().getAUse() }
+FlaskAppRouteCall() { this.getFunction() = flask::Flask::instance().getMember("route").getAUse() }
  1. Restructure class/predicate layout. Since we don't need explicit CodeQL modules for all Python modules we care about anymore, we can restructure things a bit, and expose the things we care about in a manner that (potentially) makes more sense from a modeling perspective.
    For example, we could remove the flask:: prefix on everything.

I think these are excellent suggestions, but probably best left for another PR (as the commit history for this one is getting a bit complicated). I'm in favour of doing both of the things you suggest.

@tausbn
Copy link
Contributor Author

tausbn commented Feb 16, 2021

Performance seems decent, and will be possible to improve once we've moved all of our type trackers to API graphs. This can now be merged as far as I'm concerned.

@tausbn tausbn marked this pull request as ready for review February 16, 2021 11:15
@RasmusWL
Copy link
Member

Alright, let's go 🚀

@RasmusWL RasmusWL merged commit bf401c7 into github:main Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants