Skip to content

Python: Dataflow, Test that pointsTo implies data flow#4174

Merged
yoff merged 9 commits intogithub:mainfrom
yoff:SharedDataflow_PointsToImpliesDataflow
Nov 10, 2020
Merged

Python: Dataflow, Test that pointsTo implies data flow#4174
yoff merged 9 commits intogithub:mainfrom
yoff:SharedDataflow_PointsToImpliesDataflow

Conversation

@yoff
Copy link
Contributor

@yoff yoff commented Sep 1, 2020

Running the test on a larger database gives some interesting results. While writing it, I ran it on the coverage test database and got some surprising hits. For instance, it looked like a parameter called cls did not flow into the function body.

Running the test on a larger database gives some interesting results.
@yoff yoff requested a review from a team as a code owner September 1, 2020 10:00
@yoff yoff added the Python label Sep 1, 2020
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean we currently are not able to track functions with dataflow?

Maybe we can add a test-case along these lines?

def foo():
    return "foo"

def bar():
    return "bar"

if cond:
    f = foo
else:
    f = bar
f()  # what can flow to `f`?

wait a minute, following the existing test setup, that is just

if cond
    f = source
else:
    f = non_source
SINK(f())

@yoff
Copy link
Contributor Author

yoff commented Sep 1, 2020

I guess the question is if we actually want a data flow edge from the function expression to the name, or if we are satisfied with the ability to look up the function expression whenever the name is used as a call. In the latter case, we can exclude function expressions from pointed.

Copy link
Contributor

@tausbn tausbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments, but otherwise this looks good.

Before deciding what to do with the missing flow (whether to ignore it or add it), do we know what the situation is for global variables? (After all, a function definition is just a glorified assignment to a variable.)
I would be curious to see if we're also missing flow in that case.

@RasmusWL
Copy link
Member

RasmusWL commented Sep 1, 2020

I guess the question is if we actually want a data flow edge from the function expression to the name, or if we are satisfied with the ability to look up the function expression whenever the name is used as a call. In the latter case, we can exclude function expressions from pointed.

Whatever makes us handle the above example and stuff like the following correctly 😄

def foo():
    return "foo"

f = foo

def foo():
    return "refined"

SINK(f())

@yoff yoff requested a review from tausbn September 1, 2020 12:20
tausbn
tausbn previously approved these changes Sep 1, 2020
@tausbn
Copy link
Contributor

tausbn commented Sep 1, 2020

I've approved this (perhaps prematurely), but I would still like to see what happens in the global variable case (and rasmuswl's cases) before we move on to other things.

Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to worry, I'll put in a request changes PR review, until tests cases have been added 😉

origin = pointer.pointsTo().getOrigin()
}

class PointsToConfiguration extends DataFlow::Configuration {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a short comment somewhere that describes what this "query" does?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

@yoff
Copy link
Contributor Author

yoff commented Nov 4, 2020

@tausbn had the nice suggestion of [adding or putting this next to] the old pointsto-tests. Thos seem to be inside many different directories, though, but perhaps it might be ok to put this test at the top-level?

@tausbn
Copy link
Contributor

tausbn commented Nov 4, 2020

I believe putting it, say, here should do the trick. This contains the bulk of the points-to-testing code.

@yoff
Copy link
Contributor Author

yoff commented Nov 5, 2020

Ah, yes, that looks great. Thanks :-)

@yoff
Copy link
Contributor Author

yoff commented Nov 5, 2020

A bunch of results. The three big groups are self, FunctionExpr, and ClassExpr. There are also two default parameter values and a convoluted import. The snippet in (the now renamed) v_pointsto_regressions.py is not relevant now that there are other examples of self, but it still seemed like a good idea to have a place to put regressions found in snapshots.

@yoff
Copy link
Contributor Author

yoff commented Nov 5, 2020

For self it seems to be the case that there is a ControlFlowNode for the parameter. There is no data flow from that one because the data flow analysis is based on the corresponding EssaVariable.

@tausbn
Copy link
Contributor

tausbn commented Nov 5, 2020

Aha, so maybe this would go away if we did away with the EssaNodes, as discussed previously? (By instead flowing to and from that control flow node.)

@RasmusWL
Copy link
Member

RasmusWL commented Nov 5, 2020

@yoff I don't quite remember all the context for this PR, but did you add the test cases from my comments yet? 😊

@yoff
Copy link
Contributor Author

yoff commented Nov 6, 2020

Aha, so maybe this would go away if we did away with the EssaNodes, as discussed previously? (By instead flowing to and from that control flow node.)

Yes, I believe so.

@yoff
Copy link
Contributor Author

yoff commented Nov 6, 2020

@yoff I don't quite remember all the context for this PR, but did you add the test cases from my comments yet? 😊

No, which is why I did not request your review yet. That is my next move, though :-)

@yoff yoff requested a review from RasmusWL November 6, 2020 12:31
@yoff
Copy link
Contributor Author

yoff commented Nov 6, 2020

def foo():
    return "foo"

f = foo

def foo():
    return "refined"

SINK(f())

As it stands, this does not compile, so I moved the redefinition into a function.

…es.py

Co-authored-by: Rasmus Wriedt Larsen <rasmuswriedtlarsen@gmail.com>
@yoff yoff requested a review from RasmusWL November 6, 2020 14:05
Copy link
Member

@RasmusWL RasmusWL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good test to have in it's current state.

I think it could be valuable to solve the missing data-flow for self, so we can run this on real snapshots to explore new cases that we are not handling.

I'm leaving the approving review without merging, so it can be your call whether you want to do this as part of this PR, or later 😊

@yoff yoff merged commit 26286e5 into github:main Nov 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants