Skip to content

Add Hook For Import-Related Events #149466

@ericsnowcurrently

Description

@ericsnowcurrently

Feature or enhancement

There are a number of cases in which it is helpful to know when importing a module has reached a certain point. It would be helpful to have a runtime hook to support this. Think of it like sys.monitoring, but for imports.

While there are many possible import-related operations we might want to track, here are the ones I had in mind specifically:

  • import started
  • import ended
  • module loading
  • module loaded

Other possibilities include "finding module", "module found", and "import called".

Status Quo

Audit hooks (PEP 578) give us the first one with the "import" event (see https://docs.python.org/3/library/audit_events.html). 1 The audit event is emitted at the very beginning of importlib._bootstrap._find_and_load() (source). 2 At that point the module hasn't been created yet nor has sys.modules been modified. However, we reach that point only if the module isn't already in sys.modules, so only the first import statement triggers the event. 3

The status quo, with the "import" audit event, has some deficiencies, including:

  • no module spec available since the module hasn't been found yet
  • no indication of when import finishes, including the outcome
  • no insight into the phases/steps of the import machinery
  • no reload detection

The simplest solution right now for users is to monkey-patch importlib._bootstrap (specifically _find_and_load() and _load_unlocked()). That isn't ideal.

Users can also use a custom __import__(), though it's a lot of tricky work to get right anything beyond "import ended". Users could similarly use regular import hooks (metapath, path-based), with the same caveat. In both cases, the approach is susceptible to inadvertent (and even intentional) interference, but using import hooks is especially vulnerable (more than a dedicated event hook would be).

Proposal

The solution I think would work best is to add the following new audit events:

  • "importlib.import_started" (args: module name)
  • "importlib.import_ended" (args: module name)
  • "importlib.module_loading" (args: module name, spec)
  • "importlib.module_loaded" (args: module_name, spec)

(I may switch the proposal to use an ";end" suffix. See "Possible Alternate Events 3" below.)

The two "import_*" events would wrap importlib._bootstrap._find_and_load(). The two "module_*" would wrap importlib._bootstrap._load_unlocked().

We would also leave the existing "import" audit event, for backward compatibility. It would be emitted immediately before "importlib.import_started". It would still be emitted at the start of _imp.create_dynamic(). 1

While they happen at the same time, at least currently 2, we would have "importlib.import_started" be distinct from "import" for the following reasons:

  • the args are different
  • it matches the actual call within importlib
  • it matches the other new events
  • it is only emitted when importing starts and not when loading an extension module

Note that this proposal introduces the concept of an audit event representing the end of an operation. AFAICT there aren't any existing audit events that do so.

Also note that the event names do not match any function names. Nearly all audit events do, with the exception of "setopencodehook" (which is similar to the corresponding PyFile_SetOpenCodeHook()) and, appropriately enough for this proposal, "import".

Why "importlib.*"?

(expand)

In speaking about this privately with @zooba, he suggested we introduce an "importlib" event namespace if we go with audit hooks:

I think it's best to distinguish them from the core runtime events,
and probably frame them as "when the default importers from importlib
are used to resolve the module, you'll get this event"

Possible Extra Event Args

(expand)
  • "importlib.import_ended": (args: module name, module)
  • "importlib.import_ended" (args: module name, success)
  • "importlib.module_loading" (args: module name, spec, finder)
  • "importlib.module_loaded" (args: module_name, module)
  • "importlib.module_loaded" (args: module_name, spec, success)

The "finder" arg would fit better on a separate "importlib.module_found" event (see "Possible Extra Events" below), so I've left it off the proposed "module_loading" event. The other extra arg values could be inferred from the import state, so I've left them off the proposal.

Possible Alternate Events 1

(expand)

We could simplify down to two new events:

  • "importlib.importing" (args: module name, end)
  • "importlib.loading" (args: module name, spec, end)

These would accomplish the same thing as the proposed four events by using the "end" arg to distinguish between the start and end of the operation.

I didn't go with this because it's better to have each distinct event to have a distinct event name (and args). At the least, one-event-name-per-event is conceptually simpler and easier for audit hooks to handle. One might argue that the "import" event is used for two distinct events (import-started and extension-module-loading), but I'd argue we should add a distinct "imp.create_dynamic" event rather than have more multi-purpose events.

Furthermore, nearly all existing events indicate the beginning of some call or operation. For those it wouldn't be practical to try to retrofit any of them with an extra arg to indicate the end of the operation, which is the convention we'd be introducing. Instead, for those we'd add distinct events. It would be unhelpful to add a second convention (indicate-end-with-arg) for a new event.

Possible Alternate Events 2

(expand)

It may make sense to match the actual function names involved:

  • "importlib._bootstrap._find_and_load"
  • "importlib._bootstrap._find_and_load;end"
  • "importlib._bootstrap._load_unlocked"
  • "importlib._bootstrap._load_unlocked;end"

or in the spirit of "setopencodehook":

  • "importlib.find_and_load"
  • "importlib.find_and_load;end"
  • "importlib.load"
  • "importlib.load;end"

In the former case, I don't see the point in tying the event to the specific functions. These audit events represent runtime operations rather than calls. On top of that, the event names would be too coupled to internal function names (and an internal module).

In the latter case, it could be confusing that no such functions exist. It's better to be clear with the names that it's an operation and not a call, like "import" does.

In either case, we'd be introducing the convention of an ";end" suffix to indicate the paired event. That isn't necessarily a bad thing. See the next section.

Possible Alternate Events 3

(expand)

It may be more clear to use the same name for the event pairs, and distinguish them with an explicit suffix:

  • "importlib.importing" (args: module name)
  • "importlib.importing;end" (args: module name)
  • "importlib.loading" (args: module name, spec)
  • "importlib.loading;end" (args: module_name, spec)

This approach could apply to other existing events just as easily and clearly, if we want to add the ending pair in the future.

This is worth further discussion. If using such a suffix is a reasonable general approach for future use then I'll probably change the proposed event names accordingly. Otherwise I'll leave them as proposed.

Possible Extra Events

(expand)

There are other import-related events that may be helpful:

  • "import_called" (args: module name)
  • "import_ended" (args: module name)
  • "importlib.finding_module" (args: module name)
  • "importlib.module_found" (args: module name, finder, spec)

The pairs correspond to wrapping importlib._bootstrap._find_spec() and __import__() (AKA PyImport_ImportModuleLevelObject()).

The possible variations described above apply as well here. For example:

  • "__import__"
  • "__import__;end"
  • "importlib.finding"
  • "importlib.finding;end"

I've left these out of the proposal because I don't need them at the moment. They would be easy enough to add later if desired.

Some other import-related operations to consider in the future:

  • metapath importer found for module
  • path importer found for module

Rationale for Using Audit Hooks

(expand)

There are other ways the runtime could provide the desired events. For example:

  • hook in sys (e.g. sys.add_import_event_hook())
  • hook in importlib (e.g. importlib.add_import_event_hook())

Adding a new event hook framework seems undesirable and unnecessary. Thus I am convinced adding audit hook events is the way to go.

My main concern with using audit hook events for this was that we could be abusing the audit hook system by stepping outside the intended purpose. The name "audit" implies a different purpose than what I need the events for.

However, PEP 578 says the objective is to "make actions taken by the Python runtime visible to auditing tools. Visibility into these actions provides opportunities for test frameworks, logging frameworks, and security tools to monitor and optionally limit actions taken by the runtime."

That statement resolves the concern for me.

A lesser concern was that most of the existing audit events represent [generally] user-initiated operations on (and user interactions with) the runtime and stdlib. That isn't strictly true and the PEP is more clear about it, so I'm not worried there.

There's one last concern which I noted earlier but still haven't fully resolved: it isn't 100% clear that it's okay to introduce the concept of "operation ended". I see it as a meaningful and useful addition, and that it fits with the purpose described in PEP 578. However, it deserves at least some discussion.

Key Questions for Discussion

  • are there any objections to using audit hooks for this?
  • does the event name need to match a function name?
  • is it okay to introduce the concept of an audit event that's emitted when an operation ends?
  • is the ";end" suffix an acceptable convention?

CC @zooba, @brettcannon, @ncoghlan

Footnotes

  1. Another "import" audit event is emitted at the beginning of _imp.create_dynamic. At the time of the "import" audit event, the loading phase of import has already started, so the event hook has access to the initialized (but not loaded) module, and thus its spec, via sys.modules. 2

  2. Prior to 2fd43a1 (gh-138310: Adds sys.audit event for import_module #138311) the audit event was emitted by the builtin __import__(), right before the call to importlib._bootstrap._find_and_load(). That changed moved it to right inside the _find_and_load() call, which is functionally equivalent (as long as nothing is monkey-patched). 2

  3. importlib.import_module() emits the audit event for every call, not just the first one for each module.

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.15new features, bugs and security fixestopic-importlibtype-featureA feature request or enhancement

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions