AnsiballZ improvements

Now that we don't need to worry about python-2.4 and 2.5, we can make
some improvements to the way AnsiballZ handles modules.

* Change AnsiballZ wrapper to use import to invoke the module
  We need the module to think of itself as a script because it could be
  coded as:

      main()

  or as:

      if __name__ == '__main__':
          main()

  Or even as:

      if __name__ == '__main__':
          random_function_name()

  A script will invoke all of those.  Prior to this change, we invoked
  a second Python interpreter on the module so that it really was
  a script.  However, this means that we have to run python twice (once
  for the AnsiballZ wrapper and once for the module).  This change makes
  the module think that it is a script (because __name__ in the module ==
  '__main__') but it's actually being invoked by us importing the module
  code.

  There's three ways we've come up to do this.
  * The most elegant is to use zipimporter and tell the import mechanism
    that the module being loaded is __main__:
    * 5959f11c9d/lib/ansible/executor/module_common.py (L175)
    * zipimporter is nice because we do not have to extract the module from
      the zip file and save it to the disk when we do that.  The import
      machinery does it all for us.
    * The drawback is that modules do not have a __file__ which points
      to a real file when they do this.  Modules could be using __file__
      to for a variety of reasons, most of those probably have
      replacements (the most common one is to find a writable directory
      for temporary files.  AnsibleModule.tmpdir should be used instead)
      We can monkeypatch __file__ in fom AnsibleModule initialization
      but that's kind of gross.  There's no way I can see to do this
      from the wrapper.

  * Next, there's imp.load_module():
    * 340edf7489/lib/ansible/executor/module_common.py (L151)
    * imp has the nice property of allowing us to set __name__ to
      __main__ without changing the name of the file itself
    * We also don't have to do anything special to set __file__ for
      backwards compatibility (although the reason for that is the
      drawback):
    * Its drawback is that it requires the file to exist on disk so we
      have to explicitly extract it from the zipfile and save it to
      a temporary file

  * The last choice is to use exec to execute the module:
    * f47a4ccc76/lib/ansible/executor/module_common.py (L175)
    * The code we would have to maintain for this looks pretty clean.
      In the wrapper we create a ModuleType, set __file__ on it, read
      the module's contents in from the zip file and then exec it.
    * Drawbacks: We still have to explicitly extract the file's contents
      from the zip archive instead of letting python's import mechanism
      handle it.
    * Exec also has hidden performance issues and breaks certain
      assumptions that modules could be making about their own code:
      http://lucumr.pocoo.org/2011/2/1/exec-in-python/

  Our plan is to use imp.load_module() for now, deprecate the use of
  __file__ in modules, and switch to zipimport once the deprecation
  period for __file__ is over (without monkeypatching a fake __file__ in
  via AnsibleModule).

* Rename the name of the AnsiBallZ wrapped module
  This makes it obvious that the wrapped module isn't the module file that
  we distribute.  It's part of trying to mitigate the fact that the module
  is now named __main)).py in tracebacks.

* Shield all wrapper symbols inside of a function
  With the new import code, all symbols in the wrapper become visible in
  the module.  To mitigate the chance of collisions, move most symbols
  into a toplevel function.  The only symbols left in the global namespace
  are now _ANSIBALLZ_WRAPPER and _ansiballz_main.

revised porting guide entry

Integrate code coverage collection into AnsiballZ.

ci_coverage
ci_complete
This commit is contained in:
Toshio Kuratomi 2018-06-20 11:23:59 -07:00
commit 52449cc01a
32 changed files with 349 additions and 321 deletions

View file

@ -38,6 +38,26 @@ There is an important difference in the way that ``include_role`` (dynamic) will
Deprecated
==========
Expedited Deprecation: Use of ``__file__`` in ``AnsibleModule``
---------------------------------------------------------------
.. note:: The use of the ``__file__`` variable is deprecated in Ansible 2.7 and **will be eliminated in Ansible 2.8**. This is much quicker than our usual 4-release deprecation cycle.
We are deprecating the use of the ``__file__`` variable to refer to the file containing the currently-running code. This common Python technique for finding a filesystem path does not always work (even in vanilla Python). Sometimes a Python module can be imported from a virtual location (like inside of a zip file). When this happens, the ``__file__`` variable will reference a virtual location pointing to inside of the zip file. This can cause problems if, for instance, the code was trying to use ``__file__`` to find the directory containing the python module to write some temporary information.
Before the introduction of AnsiBallZ in Ansible 2.1, using ``__file__`` worked in ``AnsibleModule`` sometimes, but any module that used it would fail when pipelining was turned on (because the module would be piped into the python interpreter's standard input, so ``__file__`` wouldn't contain a file path). AnsiBallZ unintentionally made using ``__file__`` always work, by always creating a temporary file for ``AnsibleModule`` to reside in.
Ansible 2.8 will no longer create a temporary file for ``AnsibleModule``; instead it will read the file out of a zip file. This change should speed up module execution, but it does mean that starting with Ansible 2.8, referencing ``__file__`` will always fail in ``AnsibleModule``.
If you are the author of a third-party module which uses ``__file__`` with ``AnsibleModule``, please update your module(s) now, while the use of ``__file__`` is deprecated but still available. The most common use of ``__file__`` is to find a directory to write a temporary file. In Ansible 2.5 and above, you can use the ``tmpdir`` attribute on an ``AnsibleModule`` instance instead, as shown in this code from the :ref:`apt module <apt_module>`:
.. code-block:: diff
- tempdir = os.path.dirname(__file__)
- package = os.path.join(tempdir, to_native(deb.rsplit('/', 1)[1]))
+ package = os.path.join(module.tmpdir, to_native(deb.rsplit('/', 1)[1]))
Using a loop on a package module via squash_actions
---------------------------------------------------