The Situation

I wrote a script using Biopython which read a file containing a bunch of Genbank accession numbers, and downloaded the Genbank records:

   1 ### gentest.py
   2 
   3 from Bio import GenBank
   4 
   5 gi_list = ['AF339445', 'AF339444', 'AF339443', 'AF339442', 'AF339441']
   6 record_parser = GenBank.FeatureParser()             # GenBank file parser
   7 ncbi_dict = GenBank.NCBIDictionary(parser = record_parser,
   8                                    database = "nucleotide") # Dict for accessing NCBI
   9 
  10 count = 1
  11 for accession in gi_list:
  12     print "Accessing GenBank for %s... (%d/%d)" % (accession, count, len(gi_list))
  13     try:
  14         record = ncbi_dict[accession]           # Get record as SeqRecord
  15         RECORDS.append(record)                  # Put records in local list
  16     except:
  17         print "Accessing record %s failed" % accession
  18 
  19 
  20     count += 1

This worked fine as a script, but when I attempted to turn it into a Windows executable with py2exe and the setup.py script:

   1 ### setup.py
   2 
   3 from distutils.core import setup
   4 import py2exe, sys
   5 
   6 setup (name = "gentest",
   7        version = "0.10",
   8        url = r'http://bioinf.scri.sari.ac.uk/lp/index.shtml',
   9        author = "Leighton Pritchard",
  10        console = ["gentest.py"])

with the command python setup.py py2exe, attempting to run the resulting gentest.exe would throw an error.

The Error

This is the error thrown on running the executable:

{{{Traceback (most recent call last):

WindowsError: [Errno 3] The system cannot find the path specified:

The Problem

Location of Bio.config

With help from Thomas Heller on the Python-Win32 mailing list, the problem was identified. When the Bio package is imported, Bio/__init__.py imports a number of modules from the Bio.config module using the _load_registries function. The first problem occurs in line 52: (file version 1.21 from CVS)

   1     x = os.listdir(
   2         os.path.dirname(__import__("Bio.config", {}, {}, ["Bio"]).__file__))

Under normal script-like execution, the os.path.dirname call returns a string indicating a location accessible through the filesystem via os.listdir. However, py2exe uses new import hooks (via the builtin zipimport hook), described in PEP 302, so the location returned by the os.path.dirname call is located within the shared zip archive that py2exe creates. As a result, os.listdir fails, and the above error is thrown.

Module extensions

The arrangement with py2exe's shared zipfile causes problems further down the function. The _load_registries function expects that modules will have the .py extension, rather than the .pyc extension that the compiled files (all that are included in the zipfile) use.

   1     x = filter(lambda x: not x.startswith("_") and x.endswith(".py"), x)
   2     x = map(lambda x: x[:-3], x)            # chop off '.py'

Zipfile modules within Bio.config are thus not loaded.

The Solution

Existing code

The code to be changed for the _load_registries method is (lines 50-55 in Bio/init.py CVS version 1.21)

   1     # Load the registries.  Look in all the '.py' files in Bio.config
   2     # for Registry objects.  Save them all into the local namespace.
   3     x = os.listdir(
   4         os.path.dirname(__import__("Bio.config", {}, {}, ["Bio"]).__file__))
   5     x = filter(lambda x: not x.startswith("_") and x.endswith(".py"), x)
   6     x = map(lambda x: x[:-3], x)            # chop off '.py'

Which obtains a list of modules, (for later import as Bio.config.module_name).

Since we cannot obtain the list of modules with this code, we need to provide an alternative way of generating the list when the modules are in the shared zipfile.

Processing the zipfile

Firstly, we must determine whether the imported module comes from a zipfile, or is a straightforward import. This is done by checking for the .__loader__ attribute with if hasattr(config_imports, '__loader__'):

Next, we need to obtain the list of module files for Bio.config. These are all found within the Bio/config folder, so we can filter the filenames in the shared zipfile using the x = [zipfiles[file][0] for file in zipfiles.keys() if 'Bio\\config' in file] list comprehension.

The filenames in this list are absolute paths, so we can grab just the filename with another list comprehension x = [name.split('\\')[-1] for name in x].

We have to lose the extensions from these filenames, too. These are all .pyc files, so we can use a modification of the existing code's map and lambda x = map(lambda x: x[:-4], x). [Note: we could easily combine the last two steps, but I keep them separate for clarity].

We now have the required list of module filenames.

Putting the steps together, and combining with the original code, we have:

   1     # Load the registries.  Look in all the '.py' files in Bio.config
   2     # for Registry objects.  Save them all into the local namespace.
   3     # Import code changed to allow for compilation with py2exe from distutils
   4     config_imports = __import__("Bio.config", {}, {}, ["Bio"])  # Import Bio.config
   5     if hasattr(config_imports, '__loader__'):                   # Is it in zipfile?
   6         zipfiles = __import__("Bio.config", {}, {}, ["Bio"]).__loader__._files
   7         x = [zipfiles[file][0] for file in zipfiles.keys() \
   8              if 'Bio\\config' in file]
   9         x = [name.split('\\')[-1] for name in x]# get filename
  10         x = map(lambda x: x[:-4], x)            # chop off '.pyc'
  11     else:                           # Not in zipfile, get files normally
  12         x = os.listdir(
  13             os.path.dirname(config_imports.__file__))
  14         x = filter(lambda x: not x.startswith("_") and x.endswith(".py"), x)
  15         x = map(lambda x: x[:-3], x)            # chop off '.py'

Compilation with the original setup.py script and python setup.py py2exe then ran smoothly, apart from a couple of missing modules which had no impact on the running of the executable.

Update

The changes have now (3rd Feb 04) been incorporated into the Biopython source in CVS.

ConfigImportProblems (last edited 2008-07-08 11:27:44 by localhost)