Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
If I were designing Python's import from scratch (snarky.ca)
80 points by ingve on Dec 4, 2015 | hide | past | favorite | 32 comments


I wrote about this many years ago and I think my conclusions back then are still valid: http://lucumr.pocoo.org/2009/7/24/singletons-and-their-probl...

Turns out other languages have evolved in the mean time and came up with much better solutions. In particular with dynamic languages ES6 modules in combination with npm's approach to package dependency management is the current benchmark. ES6 exposes bindings instead of values which solves a ton of issues that Python will never be able to fix with it's import system.


Can you elaborate on the issues that exposing bindings instead of values fixes or point me to an article that explains this? I'm not too familiar with the ES6 import system and I am curious, as I have used Python's extensively and encountered many of the issues in your article.


It gives a read-only peek at the scope of the imported module.

    import {foo} from 'bar.js';
    // you can't re-bind `foo` from here, but if
    // a function defined in 'bar.js' mutates it, 
    // the change is reflected here too.
It enables circular dependencies, for free, and makes it easy to discard unused code.

See the examples at the Rollup playground.

http://rollupjs.org


I'd say Go is the benchmark in case of import system.

- consistent syntax - no circular dependencies - you can only import a module, only one symbol is polluting the namespace

I wonder what will be the winning solution for versioning.

npm and 260 char limit for path on Windows was driving me nuts.


Go's import system has tons of problems.

1) inconsistent syntax: import "github.com/fsouza/go-dockerclient"; docker.NewClient...

Notice how you import something with the trailing path being "go-dockerclient" but the actual package name is "docker"?

In npm, if you type x = require("foo"), you always refer to it as 'x.something'. You don't have to guess or read docs.

2) build flags. What the fuck? // +build fuckme

3) Canonical import paths. Double what the fuck? For those of you who don't know, if you have "github.com/foo/bar" but write 'package bar // import "github.com/google/bar"' then a user of your package will not be able to compile it unless they move it to the directory you gave.

Yeah, no joke.... I'm in awe at this stupidity

4) Poor project-based importing (something actually worse than virtualenv; nothing near as good as npm).

5) init side effects, e.g. look at how pprof works. Literally you do an underscore import and it mutates the behaviour of the http package. Thanks.


1) Yes, that is messed up, but that's just cause that package isn't following standard idiom

2) Build flags are very useful, on the verge of required, for some situations. For instance, code that needs seperate implementations for each os.

3) This is very useful for packages that may be on github, but served through an alternate path. If they import the github version, you are now stuck with github. If you move to self hosted git or something, their existing codebase won't know how to do an update, and will require a rewrite to use the new version.

4) I'll be the first to admit it isn't the greatest, but, it's rendered mostly moot by the fact that you build single binaries. Deal with a slightly annoying import system in exchange for effortless deployments and not having to worry about libraries and such on the servers? Heck yeah!

5) I agree on this one, it should be something more like doing pprof.Register(*ServeMux) in an init in your program.


> I'd say Go is the benchmark in case of import system.

I prefer systems that (a) allow namespace nesting within a single package and (b) don't run arbitrary code on import (in Go's case, init()). The latter is particularly pernicious because systems that have arbitrary code execution on import have to define some kind of global ordering on imports (throughout the entire program!), since the order that packages get loaded in very much affects the program semantics. This is called the "static initialization order fiasco" in C++ and Go inherits it too.


> npm and 260 char limit for path on Windows was driving me nuts.

If you can upgrade, node 5/npm 3 fixed the by flattening out the node_modules folder as much as possible.


"As much as possible" would be doing node_modules/$PACKAGE-$VERSION for all packages (like e.g. Rubygems does). But instead they're doing node_modules/$PACKAGE and still nesting conflicting dependencies, so it's entirely possible to still run into the same issue.


note, Go does allow namespace pollution via dot imports, though they are ill-regarded by most of the community.


Sometimes you really do legitimately have a lot of static, global state. For instance, consider a program that needs to reference local, national, and/or global geography and its metadata, on a wide scale, randomly. All the countries have subdivisions, and subdivisions of subdivisions, and so on all the way down, which are all inter-referential. You can easily hit 100 MB of state that is essentially constant, and needs to be indexed 50 different ways for millions of function calls per user action that would access it.

Why not manage access to such things in a singleton class?


Singletons are fine, but it's almost always better to lazily initialize them rather than eagerly, to save on startup time. As a bonus, if you have no eager global initialization in your language, you can make import completely side-effect-free, which is a really nice simplification that I wish more languages adopted.


The slow startup from imports is my biggest annoyance with python.

We had a decent sized library at a previous company that pulled in modules that defined huge register maps, wrapped c++ libraries, etc.

I wrapped all imports in a lazy importer that was triggered by the first attribute access. It brought our script startup times from 3 seconds down to a fraction.

Blows me away that this isn't default behavior for ALL modules.


That behaviour feels to me like it may result in faster startup, but would also result in less predictable performance for code bases with somewhat random access such as web applications.

You could I suppose do some cache warming to make sure the first user request isn't slowed down, but its one more thing to think about.


>"I wrapped all imports in a lazy importer that was triggered by the first attribute access."

Well, putting code in the root of your file is generally the problem to such things, I would argue. Granted, I don't know about how that is necessary when it comes to "register maps" and "wrapped c++ libraries". But I'd imagine you should be encapsulating them away anyways and that would include fixing large startup time by design.


If this was the default, any change could completely upend the initialization order of your app. "Explicit is better than implicit".


As long as these data are immutable, sharing them is easy.

If you want hundreds on megs of shared mutable state, a database is a proper solution.


And make 300,000 queries over TCP like getting the list of county names in a state, or getting the list of place names in a county, because my actual use case involves fuzzy matching an arbitrary subset determined by user input, of 18,000,000+ unsanitized data records against geographical place names so they can be assigned geometries?

I'd like the program to finish in 15 seconds or less, please.


If you're making 300K queries over TCP to a database in order to do a calculation, then I'd say you need a much better data structure and/or algorithm. Either that, or do the bulk of the calculations on the database in P/T-SQL, or pre-calculate before-hand so that your on-line queries are just lookups instead of actual calculations.


You know there is such a thing as querying a database without going over a network, right?


It's moot.

The train of the discussion, if you go and read the OP's link and inner links, is like this:

- Singletons are bad - Why are singletons bad? - They're not "real" OO, they're global state, they obfuscate dependency, etc, etc, etc - But what if I just legitimately have a ton of global state? - Use a database! Use a filesystem!

The last point in the chain admits that the first point is mistaken. "Use a database" is just saying "use someone else's code to solve your problem". What if the database is implemented using singletons? What if it uses code that isn't OO at all? All you've accomplished is to say "OO can't solve your problem, use something external". In fact, my problem is solved just fine by using a singleton.


>essentially constant

immutable singleton is fine. The other concern is performance, but if you don't have to do this, there is no point.


I totally agree with nixing *-imports and attribute imports. Not only do they hide where something came from but they also tend to actively hinder refactoring by making it harder to tell how packages are being used.


Nice for REPL work though.


You can use "import longmodulename as mn"--many people do this today, and it isn't nearly as harmful as "*".


Overall, I think many would agree that "import " and "from x import " are more harmful than any of the alternatives. It's a laziness code-smell and will come back to bite you.


If I were designing import in JS I'd give option to inject dependency into imported module that overrides the one that language would normally load from file.

Like this:

    // model.js:
    var db = require("db");


    // controller.js:
    var model = require("model");


    // test.js:
    var model = require("model", { db: require("db_mock") });

I could probably implement something like that as webpack plugin.


The docs for importlib are a nightmare right now.

So many things are deprecated that it is hard to work out which things you should use.

The deprecated bits should either be moved to the end of the page or to another page.


Very interesting write-up, I'm glad to see a core developer thinking deeply on this topic because Python has a lot of room for improvement.

This touches briefly on 1 of 3 major issues I have with python. I'll try my best to articulate them from smallest-to-greatest impact.

1. import semantics

It would be amazing to see the import module be abstracted out of the sys module. Not just for the reasons mentioned in the article. Ideally, it would allow developers to override/extend the implementation of import and experiment with new ways to handle.

For instance, why can't a developer write a module to experiment with using es6-module style imports.

``` import {attr1, attr2} from module@[version]/submodule import * from module2 ```

When I wrote the pypreprocessor lib, I simply wanted to add the ability to import and conditionally-compile code using inline c-style preprocessor directives. It works by blocking the import, preprocessing the code, then importing the postprocessed version of the code.

Implementing it is ugly because there's no way to inline the preprocessor step. If I had the ability to extend and inline the preprocessor step as a custom import module, the specifics would be transparent to the user.

With a preprocessor it would be possible to write python2/python3 code side-by-side making the transition much easier for library developers. Implementing it as a custom import add-on would allow developers to extend the platform without polluting core (ie since GvR is vehemently against adding preprocessor capability to core).

Why does python default to the mentality of 'one true way'? There is a huge ecosystem of developers willing to build/extend and experiment with the language in ways that aren't immediately apparent to core design committee.

2. Packages violates 'Explicit is better than implicit'

Package management as a whole is a terrible, broken experience for library developers and consumers of those libraries alike. Global-only package installations, dependency hell, lack of version configuration, inability to automate dependency loading, etc.

"Vendoring is not hard thanks to relative imports and most projects don't seem to need it."

I highly disagree. Expecting users to either install dependencies globally or manually copy them to the package contents sucks. The former introduces potential environmental-level side effects such as dependency conflicts and requires virtualenv to isolate packages. The latter is prone to error, adds maintenance overhead as dependencies are updated, and makes it unclear to developers who want to add dependencies without polluting source control.

Just look at the NPM ecosystem for inspiration. Vendoring, incl versioned dependency management is done on a per-project level by default and the benefits are clear. I can execute `npm install` from the CLI and the package manager will download+install all of a projects dependencies (ie and dependencies of dependencies) automagically. There's a reason the NPM eclipsed every other packaging ecosystem in such a short period. It 'just works'.

Virtualenv is a hack. Instead of supporting package-level dependencies, it forces developers to limit the environment to the package-level. It introduces all of the same problems inherent to using globally installed dependencies, except at a local level.

This violates II of 'The Twelve Factor App' http://12factor.net/dependencies

Pets vs Cattle

While the current model works well with 'pets', ie long-running processes and persistent environments. It causes a lot of problems when it comes to 'cattle', ie transient/disposable environments because it requires developers to re-create an identical copy of the development environment on the target for every deployment. Not only does this add a lot of provisioning overhead/complexity but is highly prone to error if/when the environment changes (ex modules added, configuration changes).

DevOps is hard enough, even without the overhead of cloning environment-specific details of the language platform.

3. Religious adherence to 'the one true way'

Specifically, python is a platform implementation when it 'should' be a language.

Don't get me wrong, Python is a significant improvement over the everything-and-the-kitchen-sink approach used by platforms like .NET/Java. I understand the need for including enough core functionality to get users up and running quickly. What I don't agree with is the difficulty of extending/overriding the default functionality.

Along those lines, I'd include a requirements: - expose the core modules as classes (ie not just import) that can be extended/overridden - take some time to engineer a really good package management tool - setup.py shouldn't require hand-coding, pip should be capable of generating it - unify the configuration at the package-level - additional package install steps can be added as pre/post hooks in the configuration

-------

I have some obvious biases that come from the JS/Node.js development community. Don't get me wrong, I love python as a language and will continue to use it in the distant future. Without good package/version management I don't see it as a good alternative for building anything but trivial applications.


Good post but I call hyperbole on one of your closing remarks:

> Without good package/version management I don't see it as a good alternative for building anything but trivial applications.

Even if we agree Python has mediocre package management then it is still more of an annoyance/hinderance than a factor that realistically rules out "building anything but trivial applications".


You're right. Python is definitely useful for more than 'trivial applications'. My statement was more a matter of subjective personal preference.

The emphasis on package management isn't so much about building non-trivial applications as it is about deploying non-trivial applications.

My ideal is a system that can be built in a composable manner using the standard tools.

A deployment would include: - cloning the source - installing dependencies - setting env vars - running tests - launching the application

It should be trivial to repeat this process to deploy to multiple targets as well as automate continuous integration testing.

With the current tool chain there's no clear path to accomplish this without relying on a higher level abstraction like Docker.


not to quibble but the author spelled "import" as "imort" in the url slug.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: