More

deizel · on July 31, 2023

Given you'll want to activate a virtual environment for most Python projects, and projects live in directories.. I find myself constantly reaching for direnv. https://github.com/direnv/direnv/wiki/Python

    echo "layout python\npip install --upgrade pip pip-tools setuptools wheel\npip-sync" > .envrc

When you CD into a given project, it'll activate the venv, upgrade to non-ancient versions of Pip/etc with support for latest PEPs (ie. `pyproject.toml` support on new Python 3.9 env), verify the latest pinned packages are present.. it's just too useful not to have.

    direnv stdlib

This command (or this link https://direnv.net/man/direnv-stdlib.1.html) will print many useful functions that can be used in the `.envrc` shell script that is loaded when entering directories, ranging from many languages, to `dotenv` support, to `on_git_branch` for e.g. syncing deps when switching feature branches.

Check it out if you haven't.. I've been using it for more years than I can count and being able to CD from a PHP project to a Ruby project to a Python project with ease really helps with context switching.

deizel · on Aug 22, 2018

Ditto, especially in combination with Python/Django, as used by Nextdoor. Ironically, they had already removed Celery from their stack a few years prior. https://engblog.nextdoor.com/nextdoor-taskworker-simple-effi...

deizel · on April 11, 2018

Can be tricky to use EAV data models with traditional ORMs.. this type of functionality can often be slow or require plugins, if implemented at all:

https://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80...

philliphaydon · on April 11, 2018

Wondering if they shoved the data in PostgreSQL with JSONB how well it would perform over EAV.

jjeaff · on April 11, 2018

I think that jsonb may not be as performant as EAV. You don't need joins or unions, but if you are dealing with dynamic fields, you need to know the fields ahead of time and set indexes for them in jsonb. For eav you just have to index your values table.

philliphaydon · on April 11, 2018

PostgreSQL can use multiple indexes so you don't need to worry about needing to know about the fields ahead of time.

Likewise you can get away with a full document GIN index.

I played around with some basic report stuff at work last year, the EAV data on my local machine, the report took ~7 seconds to run. I shoved the same data into PostgreSQL as JSONB, indexed it just as full doc cos I was lazy, the same report took ~80ms.

Obviously this isn't 'proof' my dataset was only 1.5m by 15m records. But with my limited knowledge i do believe it would perform better, I don't know how much better... but I think better...

deizel · on April 18, 2017

It seems so - the `docker/moby` repository[0] now redirects to `linuxkit/linuxkit`, but the `moby` cli from `linuxkit/linuxkit`[1] is being moved to the `moby/moby` repository[2].

[0]: https://github.com/docker/moby

[1]: https://github.com/linuxkit/linuxkit/tree/master/src/cmd/mob...

[2]: https://github.com/moby/moby/pull/32693

deizel · on April 18, 2017

As I understand it (though I could be wrong) - this repository (in relation to the old one) is intended to be a parent.

Rather than having just the Docker engine, it will coordinate docker, swarmkit, infrakit, linuxkit in a single project.

These will be swappable, so for example you could a) swap swarmkit for Kubernetes, b) swap linuxkit for Debian, c) swap infrakit with Terraform.

Like "Docker for Mac/Windows/AWS/Azure/GCE", etc. already exist - Moby will likely house all these variations and allow the creation of custom "Docker/Other for X/Y/Z".

deizel · on April 18, 2017

The new README[0] is in a pull request[1].

[0]: https://github.com/moby/moby/blob/moby/README.md

[1]: https://github.com/moby/moby/pull/32691

deizel · on April 11, 2017

I believe most of your questions have a (mostly) positive answer. Admittedly the Docker project moves fast, so I've tried to provide a link or two for each:

> Are these signed with some sort of crypto and known-good keys, similar to package signing keys attached to a package repo, or are we taking Docker's word for it?

The official images are all signed with Notary[0] (released <2 years ago). The "Content Trust" feature makes use of TLS plus a number of keys[1]. It can be enabled in required environments to prevent the pulling of non-signed images (`export DOCKER_CONTENT_TRUST=1`).

> I had to go 3 levels up from the link on the Docker Hub page to get to this one because most of the images are FROM:something.

There is a helpful Chrome extension, OctoLinker[2], that makes the `FROM parent` clickable (among other non-Docker things). I'm sure there are similar extensions, but this is the one I currently use.

> a) still derived from some other upstream Dockerfile (must this file also be validated by Docker to qualify as an "official repo"?)

The `FROM` line in the Dockerfile would likely be the first thing to undergo scrutiny, for obvious reasons. Also, all the layers go through security scanning[3], some issues can be fixed[4] and others are tracked upstream (e.g. by Debian in the case of the Ruby image you linked[5]).

> you end up with an image that is very bloated due to the way Docker's caching layer functions

You can now use `docker build --squash`[6] to combine multiple image layers while still benefiting from layer caching during builds. Also, the final image won't contain files that are added in previous layers and removed in later ones. (And with multi-stage builds recently merged, soon multiple inheritance will be easier.)

> Is this "transparency" only the case on "official" images? As far as I can see Docker Hub just stores the pushed binary blob and not the Dockerfile required to build it

I would say groups or sole developers that want others to use their images will usually provide documentation and links to the Dockerfiles. Of course, this isn't always possible or desired (e.g. see closed-source projects like Windows Server Core[7], or the countless people using Docker Hub as a free image host.)

[0]: https://github.com/docker/notary

[1]: https://docs.docker.com/engine/security/trust/trust_key_mng/

[2]: https://octolinker.github.io/

[3]: https://docs.docker.com/docker-hub/official_repos/#how-do-i-...

[4]: https://github.com/docker-library/official-images/pulls?utf8...

[5]: https://github.com/docker-library/ruby/issues/117

[6]: https://docs.docker.com/engine/reference/commandline/build/#...

[7]: https://hub.docker.com/r/microsoft/windowsservercore/

cookiecaper · on April 11, 2017

Thanks for this post. It does indeed answer most of my follow-up questions.

I wanted to let you know that I gave `docker build --squash` a try today. This is the output I got:

    "--squash" is only supported on a Docker daemon with experimental features enabled

so that feature is not yet mainlined.

Also, I encountered another issue today when trying to do a `docker push`. It retried a cache layer multiple times before conceding with `open /dev/mapper/docker-254:0-xxx-yyy-zzz: no such file or directory` on one of the layers. I had to rebuild the image with `--no-cache` to get it to push.

Not doing anything unusual/fancy, just an ordinary docker push, which worked fine on other images before and after. No auto-cleanup scripts running in the background or anything that should cause a layer to go mysteriously missing.

deizel · on March 30, 2016

See also: https://chrome.google.com/webstore/detail/octo-linker/jlmafb...

deizel · on Feb 10, 2016

Someone should tell this guy: https://www.youtube.com/watch?v=s7EgrY17Ozk

deizel · on Jan 27, 2016

> ... anyone who wants to enlarge its available message space ...

Bear in mind that the proof-of-work algorithm actually aims to reduce chatter on the network. Every ten minutes, a single Bitcoin full node broadcasts: "Hey, this is the latest state of the ledger, and you can trust it because [insert proof-of-work that was calculated out-of-band]." So, as the network's processing power scales up, the computation difficulty is also scaled up, so that the "message space" required for consensus remains constant.

> That seems rather limited for a growing population's needs for communications around an increasingly valuable resource base.

While the way distributed consensus is achieved (see "Byzantine Generals' Problem") consumes most of the network's processing power, it's what makes the resource valuable. However, since this is done in parallel, it hardly consumes any of the network's bandwidth, freeing up that "message space" for important things like sending/receiving/propagating transactions, which are much less demanding computationally. All that is required there is some elliptic curve cryptography (ie. a valid public/private key pair + any low-powered device with correct software and an internet connection.)