I run the infrastructure (k8s+helm on GCP) for a PoS validator of a top 50 crypto project. My client is a big whale who bonded ~$15.6M at the projects all-time high. It’s about 1-2 hours of work per month and my 10% cut of the rewards nets me anywhere from $4k-$40k/mo. depending on the price. Given the fact that crypto is in the gutter now I haven’t been selling any to USD, but it’s a nice way to stack an asset with high upside potential while doing very little work.
Automox has seen double digit growth year over year, and this year is no exception. We recently raised $110 million Series C led by Insight Partners and appointed Dmitri Alperovitch, Crowdstrike co-founder as the Chair of our Board. We are modernizing the IT Ops market with our cloud native approach to automating and streamlining IT workflows.
With over 100 positions to hire for this year, we have something for everyone. From building data pipelines, scaling our infrastructure in AWS, developing distributed backend services, to simplifying a complex UI.
* Staff, Senior, Mid Level Engineers across the stack
* Senior Data Engineers
* SDETs (Python)
* Windows or Linux or Mac Systems Engineers
* UX - Researcher and Architect
* Technical Product Managers
* Senior Site Reliability Engineers
I probably shouldn't have picked that one in particular, it's just front of mind at the moment.
That one might be better updated than others - but if it is, they don't seem to update the version number making tracking difficult, a lot of the vendors will stop providing updates to old devices. The other problem is that if you have a Mediatek, Broadcom, Ralink, Hisilicon vulnerability (all have their own kernel forks and driver forks) then every device downstream using that kernel or driver is vulnerable, and not all devices will get fixed. Even if the vendor or upstream fixes it, who upgrades their router firmware?
Can confirm that running Spark at scale is difficult. Not even necessarily talking about scale of data or scale of performance, but organizational scale. Getting dozens or hundreds of engineers aligned around best practices, tooling and local development for Spark is both challenging and extremely rewarding. When you have everyone buy into Spark as not just an execution environment but a programming paradigm, it really unlocks some cool potential. If anyone cares this is how I've found to best get Spark users riding on rails:
* Use a monorepo to "namespace" different projects/teams/whatever. Each namespace has its own build.sbt for Scala jobs and Conda/Pip requirements file for PySpark. This gives you package isolation so that different projects can bump requirements at their own pace. This is crucial in larger organizations where you might have more siloed development or more legacy applications.
* Build each project in the monorepo into a separate Docker image and tag it accordingly with some combination of the branch and namespace.
* Deploy applications onto Kubernetes by invoking the SparkOperator (https://github.com/GoogleCloudPlatform/spark-on-k8s-operator), This abstracts away a lot of the hassle of driver/executor configuration and gives you nice out-of-the-box functionality for scraping Spark metrics.
* For local development, use some type of CLI or Makefile to build/run the image locally. This is where the implementation diverges somewhat from using SparkOpelrator (unless you want to tell your employees that everyone needs to run Kubernetes on their local machine, which we thought would create too much friction).
* For orchestration, write a custom operator for Airflow that submits a SparkOperator resource to the Kubernetes cluster of your choosing. The operator should supervise the application state, since the SparkOperator doesn’t quite do that well enough for you. This is something I wish we had the opportunity to open source.
* Where it gets tricky is building Spark applications locally and running remotely, Say you built a job locally and tested it on a small subset of your data. Now you want to see what happens when you run across a full dataset, requiring more than 16gb of memory (or whatever the developer has on their laptop). You need some way to build your image locally but schedule it remotely. This could be done via the same CLI or Makefile, but you end up with a lot of images and it gets pretty costly. I’m sure we would have figured it out eventually if we didn’t all get laid off last month :P
* BONUS: Use Iceberg or Delta (https://iceberg.apache.org/) (https://delta.io/). These are storage formats that work with distributed file storage like HDFS or S3 to partition and query data using the Spark DataFrame API. You get time travel, schema evolution and a bunch of other sweet features out of the box. They are an evolution of Hadoop-era partitioned file formats and are an absolute must for organizations dealing with lots of data & ML infrastructure.
This post took up more time than I had wanted, but it actually feels good to write down before I forget. I hope it is useful for someone building Spark infrastructure. I'm sure others have a completely different approach, which I'd be curious to hear! As someone whose full time job was basically just to orchestrate Spark application development, I can say for certain products like this are needed in order for the ecosystem to thrive, and I would probably have given you my business had the circumstances been correct. Good luck to you and your team.
Thanks for taking the time on this detailed and thoughtful feedback. We've implemented some of the points you mentioned (SparkOperator, Airflow connector, CLI is WIP) and have projects for the other points you mentioned, like how to make it easy to transition from local development to remote execution.
Sorry to hear about the layoffs. I'd like to follow-up with you to get your feedback on specific roadmap items we have in mind. Would you email us at founders@datamechanics.co to schedule a call, or at least keep in touch for when we have an interesting feature/mockup to show you? Thanks and good luck as well!
Now may be a good time to plug a project we worked on at my last gig. KeySpace uses IPFS to store PGP keys in a decentralized file system. We used a smart contract on the Ethereum blockchain to store an address-hash lookup. What this achieves is fully decentralized peer-to-peter encrypted communication. We used it to facilitate trustless OTC negotiation and trading.
Sure, we could move humans even further away from the means of production and deploy robots to do mega-scale monoculture using closed-source hardware / software, while continuing to further the monopolies held by Monsanto, John Deere, etc..
Or, we could try to shift agriculture back to a local scale, use open source hardware/software, and community-owned infrastructure to build more sustainable, polyculture food systems.
In particular, I am excited about the rooftop farming work being done in the Brooklyn Navy Yard here in NYC(1). Our Public Advocate has even discussed building-code mandated "green roof" legislation(2). CNC/Robotics & IoT are the key to unlocking urban micro-agriculture that can begin to offset some of our dependency on dirty food, and I applaud those(3) who are working on these very important problems.
Neo4j, and graph databases in general, are an excellent use case for IoT access management.
Our schema involved taking physical assets/personnel and representing them as different labels: machine, factory, production line, user, usergroup, etc. We then drew complex relationships between different user/groups in the organization and the assets they were responsible for.
At first, we used a relational database, but it soon became difficult to go more granular than simply: user belongs to usergroup, usergroup belongs to client, client has factories, factories have lines, lines have machines.
As many have pointed out here, it's not that you can't do this with non-graph databases, it just requires a more complex query layer. Neo4j allowed us to represent complex business relationships as natural language, and that really helped us as the business scaled.
I'm willing to bet money this was a cyber-terrorist attack. Unfortunately we'll never know. If a link were established, it would be the subject of a gag order on grounds of national security. But more likely, the true root cause will never be found because the authorities didn't do a deep enough forensic analysis. It's too easy to blame something this on mechanical failure, especially in America's aging infrastructure. They won't even think to look at the PLCs and control systems that control the gas pumps :/
Speaking of capacity, I just got this error trying to build an EKS cluster.
UnsupportedAvailabilityZoneException: Cannot create cluster because us-east-1b, the targeted availability zone, does not currently have sufficient capacity to support the cluster. Retry and choose from these availability zones: us-east-1a, us-east-1c, us-east-1d
But yeah, sure, keep telling yourself the AWS doesn't have a problem with power and compute capacity. Or maybe it's just poor product design?
The availability zones in us-east-1 have unique legacy problems related to companies that would select one AZ and deploy all their huge amount of infrastructure to it. IIRC, Netflix did this, for one example.
Every company that did this was picking us-east-1a specifically, because it was the first AZ in the list. It added up, and now the us-east-1a datacenter can't add capacity fast enough to support the growth of all the companies "stuck" on it (because their existing infra is already deployed there, and they still need to grow.) Effectively, us-east-1a is "full."
Which means, of course, that companies would find out from friends or from AWS errors that us-east-1a is full, and so choose us-east-1b...
AWS fixed this a few years in by making the AZs in a region randomized respective to an AWS root account (so my AWS account's us-east-1a is your AWS account's us-east-1{b,c,d}.) So new companies were better "load balanced" onto the AZs of a region.
But, because those companies still exist and are still growing on the specific DC that was us-east-1a (and to a lesser extent us-east-1b), those DCs are still full. So, for any given AWS account, one-and-a-half of the AZs in us-east-1 will be hard to deploy anything to.
Suggestions:
• for greenfield projects, just use us-east-2.
• for projects that need low-latency links to things that are already deployed within us-east-1, run some reservation actions to see how much capacity you can grab within each us-east-1 AZ, which will let you determine which of your AZs map to the DCs previously known as us-east-1a and us-east-1b. Then, use the other ones. (Or, if you're paying for business-level support, just ask AWS staff which AZs those are for your account; they'll probably be happy to tell you.)