For a use case where we ingest hundreds of millions of data points to hadoop then run spark etl jobs to partition the data on hdfs itself.
And then next day we have several million updates on the several datapoints from the last day(s).
What would be recommended on a hadoop setup ? HBase ? Parquet with Hoodie to deal with deltas ? Or Iceberg ?
Or hive3 ?