[WIP] Add AlluxioClientConfig#72
Conversation
| self, | ||
| etcd_hosts: Optional[str] = None, | ||
| worker_hosts: Optional[str] = None, | ||
| options: Optional[Dict[str, Any]] = None, |
There was a problem hiding this comment.
any reason we need additional options here?
There was a problem hiding this comment.
This is for putting Alluxio KEY=VALUE pairs
I will rename it to alluxio_properties
There was a problem hiding this comment.
My abstraction for this config class is like our AlluxioProperty in java. I feel like we can just put Key value directly in the init.
There was a problem hiding this comment.
yeah, could do that, hopefully the properties needed won't increase too much in the future making this config super long
There was a problem hiding this comment.
yeah I think just keep one layer of Key value so it's easier to turn the class to dict or read config from file
There was a problem hiding this comment.
Make sense!
I move the supported alluxio_properties to this config class
If it becomes too long, we could go for the file approach
| etcd_hosts: Optional[str] = None, | ||
| worker_hosts: Optional[str] = None, | ||
| options: Optional[Dict[str, Any]] = None, | ||
| logger: Optional[logging.Logger] = None, |
There was a problem hiding this comment.
we don't really need logger in config
There was a problem hiding this comment.
Sounds good, will move it out of config
| etcd_port=2379, | ||
| worker_http_port=ALLUXIO_WORKER_HTTP_SERVER_PORT_DEFAULT_VALUE, | ||
| etcd_refresh_workers_interval=120, | ||
| config: AlluxioClientConfig, |
There was a problem hiding this comment.
I'm unsure whether it is a good practice to use a config class here. We would probably be careful as this is an interface change. I assume that the reason for this change is to avoid the long argument list for the AlluxioClient (although the users would still have to provide a long argument list for the AlluxioClientConfig).
There was a problem hiding this comment.
I checked a few other Python libraries:
- scikit-learn. Take the random forest module (https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/ensemble/_forest.py#L1171) as an example, they just keep a long argument list.
- presto-python-client.
PrestoRequest(https://github.com/prestodb/presto-python-client/blob/master/prestodb/client.py#L131) also maintains a long argument list. - pymongo.
MongoClient(https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/mongo_client.py#L141) keeps the argument list and uses**kwargsfor other arguments. - Ray.
ray.tunemodule (https://github.com/ray-project/ray/blob/master/python/ray/tune/tune.py) maintains a long list of arguments.
There was a problem hiding this comment.
To conclude,
- It may not be an issue to maintain a long argument list. But for a long argument list, it might be a good practice to use
*in the arguments (both ray and scikit-learn use that). https://stackoverflow.com/questions/57667742/what-does-represent-in-function-argument-list-in-python - After reviewing the configs, I feel some of them are client-related configs (e.g. etcd hosts, etcd ports, HTTP ports), and some are alluxio-specific configs (e.g. page size, hash nodes, etc.). Maybe it's better to differentiate those configs and construct these configs passed in as
**kwargs.pymongouses this approach (https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/mongo_client.py#L747, https://github.com/mongodb/mongo-python-driver/blob/master/pymongo/client_options.py).
There was a problem hiding this comment.
From mongo client_options example, they are also putting all options into one options dict and then retrieve key values from it which is the main goal of this PR: putting all options in one place. I don't have strong preference about using dict or class, just feel there's might be two strength if we use a class instead of dict: it's more intuitive what's in the dict and what's not in the dict. And it's easy to convert to dict using: __dict__.
There was a problem hiding this comment.
And agree, we could do better by splitting different kind of options into their own scope
There was a problem hiding this comment.
The original reason is to have long list of argument in both alluxio-py and alluxiofs
We could discuss about whether to merge the two repos into one, otherwise too easy to break one of them
Add AlluxioClientConfig to define the configuration values and types.
This help avoid repeating the same configuration across different classes.