gwgen.preproc module¶
Additional routines for preprocessing
Classes
CloudGHCNMap(*args, **kwargs) | 
A task for computing the EECRA inventory for each station | ||
CloudInventory(*args, **kwargs) | 
A task for computing the EECRA inventory for each station | ||
CloudPreproc(stations, config, ...[, data, ...]) | 
  | 
Functions
default_cloud_ghcn_map_config([max_distance]) | 
Default config for CloudGHCNMap | 
default_cloud_inventory_config([xstall]) | 
Default config for CloudInventory | 
- 
class 
gwgen.preproc.CloudGHCNMap(*args, **kwargs)[source]¶ Bases:
gwgen.preproc.CloudPreprocA task for computing the EECRA inventory for each station
Attributes
dbnamestr(object=’‘) -> string default_configeecra_inventoryeecra_inventory parameterization instance has_runbool(x) -> bool namestr(object=’‘) -> string setup_requireslist() -> new empty list summarystr(object=’‘) -> string Methods
init_from_scratch()run(info)setup(*args, **kwargs)setup_from_scratch()Does nothing but initializing an empty data frame. write2db(*args, **kwargs)write2file(*args, **kwargs)- 
dbname= 'eecra_ghcn_map'¶ 
- 
default_config¶ 
- 
eecra_inventory¶ eecra_inventory parameterization instance
- 
has_run= True¶ 
- 
name= 'eecra_ghcn_map'¶ 
- 
setup_from_scratch()[source]¶ Does nothing but initializing an empty data frame. The real work is done in the
run()method
- 
setup_requires= ['eecra_inventory']¶ 
- 
summary= 'Compute the inventory of the EECRA stations'¶ 
- 
 
- 
class 
gwgen.preproc.CloudInventory(*args, **kwargs)[source]¶ Bases:
gwgen.preproc.CloudPreprocA task for computing the EECRA inventory for each station
Attributes
dbnamestr(object=’‘) -> string default_confighas_runbool(x) -> bool http_xstallstr(object=’‘) -> string namestr(object=’‘) -> string setup_parallelsummarystr(object=’‘) -> string xstall_dfThe dataframe corresponding to the XSTALL stations Methods
init_from_scratch()run(info)setup(*args, **kwargs)setup_from_db(**kwargs)Set up the task from datatables already created (and avoid locating setup_from_file(**kwargs)Set up the task from already stored files (and avoid locating the setup_from_scratch()write2db(*args, **kwargs)write2file(*args, **kwargs)- 
dbname= 'eecra_inventory'¶ 
- 
default_config¶ 
- 
has_run= True¶ 
- 
http_xstall= 'http://cdiac.ornl.gov/ftp/ndp026c/XSTALL'¶ 
- 
name= 'eecra_inventory'¶ 
- 
setup_from_db(**kwargs)[source]¶ Set up the task from datatables already created (and avoid locating the stations of this task)
- 
setup_from_file(**kwargs)[source]¶ Set up the task from already stored files (and avoid locating the stations of this task)
- 
setup_parallel¶ 
- 
summary= 'Compute the inventory of the EECRA stations'¶ 
- 
xstall_df¶ The dataframe corresponding to the XSTALL stations
- 
 
- 
class 
gwgen.preproc.CloudPreproc(stations, config, project_config, global_config, data=None, requirements=None, *args, **kwargs)[source]¶ Bases:
gwgen.utils.TaskBaseParameters: - stations (list) – The list of stations to process
 - config (dict) – The configuration of the experiment
 - project_config (dict) – The configuration of the underlying project
 - global_config (dict) – The global configuration
 - data (pandas.DataFrame) – The data to use. If None, use the 
setup()method - requirements (list of 
TaskBaseinstances) – The required instances. If None, you must call theset_requirements()method later 
Other Parameters: ``*args, **kwargs`` – The configuration of the task. See the
TaskConfigfor arguments. Note that if you provide*args, you have to provide all possible argumentsAttributes
task_data_dir- 
task_data_dir¶ 
- 
gwgen.preproc.default_cloud_ghcn_map_config(max_distance=1000.0, *args, **kwargs)[source]¶ Default config for
CloudGHCNMapParameters: - max_distance (float) – The maximum distance in meters for which we consider two stations as equal (Default: 1000m)
 - to_csv (bool) – If True, the data at setup will be written to a csv file
 - remove (bool) – If True and the old data file already exists, remove before writing to it
 - skip_filtering (bool) – If True, skip the filtering for the correct stations in the datafile
 - plot_output (str) – An alternative path to use for the PDF file of the plot
 - nc_output (str) – An alternative path (or multiples depending on the task) to use for the netCDF file of the plot data
 - project_output (str) – An alternative path to use for the psyplot project file of the plot
 - new_project (bool) – If True, a new project will be created even if a file in project_output exists already
 - project (str) – The path to a psyplot project file to use for this parameterization
 - close (bool) – Close the project at the end
 
- 
gwgen.preproc.default_cloud_inventory_config(xstall=True, *args, **kwargs)[source]¶ Default config for
CloudInventoryParameters: - xstall (bool or str) – If True (default), download the XSTALL file from http://cdiac.ornl.gov/ftp/ndp026c/XSTALL.
This file contains some estimates of station longitude and latitude.
If 
Falseor empty string, the file is not used, otherwise, if set with a string, it is interpreted as the path to the local file - to_csv (bool) – If True, the data at setup will be written to a csv file
 - remove (bool) – If True and the old data file already exists, remove before writing to it
 - skip_filtering (bool) – If True, skip the filtering for the correct stations in the datafile
 - plot_output (str) – An alternative path to use for the PDF file of the plot
 - nc_output (str) – An alternative path (or multiples depending on the task) to use for the netCDF file of the plot data
 - project_output (str) – An alternative path to use for the psyplot project file of the plot
 - new_project (bool) – If True, a new project will be created even if a file in project_output exists already
 - project (str) – The path to a psyplot project file to use for this parameterization
 - close (bool) – Close the project at the end
 
- xstall (bool or str) – If True (default), download the XSTALL file from http://cdiac.ornl.gov/ftp/ndp026c/XSTALL.
This file contains some estimates of station longitude and latitude.
If