gwgen.utils module¶
Classes
TaskBase (stations, config, project_config, ...) |
Abstract base class for parameterization and evaluation tasks | ||||
TaskConfig (setup_from, to_csv, to_db, ...) |
|
||||
TaskManager ([base_task, tasks, config]) |
A manager to run the tasks within a task framework | ||||
TaskMeta |
Meta class for the TaskBase |
Functions
append_doc (namedtuple_cls, doc) |
|
default_config ([setup_from, to_csv, to_db, ...]) |
The default configuration for TaskBase instances. |
dir_contains (dirname, path[, exists]) |
Check if a file of directory is contained in another. |
download_file (url[, target]) |
Download a file from the internet |
enhanced_config (config_cls, name) |
|
file_len (fname) |
Get the number of lines in fname |
get_module_path (mod) |
Convenience method to get the directory of a given python module |
get_next_name (old[, fmt]) |
Return the next name that numerically follows old |
get_postgres_engine (database[, user, host, ...]) |
Get the engine to access the given database |
get_toplevel_module (mod) |
|
go_through_dict (key, d[, setdefault]) |
Split up the key by . |
init_interprocess_locks (db_locks, ...) |
|
init_locks (db_locks, file_locks) |
|
isstring (s) |
|
ordered_move (d, to_move, pos) |
Move a key in an ordered dictionary to another position |
safe_csv_append (df, path, *args, **kwargs) |
Convenience method to dump a data frame to csv without removing the old |
str_ranges (s) |
Convert a string of comma separated values to an iterable |
unique_everseen (iterable[, key]) |
List unique elements, preserving order. |
-
class
gwgen.utils.
TaskBase
(stations, config, project_config, global_config, data=None, requirements=None, *args, **kwargs)[source]¶ Bases:
object
Abstract base class for parameterization and evaluation tasks
Abstract base class that introduces the methods for the parameterization and evaluation framework. The name of the task is specified in the
name
attribute. You can implement the connection to other tasks (within the same framework) in thesetup_requires
attribute. The corresponding instances to the identifiers in thesetup_requires
attribute can later be accessed through the given attribute.Examples
Let’s define a parameterizer that does nothing but setup_requires another parameterization task named cloud as connection:
>>> class CloudParameterizer(Parameterizer): ... name = 'cloud' ... def setup_from_scratch(self): ... pass ... >>> class DummyParameterizer(Parameterizer): ... setup_requires = ['cloud'] ... name = 'dummy' ... def setup_from_scratch(self): ... pass ... >>> cloud = CloudParameterizer() >>> dummy = DummyParameterizer(cloud=cloud) >>> dummy.cloud is cloud True
Attributes
cloud_dir
str. Path to the directory were the processed parameterization data
pandas.DataFrame. The dataframe holding the daily data data_dir
str. Path to the directory where the source data of the project datafile
str. The path to the csv file where the data is stored by the dbname
The database name to use default_config
The default configuration of this task inserted with the df_ref
The reference data frame engine
The sqlalchemy engine to access the database eval_dir
str. Path to the directory were the processed evaluation data is fmt
dict. Formatoptions to use when making plots with this task has_run
bool. Boolean that is True if there is a run method for this task input_dir
str. Path to the directory were the input data is stored input_path
The path to the project input file in the configuration logger
The logger of this task name
str. name of the task nc_file
NetCDF file for the project output_dir
str. Path to the directory were the input data is stored output_path
The path to the project output file in the configuration param_dir
str. Path to the directory were the processed parameterization pdf_file
pdf file with figures the project project_file
Pickle file for the project reference_path
The path to the reference file in the configuration sa_dir
str. Path to the directory were the processed sensitivity analysis setup_from
setup_parallel
bool. Boolean that is True if the task can be setup in parallel setup_requires
list of str. identifiers of required classes for this task sql_dtypes
The data types to write the data into a postgres database summary
str. summary of what this task does task_data_dir
The directory where to store data threads
threading.Thread
objects that are started during the setup.Methods
create_project
(ds)To be reimplemented for each task with has_run
from_organizer
(organizer, stations, *args, ...)Create a new instance from a model_organization.ModelOrganizer
from_task
(task, *args, **kwargs)Create a new instance from another task get_manager
(*args, **kwargs)Return a manager of this class that can be used to setup and organize get_run_kws
(kwargs)init_from_db
()Initialize the task from datatables already created init_from_file
()Initialize the task from already stored files init_from_scratch
()Initialize the task from the configuration settings init_task
()Method that is called on the I/O-Processor to initialize the setup make_run_config
(sp, info)Method to be reimplemented for each task with has_run
plot_additionals
(pdf)Method to be reimplemented to make additional plots (if necessary) run
(info, *args, **kwargs)Run the task set_requirements
(requirements)Set the requirements for this task setup
()Set up the database for this task setup_from_db
(**kwargs)Set up the task from datatables already created setup_from_file
(**kwargs)Set up the task from already stored files setup_from_instances
(base, instances[, copy])Combine multiple task instances into one instance setup_from_scratch
()Setup the data from the configuration settings write2db
(**kwargs)Write the data from this task to the database given by the write2file
(**kwargs)Write the database to the datafile
fileParameters: - stations (list) – The list of stations to process
- config (dict) – The configuration of the experiment
- project_config (dict) – The configuration of the underlying project
- global_config (dict) – The global configuration
- data (pandas.DataFrame) – The data to use. If None, use the
setup()
method - requirements (list of
TaskBase
instances) – The required instances. If None, you must call theset_requirements()
method later
Other Parameters: ``*args, **kwargs`` – The configuration of the task. See the
TaskConfig
for arguments. Note that if you provide*args
, you have to provide all possible arguments-
cloud_dir
¶ str. Path to the directory were the processed parameterization data is stored
-
create_project
(ds)[source]¶ To be reimplemented for each task with
has_run
Parameters: ds (xarray.Dataset) – The dataset to plot
-
data
= None¶ pandas.DataFrame. The dataframe holding the daily data
-
data_dir
¶ str. Path to the directory where the source data of the project is located
-
datafile
¶ str. The path to the csv file where the data is stored by the
Parameterizer.write2file()
method and read by theParameterizer.setup_from_file()
-
dbname
= ''¶ The database name to use
-
default_config
¶ The default configuration of this task inserted with the
pdf_file
,nc_file
andproject_file
attributes
-
df_ref
¶ The reference data frame
-
engine
¶ The sqlalchemy engine to access the database
-
eval_dir
¶ str. Path to the directory were the processed evaluation data is stored
-
fmt
= {}¶ dict. Formatoptions to use when making plots with this task
-
classmethod
from_organizer
(organizer, stations, *args, **kwargs)[source]¶ Create a new instance from a
model_organization.ModelOrganizer
Parameters: - organizer (model_organization.ModelOrganizer) – The organizer to use the configuration from
- stations (list) – The list of stations to process
- data (pandas.DataFrame) – The data to use. If None, use the
setup()
method - requirements (list of
TaskBase
instances) – The required instances. If None, you must call theset_requirements()
method later
Other Parameters: ``*args, **kwargs`` – The configuration of the task. See the
TaskConfig
for arguments. Note that if you provide*args
, you have to provide all possible argumentsReturns: An instance of the calling class
Return type:
-
classmethod
from_task
(task, *args, **kwargs)[source]¶ Create a new instance from another task
Parameters: - task (TaskBase) – The organizer to use the configuration from. Note that it can also be of a different type than this class
- data (pandas.DataFrame) – The data to use. If None, use the
setup()
method - requirements (list of
TaskBase
instances) – The required instances. If None, you must call theset_requirements()
method later
Other Parameters: ``*args, **kwargs`` – The configuration of the task. See the
TaskConfig
for arguments. Note that if you provide*args
, you have to provide all possible argumentsSee also
setup_from_instances()
- To combine multiple instances of the class
Notes
Besides the skip_filtering parameter, the
task_config
is not inherited from task
-
classmethod
get_manager
(*args, **kwargs)[source]¶ Return a manager of this class that can be used to setup and organize tasks
-
has_run
= False¶ bool. Boolean that is True if there is a run method for this task
-
input_dir
¶ str. Path to the directory were the input data is stored
-
input_path
¶ The path to the project input file in the configuration
-
logger
¶ The logger of this task
-
make_run_config
(sp, info)[source]¶ Method to be reimplemented for each task with
has_run
to manipulate the configurationParameters: - sp (psyplot.project.Project) – The project of the data
- info (dict) – The dictionary for saving additional information of the task
-
name
= None¶ str. name of the task
-
nc_file
¶ NetCDF file for the project
-
output_dir
¶ str. Path to the directory were the input data is stored
-
output_path
¶ The path to the project output file in the configuration
-
param_dir
¶ str. Path to the directory were the processed parameterization data is stored
-
pdf_file
¶ pdf file with figures the project
-
plot_additionals
(pdf)[source]¶ Method to be reimplemented to make additional plots (if necessary)
Parameters: pdf (matplotlib.backends.backend_pdf.PdfPages) – The PdfPages instance which can be used to save the figure
-
project_file
¶ Pickle file for the project
-
reference_path
¶ The path to the reference file in the configuration
-
run
(info, *args, **kwargs)[source]¶ Run the task
This method uses the data that has been setup through the
setup()
method to process some configurationParameters: - dict – The dictionary with the configuration settings for the namelist
- dict – The dictionary holding additional meta information
-
sa_dir
¶ str. Path to the directory were the processed sensitivity analysis data is stored
-
set_requirements
(requirements)[source]¶ Set the requirements for this task
Parameters: requirements (list of TaskBase
instances) – The tasks as specified in thesetup_requires
attribute
-
setup_from
¶
-
classmethod
setup_from_instances
(base, instances, copy=False)[source]¶ Combine multiple task instances into one instance
Parameters:
-
setup_parallel
= True¶ bool. Boolean that is True if the task can be setup in parallel
-
setup_requires
= []¶ list of str. identifiers of required classes for this task
-
sql_dtypes
¶ The data types to write the data into a postgres database
-
summary
= ''¶ str. summary of what this task does
-
task_data_dir
¶ The directory where to store data
-
threads
= []¶ threading.Thread
objects that are started during the setup. It will be waited for them to finish before continuing with another process
-
class
gwgen.utils.
TaskConfig
(setup_from, to_csv, to_db, remove, skip_filtering, plot_output, nc_output, project_output, new_project, project, close)¶ Bases:
gwgen.utils.TaskConfig
Parameters: - setup_from ({ 'scratch' | 'file' | 'db' | None }) –
The method how to setup the instance either from
'scratch'
- To set up the task from the raw data
'file'
- Set up the task from an existing file
'db'
- Set up the task from a database
None
- If the file name of this this task exists, use this one, otherwise a database is provided, use this one, otherwise go from scratch
- to_csv (bool) – If True, the data at setup will be written to a csv file
- to_db (bool) – If True, the data at setup will be written to into a database
- remove (bool) – If True and the old data file already exists, remove before writing to it
- skip_filtering (bool) – If True, skip the filtering for the correct stations in the datafile
- plot_output (str) – An alternative path to use for the PDF file of the plot
- nc_output (str) – An alternative path (or multiples depending on the task) to use for the netCDF file of the plot data
- project_output (str) – An alternative path to use for the psyplot project file of the plot
- new_project (bool) – If True, a new project will be created even if a file in project_output exists already
- project (str) – The path to a psyplot project file to use for this parameterization
- close (bool) – Close the project at the end
- setup_from ({ 'scratch' | 'file' | 'db' | None }) –
-
class
gwgen.utils.
TaskManager
(base_task=<class 'gwgen.utils.TaskBase'>, tasks=None, config={})[source]¶ Bases:
object
A manager to run the tasks within a task framework
Parameters: - base_task (TaskBase) – A subclass of the
TaskBase
class whose tasks shall be used within this manager. - tasks (list of
TaskBase
instances) – The initialized tasks to use. If None, you need to call theinitialize_tasks()
method - config (dict) – The configuration of this manager containing information about the multiprocessing
Attributes
base_task
A subclass of the TaskBase
class whoselogger
The logger of this task Methods
get_requirements
(identifier[, all_requirements])Return the required task classes for this task get_task
(identifier)Return the task corresponding in this manager of identifier get_task_cls
(identifier)Return the task class corresponding to the given identifier initialize_tasks
(stations[, task_kws])Initialize the setup of the tasks run
(full_info, *args)setup
(stations[, to_return])Setup the data for the tasks in parallel or serial sort_by_requirement
(objects)Sort the given tasks by their logical order -
get_requirements
(identifier, all_requirements=True)[source]¶ Return the required task classes for this task
Parameters: Returns: A list of Parameterizer subclasses that are required for the task of the given identifier
Return type: list of
Parameterizer
-
get_task
(identifier)[source]¶ Return the task corresponding in this manager of identifier
Parameters: identifier (str) – The name
attribute of theTaskBase
subclassReturns: The requested task Return type: TaskBase
-
get_task_cls
(identifier)[source]¶ Return the task class corresponding to the given identifier
Parameters: identifier (str) – The name
attribute of theTaskBase
subclassReturns: The class of the requested task Return type: TaskBase
-
initialize_tasks
(stations, task_kws={})[source]¶ Initialize the setup of the tasks
This classmethod uses the
TaskBase
framework to initialize the setup on the I/O-processorParameters:
-
logger
¶ The logger of this task
- base_task (TaskBase) – A subclass of the
-
class
gwgen.utils.
TaskMeta
[source]¶ Bases:
abc.ABCMeta
Meta class for the
TaskBase
-
gwgen.utils.
default_config
(setup_from=None, to_csv=False, to_db=False, remove=False, skip_filtering=False, plot_output=None, nc_output=None, project_output=None, new_project=False, project=None, close=True)[source]¶ The default configuration for TaskBase instances. See also the
TaskBase.default_config
attributeParameters: - setup_from ({ 'scratch' | 'file' | 'db' | None }) –
The method how to setup the instance either from
'scratch'
- To set up the task from the raw data
'file'
- Set up the task from an existing file
'db'
- Set up the task from a database
None
- If the file name of this this task exists, use this one, otherwise a database is provided, use this one, otherwise go from scratch
- to_csv (bool) – If True, the data at setup will be written to a csv file
- to_db (bool) – If True, the data at setup will be written to into a database
- remove (bool) – If True and the old data file already exists, remove before writing to it
- skip_filtering (bool) – If True, skip the filtering for the correct stations in the datafile
- plot_output (str) – An alternative path to use for the PDF file of the plot
- nc_output (str) – An alternative path (or multiples depending on the task) to use for the netCDF file of the plot data
- project_output (str) – An alternative path to use for the psyplot project file of the plot
- new_project (bool) – If True, a new project will be created even if a file in project_output exists already
- project (str) – The path to a psyplot project file to use for this parameterization
- close (bool) – Close the project at the end
- setup_from ({ 'scratch' | 'file' | 'db' | None }) –
-
gwgen.utils.
dir_contains
(dirname, path, exists=True)[source]¶ Check if a file of directory is contained in another.
Parameters: Notes
path and dirname must be either both absolute or both relative paths
-
gwgen.utils.
download_file
(url, target=None)[source]¶ Download a file from the internet
Parameters: - url (str) – The url of the file
- target (str or None) – The path where the downloaded file shall be saved. If None, it will be saved to a temporary directory
Returns: file_name – the downloaded filename
Return type:
-
gwgen.utils.
get_module_path
(mod)[source]¶ Convenience method to get the directory of a given python module
-
gwgen.utils.
get_postgres_engine
(database, user=None, host='127.0.0.1', port=None, create=False, test=False)[source]¶ Get the engine to access the given database
This method creates an engine using sqlalchemy’s create_engine function to access the given database via postgresql. If the database is not existent, it will be created
Parameters: - database (str) – The name of a psql database. If provided, the processed data will be stored
- user (str) – The username to use when logging into the database
- host (str) – the host which runs the database server
- port (int) – The port to use to log into the the database
- create (bool) – If True, it is tried to create the database if not existent as postgres user
- test (bool) – If True, test the connection before returning the engine
Returns: Tha engine to access the database
Return type: sqlalchemy.engine.base.Engine
Notes
The engine is for single usage!
-
gwgen.utils.
go_through_dict
(key, d, setdefault=None)[source]¶ Split up the key by . and get the value from the base dictionary d
Parameters: - key (str) – The key in the config configuration.
If the key goes some
levels deeper, keys may be separated by a
'.'
(e.g.'namelists.weathergen'
). Hence, to insert a','
, it must be escaped by a preceeding''
. - d (dict) – The configuration dictionary containing the key
- setdefault (callable) – If not None and an item is not existent in d, it is created by calling the given function
Returns: - str – The last level of the key
- dict – The dictionary in d that contains the last level of the key
- key (str) – The key in the config configuration.
If the key goes some
levels deeper, keys may be separated by a
-
gwgen.utils.
ordered_move
(d, to_move, pos)[source]¶ Move a key in an ordered dictionary to another position
Parameters: - d (collections.OrderedDict) – The dictionary containing the keys
- to_move (str) – The key to move
- pos (str) – The name of the key that should be followed by to_move
-
gwgen.utils.
safe_csv_append
(df, path, *args, **kwargs)[source]¶ Convenience method to dump a data frame to csv without removing the old
This function dumps the given df to the file specified by path. If path already exists, we read the header of the file and sort df according to this header
Parameters: - df (pandas.DataFrame) – The data frame to store
- path (str) – The path where to store the data
- **kwargs – Any other keyword for the
pandas.DataFrame.to_csv()
method
-
gwgen.utils.
str_ranges
(s)[source]¶ Convert a string of comma separated values to an iterable
Parameters: s (str) – A comma ( ','
) separated string. A single value in this string represents one number, ranges can also be used via a separation by comma ('-'
). Hence,'2009,2012-2015'
will be converted to[2009,2012, 2013, 2014]
and2009,2012-2015-2
to[2009, 2012, 2015]
Returns: The values in s converted to a list Return type: list
-
gwgen.utils.
unique_everseen
(iterable, key=None)[source]¶ List unique elements, preserving order. Remember all elements ever seen.
Function taken from https://docs.python.org/2/library/itertools.html