gwgen.utils module¶

Classes

TaskBase(stations, config, project_config, ...) Abstract base class for parameterization and evaluation tasks

TaskConfig(setup_from, to_csv, to_db, ...)

param setup_from:
	The method how to setup the instance either from

TaskManager([base_task, tasks, config]) A manager to run the tasks within a task framework

TaskMeta Meta class for the TaskBase

Functions

`append_doc`(namedtuple_cls, doc)
`default_config`([setup_from, to_csv, to_db, ...])	The default configuration for TaskBase instances.
`dir_contains`(dirname, path[, exists])	Check if a file of directory is contained in another.
`download_file`(url[, target])	Download a file from the internet
`enhanced_config`(config_cls, name)
`file_len`(fname)	Get the number of lines in fname
`get_module_path`(mod)	Convenience method to get the directory of a given python module
`get_next_name`(old[, fmt])	Return the next name that numerically follows old
`get_postgres_engine`(database[, user, host, ...])	Get the engine to access the given database
`get_toplevel_module`(mod)
`go_through_dict`(key, d[, setdefault])	Split up the key by .
`init_interprocess_locks`(db_locks, ...)
`init_locks`(db_locks, file_locks)
`isstring`(s)
`ordered_move`(d, to_move, pos)	Move a key in an ordered dictionary to another position
`safe_csv_append`(df, path, args, *kwargs)	Convenience method to dump a data frame to csv without removing the old
`str_ranges`(s)	Convert a string of comma separated values to an iterable
`unique_everseen`(iterable[, key])	List unique elements, preserving order.

class gwgen.utils.TaskBase(stations, config, project_config, global_config, data=None, requirements=None, *args, **kwargs)[source]¶

Bases: object

Abstract base class for parameterization and evaluation tasks

Abstract base class that introduces the methods for the parameterization and evaluation framework. The name of the task is specified in the name attribute. You can implement the connection to other tasks (within the same framework) in the setup_requires attribute. The corresponding instances to the identifiers in the setup_requires attribute can later be accessed through the given attribute.

Examples

Let’s define a parameterizer that does nothing but setup_requires another parameterization task named cloud as connection:

>>> class CloudParameterizer(Parameterizer):
...     name = 'cloud'
...     def setup_from_scratch(self):
...         pass
...
>>> class DummyParameterizer(Parameterizer):
...     setup_requires = ['cloud']
...     name = 'dummy'
...     def setup_from_scratch(self):
...         pass
...
>>> cloud = CloudParameterizer()
>>> dummy = DummyParameterizer(cloud=cloud)
>>> dummy.cloud is cloud
True

Attributes

`cloud_dir`	str. Path to the directory were the processed parameterization
`data`	pandas.DataFrame. The dataframe holding the daily data
`data_dir`	str. Path to the directory where the source data of the project
`datafile`	str. The path to the csv file where the data is stored by the
`dbname`	The database name to use
`default_config`	The default configuration of this task inserted with the
`df_ref`	The reference data frame
`engine`	The sqlalchemy engine to access the database
`eval_dir`	str. Path to the directory were the processed evaluation data is
`fmt`	dict. Formatoptions to use when making plots with this task
`has_run`	bool. Boolean that is True if there is a run method for this task
`input_dir`	str. Path to the directory were the input data is stored
`input_path`	The path to the project input file in the configuration
`logger`	The logger of this task
`name`	str. name of the task
`nc_file`	NetCDF file for the project
`output_dir`	str. Path to the directory were the input data is stored
`output_path`	The path to the project output file in the configuration
`param_dir`	str. Path to the directory were the processed parameterization
`pdf_file`	pdf file with figures the project
`project_file`	Pickle file for the project
`reference_path`	The path to the reference file in the configuration
`sa_dir`	str. Path to the directory were the processed sensitivity analysis
`setup_from`
`setup_parallel`	bool. Boolean that is True if the task can be setup in parallel
`setup_requires`	list of str. identifiers of required classes for this task
`sql_dtypes`	The data types to write the data into a postgres database
`summary`	str. summary of what this task does
`task_data_dir`	The directory where to store data
`threads`	`threading.Thread` objects that are started during the setup.

Methods

`create_project`(ds)	To be reimplemented for each task with `has_run`
`from_organizer`(organizer, stations, *args, ...)	Create a new instance from a `model_organization.ModelOrganizer`
`from_task`(task, args, *kwargs)	Create a new instance from another task
`get_manager`(args, *kwargs)	Return a manager of this class that can be used to setup and organize
`get_run_kws`(kwargs)
`init_from_db`()	Initialize the task from datatables already created
`init_from_file`()	Initialize the task from already stored files
`init_from_scratch`()	Initialize the task from the configuration settings
`init_task`()	Method that is called on the I/O-Processor to initialize the setup
`make_run_config`(sp, info)	Method to be reimplemented for each task with `has_run`
`plot_additionals`(pdf)	Method to be reimplemented to make additional plots (if necessary)
`run`(info, args, *kwargs)	Run the task
`set_requirements`(requirements)	Set the requirements for this task
`setup`()	Set up the database for this task
`setup_from_db`(**kwargs)	Set up the task from datatables already created
`setup_from_file`(**kwargs)	Set up the task from already stored files
`setup_from_instances`(base, instances[, copy])	Combine multiple task instances into one instance
`setup_from_scratch`()	Setup the data from the configuration settings
`write2db`(**kwargs)	Write the data from this task to the database given by the
`write2file`(**kwargs)	Write the database to the `datafile` file

Parameters:

stations (list) – The list of stations to process
config (dict) – The configuration of the experiment
project_config (dict) – The configuration of the underlying project
global_config (dict) – The global configuration
data (pandas.DataFrame) – The data to use. If None, use the setup() method
requirements (list of TaskBase instances) – The required instances. If None, you must call the set_requirements() method later

Other Parameters:

``*args, **kwargs`` – The configuration of the task. See the TaskConfig for arguments. Note that if you provide *args, you have to provide all possible arguments

cloud_dir¶: str. Path to the directory were the processed parameterization data is stored

create_project(ds)[source]¶

To be reimplemented for each task with has_run

Parameters:	ds (xarray.Dataset) – The dataset to plot

data = None¶: pandas.DataFrame. The dataframe holding the daily data

data_dir¶: str. Path to the directory where the source data of the project is located

datafile¶: str. The path to the csv file where the data is stored by the Parameterizer.write2file() method and read by the Parameterizer.setup_from_file()

dbname = ''¶: The database name to use

default_config¶: The default configuration of this task inserted with the pdf_file, nc_file and project_file attributes

df_ref¶: The reference data frame

engine¶: The sqlalchemy engine to access the database

eval_dir¶: str. Path to the directory were the processed evaluation data is stored

fmt = {}¶: dict. Formatoptions to use when making plots with this task

classmethod from_organizer(organizer, stations, *args, **kwargs)[source]¶

Create a new instance from a model_organization.ModelOrganizer

Other Parameters:
Parameters:	organizer (model_organization.ModelOrganizer) – The organizer to use the configuration from stations (list) – The list of stations to process data (pandas.DataFrame) – The data to use. If None, use the `setup()` method requirements (list of `TaskBase` instances) – The required instances. If None, you must call the `set_requirements()` method later
	*``args, kwargs`` – The configuration of the task. See the `TaskConfig` for arguments. Note that if you provide `*args`, you have to provide all possible arguments
Returns:	An instance of the calling class
Return type:	TaskBase

classmethod from_task(task, *args, **kwargs)[source]¶

Create a new instance from another task

Other Parameters:
Parameters:	task (TaskBase) – The organizer to use the configuration from. Note that it can also be of a different type than this class data (pandas.DataFrame) – The data to use. If None, use the `setup()` method requirements (list of `TaskBase` instances) – The required instances. If None, you must call the `set_requirements()` method later
	*``args, kwargs`` – The configuration of the task. See the `TaskConfig` for arguments. Note that if you provide `*args`, you have to provide all possible arguments

See also

setup_from_instances(): To combine multiple instances of the class

Notes

Besides the skip_filtering parameter, the task_config is not inherited from task

classmethod get_manager(*args, **kwargs)[source]¶: Return a manager of this class that can be used to setup and organize tasks

get_run_kws(kwargs)[source]¶

has_run = False¶: bool. Boolean that is True if there is a run method for this task

init_from_db()[source]¶: Initialize the task from datatables already created

init_from_file()[source]¶: Initialize the task from already stored files

init_from_scratch()[source]¶: Initialize the task from the configuration settings

init_task()[source]¶: Method that is called on the I/O-Processor to initialize the setup

input_dir¶: str. Path to the directory were the input data is stored

input_path¶: The path to the project input file in the configuration

logger¶: The logger of this task

make_run_config(sp, info)[source]¶

Method to be reimplemented for each task with has_run to manipulate the configuration

Parameters:	sp (psyplot.project.Project) – The project of the data info (dict) – The dictionary for saving additional information of the task

name = None¶: str. name of the task

nc_file¶: NetCDF file for the project

output_dir¶: str. Path to the directory were the input data is stored

output_path¶: The path to the project output file in the configuration

param_dir¶: str. Path to the directory were the processed parameterization data is stored

pdf_file¶: pdf file with figures the project

plot_additionals(pdf)[source]¶

Method to be reimplemented to make additional plots (if necessary)

Parameters:	pdf (matplotlib.backends.backend_pdf.PdfPages) – The PdfPages instance which can be used to save the figure

project_file¶: Pickle file for the project

reference_path¶: The path to the reference file in the configuration

run(info, *args, **kwargs)[source]¶

Run the task

This method uses the data that has been setup through the setup() method to process some configuration

Parameters:	dict – The dictionary with the configuration settings for the namelist dict – The dictionary holding additional meta information

sa_dir¶: str. Path to the directory were the processed sensitivity analysis data is stored

set_requirements(requirements)[source]¶

Set the requirements for this task

Parameters:	requirements (list of `TaskBase` instances) – The tasks as specified in the `setup_requires` attribute

setup()[source]¶: Set up the database for this task

setup_from¶

setup_from_db(**kwargs)[source]¶: Set up the task from datatables already created

setup_from_file(**kwargs)[source]¶: Set up the task from already stored files

classmethod setup_from_instances(base, instances, copy=False)[source]¶

Combine multiple task instances into one instance

Parameters:	base (TaskBase) – The base task to use the configuration from instances (list of `TaskBase`) – The tasks containing the data copy (bool) – If True, a copy of base is returned, otherwise base is modified inplace

setup_from_scratch()[source]¶: Setup the data from the configuration settings

setup_parallel = True¶: bool. Boolean that is True if the task can be setup in parallel

setup_requires = []¶: list of str. identifiers of required classes for this task

sql_dtypes¶: The data types to write the data into a postgres database

summary = ''¶: str. summary of what this task does

task_data_dir¶: The directory where to store data

threads = []¶: threading.Thread objects that are started during the setup. It will be waited for them to finish before continuing with another process

write2db(**kwargs)[source]¶: Write the data from this task to the database given by the engine attribute

write2file(**kwargs)[source]¶: Write the database to the datafile file

class gwgen.utils.TaskConfig(setup_from, to_csv, to_db, remove, skip_filtering, plot_output, nc_output, project_output, new_project, project, close)¶

Bases: gwgen.utils.TaskConfig

Parameters:

setup_from ({ 'scratch' | 'file' | 'db' | None }) –
The method how to setup the instance either from

'scratch'

To set up the task from the raw data

'file'

Set up the task from an existing file

'db'

Set up the task from a database

None

If the file name of this this task exists, use this one, otherwise a database is provided, use this one, otherwise go from scratch
to_csv (bool) – If True, the data at setup will be written to a csv file
to_db (bool) – If True, the data at setup will be written to into a database
remove (bool) – If True and the old data file already exists, remove before writing to it
skip_filtering (bool) – If True, skip the filtering for the correct stations in the datafile
plot_output (str) – An alternative path to use for the PDF file of the plot
nc_output (str) – An alternative path (or multiples depending on the task) to use for the netCDF file of the plot data
project_output (str) – An alternative path to use for the psyplot project file of the plot
new_project (bool) – If True, a new project will be created even if a file in project_output exists already
project (str) – The path to a psyplot project file to use for this parameterization
close (bool) – Close the project at the end

class gwgen.utils.TaskManager(base_task=<class 'gwgen.utils.TaskBase'>, tasks=None, config={})[source]¶

Bases: object

A manager to run the tasks within a task framework

Parameters:	base_task (TaskBase) – A subclass of the `TaskBase` class whose tasks shall be used within this manager. tasks (list of `TaskBase` instances) – The initialized tasks to use. If None, you need to call the `initialize_tasks()` method config (dict) – The configuration of this manager containing information about the multiprocessing

Attributes

`base_task`	A subclass of the `TaskBase` class whose
`logger`	The logger of this task

Methods

`get_requirements`(identifier[, all_requirements])	Return the required task classes for this task
`get_task`(identifier)	Return the task corresponding in this manager of identifier
`get_task_cls`(identifier)	Return the task class corresponding to the given identifier
`initialize_tasks`(stations[, task_kws])	Initialize the setup of the tasks
`run`(full_info, *args)
`setup`(stations[, to_return])	Setup the data for the tasks in parallel or serial
`sort_by_requirement`(objects)	Sort the given tasks by their logical order

base_task = None¶: A subclass of the TaskBase class whose TaskBase._registry attribute shall be used

get_requirements(identifier, all_requirements=True)[source]¶

Return the required task classes for this task

Parameters:	identifier (str) – The `name` attribute of the `Parameterizer` subclass all_requirements (bool) – If True, all requirements are searched recursively. Otherwise only the direct requirements are returned
Returns:	A list of Parameterizer subclasses that are required for the task of the given identifier
Return type:	list of `Parameterizer`

get_task(identifier)[source]¶

Return the task corresponding in this manager of identifier

Parameters:	identifier (str) – The `name` attribute of the `TaskBase` subclass
Returns:	The requested task
Return type:	TaskBase

get_task_cls(identifier)[source]¶

Return the task class corresponding to the given identifier

Parameters:	identifier (str) – The `name` attribute of the `TaskBase` subclass
Returns:	The class of the requested task
Return type:	TaskBase

initialize_tasks(stations, task_kws={})[source]¶

Initialize the setup of the tasks

This classmethod uses the TaskBase framework to initialize the setup on the I/O-processor

Parameters:	stations (list) – The list of stations to process task_kws (dict) – Keywords can be valid identifiers of the `TaskBase` instances, dictionaries may be mappings for their `setup()` method

logger¶: The logger of this task

run(full_info, *args)[source]¶

setup(stations, to_return=None)[source]¶

Setup the data for the tasks in parallel or serial

Parameters:	stations (list of str) – The stations to process to_return (list of str) – The names of the tasks to return. If None, all tasks that have a run method will be returned

static sort_by_requirement(objects)[source]¶

Sort the given tasks by their logical order

Parameters:	objects (list of `TaskBase` subclasses or instances) – The objects to sort
Returns:	The same as objects but sorted
Return type:	list of `TaskBase` subclasses or instances

class gwgen.utils.TaskMeta[source]¶

Bases: abc.ABCMeta

Meta class for the TaskBase

gwgen.utils.append_doc(namedtuple_cls, doc)[source]¶

gwgen.utils.default_config(setup_from=None, to_csv=False, to_db=False, remove=False, skip_filtering=False, plot_output=None, nc_output=None, project_output=None, new_project=False, project=None, close=True)[source]¶

The default configuration for TaskBase instances. See also the TaskBase.default_config attribute

Parameters:

setup_from ({ 'scratch' | 'file' | 'db' | None }) –
The method how to setup the instance either from

'scratch'

To set up the task from the raw data

'file'

Set up the task from an existing file

'db'

Set up the task from a database

None

If the file name of this this task exists, use this one, otherwise a database is provided, use this one, otherwise go from scratch
to_csv (bool) – If True, the data at setup will be written to a csv file
to_db (bool) – If True, the data at setup will be written to into a database
remove (bool) – If True and the old data file already exists, remove before writing to it
skip_filtering (bool) – If True, skip the filtering for the correct stations in the datafile
plot_output (str) – An alternative path to use for the PDF file of the plot
nc_output (str) – An alternative path (or multiples depending on the task) to use for the netCDF file of the plot data
project_output (str) – An alternative path to use for the psyplot project file of the plot
new_project (bool) – If True, a new project will be created even if a file in project_output exists already
project (str) – The path to a psyplot project file to use for this parameterization
close (bool) – Close the project at the end

gwgen.utils.dir_contains(dirname, path, exists=True)[source]¶

Check if a file of directory is contained in another.

Parameters:	dirname (str) – The base directory that should contain path path (str) – The name of a directory or file that should be in dirname exists (bool) – If True, the path and dirname must exist

Notes

path and dirname must be either both absolute or both relative paths

gwgen.utils.download_file(url, target=None)[source]¶

Download a file from the internet

Parameters:	url (str) – The url of the file target (str or None) – The path where the downloaded file shall be saved. If None, it will be saved to a temporary directory
Returns:	file_name – the downloaded filename
Return type:	str

gwgen.utils.enhanced_config(config_cls, name)[source]¶

gwgen.utils.file_len(fname)[source]¶: Get the number of lines in fname

gwgen.utils.get_module_path(mod)[source]¶: Convenience method to get the directory of a given python module

gwgen.utils.get_next_name(old, fmt='%i')[source]¶: Return the next name that numerically follows old

gwgen.utils.get_postgres_engine(database, user=None, host='127.0.0.1', port=None, create=False, test=False)[source]¶

Get the engine to access the given database

This method creates an engine using sqlalchemy’s create_engine function to access the given database via postgresql. If the database is not existent, it will be created

Parameters:	database (str) – The name of a psql database. If provided, the processed data will be stored user (str) – The username to use when logging into the database host (str) – the host which runs the database server port (int) – The port to use to log into the the database create (bool) – If True, it is tried to create the database if not existent as postgres user test (bool) – If True, test the connection before returning the engine
Returns:	Tha engine to access the database
Return type:	sqlalchemy.engine.base.Engine

Notes

The engine is for single usage!

gwgen.utils.get_toplevel_module(mod)[source]¶

gwgen.utils.go_through_dict(key, d, setdefault=None)[source]¶

Split up the key by . and get the value from the base dictionary d

Parameters:

key (str) – The key in the config configuration. If the key goes some levels deeper, keys may be separated by a '.' (e.g. 'namelists.weathergen'). Hence, to insert a ',', it must be escaped by a preceeding ''.
d (dict) – The configuration dictionary containing the key
setdefault (callable) – If not None and an item is not existent in d, it is created by calling the given function

Returns:

str – The last level of the key
dict – The dictionary in d that contains the last level of the key

gwgen.utils.init_interprocess_locks(db_locks, file_locks, lock_dir)[source]¶

gwgen.utils.init_locks(db_locks, file_locks)[source]¶

gwgen.utils.isstring(s)[source]¶

gwgen.utils.ordered_move(d, to_move, pos)[source]¶

Move a key in an ordered dictionary to another position

Parameters:	d (collections.OrderedDict) – The dictionary containing the keys to_move (str) – The key to move pos (str) – The name of the key that should be followed by to_move

gwgen.utils.safe_csv_append(df, path, *args, **kwargs)[source]¶

Convenience method to dump a data frame to csv without removing the old

This function dumps the given df to the file specified by path. If path already exists, we read the header of the file and sort df according to this header

Parameters:	df (pandas.DataFrame) – The data frame to store path (str) – The path where to store the data **kwargs – Any other keyword for the `pandas.DataFrame.to_csv()` method

gwgen.utils.str_ranges(s)[source]¶

Convert a string of comma separated values to an iterable

Parameters:	s (str) – A comma (`','`) separated string. A single value in this string represents one number, ranges can also be used via a separation by comma (`'-'`). Hence, `'2009,2012-2015'` will be converted to `[2009,2012, 2013, 2014]` and `2009,2012-2015-2` to `[2009, 2012, 2015]`
Returns:	The values in s converted to a list
Return type:	list

gwgen.utils.unique_everseen(iterable, key=None)[source]¶

List unique elements, preserving order. Remember all elements ever seen.

Function taken from https://docs.python.org/2/library/itertools.html

gwgen.utils module¶

gwgen

Navigation

Related Topics

This Page