Hydroelectric Optimization with Graph Databases (II: Station Curves)

14 min readJun 9, 2021

Using the schema described in the previous section, we can begin generating station curves, which will be used for our long-term goal of estimating the MW potential for a time period.

Station Curves

Station efficiency curves give the maximum possible MW that the station can generate for every flow value. Station curves will be used in the simulation and maximizer to optimize MW outputs. The core challenge is determining, for each flow amount, the optimal proportion of flow that should be sent through each unit in order to maximize the MW output.

Note: Unit config / station efficiency curves are calculated using flow-indexed values instead of MW-indexed values. This is because MW-indexed values produce a flow, which can change the TWL, which changes the headwater level, which changes the flow, etc. The flow-indexed values give the MW output once the tailwater level / head has converged.

Additional Schema

In order to facilitate the computation of the station curves, I added intermediate vertices. This will allow me to produce station curves using a series of queries, instead of one large query.

Unit_Config: Represents a possible combination of unit types. At MTI for example, which has units of type G or H, a possible unit config is G:0, H:3.

Types: List of unit types that are in this station
Counts: Corresponding list of counts of each unit type for this configuration
Head_Step: The spacing of the efficiency curves for this unit configuration. All configurations will have the same head step.

Unit_Config_Efficiency_Curve: The efficiency curve for this unit config at a particular gross head level. It gives the maximum MW output for this combination of units at each particular flow.

Abstract_Unit: This represents a unit in an efficiency curve. For example, each efficiency curve for the unit configuration G:0, H:3 will have 3 abstract units of type H. The abstract unit doesn’t correspond to any particular real unit. Abstract_Unit stores the flow and MW contribution of the unit to the efficiency curve that it is a part of.

Flow_Index: the flow contribution of this unit corresponding to each entry of Flow_Index in the efficiency curve.

Station_Efficiency_Curve: The efficiency curve of a unit at a particular HWL. Flow_Index/MW allow MW to be looked up by flow, and MW_Index/Flow allows flow to be looked up by MW.

Procedure

The process of generating station curves will be done using a series of queries:

create_unit_configs: Generates all possible unit configurations by unit type for each station.
create_config_heads: For each unit configuration, creates an efficiency curve for every possible head that the config can run at.
calc_unit_config_curves: For each unit configuration efficiency curve, calculates the maximum possible MW that the unit configuration can generate for each flow value and stores it.
calc_station_curves: Calculates the station curves by taking the maximum MW among the unit configuration curves for each flow.
convert_station_curves_mw: Converts the station efficiency curves to a MW indexing instead of a flow-based indexing.

Queries

This is an overview of the GSQL used for the queries. The documentation for GSQL is here.

create_unit_configs

For each station, this query creates a Unit_Config vertex for every possible configuration of unit types. A unit configuration represents a set of units within the station that are on, while the rest remain off. For example, a configuration for MTI could be H:3, G:2, meaning that 3 units of type H are on, while 2 units of type G are on.

It will be much easier to compute the maximum MW output for individual unit configs than for the entire station.

For this query, the accumulators are:

type_counts: a map of unit type (key) to number of units of that type (value) for each station.
types: a list of unit types
counts: a list of counts corresponding to the unit types in types. We need to convert type_counts to two separate lists for the keys and values so we can use the .update() method of the list accum.
curr_counts: tracks the current counts for each unit type corresponding to types as we iterate through every possible unit config.

Here, we collect the number of unit types for each substation, then propagate that to each station. The map accumulator is then converted into separate lists for the unit types and unit counts. For each unit type, we add a 0 to the curr_counts list so that it is of the same length.

Below is the code that iteratively generates the unit configurations. Using recursive enumeration would be easier than an iterative approach, but that would require using subqueries, which incurs a performance overhead. In the first query, I set the last element of curr_counts of each station to 1 instead of 0, since we don’t need to consider the unit config where no units are on.

The loop that generates the unit configurations is controlled by two variables, is_forward and index. We can generate unit configurations similar to how we count numbers. When we count numbers, we start with the last digit (ones) and increment it. If we exceed 9, we set it to 0 and increment the tens digit.

For generating unit configs, we have a list of unit types instead of digits. We start with the last unit type and increment it by one. If we’ve exceeded the total number of units of that type, we set it to 0 and move to the previous unit type. We try incrementing that, and continue moving back until we find a unit type that can be incremented. Then we move to the last digit again, and increment it by one.

Every time we change the unit type counters, we update curr_counts. Every time we increment the last unit type, we use curr_counts to create a Unit_Config vertex.

create_config_heads

For every unit configuration, this query creates a Unit_Config_Efficiency_Curve vertex for every possible head level for that configuration.

The accumulators are:

head_step: The head step for each unit configuration
min(max)_head: The minimum/maximum head for each unit type
head_upper(lower)_bound: The upper/lower bound on the possible head levels for each unit config.

In the first query, we will traverse every edge connecting unit type to an efficiency curve. This will allow us to find the minimum and maximum possible head level for each unit type. We will only consider efficiency curves that have a flow indexing, since that is what we will be using to create the efficiency curve for the unit configuration later on.

Each unit configuration can only operate at headwater levels where all of the units within it can operate. If a unit configuration has unit type counts of A:1, B:1, C:0, the unit configuration can only operate at headwater levels where both A and B can operate.

In this query, we traverse from each unit type to each unit configuration that contains it. Each unit configuration stores the maximum min_head of the contained unit types as the lower bound, and the minimum max_head of the contained unit types as the upper bound for the head level.

We can also assume that the contained unit types all have the same head spacing for its efficiency curves, so the head step of the unit config will be set equal to the head step of any of the unit types.

In the POST-ACCUM clause, we iterate through every head level for the unit configuration using head step, the lower bound, and the upper bound. For each head level, we create a Config_Head_Efficiency_Curve. We also use the Types and Counts fields of the unit config to find all of the abstract units within this unit configuration. Every Config_Head_Efficiency_Curve will have its own set of abstract units, which will store the flow/MW contribution of each unit in the curve.

calc_unit_config_curves

This query will create the efficiency curve for each head level for each unit configuration by finding the maximum possible MW output for each flow level.

Since the unit efficiency curves can be assumed to be convex, we can use a gradient ascent approach. Starting at the minimum possible flow, the query will add the flow step (2 cumecs) until the maximum flow is reached. For each additional flow step, the query will determine the unit that can yield the highest marginal MW output. This can be done by calculating a list of gradients, MW increase per flow step, for each unit, and storing them in sorted order.

Accumulators:

heap: Stores unit_gradient tuples in ascending order by mw_step (MW increase).
mw: Stores the list of maximum MW output for the corresponding flow values in flow
flow: Stores a list of flow values. It starts from the minimum possible flow. Each index adds 2 cumecs to the previous index, and ends at the maximum possible flow.
starting_flow: Flow to start the gradient ascent at (the minimum possible flow)
starting_mw: MW to start the gradient ascent at
max(min)_generation: Max/min MW generation for each unit type
min_gen_index: Index of the minimum MW generation of a unit type in the unit’s efficiency curve.
abstract_unit_ids: List of abstract units in the form type-index for each unit config.
abstract_unit_flow(mw)_steps: Maps each abstract unit id to the list of marginal mw/flow contributions to the unit config’s efficiency curve. For example, if maximizing the MW output at index 40 requires adding 2 cumecs of flow to abstract unit A-1, then A-1 will have a value of 2 at index 40 in abstract_unit_flow_steps.

The query will start by storing the minimum and maximum generations of each unit type on each of the unit type’s efficiency curves.

In the second query, for each efficiency curve, we find the first index in the efficiency curve where the MW output is higher than the minimum generation. This will give us the index for the starting flow (minimum flow). We then iterate through the efficiency curve until the MW output is higher than the maximum generation. For each MW/Flow in the efficiency curve, we will calculate the MW / flow increase from the previous entry in the efficiency curve. This will give us the gradient, which we will store in the heap accumulator of the abstract unit.

For each config curve, the minimum flow and MW output will be the sum of the minimum flow/MW of the abstract units. The heap of gradients for the config will be the combination of heap of gradients from the abstract units. We will add the starting flow and starting MW to the global accumulator of abstract unit flow steps and MW steps, since that is the size of the first step from 0 cumecs / 0 MW.

In the POST-ACCUM, we have two variables, curr_mw and curr_flow, which keeps track of the current flow and MW output. Every time we pop a new gradient from the heap, we will add the flow / MW steps to the current counters.

The query pops the heap until it is empty. Popping from the heap will return the next flow step at a particular abstract unit that yields the highest MW output increase. For each MW step, we will add the MW and flow step to the corresponding abstract unit, and 0 to the flow / MW steps for the other abstract units.

For each abstract unit, the query will iterate through the flow / MW steps. It will add the steps to determine the total MW output and flow contribution from that abstract unit at each index. In the POST-ACCUM, the MW output and flow contributions are persisted to the database.

The same is then done for the unit config curves.

calc_station_curves

calc_station_curves calculates the efficiency curve for each station by getting the maximum MW output among the possible unit configurations for each flow.

Accumulators:

heap: Stores a tuple for each unit config, MW output, and flow combination. The tuples are first sorted by flow in ascending order, and then by MW output in descending order. All tuples that are at the same flow will be next to each other, with the tuple with the highest MW output being at the top of the heap.
head_flows: Maps each head to a corresponding list of flows.
type_counts: Stores the count of each unit type for each station
flow / mw: Stores a list of flows and the corresponding MW outputs.
config_unit_flow(mw): Global map accumulator that stores the flow / mw contribution for each unit configuration at each flow level. The key is unit_config_id-unit_id-flow.
config_id: Stores the id of the unit config
unit_flows(mws): Maps each abstract unit id to the list of flow/mw contributions corresponding to the station efficiency curve’s flow/mw at the same index
hwl: HWL of the station for the efficiency curve to be calculated at
ds_hwl: HWL of the downstream station that will be used to calculate the TWL of this station.
head_step: The spacing of the headwater levels.

The first 3 queries get the head step of the unit types, and propagate that to the stations. We can assume that all of the unit types in the same station have the same head spacing.

For each station, we will calculate the efficiency curve using a HWL halfway between the minimum and maximum HWLs. We will iterate through every reasonable flow value, from 0 to 500 cumecs, and use these HWLs and the downstream HWL to calculate the TWLs at each station. This gives us a headwater level for each flow. The flow value is added to the corresponding head, so that each head has a list of flows that result in that headwater level.

The next queries will count the number of units of each type for each station, and get the head spacing for each station from the unit types.

For every efficiency curve for each unit configuration / head water level, it will take the flows associated with that headwater level and add the MW / flow / config tuple to the heap accumulator. If the flow exceeds the minimum or maximum flow in the efficiency curve, it will not be added.

The heap is then combined with the heaps of other unit configurations at the station vertex.

For each unit config efficiency curve, and for each station flow associated with the curve’s headwater level, the query stores the abstract unit’s MW and flow contribution in config_unit_flow(mw).

The query pops the heap until it is empty. Each tuple contains a flow, MW, and unit config combination. Tuples with the same flow value are together in the heap, with the first tuple having the largest MW. When a tuple is popped, the query checks if its flow is greater than the previous tuple’s. If it is, it is added to the MW / Flow list since this is the largest MW for the next flow value. If it is not, it is ignored. When a MW / flow is added to the efficiency curve, the query iterates through every abstract unit in the station. If the abstract unit is contained in the unit configuration that produced the maximum MW, the flow / MW contribution of the corresponding abstract unit to that config efficiency curve is added to the station curve’s abstract unit’s flow / mw list. If it is not contained, then a 0 is added.

In the final query, the station’s efficiency curve is stored in the database. The query iterates through every abstract unit in the station and creates an abstract unit vertex for the station curve.

convert_station_curves_mw

This query produces a MW-indexed set of values for the station efficiency curve using the flow-indexed set of values.

Accumulators:

heap: A list of efficiency tuples, containing the index, mw, and corresponding efficiency (cumecs / mw). It is sorted in ascending order of MW.
unit_mw_proportion: Maps each abstract unit to the proportion of the total MW that it contributes at each index in the efficiency curve.
unit_mw_index: Maps the mw contribution for each abstract unit
mw: List of MWs
efficiencies: List of efficiencies
original_indexes: List of indexes from the original flow indexed list of values
unit_ids: List of abstract unit ids associated with a station curve
mw_index: List of MWs once they have been converted to an even spacing
flows: List of flows

The first query stores a list of MW proportions for each abstract unit. In the POST-ACCUM, for every MW / flow pair, the MW / efficiency tuple is added to the heap. The heap sorts the tuples by MW.

In the second query, the tuples are converted into separate lists of efficiencies, MWs, and original indexes (the index that the tuple corresponds to in the flow-indexed efficiency curve).

This query iteratively adds 0.5 to the variable curr_mw, representing the current MW index, until the MW index is larger than the MW values in the flow-indexed station curve. The efficiency and MW index can be used to calculate the flow at that MW index.

The query starts with a curr_mw of 0.0, and iterates through the MW of the flow-indexed station curve until the next MW is larger than the current MW. Depending on whether curr_mw is closer to the next or current MW, either i + 1 or i is used as the index. Using this index, the original index can be determined. The original index is used to find the proportion of the original MW output that was contributed by each abstract unit. This is used to calculate the proportion of the MW index that each abstract unit should contribute, and is added to unit_mw_index. Since the abstract unit’s MW contribution is being rounded, it is possible that the total contributions from the abstract units doesn’t equal the MW index. To prevent this, the variable total_mw is used to ensure that the last abstract unit in the loop doesn’t have a MW contribution that exceeds the MW index.

These two queries store the MW-indexed values for the station curve and the abstract units.

Results

With these queries, the station curves can be computed. This is what the station curves look like. As seen in the charts, the curves are not smooth, due to noise in the data. The station curves have bumps, showing that some flows are capable of much higher efficiencies than others. Efficiency is measured in MW/cumecs.