Hydroelectric Optimization with Graph Databases (III: Simulation)

9 min readJun 9, 2021

We will need a query that can simulate Waikato river for a given set of MW inputs, so we can see how the river responds ot hypothetical scenarios. This will be used to calculate the maximum MW that can be produced for a time window.

Pre-Processing

Before running the simulation query or endpoint there is some pre-processing that can help with the results:

Calculating Water Balance: There will be some constant errors, such as erroneous tributary measurements. In order to account for this, I created a query that calculates the expected flow at each minute using the simulation equations, and calculates the error from the actual measured flow. The error is averaged over entire days, and stored as “Water Balance,” which will be added to the calculated flow during the simulation. This allows the simulated flow to better match the historic flow, without overfitting the historic data.

Next Minute Edges. I created a query that links station/unit/substation records to the record of the next minute for a given time period. These edges help the simulation run much faster, since it can directly traverse to the next minute’s record without needing to use a MapAccum.

Simulation Query

Inputs and Accumulators

The simulate_inputs_set_hwl query takes the following parameters:

Time: The datetime to start the simulation at
Steps: The number of time steps to run the simulation for
Time_step: The number of minutes in a timestep. The application uses 30 minute timesteps.
P_inputs: The JSON string of unit inputs. At the top level, each JSON object represents the input for the time step corresponding to that index. Each time step-specific JSON object contains a list of stations. Each station JSON object contains a list of units. Each unit JSON contains a MW setpoint for that timestep.
Station_curve_only: Whether to use station curves to optimize the unit MW setpoints. If true, the unit MWs will be summed by station. Each station’s summed MW target will be used to look up the optimal configuration of unit MWs using the station curves, and the unit MWs will be adjusted to match that.
Output_file: Whether the output should be sent to a .csv file instead of a JSON response.
Hold_ara_head: Whether Aratiatia’s headwater level should be held constant. Taupo gate’s TOTF will be adjusted so that Aratiatia’s inflows and outflows are always the same.
P_hwl_inputs: The HWLs that each station should start at. If a station is not contained in this JSON, then the station will use the HWL from the historic record.

The simulation uses these tuples:

station_curve_max_exceeded_log: Represents data for each time a station’s MW exceeds the station curve. This is sent back to the application.
past_totf: Stores the time and flow of a station. Used in a HeapAccumulator to get the flow from the upstream station.
station_tuple: Represents the status of a station at a certain timestep, used for the JSON response or when outputting to a csv file.
substation_tuple: Status of a substation at a timestamp
unit_tuple: Status of a unit at a timestamp.

The accumulators that will be used to store the output are:

station_tuples: Stores a list of station tuples, one for each station at each timestamp. Sorts the stations in ascending order by time.
unit_tuples: Maps each station and time step combination to the list of unit tuples corresponding to that station at that timestamp.
substation_tuples: Similar to unit_tuples, but for substations.

These variables are:

Curr_time: Keeps track of the current time as we iterate minute-by-minute through the simulation
Epoch: Stores the epoch version of the current time for outputting
End_time: The time at which the simulation ends.

Accumulators used for calculating the total flow at TPO

Accumulators used during the simulation

Accumulators:

Unit_mws: Stores the MW that each unit should be set to. The key is in the form <unit_id>-<time_step>
Station_curve_mws stores the MW that each unit should be set to using the station curve
Station_curve_max_mws: Stores the maximum MW for each station curve.
Station_curve_max_exceeded_logs: Stores a list of logs when the station curve maximum is exceeded, which will be outputted to the application.

Variables used for processing the JSON input:

Declaring the files that we will be outputting to

Curr_time will be set to 1 minute before the first minute of the simulation. This is because we will be using curr_time to retrieve the records that determine the starting TOTF / HWL when the simulation starts, and we want to use the records 1 minute before the first simulated minute.

Storing the station curves:

The query maps each real unit to an abstract unit by mapping each unit type to a list of real unit ids of that type. Then it will iterate through these lists to assign an abstract unit id.

The query parses the JSON input by first iterating through the top level, which contains a JSON object for each time step. Within each time step JSON, it iterates through each station’s JSON. For each station’s JSON, it iterates through that station’s unit inputs. If the user indicated that only station curves should be used, or that this particular station should use station curves, the query sums the total MW input for each unit to get the station’s total MW output. The total MW is converted to the corresponding index in the station efficiency curve. This is then used to lookup the MW setpoint that each unit should be set to, which is stored in unit_mws.

If station curves are not to be used, then the inputted unit MWs are directly stored in unit_mws.

The head step for each unit type is propagated to the units of that type. This will help with looking up efficiency curves during the simulation.

The MW target of the unit is set to MW that is stored for timestep 0. The MW target for the substation / station are the sums of the MW targets of the contained units.

Initiation

In the first query, the downstream travel time of each station is stored.

The downstream stations use the total flows from the upstream station for calculating their headwater level. Since there is a travel delay between stations, the downstream station must use the upstream station’s total flow in the past. At the beginning of the simulation, stations will be using the historic total flows of the upstream stations. The second query uses the downstream travel time to fetch all station records that are before the start of the simulation, but within the downstream travel time, so that they are available to be used by the downstream station. The past total flows are stored in a heap, with the earliest past total flow at the top. This is only done for stations that are not Taupo gate (the gate that controls water flow).

For Taupo gate, the past total flows will use records between start_time — travel_time and end_time. Since Taupo gate’s total flow isn’t modified by the unit inputs, it will only use historic total flows, even during the simulation. To reduce the size of the map accum, only records that line up with the outputted timestamps (top of the hour or 30 minutes) will be stored in tpo_flows for outputting.

In the first query, station records are retrieved that match curr_time (1 minute before the first minute of the simulation). The starting record is used to determine the starting HWL and TOTF of the stations. In the POST-ACCUM, if the JSON hwl_inputs contains the station, then it will set the starting HWL to the supplied input instead.

Starting records are also retrieved for substations and units. The starting substation record determines the substation’s starting TWL.

Iteration

The number of iterations is determined by the total number of minutes steps * time_step. Each iteration, the current time is incremented by 1 minute. The station, substation, and unit records are retrieved by traversing the next minute edges of the previous records.

If the iteration is divisible by the time step, then we’ve entered a new time step and should update the unit’s MWs to the user’s MW inputs for that timestep.

The upstream total flow (popped from the past_totf heap) and the downstream headwater level is used to calculate each station’s HWL and TWL. The gross head is then stored on the units. If Aratiatia’s HWL is to be held steady, then its HWL will not be updated.

Using the MW input and gross head of each unit, the flow at each unit can be determined. The total flow at each substation and station is the sum of the flows at the contained units. The total flow at the station is then stored into the past_totf heap, so that it can be used later for calculating the HWL of the downstream station. If the station is Aratiatia (the first station) and the user wants to hold ARA’s head, then ARA’s upstream total flow, tributaries, spills, and water balance should be stored. This will be used to calculate the total flow at TPO so that ARA’s inflow and outflow are equal, resulting in no HWL change.

If the iteration is divisible by the timestep, then we should store the status of each station, substation, and unit into the tuples as described earlier. If the unit’s historic status was “GN” (available), and its MW target is 0, then its status should be set to “RS” (reserve state). If the unit’s historic status was off but its MW target is greater than 0, then its status should be “GN.”

After the stations have been simulated, the total flow at TPO is calculated. If ARA’s head is to be held steady, then TPO’s flow is calculated so that ARA’s inflows equal the outflows. Otherwise, the historic TPO flow is used.

Output

If the user wants to output the file, then the tuples are iterated over and printed to the .csv file. Otherwise, for each station tuple (station / timestep), the station’s data and the corresponding substation / unit tuples are returned as JSON.