Model reconstruction from temporal data for coupled oscillator networks

In a complex system, the interactions between individual agents often lead to emergent collective behavior like spontaneous synchronization, swarming, and pattern formation. The topology of the network of interactions can have a dramatic influence over those dynamics. In many studies, researchers start with a specific model for both the intrinsic dynamics of each agent and the interaction network, and attempt to learn about the dynamics that can be observed in the model. Here we consider the inverse problem: given the dynamics of a system, can one learn about the underlying network? We investigate arbitrary networks of coupled phase-oscillators whose dynamics are characterized by synchronization. We demonstrate that, given sufficient observational data on the transient evolution of each oscillator, one can use machine learning methods to reconstruct the interaction network and simultaneously identify the parameters of a model for the intrinsic dynamics of the oscillators and their coupling.

the network topology, the intrinsic frequency of each model oscillator, and the so-called coupling function that specifies the influence of one oscillator on others.
To carry out the model reconstruction, we begin with simulated time series for the oscillator phases and estimated phase velocities. We set up an optimization problem involving the mean squared error for the predicted phase velocities generated by a system of nonlinear differential equations, which can be represented as a convolutional neural network. We then use computational methods designed for neural networks to estimate the optimal parameters, thereby inferring the adjacency matrix for network connectivity, the oscillator coupling strength, the oscillator frequencies, and the Fourier coefficients of the coupling function. This approach is effective for a variety of different networks and model parameters. More specifically, we find that: • Accurate inference of the coupling network, frequencies and coupling functions is possible independent of the model when the system does not synchronize or when sufficient perturbations from synchronization are permitted.
• Model reconstruction is possible with a smaller amount of data than previous approaches that set up the problem as a large system of linear equations.
• Computational methods designed for neural networks, such as mini-batch gradientdescent implemented in TensorFlow [34], can be used to solve the optimization problem associated with model reconstruction.
• Synchronizing networks can be reconstructed using random phase resets or sufficiently large phase perturbations to a small randomly-chosen set of oscillators.
• Synchronizing networks can also be reconstructed using perturbations to a sufficiently large fixed subset of oscillators.
The rest of this paper is organized as follows. In section 2 we introduce the Kuramoto model (a standard model for the dynamics of coupled oscillators), set up the inverse problem for reconstruction, and discuss approaches for solving this inverse problem. In section 3 we describe a series of experiments used to test the effectiveness of these reconstruction techniques over a wide range of model parameters. In section 4, we present the results of the aforementioned experiments and discuss perturbation strategies for improving the reconstruction when the system synchronizes. Finally, in section 5, we summarize our primary findings and discuss extensions to our methodology.

Models and Methods
The Kuramoto model is a standard model for the dynamics of coupled oscillators. We consider N oscillators with phases θ k ∈ [ 0, 2π) for k = 1, 2, . . . , N , each with an intrinsic frequency ω k . These oscillators are coupled through an interaction network with adjacency matrix A, with the entry A kj ∈ {0, 1} determining whether oscillators k and j are coupled. The dynamics of θ k are governed by the equatioṅ (1) Here σ kj represents the strength of the coupling between oscillators j and k, and Γ(θ) represents the coupling function. If the coupling function Γ(θ) is continuous and 2π−periodic, and Γ (θ) is piecewise continuous, then it can be represented using a uniformly convergent Fourier series Γ(θ) = a 0 + ∞ n=1 a n cos(nx) + b n sin(nx) , where the coefficients satisfy a n , b n ≤ M/n 2 for some M . Therefore, these coefficients decay to 0 for large n and this function can be approximated to arbitrary accuracy using a truncated Fourier series [a n cos(nθ) + b n sin(nθ)] with sufficiently large m. Note that within the model, one can set a 0 = 0 without loss of generality, using the change of variables ω k + N j=1 σ kj A kj a 0 → ω k . In Kuramoto's original formulation [10], the oscillators were globally coupled (A kj = 1) with coupling strengths σ kj = K/N where K scales the global coupling strength. He used the coupling function Γ(θ) = sin(θ), and showed that, for K > K C where K C is a critical value related to the width of the distribution of intrinsic frequencies, the oscillators begin to synchronize, achieving identical phase velocities with phases distributed around the population mean. Analogous results were obtained later for systems with periodic coupling functions possessing only odd harmonics [11,30] and arbitrary complex networks [35].

Simulated data generation
We investigate a method for inferring the parameters of the Kuramoto model for phase oscillators on random graphs with uniform coupling strengths σ kj = K N . In our experiments, we focus on undirected Erdős-Rényi graphs, where each possible edge in the network is present with probability of p. It is straightforward to extend our approach to arbitrary networks with non-uniform coupling strengths, though this would likely require a larger amount of time series data. The details of our method appear in sections 2.2 and 2.3.
In order to test our method, we generate data according to the following procedure: (i) Generate an Erdős-Rényi network for a fixed connection probability p where the nodes represent individual oscillators with natural frequencies sampled from a normal distribution with mean µ and standard deviation σ.
(ii) Use numerical integration to generate a time series for the evolution of this system for t ∈ [0, t max ], starting from initial phases drawn from a uniform distribution on [0, 2π]. For numerical integration, we use an explicit Runge-Kutta method of order 5(4) [36] or a stochastic Runge-Kutta method of order 2 [37].
(iii) Compute the phases θ k (t n ) at times t n = n∆t with timestep ∆t and for n = 0, 1, . . . T where T = t max /∆t to obtain observations that are evenly spaced in time.  (iv) Estimate the phase velocities v k (t n ) =θ k (t n ) using central differencing in time with Savitsky-Golay filtering [38]. We used a window length of 5 with first degree polynomials.
(v) Repeat steps (ii-iv) N res times with different uniform random initial conditions to obtain sufficient data during the transient evolution of the system.
See tables 1 and 2 for the network parameters and numerical solution parameters used in this procedure.
The aforementioned process is intended to produce simulated data mimicking that which experimentalists might collect when observing real world networks. Note however, that it may not be possible to control the initial phases in an experiment. We therefore evaluate the reconstruction methods for varying N res , as well as for cases in which small perturbations are used instead of different initial conditions (see section 4.1).

Inverse problem formulation
We will formulate a set of equations whose least squares solution can be used to estimate the natural frequencies ω k of each oscillator, the adjacency matrix A kj of the coupling network, the coupling strength K, and the coupling function Γ(θ), from observed phases θ k (t n ) and phase velocities v k (t n ) generated using the method outlined above. This work builds on prior work by Shandilya and Timme [17], where the adjacency matrix for a network of oscillators is estimated from a time series. We summarize their approach below.
Define vectors T consisting of the oscillator phases and phase velocities. When the coupling strength K and coupling function Γ are known, the phase velocity for oscillator k, is linear in the unknown parameters ω k and A kj . Therefore, one can infer both the natural frequencies and adjacency matrix by solving a linear system of N · T equations, where T is the number of time steps and each timestep j provides N equations of the form Here L(θ j ) is an N × (N + N 2 ) matrix where each row is determined from (4) for a particular oscillator, and x is an (N + N 2 ) × 1 vector with the unknown natural frequencies ω and the entries in the adjacency matrix A. The number of unknowns in A can be reduced to N + N (N − 1)/2 if one assumes A jj = 0 and A kj = A jk . Since the number of equations in the linear system is determined by the number of observed timesteps T and the number of oscillators N , one can ensure that the system is over-determined by collecting enough observations so that T > 1 + N . One can then estimate the parameters by minimizing a loss function such as the mean squared error, where v j denotes the observed value andv j denotes the predicted value of the velocity. As long as the system remains far from synchronization, increasing the number of observations T will provide additional linearly independent equations to aid with reconstruction. However, as networks synchronize, i.e. (v j ) k =θ k →θ, additional observations become (nearly) linearly dependent. This causes the system of equations to become highly ill-conditioned and makes numerical solutions sensitive to noise and rounding errors. As a result, it is often necessary to perturb the system away from equilibrium to ensure that a sufficient number of linearly independent observations can be collected.
Although the approach of [17] outlined above is effective, it is limited by the requirement that the coupling function be known a priori. In [28], Pikovsky addresses this limitation by expressing Γ as a Fourier series so that it, too, can be estimated by solving an analogous optimization problem. However, a challenge remains. If one represents the coupling function as in (3), then the system of equations is no longer linear as terms of the form A kj a n and A kj b n appear. Pikovsky [28] circumvents this issue by defining distinct coupling functions Γ kj (θ) = m n=1 a kjn cos(nθ) + b kjn sin(nθ) for each pair of oscillators and by setting A kj = 1. In this formulation, if oscillators k and j are uncoupled, then a kjn = b kjn = 0 for all n. This modification preserves the linearity of the system of equations and allows for the description of a more general class of networks with distinct coupling functions for each pair of oscillators. Unfortunately, it comes at a cost: the number of unknown parameters increases dramatically from N +N 2 to N + 2mN 2 . Therefore, even longer observation times are required. Additionally, as with the method of [17], if the system synchronizes, there may be numerical difficulties when inferring parameters. Finally, the large number of free parameters makes this model particularly prone to overfitting. As before, one could assume that connections in the network are symmetric, i.e., that Γ kj (θ) = Γ jk (θ) and that self edges are not included Γ jj (θ) = 0, allowing for a reduction in the number of parameters to N + mN (N − 1).
We propose an alternative approach. We use a single coupling function represented by a Fourier series as described in (3) for all coupling terms and attempt to infer the 2m Fourier coefficients a = [a 1 , a 2 , . . . , a m ] T and b = [b 1 , b 2 , . . . , b m ] T along with the intrinsic frequencies ω, adjacency matrix A, and global coupling strength K. This leads to a total 2m + N + N 2 + 1 inferred parameters or 2m + N + N (N − 1)/2 + 1 with symmetry constraints and no self-connections. This is therefore a modest increase of 2m + 1 parameters over the case considered by Shandilya and Timme [17] without requiring prior knowledge of the coupling function.
As mentioned previously, this leads to a set of nonlinear equations due to the appearance of terms of the form KA kj a n and KA kj b n in (4). Since K always appears in products involving a n and b n , these parameters are not structurally identifiable. One could set K = 1 without loss of generality. Instead, we include K as an inferred parameter and introduce penalty terms to the objective function we seek to minimize: The inclusion of a penalty function ensures the existence of a non-degenerate local minimum. In this objective function, the first term above represents the mean squared error. The remaining terms are penalty terms with weights λ Γ = 0.0001, λ A = 10 −6 , and λ bd = 10 5 . These hyper-parameter values were tuned initially to provide satisfactory reconstruction performance, and then kept constant in all subsequent experiments (see also table 3). Additional tuning of these parameters for specific networks could further improve the accuracy of the model reconstruction. The second term with λ Γ introduces L 2 regularization which favors smaller estimates of the parameter values a n and b n . This is useful for combating overfitting and promoting sparse representations, and is analogous to using a Bayesian prior centered at 0 for the Fourier coefficients [39]. The last term with λ A and λ bd penalizes adjacency values that are far from 0 and 1 as well as those that are negative or greater than 1 to ensure that the estimates fall within the desired range of [0, 1]. We do not penalize K and instead allow it to be as large as necessary to counteract the parameter shrinkage caused by the penalty terms.

Inverse problem solution
Minimizing (6) poses a number of challenges. First of all, due to the nonlinearity of (4), this function may not be convex and there are no guarantees that local optimization methods will converge to a global minimum. Secondly, when the number of observations T is large, this function may be costly to compute, and minimizing the number of function evaluations is paramount.
Fortunately, the theoretical challenge of nonconvexity does not prevent our method from obtaining consistently reliable model reconstruction, as we show in section 4. The computational difficulties can be addressed by using tools designed for neural networks and implemented in TensorFlow to efficiently compute gradients and perform minimization with a variant of mini-batch gradient descent [40].
To elucidate the connection with neural networks, we comment that (3) can be viewed as a 2-d convolutional neural network with a convolution of size 1 × 1 and stride 1 applied to the T × N × N tensor consisting of phase differences θ k − θ j . The hidden layer consists of the 2m harmonics of the form cos nθ and sin nθ applied to each entry in this tensor. To avoid redundant weights, we fix the weights in the first hidden layer to be 1 and the biases to be 0. In the second hidden layer, each harmonic is assigned a fixed bias of 0 and a variable weight a n or b n corresponding to the Fourier coefficients. Figure 1 provides a schematic of this neural network. Once the coupling terms have been computed, the rest of (4) and the resulting loss function (6) are straightforward to compute using vectorized operations.
We initialize the inferred parameters as follows: the adjacency matrix A has initial entries drawn from N (0.5, 1/N ), the frequencies ω k are initialized from N (0, 1/N ), the coupling strength K is drawn from N (1, 1/N ), and the Fourier coefficients a, b of the coupling function are initialized at 0.
TensorFlow maintains a computational graph for these operations which allows one to automatically compute gradients. One can therefore use gradient descent methods to compute the optimal estimates (â n ,b n ,ω k ,Â kj ,K) for the parameters. We used a minibatch gradient descent method with batch size of 100, i.e., we randomly assign the data to batches of 100 time-steps. For each batch we calculate the gradient of the loss function and update the estimated parameters by taking a small step in a direction determined from the gradient. Once this has been repeated for all batches (one epoch), Figure 1: Coupling function as a neural network. This schematic illustrates how the coupling function Γ, given by (3), is evaluated at each phase difference θ k − θ j via a convolutional neural network with size 1 × 1 and stride 1. The hidden layer consists of 2m units with activation functions cos(nθ) and sin(nθ), n = 1, 2, . . . m with fixed inputs of weight 1. The output layer uses variable weights representing the Fourier coefficients a n and b n .

Parameter Description
Value n epochs iterations through the training data 300 batch size time-steps of data per gradient descent batch 100 m number of inferred Fourier coefficients 5 λ Γ weight of penalty on non-sparse coupling 0.0001 λ A weight of penalty on network connectivity 10 −6 λ bd weight of penalty on A kj ∈ [0, 1] 10 5 Table 3: Default values and summary of method parameters for the results in section 4.
we pass through the data again for a total of 200 or more epochs. We determined the number of epochs experimentally by iterating until the mean squared error no longer decreased. Typically, 300 epochs were sufficient. However for certain sweeps examining the role of t max , N res , and σ, the number of epochs required for convergence was as high as 10000. By using small random batches of time steps, the gradient can be estimated quickly. This has the added benefit of introducing stochasticity into gradients which makes the algorithm less susceptible to getting caught in local minima. We use AdamOptimizer, which is a gradient descent method with momentum and an adaptive learning rate [41], with default parameter values for optimization in TensorFlow.
There is minor variability in the success of the algorithm due to the random initialization of the inferred parameters and to the batching of the observed data. In order to ensure an accurate network reconstruction, we can retrain the model with new initial values several times and choose the result which has the smallest mean squared error for the velocity predictions on the validation set, which was not considered during the training process. For any individual model network, this validation error is correlated with the accuracy of the inferred parameters. In all examples below, we attempt to reconstruct each network five times before choosing the best reconstruction.

Post-processing
The method described in section 2.3 produces continuous estimates for the parameters (â n ,b n ,ω k ,Â kj ,K) which could be used to predict the dynamics of the system under a variety of initial conditions. However, we are interested in evaluating whether these parameter estimates are an accurate reconstruction of the original model. Before these values can be compared to the model parameters used to generate the data, additional post-processing is needed. We therefore redefine our parameter estimates via the transformationsKÂ kj →Â kj , c 0 + c 1Γ →Γ where c 0 and c 1 are selected to These transformation are necessary to address the aforementioned identifiability issues with K and to allow Γ to have nonzero mean.
In the original model, the adjacency matrix entries are either zero or one. However, we treat the entries as continuous variables during optimization and then choose a threshold so that A kj < is chosen to be 0 and A kj ≥ is chosen to be 1. In practice, one could fix or select so that the reconstructed model minimizes the mean squared error for the data set. However, we explore a range of threshold values using ROC curves and report the reconstruction error rates using the value of that yields the largest F 1 score for the adjacency matrix reconstruction (see Appendix B). These optimal values are robust and good performance is typically obtained over a wide range of thresholds (see section 4 for details).

Experimental design
In order to validate the robustness of our approach, we test the reconstruction method on data simulated with a variety of networks and parameter values. We explore this high-dimensional parameter space by fixing all of the model parameters except for one and then sweeping the remaining parameter over a wide range of values. See tables 1 and 2 for a list of default values as well as the ranges considered. For each set of parameter values, we compute 30 networks with random initial phases and intrinsic frequencies.
We then attempt to reconstruct the underlying model for each network and compute various performance metrics for each reconstruction.
Coupling function -We evaluate the inferred coupling functionΓ(θ) by comparing it to the true coupling function Γ(θ) via the normalized difference in area defined as follows: This represents the area between the true and estimated coupling function curves, weighted by the area under the curve of the true function. This quantity serves as a measure of the error in the reconstruction. A perfect reconstruction would have a normalized difference in area of zero. In the Kuramoto case Γ(θ) = sin(θ), the initial estimateΓ(θ) = 0 corresponds to a value of 1. Therefore values significantly lower than 1 indicate progress towards a correct reconstruction. Intrinsic Frequencies -We compare the inferred intrinsic frequenciesω k to the true intrinsic frequencies ω k using the mean absolute deviation defined as follows: Values near zero indicate accurate reconstructions. We considered alternative metrics such as relative deviations and the correlation between true and inferred frequencies, but these were less informative because they tend to amplify errors when the the intrinsic frequencies are close to zero. Adjacency matrix -We investigate several methods for evaluating the success of adjacency matrix reconstruction. The accuracy can be inspected visually by plotting the true and reconstructed matrices along with the absolute differences in a grid where values range from 0 (black) to 1 (white); see figure 2(a-c). Since we are primarily interested in discrete adjacency values, we interpret reconstruction as a classification problem and compute three standard evaluation metrics: the F 1 score, the classification error rate for the connections, and the area under the ROC curve. See Appendix B for precise definitions of these metrics.

Results
Here we discuss the key results from the parameter sweeps outlined in section 3.
Our method can successfully reconstruct models with a variety of coupling functions. Figure 3a and table C1 show the results for four different coupling functions (see Appendix A for details), three of which come from popular coupled oscillator models: the Kuramoto model, the Kuramoto-Sakaguchi model, and a phase reduction of the Hodgkin-Huxley model, and a fourth, a square wave which represents a generic discontinuous function with high frequency Fourier harmonics. We note that for the square wave coupling function, the accuracy of the reconstruction suffers since the reconstruction includes only five Fourier harmonics and the true coupling function contains an infinite number of harmonics with amplitudes that decay slowly due to the discontinuities. Despite these limitations, the adjacency matrix was still estimated with a high degree of accuracy.  Here we obtain an adjacency matrix classification error rate of 0 for thresholds between 0.1 and 0.9, a normalized difference in area of 0.032 for the coupling function, and a mean absolute deviation of 0.008 for the estimated natural frequencies.
Counterintuitively, as the number of oscillators increases (figure 3b and table C2), we find that reconstruction accuracy improves. This can be explained by the observation that although increasing the number of oscillators increases the number of unknown parameters, it also provides a greater number of pairwise phase differences which can aid in reconstruction of the coupling function. The resulting modest improvements in the estimate of the coupling function can then allow for better inference of the adjacency matrix and intrinsic frequencies as well. This is illustrated by the normalized difference in area plot in figure 3b.
In figure 4 and table C3, we consider simulations where the standard deviation σ of the oscillator frequencies ranges from 0.01 to 1. The normalized difference in area of the coupling function and the mean absolute deviation of the inferred frequencies both show that we can infer the network well for σ ≤ 1. For larger standard deviations, the quality of the reconstruction suffers slightly, since the large frequencies dominate the coupling terms within the phase velocity relationship.   We also explore the impact of noise on the model reconstruction since both observation noise and dynamic noise would be present in experiments. In table C4 and  table C5, we demonstrate that our model reconstruction method is robust to moderate amounts of both types of noise. As in [17], we introduce dynamic noise using stochastic differential equations with Gaussian noise. Observation noise was also Gaussian and was added after integrating the governing differential equations. For both types of noise, we used mean 0 and standard deviations between 0 and 1. It is worth noting that dynamic noise with a standard deviation less than 10 −3 helps with the reconstruction of both coupling function and frequencies. This can be explained by the fact that a small (a) Normalized difference in area of the coupling function. When the dynamic noise standard deviation is less than 10 −3 , the normalized difference in area is smaller than for a model without noise. As the dynamic noise standard deviation increases to 10 −1 , we observe an increase in the normalized difference in area. A significant increase is observed as the noise standard deviation increases to 1. (b) Mean absolute deviation of the frequencies. The mean absolute deviation remains small as the dynamic noise standard deviation increases to 10 −1 .
amount of noise keeps the system from reaching equilibrium, effectively increasing the duration of the transients that provide useful information about the structure of the network. We carry out additional parameter sweeps with varying network connectivity p (table C6), maximum simulation time t max (table C7) and the number of simulation restarts (table C8) with all other parameters set using the default values given in tables 1, 2, and 3. In each case, we are consistently able to reconstruct the coupling function, the underlying network and the intrinsic frequencies model over a wide range of model parameters. The tables in Appendix C illustrate averaged numerical results for 30 trials of each of these parameter sweeps in terms of evaluation metrics such as normalized difference in area, mean absolute deviation, error rate, area under ROC curve, and a range of thresholds that yield F 1 scores within 90% of the largest value.

Results for perturbations of synchronous dynamics
As both [17] and [28] point out, when a network remains synchronized, i.e. wheṅ θ 1 =θ 2 = . . . =θ N , one cannot infer the model parameters due to the fact that the observed phases no longer provide linearly independent equations. Furthermore, even with precise knowledge of the adjacency matrix, one could not hope to reconstruct the coupling function Γ(θ k − θ j ) without data over a wide range of phase differences θ k − θ j .
As such we investigated a method for using small perturbations to introduce brief transients into the dynamics to allow for successful model reconstruction. To test this, we used nearly identical oscillators with frequency standard deviation σ = 0.0001, which causes the system to quickly converge to a synchronized state for almost all initial conditions. We then initialized θ(0) = 0 so that the system begins from perfect synchrony. Then, at times kt max /N pert for k = 1, 2, . . . N pert we added a phase perturbation to a subset of the oscillators. Each perturbation causes some of the oscillators to briefly become desynchronized. In this way, the total number of observed phases remains constant, but the fraction of those observations that occur during transient dynamics is proportional to the number of perturbations, N pert . The observed phases during these transients provide meaningful data about the structure of the network. Figure 6 demonstrates that, as expected, with too few or too small perturbations, model reconstruction is unsuccessful. However, as the number of perturbations N pert increases, the accuracy of the estimated coupling function, adjacency matrix, and the oscillator frequencies improves.
We explored two methods for selecting which oscillators to perturb: fixed subsets, in which the subset of perturbed oscillators was selected at the beginning of the experiment and these same oscillators were perturbed repeatedly, and random subsets, in which a random subset of the system was selected for each perturbation. Although a perturbation to the phase of a particular oscillator does propagate to the phases of neighboring oscillators through the coupling terms, those perturbations decay quickly and are therefore most effective for revealing the local structure of the network. As such, although perturbations to a fixed subset might be more practical in a physical experiment, one must typically perturb a larger fraction of the system to obtain comparable performance to that which is obtained with perturbations to random subsets.
We also considered two types of phase perturbations: phase resets, in which selected oscillators had their phases reset to 0, and phase shifts, in which selected oscillators had their phases modified by adding a random shift η pert ∼ N (0, σ 2 pert ). Phase resets may be more feasible from an experimental perspective, but have the drawback of preserving the mutual synchrony of the subset of oscillators that are perturbed. As such one will typically need to use random subsets in tandem with phase resets in order to be able to resolve the connections between the perturbed oscillators.
In figure 6(a-c), we used phase shifts with σ pert = 0.01 to a fixed subset of size 1, i.e. a single oscillator out of 10. This gives poor reconstruction regardless of the number of perturbations due to the small size of the perturbation. On the other hand, in figure 6(df) we perturb a fixed subset (3 out of 10 oscillators) with random phase shifts with σ pert = 10 ‡. In this case, the perturbations affect a sufficiently large proportion of the oscillators and performance begins to improve once there are 5 or more perturbations. The results in figure 6(g-i) illustrate that one only needs to perturb 1 out of 10 oscillators to obtain similar performance when the oscillators selected are chosen randomly. In figure 6(j-l), we show the results using phase resets to random subsets of 3 oscillators. Again, performance begins to improve dramatically once 5 or more perturbations are used. The tables summarizing the results of these perturbation strategies are provided Mean absolute deviation Figure 6: Reconstruction of a model with synchronous dynamics. We simulate the dynamics for t max = 200 units of time and introduce perturbations at times t = kt max /N pert for k = 1, 2, . . . , N pert . Panels (a-i) use random normally distributed phase perturbations η pert ∼ N (0, σ 2 pert ) with (a-c) σ pert = 0.01 and (d-i) σ pert = 10 and panels (j-l) use phase resetting. In panels (a-c) we perturb a fixed subset of 1 oscillator (out of 10) repeatedly. In panels (d-f), we perturb a fixed subset of three oscillators repeatedly. In panels (g-i), we perturb a single oscillator selected randomly for each perturbation. In panels (j-k), we reset a subset of three oscillators selected randomly with each perturbation. The first column displays the normalized difference in area for the coupling function, the second displays the error rate for the adjacency matrix, and the third displays the mean absolute deviation for the intrinsic frequencies. In panels (a-c) reconstruction of the coupling function and adjacency matrix fail regardless of the number of perturbations due to the small perturbation size. In the remaining panels, reconstruction is successful once there are a sufficient number of perturbations. Here the oscillators are nearly identical: σ = 0.0001 and with initial condition θ k = 0 for all k. Error rate (%) Figure 7: Comparison of our method (left column) with the method described in reference [28] (right column), for different numbers of restarts. Panels (a) and (b) display the normalized difference in area of the coupling function. For the coupling function reconstruction, our method performs well with a small number of restarts while Pikovsky's method requires a relatively larger number of restarts. Panels (c) and (d) display the error rate of the adjacency matrix. Again, both methods perform well with 10 or more restarts. However, our method provides more accurate reconstructions when the number of restarts is five or less.
in Appendix C.

Comparison with reference [28]
The amount of transient data necessary for reconstruction is also useful in comparing our method to previous approaches. As discussed in section 2.2, Pikovsky proposes an alternative method where distinct coupling functions are considered for the interaction of each pair of oscillators [28]. This ensures that the system of equations in the optimization is linear, however it also increases the number of unknown coefficients significantly. Our approach relies on the assumption that the same coupling function Γ is used for all pairs of oscillators. This is a reasonable approximation for physical systems when the physical mechanisms governing the interactions between oscillators are the same. Pikovsky [28] uses a more general model in which the coupling functions may be different for each pair of oscillators. Figure 7 shows that when the coupling functions are identical, our method provides more accurate reconstructions of both the network topology and the coupling function with smaller amounts of transient data. As the amount of transient data increases, both methods ultimately achieve perfect network reconstruction while the approach in [28] ultimately obtains slightly better estimates for the coupling function. Therefore, our approach would be preferred under circumstances where the amount of data is limited and when the coupling functions are nearly identical.

Discussion and conclusions
In this paper, we have designed a method for reconstructing models of coupled oscillator networks including both the network connections and the intrinsic oscillator properties. We hope analysts might investigate (non)convexity of our penalty function and properties of minimizers. These issues aside, after testing our method with many different parameters, we conclude that our algorithm can successfully infer the model. A common challenge that our method and many related ones encounter is that the procedure fails if the system synchronizes too quickly. One remedy is to have a sufficient number of observations during the desynchronized transients. As we demonstrate, one can ensure that this data is available by repeating experiments with multiple initial conditions or by adjusting the model parameters to inhibit synchronization by instituting large frequency variability (σ) or dynamic noise. An alternative remedy is to move the system away from synchrony using perturbations, preferably ones that are physically realizable. The introduction of these perturbations provides useful transient data. By perturbing a sufficiently large subset of oscillators with a large enough change, we are able to infer the model accurately. Our hope is that this method will be adopted by experimentalists and used with experimental data to aid with the construction of interpretable models for the dynamics of networks of coupled oscillators.
Although the numerical experiments outlined here involve Erdös-Rényi networks, the method can be applied more generally to a broad class of networks. Preliminary testing suggests that similar results can be obtained for other topologies such as star, small-world, scale-free networks, and clique networks.
It is also straightforward to extend this approach to other models for coupled phase oscillators such as the Winfree model [42] or even more general oscillator models such as the Stuart-Landau model [43] in which both phase and amplitude variations are permitted. Indeed, any system where the unknown functions are periodic can be represented using our technique.
For models containing unknown functions that are not periodic such as the Hodgkin-Huxley model [44], a Fourier series representation is not possible. In these cases, one could represent the coupling functions using feed-forward neural networks, which are capable of representing continuous functions to arbitrary accuracy using a finite number of parameters [45]. Given a sufficiently rich data set, one could still use our approach with back-propagation to learn the structure of the neural network approximation for the coupling function.
The coupling functions investigated are described in table A1. The first three classes of functions were selected due to their use in popular coupled oscillator models. The last was an example of a generic coupling function with higher order harmonics that was tested to verify that reconstruction of the adjacency matrix is possible even with an imperfect approximation to the coupling function.

Name
Function Square Wave The F 1 score is defined as Here, precision is the fraction of true positives among all inferred positives, while recall is the fraction of true positives among all positives. The F 1 score is a value between 0 and 1; when reporting the error rate for the reconstructed adjacency matrix, we use the threshold = max that corresponds to the largest F 1 score. We also determine the interval of thresholds that yield F 1 scores within 90% of this largest value. The width of this interval is reported in the tables in Appendix C as "Interval width (> 90%)". The error rate is defined as the percentage of entries in A that are incorrectly classified. We report the error rate for the optimal threshold discussed in section 2.4.
The ROC curve, or receiver operating characteristic, is a parametric curve that uses the classification threshold as a parameter and uses the true positive rate and the false positive rate as the the variables [46]. The area under the ROC curve measures the quality of a classifier independent any particular threshold. A value of 1/2 is consistent with blind guesses, and a value of 1 indicates a perfect classifier since it implies the existence of a threshold for which the rate of true positives is 1 and the rate of false positives is 0. Our ROC curves were generated using the scikit-learn package in Python [47].

Appendix C. Tables of results
Here we include several tables which report the averaged numerical results of the experimental parameter sweeps described in section 3 and reported on in section 4. The performance metrics used are defined in Appendix B. Values given in the tables represent the mean plus or minus one standard deviation of the corresponding performance metric over the 30 trials run at that parameter value. We note that some of the distributions, such as the error rate (which is nonnegative by definition), are skewed.                  Table C13: Sweep through the number of transients observed for a simulation that carries out the optimization for a linear system as proposed in [28].