mossspider.estimators.tmle.NetworkTMLE

class NetworkTMLE(network, exposure, outcome, degree_restrict=None, alpha=0.05, continuous_bound=0.0005, verbose=False)

Implementation of the Targeted Maximum Likelihood Estimator (TMLE) for network dependent data. The following procedure estimates the expected incidence under a treatment plan of interest. For stochastic treatment plans, the expected incidence is obtained through Monte Carlo integration of a subsample of possible treatment allotments that correspond to the plan of interest.

Note

Network-TMLE makes the weak dependence assumption, such that only direct contacts’ treatment can interfere with individual i’s outcome.

Parameters
  • network (NetworkX Graph) – NetworkX undirected network without self-loops. Additionally, all variables should be stored as attributes for each node. Targetula extracts the node data from the graph and creates a pandas.DataFrame object from that information. It is important that no nodes have missing data. Currently there is no procedure to handle missing data

  • exposure (str) – String indicating the exposure variable of interest.

  • outcome (str) – String indicating the outcome variable of interest.

  • degree_restrict (None, list, tuple, optional) – Restriction on the minimum & maximum degree for nodes to be included in the estimand. Must be a list with a length of two, where the first value corresponds to the lower bound and the second is the upper bound for degree. Values are inclusive. All samples below the first value OR above the second level are considered as “background” features. Hence the intervention does not change their exposure.

  • alpha (float, optional) – Alpha for confidence interval level. Default is 0.05

  • continuous_bound (float, optional) – For continuous outcomes, TMLE needs to bound Y between 0 and 1. However, 0/1 cannot be included in these bounded values. This specification sets the bounds for the continuous outcomes. The default is 0.0005.

  • verbose (bool, optional) – Whether to print all intermediary model results for the estimation process. When set to True, each of the model results are printed to the console. The default is False.

Note

mossspider calculates exposure mapping variables automatically with the input network. These variables are saved as variable-name_map. So for a variable ‘A’, the newly created exposure mapping variable calculated is ‘A_map’

Note

For directed networks, the direction of of influence goes from the target node to the source (i.e. opposite of the arrow direction). If A –> B then B’s covariates will be part of the A’s summary measures.

Examples

Setting up environment

>>> from mossspider import NetworkTMLE
>>> from mossspider.dgm import uniform_network, generate_observed

Generating a generic network and some data

>>> graph = generate_observed(uniform_network(n=500, degree=[1, 6]))

Estimation with NetworkTMLE (nonparametric summary measure in exposure map model)

>>> tmle = NetworkTMLE(network=graph, exposure='A', outcome='Y')
>>> tmle.exposure_model('W + W_map')
>>> tmle.exposure_map_model('A + W + W_map', distribution=None)
>>> tmle.outcome_model('A + W + A_map + W_map', print_results=False)
>>> tmle.fit(p=0.8, bound=10e5)
>>> tmle.summary()

Estimation with NetworkTMLE (parametric summary measure in exposure map model)

>>> tmle = NetworkTMLE(network=graph, exposure='A', outcome='Y')
>>> tmle.exposure_model('W + W_map')
>>> tmle.exposure_map_model('A + W + W_map', measure='sum', distribution='poisson')
>>> tmle.outcome_model('A + W + A_map + W_map', print_results=False)
>>> tmle.fit(p=0.8, bound=10e5)
>>> tmle.summary()

Estimation with NetworkTMLE and restricting inference by degree

>>> tmle = NetworkTMLE(network=graph, exposure='A', outcome='Y', degree_restrict=[0, 5])
>>> tmle.exposure_model('W + W_map')
>>> tmle.exposure_map_model('A + W + W_map', measure='sum', distribution='poisson')
>>> tmle.outcome_model('A + W + A_map + W_map', print_results=False)
>>> tmle.fit(p=0.8, bound=10e5)
>>> tmle.summary()

Diagnostic plot for support of policy of interest in observed data

>>> import matplotlib.pyplot as plt
>>> tmle.diagnostics()
>>> plt.show()

Generating a threshold measure based on a summary measure

>>> tmle = NetworkTMLE(network=graph, exposure='A', outcome='Y')
>>> tmle.define_threshold(variable='A_sum', threshold=3)  # A_sum_t3

Generating a category measure based on a binned summary measure

>>> tmle = NetworkTMLE(network=graph, exposure='A', outcome='Y')
>>> tmle.define_category(variable='A_sum', bins=[0, 1, 2, 4, 6])  # A_sum_c

References

van der Laan MJ. (2014). Causal inference for a population of causally connected units. Journal of Causal Inference, 2(1), 13-74.

Sofrygin O & van der Laan MJ. (2017). Semi-parametric estimation and inference for the mean outcome of the single time-point intervention in a causally connected population. Journal of Causal Inference, 5(1).

Ogburn EL, Sofrygin O, Diaz I, & van der Laan MJ. (2017). Causal inference for social network data. arXiv preprint arXiv:1705.08527.

Sofrygin O, Ogburn EL, & van der Laan MJ. (2018). Single Time Point Interventions in Network-Dependent Data. In Targeted Learning in Data Science (pp. 373-396). Springer.

__init__(network, exposure, outcome, degree_restrict=None, alpha=0.05, continuous_bound=0.0005, verbose=False)

Methods

__init__(network, exposure, outcome[, ...])

define_category(variable, bins[, labels])

Function arbitrarily allows for multiple different defined thresholds

define_threshold(variable, threshold)

Function arbitrarily allows for multiple different defined thresholds

diagnostics([figsize, color_a1, color_a0])

Returns diagnostic plot for the specified network-TMLE.

exposure_map_model(model[, measure, ...])

Exposure summary measure model for individual i.

exposure_model(model[, custom_model, ...])

Exposure model for individual i.

fit(p[, samples, bound, seed])

Estimation procedure under a specified treatment plan.

outcome_model(model[, custom_model, ...])

Estimation of the outcome model E(Y|A, A_map, W, W_map).

summary([decimal])

Prints summary results for the sample average treatment effect under the treatment plan specified in the fit procedure

exposure_model(model, custom_model=None, custom_model_sim=None)

Exposure model for individual i. Estimates Pr(A=a|W, W_map) using a logistic regression model.

Note

This function only saves the model specifications. IPTW are calculated later during the fit() procedure since the policy is needed.

Parameters
  • model (str) – Exposure mapping model. Ideally would include treatment for individual i

  • custom_model – User-specified model

  • custom_model_sim – User-specified model. This allows the user to specify a different IPW model to be fit for the numerator. That model is fit to the simulated data, so some constraints may be added to speed up the estimation procedure. If None and custom_model is not None, copies over the custom_model used.

exposure_map_model(model, measure=None, distribution=None, custom_model=None, custom_model_sim=None)

Exposure summary measure model for individual i. Estimates Pr(A_map=a|A=a, W, W_map) using a logistic regression model.

Note

Only saves the model specifications. IPTW are calculated later during the fit() function

There are several options for the distributions of the summary measure. One option is a non-parametric approach that estimates the probability for each individual contact (works best for uniform distributions). However, this approach may not always be possible to estimate. Instead, parametric distributional assumption can be used instead. Currently, implemented are normal and Poisson distributions.

Parameters
  • model (str) – Exposure mapping model. Ideally would include treatment for individual i

  • measure (None, str, optional) – Exposure mapping to use for the modeling statement. Options include ‘mean’ and ‘sum’. Default is None which natively works with the distribution=None option

  • distribution (None, str, optional) – Distribution to use for exposure mapping model. Options include: non-parametric (None), Normal (‘normal’), Poisson (‘poisson’).

  • custom_model (None, optional) – User-specified model

  • custom_model_sim – User-specified model. This allows the user to specify a different IPW model to be fit for the numerator. That model is fit to the simulated data, so some constraints may be added to speed up the estimation procedure. If None and custom_model is not None, copies over the custom_model used.

outcome_model(model, custom_model=None, distribution='normal')

Estimation of the outcome model E(Y|A, A_map, W, W_map).

Note

Estimates the outcome model (g-formula) using the observed data and generates predictions under the observed distribution of the exposure.

Parameters
  • model (str) – Specified Q-model

  • custom_model – User-specified model

  • distribution (optional, str) – For non-binary outcome variables, the distribution of Y must be specified. Default is ‘normal’.

fit(p, samples=100, bound=None, seed=None)

Estimation procedure under a specified treatment plan.

This function estimates the IPTW for the treatment plan of interest, performs the target steps, and performs Monte Carlo integration with the targeted model, and calculates confidence intervals. Confidence intervals are obtained from influence curves.

Parameters
  • p (float, int, list, set) – Percent of population to treat. For conditional treatment plans, a container object of floats. All values must be between 0 and 1

  • samples (int) – Number of samples to generate to calculate numerator for weights and for the Monte Carlo integration procedure for stochastic treatment plans. For deterministic treatment plans (p==1 or p==0), samples is set to 1 to reduce computation burden. Deterministic treatment plan do not require the Monte Carlo integration procedure

  • bound (None, int, float) – Bounds to truncate calculate weights by…

  • seed (int, None) – Random seed for the Monte Carlo integration procedure

summary(decimal=3)

Prints summary results for the sample average treatment effect under the treatment plan specified in the fit procedure

Parameters

decimal (int) – Number of decimal places to display

Returns

Return type

None

diagnostics(figsize=(6, 5), color_a1='blue', color_a0='red')

Returns diagnostic plot for the specified network-TMLE. The currently available diagnostic presents plots of the designated summary measure for \(A^s\) (stratified by \(A\)) for the observed data, and the Monte Carlo simulated data. This diagnostic can be used to visually assess whether the designated policy is poorly-supported by the data.

Note

A policy that has little overlap with the observed data is indicative of the policy being poorly supported by the observed data. Poorly-supported policies may not be well estimated and thus considering other stochastic policies in recommended.

Parameters
  • figsize (list, set, array, optional) – Determine the figure size (dimensions). Passes directly to plt.subplots(...figsize=figsize).

  • color_a1 (str, optional) – Color for the A=1 group in the figure. Default is blue.

  • color_a0 (str, optional) – Color for the A=0 group in the figure. Default is red.

Returns

Return type

Diagnostic plot for data support of policy.

define_threshold(variable, threshold)

Function arbitrarily allows for multiple different defined thresholds

Parameters
  • variable (str) – Variable to generate categories for

  • threshold (int, float) – Threshold to use as the cutpoint.

define_category(variable, bins, labels=False)

Function arbitrarily allows for multiple different defined thresholds

Parameters
  • variable (str) – Variable to generate categories for

  • bins (list, set, array) – Bin cutpoints to generate the categorical variable for. Uses pandas.cut(..., include_lowest=True) to create the binned variables.

  • labels (list, set, array) – Specified labels. Can be given custom labels, but generally recommend to keep set as False