Function will return clusters, given a frame of case counts by location and date, a distance matrix, a spline lookup table, and other parameters
Usage
find_clusters(
cases,
distance_matrix,
detect_date,
spline_lookup = NULL,
baseline_length = 90,
max_test_window_days = 7,
guard_band = 0,
distance_limit = 15,
baseline_adjustment = c("add_one", "add_one_global", "add_test", "none"),
adj_constant = 1,
min_clust_cases = 0,
max_clust_cases = Inf,
use_fast = TRUE,
return_interim = FALSE
)Arguments
- cases
a frame of case counts by location and date
- distance_matrix
a square distance matrix, named on both dimensions or a list of distance vectors, one for each location
- detect_date
a date that indicates the end of the test window in which we are looking for clusters
- spline_lookup
default NULL; either a spline lookup table, which is a data frame that has at least two columns: including "observed" and "spl_thresh", OR a string indicating to use one of the built in lookup tables: i.e. one of
"001", "005", "01", "05". If NULL, the default table will be 01 (i.e.spline_01dataset)- baseline_length
integer (default = 90) number of days in the baseline interval
- max_test_window_days
integer (default = 7) number of days for the test window
- guard_band
integer (default = 0) buffer days between baseline and test interval
- distance_limit
numeric (default=15) maximum distance to consider cluster size. Note that the units of the value default (miles) should be the same unit as the values in the distance matrix
- baseline_adjustment
one of four string options: "add_one" (default), "add_one_global", "add_test", or "none". All methods except for "none" will ensure that the log(obs/expected) is always defined (i.e. avoids expected =0). For the default, this will add 1 to the expected for any individual calculation if expected would otherwise be zero. "add_one_global", will add one to all baseline location case counts. For "add_test_interval", each location in the baseline is increased by the number of cases in that location during the test interval. If "none", no adjustment is made.
- adj_constant
numeric (default=1.0); this is the constant to be added if
baseline_adjustment == 'add_one'orbaseline_adjustment == 'add_one'- min_clust_cases
(default = 0); minimum number of case within a returned cluster.
- max_clust_cases
(default = Inf); maximum number of cases within a returned cluster.
- use_fast
boolean (default = TRUE) - set to TRUE to use the fast version of the compress clusters function
- return_interim
boolean (default = FALSE) - set to TRUE to return all interim objects of the
find_clusters()function
Examples
find_clusters(
cases = example_count_data,
distance_matrix = county_distance_matrix("OH")[["distance_matrix"]],
detect_date = example_count_data[, max(date)],
distance_limit = 50
)
#> $cluster_alert_table
#> Key: <target>
#> target observed spl_thresh date test_totals location base_clust_sums
#> <char> <int> <num> <Date> <int> <char> <num>
#> 1: 39003 335 0.2113664 2025-01-30 9673 39003 1263
#> 2: 39005 166 0.3304438 2025-01-30 9673 39005 489
#> 3: 39009 215 0.2769749 2025-01-30 9673 39009 560
#> 4: 39015 80 0.4995393 2025-02-04 2858 39025 862
#> 5: 39017 280 0.2330866 2025-01-30 9673 39017 1383
#> 6: 39039 67 0.5758212 2025-02-01 6898 39039 335
#> 7: 39061 287 0.2297259 2025-02-02 5553 39061 2270
#> 8: 39081 37 0.8068677 2025-02-04 2858 39081 326
#> 9: 39109 399 0.1928263 2025-02-04 2858 39021 5866
#> 10: 39141 160 0.3385027 2025-01-31 8267 39129 794
#> distance_value count count_sum detect_date baseline_total expected
#> <num> <int> <int> <Date> <num> <num>
#> 1: 0.00000 67 335 2025-02-05 63803 191.48001
#> 2: 0.00000 44 166 2025-02-05 63803 74.13597
#> 3: 0.00000 59 215 2025-02-05 63803 84.90008
#> 4: 17.32111 26 59 2025-02-05 63803 38.61254
#> 5: 0.00000 46 280 2025-02-05 63803 209.67288
#> 6: 0.00000 14 67 2025-02-05 63803 36.21820
#> 7: 0.00000 79 287 2025-02-05 63803 197.56610
#> 8: 0.00000 20 37 2025-02-05 63803 14.60289
#> 9: 24.96767 11 25 2025-02-05 63803 262.76238
#> 10: 22.46182 10 59 2025-02-05 63803 102.87914
#> log_obs_exp min_dist alert_ratio alert_gap id nr_locs max_date
#> <num> <num> <num> <num> <int> <int> <IDat>
#> 1: 0.5593471 0.00000 2.646338 0.34798069 2 1 2025-02-05
#> 2: 0.8060870 0.00000 2.439407 0.47564318 10 1 2025-02-05
#> 3: 0.9291630 0.00000 3.354683 0.65218806 15 1 2025-02-05
#> 4: 0.7284495 17.32111 1.458243 0.22891024 46 2 2025-02-05
#> 5: 0.2892410 0.00000 1.240916 0.05615438 59 1 2025-02-05
#> 6: 0.6151308 0.00000 1.068267 0.03930963 123 1 2025-02-05
#> 7: 0.3734090 0.00000 1.625454 0.14368309 141 1 2025-02-05
#> 8: 0.9296987 0.00000 1.152232 0.12283098 163 1 2025-02-05
#> 9: 0.4177113 24.96767 2.166257 0.22488502 190 5 2025-02-05
#> 10: 0.4416189 22.46182 1.304625 0.10311621 259 3 2025-02-05
#>
#> $cluster_location_counts
#> location count target
#> <char> <int> <char>
#> 1: 39009 215 39009
#> 2: 39005 166 39005
#> 3: 39003 335 39003
#> 4: 39015 21 39015
#> 5: 39025 59 39015
#> 6: 39021 25 39109
#> 7: 39037 26 39109
#> 8: 39109 36 39109
#> 9: 39113 299 39109
#> 10: 39149 13 39109
#> 11: 39061 287 39061
#> 12: 39081 37 39081
#> 13: 39129 59 39141
#> 14: 39131 22 39141
#> 15: 39141 79 39141
#> 16: 39017 280 39017
#> 17: 39039 67 39039
#>
#> attr(,"class")
#> [1] "list" "clusters"