## View/hide extracted text (may have errors)

Triadic Measures on Graphs: The Power of Wedge Sampling* C. Seshadhri† Ali Pinar‡ Tamara G. Kolda§ Abstract along with the information they reveal, motivates met- Graphs are used to model interactions in a variety of rics such as the clustering coeﬃcient and the transitiv- contexts, and there is a growing need to quickly assess ity ratio [31, 32]. The triangle structure of a graph the structure of a graph. Some of the most useful graph is commonly used in the social sciences for positing metrics, especially those measuring social cohesion, are various theses on behavior [11, 21, 7, 14]. Triangles based on triangles. Despite the importance of these tri- have also been used in graph mining applications such adic measures, associated algorithms can be extremely as spam detection and ﬁnding common topics on the expensive. We discuss the method of wedge sampling. WWW [13, 4]. The authors’ earlier work used distribu- This versatile technique allows for the fast and accu- tion of degree-wise clustering coeﬃcients as the driving rate approximation of all current variants of clustering force for a new generative model, Blocked Two-Level coeﬃcients and enables rapid uniform sampling of the Erd¨os-R´enyi [24]. The authors’ have also observed that triangles of a graph. Our methods come with provable relationships among degrees of triangle vertices can be and practical time-approximation tradeoﬀs for all com- a descriptor of the underlying graph [12]. putations. We provide extensive results that show our methods are orders of magnitude faster than the state- 1.1 Clustering coeﬃcients The information about of-the-art, while providing nearly the accuracy of full triangles is usually summarized in terms of clustering enumeration. Our results will enable more wide-scale coeﬃcients. Let G be a simple undirected graph with adoption of triadic measures for analysis of extremely n vertices and m edges. Let T denote the number of large graphs, as demonstrated on several real-world ex- triangles in the graph and W be the number of wedges amples. (a path of length 2). The most common measure is the global clustering coeﬃcient C = 3T/W, which measures 1 Introduction how often friends of friends are also friends. We show that we can achieve speed-ups of up to four orders of Graphs are used to model infrastructure networks, the magnitude with extremely small errors; see Fig. 1 and World Wide Web, computer traﬃc, molecular interac- Fig. 2. tions, ecological systems, epidemics, citations, and so- cial interactions, among others. Despite the diﬀerences in the motivating applications, some topological struc- tures have emerged to be important across all these do- mains. The primary of these is the triangle, consider a manifestation of homophily (people become friends with those similar to themselves) and transitivity (friends of friends become friends). The abundance of triangles, *This work was funded by the DARPA Graph-theoretic Re- search in Algorithms and the Phenomenology of Social Net- works (GRAPHS) program and by the DOE ASCR Complex Dis- tributed Interconnected Systems (CDIS) program, and Sandia’s Laboratory Directed Research & Development (LDRD) program. Figure 1: Speed-up over enumeration for global cluster- Sandia National Laboratories is a multi-program laboratory man- ing coeﬃcient computation with increasing numbers of aged and operated by Sandia Corporation, a wholly owned sub- sidiary of Lockheed Martin Corporation, for the U.S. Department wedge samples of Energy’s National Nuclear Security Administration under con- tract DE-AC04-94AL85000. Our approach is not limited to clustering coeﬃ- †Sandia National Laboratories, CA, scomand@sandia.gov ‡Sandia National Laboratories, CA, apinar@sandia.gov cients, however. A per-vertex clustering coeﬃcient, Cv, §Sandia National Laboratories, CA, tgkolda@sandia.gov is deﬁned as the fraction of wedges centered at v that 10 Copyright © SIAM. Unauthorized reproduction of this article is prohibited.