Skip to content


GraphWaGu:面向 Web 的 GPU 驱动大规模图布局计算与渲染

Abstract

Large scale graphs are used to encode data from a variety of application domains such as social networks, the web, biological networks, road maps, and finance. Computing enriching layouts and interactive rendering play an important role in many of these applications. However, producing an efficient and interactive visualization of large graphs remains a major challenge, particularly in the web-browser. Existing state of the art web-based visualization systems such as D3.js, Stardust, and NetV.js struggle to achieve interactive layout and visualization for large scale graphs. In this work, we leverage the latest WebGPU technology to develop GraphWaGu, the first WebGPU-based graph visualization system. WebGPU is a new graphics API that brings the full capabilities of modern GPUs to the web browser. Leveraging the computational capabilities of the GPU using this technology enables GraphWaGu to scale to larger graphs than existing technologies. GraphWaGu embodies both fast parallel rendering and layout creation using modified Frutcherman-Reingold and Barnes-Hut algorithms implemented in WebGPU compute shaders. Experimental results demonstrate that our solution achieves the best performance, scalability, and layout quality when compared to current state of the art web-based graph visualization libraries.

大规模图可用于编码来自多种应用领域的数据,例如社交网络、Web、生物网络、道路地图和金融。 在这些应用中,计算具有信息量的布局以及进行交互式渲染都非常重要。 然而,高效且交互式地可视化大图仍然是一项重大挑战,尤其是在 Web 浏览器中。 现有先进的 Web 可视化系统,例如 D3.js、Stardust 和 NetV.js,在大规模图的交互式布局和可视化方面仍然很吃力。 在本文中,作者利用最新的 WebGPU 技术开发了 GraphWaGu,这是第一个基于 WebGPU 的图可视化系统。 WebGPU 是一种新的图形 API,它把现代 GPU 的完整能力带到了 Web 浏览器中。 借助这项技术利用 GPU 的计算能力,使 GraphWaGu 能够比现有技术扩展到更大的图。 GraphWaGu 同时实现了快速并行渲染和布局生成,其中布局采用在 WebGPU compute shader 中实现的改进 Frutcherman-Reingold 与 Barnes-Hut 算法。 实验结果表明,与当前先进的 Web 图可视化库相比,作者的方法在性能、可扩展性和布局质量上都取得了最佳结果。

1. Introduction

A graph is used to represent connected entities and the relationships between them. Graphs are ubiquitous, appearing in many application domains, such as social networks, the web, the semantic web, road maps, communication networks, biology, and finance. It is common to process and analyze graphs for the purpose of extracting features such as connected components, cliques/triangles, and shortest paths. There are many production level graph mining and analytic systems to perform these kinds of tasks. Visualizing graphs where nodes and edges are mapped and projected to the 2D plane is another crucial task. There are many applications of graph visualization, including community detection, cluster analysis, and summarizing and understanding the overall structure of connected data. There are two aspects associated with visualizing a graph in 2D, layout creation (also referred to as graph drawing), where Cartesian coordinates are computed for every node of the graph, and rendering, where nodes and edges are rendered to an image based on the computed layout. In this paper, we limit the discussion of layout creation and rendering to undirected graphs.

图用于表示相互连接的实体以及它们之间的关系。 图无处不在,出现在许多应用领域中,例如社交网络、Web、语义 Web、道路地图、通信网络、生物学和金融。 人们通常会处理和分析图,以提取连通分量、团/三角形、最短路径等特征。 已有许多生产级图挖掘和图分析系统用于执行这类任务。 将节点和边映射并投影到二维平面来可视化图,是另一项关键任务。 图可视化有许多应用,包括社区检测、聚类分析,以及总结和理解连接数据的整体结构。 二维图可视化包含两个方面:布局生成,也称为图绘制,即为图中每个节点计算笛卡尔坐标;以及渲染,即根据计算出的布局把节点和边渲染成图像。 本文将布局生成和渲染的讨论限制在无向图上。

With the advent of big data and large scale applications leveraging HPC resources, we are observing graphs of increasing sizes, with millions of vertices and edges. Interactively visualizing such large scale graphs is challenging, as the massive number of nodes and edges poses a severe computational and rendering challenge. This problem is further exacerbated for web-based visualization systems. The web browser has become the preferred modality for data visualization, as it provides a standardized cross-platform environment through which applications can be deployed to users. Existing state of the art web based graph visualization libraries such as D3.js, Cytoscape.js, and Stardust struggle to achieve interactive rendering and layout creation for such large scale graphs.

随着大数据的发展,以及越来越多大规模应用利用 HPC 资源,人们正在面对规模不断增长的图,这些图包含数百万个顶点和边。 交互式地可视化这类大规模图很困难,因为海量节点和边会带来严峻的计算和渲染挑战。 对于基于 Web 的可视化系统,这一问题会进一步加剧。 Web 浏览器已经成为数据可视化的首选载体,因为它提供了标准化的跨平台环境,应用可以通过浏览器部署给用户。 现有先进的 Web 图可视化库,例如 D3.js、Cytoscape.js 和 Stardust,在这类大规模图的交互式渲染和布局生成方面仍然困难。

To deal with visualizing large graphs, existing state of the art graph mining and analytic systems perform concurrent parallel execution, leveraging either shared memory parallelism on GPUs or distributed parallelism through MPI. Existing web-based graph visualization libraries, however, have either been limited to serial execution on the CPU, as in D3.js, or have leveraged GPU capabilities for rendering only, as in Stardust.js. These limitations have caused existing web-based graph visualization systems to struggle with scaling to large graphs while maintaining interactivity. WebGPU is a new technology that enables addressing this limitation in the web browser. Compared to WebGL, the current standard for GPU rendering on the web which is used by Stardust and NetV.js to accelerate rendering, WebGPU provides a significant increase in GPU capabilities. Specifically, WebGPU provides compute shaders and storage buffers, which are essential for implementing computational algorithms on the GPU.

为了处理大图可视化,现有先进的图挖掘和图分析系统会进行并发并行执行,利用 GPU 上的共享内存并行性,或通过 MPI 利用分布式并行性。 然而,现有 Web 图可视化库要么受限于 CPU 上的串行执行,例如 D3.js;要么只把 GPU 能力用于渲染,例如 Stardust.js。 这些限制使得现有 Web 图可视化系统在扩展到大图并保持交互性方面十分困难。 WebGPU 是一项新技术,使得在 Web 浏览器中解决这一限制成为可能。 与当前 Web 上 GPU 渲染标准 WebGL 相比,WebGPU 显著增强了 GPU 能力;Stardust 和 NetV.js 正是使用 WebGL 来加速渲染。 具体来说,WebGPU 提供 compute shader 和 storage buffer,而这些能力对于在 GPU 上实现计算算法至关重要。

We have developed a web-based graph visualization library, GraphWaGu, that uses WebGPU to leverage the GPU’s computational power for layout creation and rendering for undirected graphs in 2 dimensions. By doing so, GraphWaGu is capable of visualizing large-scale networks at interactive frame rates. In particular, we make the following specific contributions: The first parallel and scalable WebGPU based graph rendering system, capable of rendering up to 100,000 nodes and 2,000,000 edges with interactive frame rates (≥ 10 FPS). The first parallel WebGPU based implementation of force directed layout computation using the Frutcherman-Reingold algorithm. An optimized parallel implementation for force directed layout computation using quadtree generation and traversal for Barnes-Hut simulation in WebGPU compute shaders (without recursion or use of pointers). Experimental studies to compare the performance and scalability of our WebGPU solutions against state of the art web-based graph visualization libraries. For 100,000 nodes and 2,000,000 edges, we maintain rendering frame rates 4× higher than the next best library (NetV.js) and layout creation times up to half that of D3.js.

作者开发了一个基于 Web 的图可视化库 GraphWaGu,它使用 WebGPU 来利用 GPU 的计算能力,在二维空间中为无向图进行布局生成和渲染。 因此,GraphWaGu 能够以交互式帧率可视化大规模网络。 具体而言,作者做出了以下具体贡献: 第一个并行且可扩展的基于 WebGPU 的图渲染系统,能够以交互式帧率(≥ 10 FPS)渲染多达 100,000 个节点和 2,000,000 条边。 第一个使用 Frutcherman-Reingold 算法、基于 WebGPU 的并行力导向布局计算实现。 一个用于力导向布局计算的优化并行实现,它在 WebGPU compute shader 中使用四叉树生成与遍历来进行 Barnes-Hut 模拟,并且不使用递归或指针。 将作者的 WebGPU 方案与先进 Web 图可视化库进行性能和可扩展性比较的实验研究。 对于 100,000 个节点和 2,000,000 条边,GraphWaGu 的渲染帧率比第二好的库 NetV.js 高 4 倍,布局生成时间最高可降至 D3.js 的一半。

In this section we cover relevant background work for both phases of graph visualization: graph layout creation and rendering. We then briefly discuss the key features of WebGPU that we leverage in GraphWaGu.

本节介绍图可视化两个阶段的相关背景工作:图布局生成与图渲染。 随后,作者简要讨论 GraphWaGu 中所利用的 WebGPU 关键特性。

2.1. Layout Creation

A graph G is comprised of a vertex (or node) set V and an edge (or link) set E, with cardinalities |V| and |E| respectively. When rendering a graph, every vertex is projected on a 2D plane, where the position of a vertex v is denoted by Pv. In this paper, we choose to restrict our discussion to only undirected graphs. We define two nodes u and v in G to be adjacent to each other if and only if eu,vE.

G 由顶点(或节点)集合 V 和边(或链接)集合 E 组成,它们的基数分别为 |V||E| 在渲染图时,每个顶点都会被投影到二维平面上,其中顶点 v 的位置记为 Pv 本文将讨论限制在无向图上。 若且唯若 eu,vE,作者定义 G 中的两个节点 uv 相邻。

Pioneering work by Eades proposed the spring embedder algorithm to generate aesthetically pleasing layouts by treating undirected graphs as mechanical systems of steel balls and springs. The spring force acting between nodes attracts or repels them from each other depending on the distance between them, bringing change to the energy of system. The nodes are allowed to move from their initial placement due to these spring forces until a global minimum energy state is attained. There have been multiple further works such as the KK method and Frutherman and Reingold force directed algorithm to generate aesthetic graph layouts. The FR algorithm introduces a spring-electric model as a modification to the SE algorithm that imitates a physics simulation where vertices map to electrical particles and edges correspond to springs following Hook’s law. In this method, attractive forces are computed between adjacent nodes that pull them together, and repulsive forces are computed between every pair of non-adjacent nodes that repel them from each other. The algorithm defines an ideal length l where these forces will cancel out, distance duv as euclidean distance between u and v, and computes attractive forces (fa) and repulsive forces (fr) as follows:

Eades 的开创性工作提出了 spring embedder 算法,它把无向图看作钢球和弹簧组成的机械系统,以生成美观的布局。 节点之间的弹簧力会根据节点之间的距离吸引或排斥它们,从而改变系统能量。 在这些弹簧力作用下,节点可以从初始位置移动,直到达到全局最小能量状态。 随后又有许多工作用于生成美观的图布局,例如 KK 方法以及 Frutherman 和 Reingold 的力导向算法。 FR 算法引入了 spring-electric 模型,作为对 SE 算法的修改:它模拟一种物理过程,其中顶点对应电粒子,边对应遵循胡克定律的弹簧。 在该方法中,相邻节点之间计算吸引力,使它们被拉近;每一对非相邻节点之间计算排斥力,使它们彼此远离。 该算法定义一个理想长度 l,在该长度处这些力会相互抵消;同时定义距离 duvuv 之间的欧氏距离,并按如下方式计算吸引力(fa)和排斥力(fr):

fa(d)=duv2lwhere uv and eu,vEfr(d)=l2duvwhere eu,vE

When the algorithm reaches a state of equilibrium or minimum energy, the layout computed tends to have edges of uniform length l and satisfying distance between separate connected components. The force directed FR algorithm is an iterative process; every iteration moves vertices by computing attractive and repulsive forces until a suitable layout is obtained. In each iteration, the cost to compute repulsive forces between every pair of nodes is O(|V|2) and the cost to compute the attractive forces of every edge is O(|E|). Due to the O(|V|2) repulsive force calculation, this method is computationally expensive for graphs with large numbers of nodes and non-interactive at scale. A number of works have been proposed to address this issue, such as a grid-variant algorithm, Fast Multipole Method, Barnes-Hut approximation, Well-Separated Pair Decomposition, Random Vertex Sampling, and combinations of these. These can reduce the repulsive force computation to O(|V|log|V|), or even O(|V|) for Random Vertex Sampling. BH is one of the most popular techniques among these techniques due to its simplicity. BH utilizes a quadtree data structure to approximate forces between nodes when they are distant from each other, allowing the computation of repulsive forces with O(|V|log|V|) average cost.

当算法达到平衡状态或最小能量状态时,计算得到的布局通常具有长度接近 l 的均匀边,并且不同连通分量之间也保持令人满意的距离。 力导向 FR 算法是一个迭代过程;每次迭代都会通过计算吸引力和排斥力移动顶点,直到获得合适布局。 在每次迭代中,计算每一对节点之间排斥力的代价为 O(|V|2),计算每条边吸引力的代价为 O(|E|) 由于排斥力计算需要 O(|V|2),当图包含大量节点时,该方法计算代价很高,并且在大规模场景下无法保持交互性。 已有许多工作试图解决这一问题,例如网格变体算法、快速多极子方法、Barnes-Hut 近似、well-separated pair decomposition、随机顶点采样,以及这些方法的组合。 这些方法可以将排斥力计算降低到 O(|V|log|V|),随机顶点采样甚至可以达到 O(|V|) 由于简单,BH 是这些技术中最流行的方法之一。 BH 利用四叉树数据结构,在节点距离较远时近似节点之间的力,从而允许以平均 O(|V|log|V|) 的代价计算排斥力。

Prior work has also explored parallelizing graph layout creation. Brinkmann et al. parallelized the ForceAtlas2 algorithm, while Grama et al. and Burtscher and Pingali dealt with parallelizing the quadtree construction and traversal steps necessary for BH. There have also been efforts to improve performance by CPU parallelization, with Tikhonova and Ma using a pre-processing step and grouping of vertices. These works successfully improved the speed and scalability of force directed layout creation, but are limited to GPU programming frameworks like CUDA and OpenCL. We propose a unique parallel implementation of layout creation in the web written completely in the WebGPU shading language (WGSL).

已有工作也探索了图布局生成的并行化。 Brinkmann 等人并行化了 ForceAtlas2 算法,而 Grama 等人以及 Burtscher 和 Pingali 则处理了 BH 所需的四叉树构建与遍历步骤的并行化。 也有工作试图通过 CPU 并行化提升性能,例如 Tikhonova 和 Ma 使用预处理步骤并对顶点分组。 这些工作成功提升了力导向布局生成的速度和可扩展性,但它们受限于 CUDA 和 OpenCL 这类 GPU 编程框架。 作者提出了一种独特的 Web 并行布局生成实现,并且完全使用 WebGPU shading language(WGSL)编写。

2.2. Graph Rendering

Several visualization frameworks provide APIs to render graphs in the web, with some of the most popular being D3.js, Cytoscape.js, and Stardust. Most of these visualization frameworks use Canvas API or SVG to render, which are poor in terms of scalability and performance. The Canvas API performs better than SVG, but the ability of both to handle large-scale data with real time interaction is limited compared to GPU-accelerated technologies. This can be observed in Figure 3. The WebGL API is a more scalable and powerful alternative to Canvas API and SVG but requires users wanting visualizations to create their own rendering pipeline with shader code. In recent years though, new WebGL-based tools provide the necessary abstractions to perform easy rendering of large-scale graphs. Examples of such libraries are Stardust, Sigma.js, and NetV.js. Although WebGL based libraries like NetV.js promise good rendering ability, their overall performance is still limited by the limitations of WebGL, and they must rely on serial graph layout libraries like D3.js for graph generation.

多个可视化框架提供了在 Web 中渲染图的 API,其中最流行的包括 D3.js、Cytoscape.js 和 Stardust。 这些可视化框架大多使用 Canvas API 或 SVG 进行渲染,而它们在可扩展性和性能方面表现较弱。 Canvas API 的性能优于 SVG,但与 GPU 加速技术相比,两者处理大规模数据并保持实时交互的能力都有限。 这一点可以从 图3 中观察到。 WebGL API 是 Canvas API 和 SVG 更具可扩展性、更强大的替代方案,但想要进行可视化的用户需要用 shader 代码创建自己的渲染管线。 不过近年来,新的基于 WebGL 的工具提供了必要抽象,使大规模图的渲染更容易实现。 这类库包括 Stardust、Sigma.js 和 NetV.js。 虽然 NetV.js 这类基于 WebGL 的库承诺具备良好渲染能力,但其整体性能仍受 WebGL 限制,并且必须依赖 D3.js 这类串行图布局库来生成图布局。

2.3. WebGPU

WebGPU is a new graphics API that brings the full capabilities of modern GPUs to the web browser. WebGPU is built from the ground up, complete with its own shading language WGSL. In contrast to WebGL, WebGPU is not a port of an existing native API, though it is designed to easily map to modern APIs like DirectX12, Vulkan, and Metal for performance, and draws inspiration from them in its design. WebGPU provides a number of advantages over WebGL for developing complex compute and rendering applications in the browser. WebGPU enables rendering applications to construct a description of the rendering or compute pipeline state ahead of time, specifying the shaders, input and output data locations, and data layouts, to build a fixed description of the pipeline. Data are fed to the pipeline through bind groups, whose layouts similarly encode ahead of time the structure of the data to be provided to the pipeline, while allowing the buffers being read or written to be changed efficiently at execution time. The pre-configured state stored in the pipeline significantly reduces state configuration overhead costs incurred during execution, while still allowing flexibility as to what data buffers are read or written when the pipeline is executed. Furthermore, WebGPU supports compute shaders and storage buffers, providing unique support for developing GPU algorithms capable of processing large data sets in the browser. Prior work on deep neural networks and scientific visualization has leveraged the capabilities of WebGPU to deploy large-scale GPU parallel compute applications in the browser.

WebGPU 是一种新的图形 API,它把现代 GPU 的完整能力带到了 Web 浏览器中。 WebGPU 是从零开始构建的,并且配备了自己的 shading language WGSL。 与 WebGL 不同,WebGPU 并不是某个现有原生 API 的移植版本;不过为了性能,它被设计为能够轻松映射到 DirectX12、Vulkan 和 Metal 等现代 API,并且在设计上也借鉴了这些 API。 在浏览器中开发复杂的计算和渲染应用时,WebGPU 相比 WebGL 具有许多优势。 WebGPU 允许渲染应用提前构造渲染或计算管线状态的描述,包括指定 shader、输入输出数据位置和数据布局,从而构建固定的管线描述。 数据通过 bind group 输入管线,其 layout 同样会提前编码将提供给管线的数据结构,同时允许在执行时高效改变被读取或写入的 buffer。 管线中保存的预配置状态显著降低了执行期间产生的状态配置开销,同时仍然允许在执行管线时灵活选择读取或写入哪些数据 buffer。 此外,WebGPU 支持 compute shader 和 storage buffer,为在浏览器中开发能够处理大规模数据集的 GPU 算法提供了独特支持。 已有关于深度神经网络和科学可视化的工作利用 WebGPU 能力,在浏览器中部署大规模 GPU 并行计算应用。

3. Implementation

We implement rendering and force-directed layout techniques in our tool, GraphWaGu, a web-based GPU-accelerated library for interacting with large-scale graphs. GraphWaGu utilizes WebGPU to create visualizations from input graphs onto mouse-interactive HTML Canvas elements. GraphWaGu also presents GPU implementations of Fruchterman-Reingold and Barnes-Hut approximate force-directed layout algorithms using WebGPU compute shaders. Together, these features establish a user-friendly prototype for computing, evaluating, and recomputing graph layouts on the web. We structure this section by first discussing the graph rendering in order to introduce how GraphWaGu handles redrawing the changing of node positions that comes with interactivity and force directed layout iteration, then propose our algorithms for computing layouts with and without BH approximation.

作者在 GraphWaGu 中实现了渲染和力导向布局技术;GraphWaGu 是一个基于 Web 的 GPU 加速库,用于与大规模图交互。 GraphWaGu 利用 WebGPU,把输入图可视化到支持鼠标交互的 HTML Canvas 元素上。 GraphWaGu 还使用 WebGPU compute shader,实现了 Fruchterman-Reingold 与 Barnes-Hut 近似力导向布局算法的 GPU 版本。 这些特性共同构成了一个用户友好的原型,可以在 Web 上计算、评估并重新计算图布局。 本节首先讨论图渲染,以介绍 GraphWaGu 如何处理交互和力导向布局迭代中节点位置变化导致的重绘;随后提出在有无 BH 近似时计算布局的算法。

3.1. Graph Rendering

The GraphWaGu approach to graph rendering is built to take advantage of the unique features of WebGPU for web-based graphics using bind groups and large storage buffers. At system initialization, we create rendering pipelines for nodes and edges, with bind groups containing edge and node storage buffers, and then start an animation frame to continuously run these pipelines’ render passes. The edge buffer consists of 2|E| uint32s corresponding to the indices of each edge’s source and target nodes, and the node buffer is filled with 2|V| float32 positions. For view changes such as panning and zooming, events are captured by an HTML canvas controller and new view parameters are written to a uniform buffer, while changes to the node and edge data buffers can be written by WebGPU API calls in the CPU or by shaders running on the GPU.

GraphWaGu 的图渲染方法利用了 WebGPU 在 Web 图形方面的独特特性,尤其是 bind group 和大 storage buffer。 在系统初始化时,作者为节点和边创建渲染管线,其中 bind group 包含边和节点的 storage buffer;随后启动 animation frame,持续运行这些管线的 render pass。 边 buffer 由 2|E| 个 uint32 构成,对应每条边的源节点和目标节点索引;节点 buffer 则填充 2|V| 个 float32 位置。 对于平移和缩放等视图变化,事件由 HTML canvas controller 捕获,新的视图参数会写入 uniform buffer;而节点和边数据 buffer 的变化可以由 CPU 上的 WebGPU API 调用写入,也可以由 GPU 上运行的 shader 写入。

For both pipelines, vertex buffers of one element are used to describe the type of primitive to be drawn; for edges a line and for nodes two triangles in the form of a square. A draw call is made to instance |V| nodes and |E| edges, and in each vertex shader, the instances access their corresponding element in the storage buffers and return its position to the rest of the pipeline. For edges, a color is simply returned in the fragment shader. For nodes, a check around the radius is made and the fragment alpha is computed as 1 minus the sigmoid of the distance of the fragment to the node center to simulate anti-aliasing. There are two main advantages to this approach. First, the position of edges does not have to be pre-computed in order to create and fill a vertex buffer. Second, the storage buffers can be used in other WebGPU pipelines, e.g. to write new node positions each iteration of GraphWaGu force directed compute shaders.

对于两个管线,作者都使用只有一个元素的 vertex buffer 来描述要绘制的 primitive 类型:边对应一条线,节点对应构成正方形的两个三角形。 一次 draw call 会实例化 |V| 个节点和 |E| 条边;在每个 vertex shader 中,实例会访问 storage buffer 中对应的元素,并把其位置返回给管线后续阶段。 对于边,fragment shader 只需返回颜色。 对于节点,shader 会检查半径附近区域,并把 fragment alpha 计算为 1 减去 fragment 到节点中心距离的 sigmoid,以模拟抗锯齿。 这种方法有两个主要优势。 第一,为了创建和填充 vertex buffer,边的位置不必预先计算。 第二,storage buffer 可以被其他 WebGPU 管线使用,例如在 GraphWaGu 力导向 compute shader 的每次迭代中写入新的节点位置。

3.2. Graph Layout Computation

We present algorithms GraphWaGu FR and GraphWaGu BH for Frutcherman-Reingold and Barnes-Hut approximate force directed layouts respectively. Iterations of the algorithms are computed and applied to nodes using WebGPU compute passes. These compute passes are run parallel to the render passes of the GraphWaGu animation frame, so that as new node positions are computed and applied, a new graph is rendered.

作者分别提出 GraphWaGu FR 和 GraphWaGu BH,用于 Frutcherman-Reingold 与 Barnes-Hut 近似力导向布局。 算法的迭代通过 WebGPU compute pass 计算,并应用到节点上。 这些 compute pass 与 GraphWaGu animation frame 的 render pass 并行运行,因此当新的节点位置被计算并应用时,新的图也会被渲染出来。

3.2.1. WebGPU Frutcherman-Reingold

Before running the GraphWaGu FR algorithm, we initialize the compute pipelines needed: adjacency matrix creation, force calculation, and force application with their respective compute shaders and necessary bind group layouts. An overview of the general algorithm used in WebGPU Frutcherman-Reingold is detailed in Algorithm 1. It follows closely to the original force directed algorithm proposed by Frutcherman and Reingold, apart from the use of an adjacency matrix and parallel computation of forces.

在运行 GraphWaGu FR 算法之前,作者会初始化所需的 compute pipeline:邻接矩阵创建、力计算、力应用,以及它们各自的 compute shader 和必要的 bind group layout。 WebGPU Frutcherman-Reingold 所使用的一般算法概述见算法 1。 除使用邻接矩阵和并行计算力之外,它与 Frutcherman 和 Reingold 提出的原始力导向算法高度相似。

Algorithm 1: GraphWaGu FR
  1. Input: G(V,E), coolingFactor
  2. for e in E do
  3. adjacencyMatrix[e.source + e.target * |V|] = 1
  4. adjacencyMatrix[e.target + e.source * |V|] = 1
  5. end for
  6. while coolingFactor ≥ ε do
  7. for i ← 0 to |V| do in parallel
  8. f = 0
  9. v = V[invocationId]
  10. for i ← 0 to |V| do
  11. if adjacencyMatrix[invocationId + i * |V|] then f ← f + fa(v,V[i])
  12. else f ← f + fr(v,V[i])
  13. end if
  14. end for
  15. F[invocationID] = f / |f| * min(coolingFactor, |f|)
  16. end for
  17. for i ← 0 to |V| do in parallel
  18. V[i].position ← V[i].position + F[i]
  19. end for
  20. coolingFactor ← coolingFactor * initialCoolingFactor
  21. end while

An adjacency matrix A for a graph G is defined as a |V| by |V| matrix where A(i,j) is 0 when (i,j) is not in E and 1 when (i,j) is. This structure allows us to check for edges in the force calculation shader in O(1) time to decide whether to compute attractive or repulsive forces for that node. If an adjacency matrix or similar data structure is not used, attractive forces must be computed separately from repulsive forces in O(|E|) time. A concern when using an adjacency matrix is that it can require a large amount of memory, as its size grows exponentially with |V|. To address this, we note that each element of the adjacency matrix requires only one bit to track whether two nodes are connected or not. We create the adjacency matrix with |V|×|V| bits and access individual entries through bitwise operations and shifts in the compute shader.

G 的邻接矩阵 A 被定义为一个 |V||V| 的矩阵,其中当 (i,j) 不在 E 中时 A(i,j) 为 0,当 (i,j)E 中时 A(i,j) 为 1。 这一结构允许作者在力计算 shader 中以 O(1) 时间检查边,从而决定对该节点计算吸引力还是排斥力。 如果不使用邻接矩阵或类似数据结构,吸引力必须与排斥力分开计算,并需要 O(|E|) 时间。 使用邻接矩阵的一个问题是它可能需要大量内存,因为它的大小会随 |V| 平方增长。 为了解决这一点,作者注意到邻接矩阵的每个元素只需要一 bit 就能记录两个节点是否相连。 因此,作者用 |V|×|V| bit 创建邻接矩阵,并在 compute shader 中通过位运算和移位访问单个条目。

Force computation is shown in lines 7-18 of Algorithm 1, where either repulsive or attractive forces are calculated for each vertex to each other vertex using the adjacency matrix, and application is shown in lines 19-21. We parallelize the FR force directed algorithm by dispatching one GPU thread per node, effectively running a parallel for loop over the node buffer and computing or applying forces for each. This method runs into issues when |V| is above 65,535, as that is the maximum supported dispatch size in WebGPU. To address this we create batches of nodes of 50,000, and run a separate compute pass for each batch. However, even with parallelizing the repulsive force computation this method struggles to scale to very high node counts as the exact force computation incurs O(|V|2) cost. This requires us to implement some method of approximating these forces, for which we use BH approximation to achieve O(|V|log|V|).

力计算如算法 1 的第 7-18 行所示,其中使用邻接矩阵为每个顶点相对于其他每个顶点计算排斥力或吸引力;力的应用如第 19-21 行所示。 作者通过为每个节点分派一个 GPU 线程来并行化 FR 力导向算法,这实际上是在节点 buffer 上运行一个并行 for 循环,并为每个节点计算或应用力。 |V| 超过 65,535 时,该方法会遇到问题,因为这是 WebGPU 支持的最大 dispatch size。 为了解决这一问题,作者将节点分成每批 50,000 个,并为每个 batch 运行单独的 compute pass。 然而,即使并行化排斥力计算,由于精确力计算仍需要 O(|V|2) 代价,该方法在极高节点数量下仍难以扩展。 因此作者需要实现某种近似力的方法,并使用 BH 近似来达到 O(|V|log|V|)

3.2.2. WebGPU Barnes-Hut

Algorithm 2: GraphWaGu BH Algorithm
  1. Input: G(V,E), coolingFactor
  2. createSortedEdgeLists()
  3. while coolingFactor ≥ ε do
  4. createQuadTree()
  5. for v ← 0 to |V| do in parallel
  6. for e(i,j) in E with i or j = v do
  7. F[v] ← F[v] + fa(V[i], V[j])
  8. end for
  9. end for
  10. computeRepulsiveForces()
  11. for i ← 0 to |V| do in parallel
  12. V[i].position ← V[i].position + F[i]
  13. F[i] ← 0
  14. end for
  15. coolingFactor ← coolingFactor * initialCoolingFactor
  16. end while

The Barnes-Hut approximation algorithm has multiple challenges to be effectively implemented in parallel systems. The Barnes-Hut algorithm involves building and traversing a quadtree data structure, two tasks typically implemented through recursion, with nodes having pointers to their children, and dynamic allocation storage in a heap to hold the tree. Recursion, pointers, and dynamic allocation at runtime are not supported in programming compute shaders in WebGPU. We present our own methods of building and traversing a quadtree in compute passes on the GPU compatible with the WebGPU API.

Barnes-Hut 近似算法要在并行系统中有效实现,会遇到多个挑战。 Barnes-Hut 算法涉及四叉树数据结构的构建和遍历,而这两个任务通常通过递归实现,节点包含指向子节点的指针,并且需要在堆上动态分配存储来保存树。 WebGPU 的 compute shader 编程并不支持递归、指针以及运行时动态分配。 作者提出了自己的四叉树构建与遍历方法,它在 GPU 的 compute pass 中运行,并兼容 WebGPU API。

Quadtree insertion in GraphWaGu BH
图1:将 4 节点图插入空四叉树。上方展示经典的基于指针和动态分配的四叉树创建过程;左侧则展示 GraphWaGu BH 算法把相同插入序列写入四叉树 buffer 的过程。

Our quadtree implementation utilizes one of the two approaches described by Samet for pointerless representations, treating the quadtree as a collection of its leaf nodes, with directional codes for indexes of its northwest, northeast, southwest, and southeast children. From the same source, we can also obtain the weak upper bound of 6|V| as an appropriate limit to the total number of items in our quadtree, including null pointers. We safely initialize a WebGPU buffer of length 6|V| to contain all items within the quadtree, where each item is a struct with empty attributes for its boundary rectangle, center of mass, mass, and indices of NW, NE, SW, and SE children in the same buffer. This buffer is bound to a compute pipeline which runs at the beginning of each GraphWaGu BH layout iteration. When creating a quadtree, a bounding box must first be computed containing all of the nodes of the input graph. In order to simplify this step and maintain robustness for our float32 node positions, GraphWaGu BH forces all node positions within the range [0,1] by clamping values when forces are applied to change node positions. An example quadtree creation is given by Figure 1 showing this method.

作者的四叉树实现使用了 Samet 描述的两种无指针表示方法之一:把四叉树视为叶节点集合,并用方向编码表示其西北、东北、西南、东南子节点在 buffer 中的索引。 同一来源还给出了 6|V| 这一弱上界,可作为四叉树中项目总数的合适限制,其中包括空指针。 作者安全地初始化一个长度为 6|V| 的 WebGPU buffer,用于容纳四叉树中的所有项目;每个项目都是一个 struct,包含边界矩形、质心、质量,以及同一 buffer 中 NW、NE、SW、SE 子节点索引等空属性。 该 buffer 会绑定到 compute pipeline,并在每次 GraphWaGu BH 布局迭代开始时运行。 创建四叉树时,首先必须计算一个包含输入图所有节点的 bounding box。 为简化该步骤并保持 float32 节点位置的鲁棒性,GraphWaGu BH 在应用力改变节点位置时,会通过 clamp 将所有节点位置限制在 [0,1] 范围内。 图1 给出了一个四叉树创建示例,展示了这一方法。

Barnes-Hut repulsive force traversal in GraphWaGu
图2:GraphWaGu BH 算法中为节点 2 计算排斥力。算法使用与图1相同的图和四叉树 buffer。根节点先被放入 stack 并检查;随后对 cell 与叶节点按距离条件近似或展开,直到 stack 处理完毕。

Once the quadtree is created, attractive and repulsive forces are computed in their own compute pipelines before being applied in the same way as GraphWaGu FR. An overview of the full GraphWaGu BH algorithm is given in Algorithm 2. Attractive forces in this algorithm now have to be computed in O(|E|) time, as an adjacency matrix is not built and repulsive and attractive forces must be computed in separate passes. To speed this up, the algorithm begins by creating sorted lists of the input graph’s edges so that we can parallelize the attractive force computation. This allows us to compute attractive forces for each node in parallel, dispatching a thread for each node which then iterates through the list of edges it is a part of and accumulates attractive forces. The same force buffer is then used for the repulsive force pass, taking in the result of the attractive forces and returning the total force results. In order to traverse the quadtree to compute Barnes-Hut approximate repulsive forces for each node, a large buffer is sent to be used as a pseudo-stack by the repulsive force compute shader described in line 10 of Algorithm 2. An example of this method is shown in Figure 2.

四叉树创建完成后,吸引力和排斥力会先在各自的 compute pipeline 中计算,然后以与 GraphWaGu FR 相同的方式应用。 完整的 GraphWaGu BH 算法概述见算法 2。 由于没有构建邻接矩阵,且排斥力和吸引力必须在不同 pass 中计算,该算法中的吸引力现在需要以 O(|E|) 时间计算。 为加速这一过程,算法首先创建输入图边的排序列表,以便并行化吸引力计算。 这使得作者可以并行计算每个节点的吸引力:为每个节点分派一个线程,该线程遍历该节点所属的边列表,并累积吸引力。 随后同一个力 buffer 会用于排斥力 pass,它接收吸引力结果并返回总力结果。 为了遍历四叉树并为每个节点计算 Barnes-Hut 近似排斥力,作者传入一个大 buffer,供算法 2 第 10 行描述的排斥力 compute shader 作为 pseudo-stack 使用。 图2 展示了这一方法的示例。

4. Evaluation

To evaluate the performance of GraphWaGu, we performed three different sets of experiments comparing our solution with state-of-the-art techniques. In particular, we report: rendering benchmark using synthetic graphs at different scales; layout generation benchmark using synthetic graphs at different scales; and layout creation and rendering of a variety of real graphs.

为了评估 GraphWaGu 的性能,作者进行了三组不同实验,将其方案与先进技术进行比较。 具体而言,作者报告了三类结果:在不同规模合成图上的渲染 benchmark;在不同规模合成图上的布局生成 benchmark;以及多种真实图上的布局生成与渲染。

4.1. Experimental Setup

We ran all experiments on two different system setups with browser Google Chrome Canary: one equipped with AMD Ryzen 4600 CPU, 8GB RAM memory and integrated AMD Radeon C7 GPU; and one equipped with AMD Ryzen 5600 CPU, 16GB RAM memory and a dedicated NVIDIA GeForce RTX 2060 GPU. All the code for the application running the node rendering and layout creation pipelines is written in React, compiled with babel transcompiler, and bundled with webpack module bundler. For both the rendering and layout computation benchmarks, we generated 11 graphs with varying vertex counts, all with completely random edge connectivity and initial node positions using the javascript Math.random function. For these artificial graphs, we maintained an edge to vertex count ratio of 20 to model real-world data. The total vertex count for each graph varies from 102 to 106. To demonstrate that our performance results are not biased by the nature of the synthetic graphs, we also performed an additional evaluation using datasets generated by a wide range of applications and hosted by the SuiteSparse Matrix Collection.

作者在两个不同系统配置上使用 Google Chrome Canary 浏览器运行所有实验:一个系统配备 AMD Ryzen 4600 CPU、8GB RAM 和集成 AMD Radeon C7 GPU;另一个系统配备 AMD Ryzen 5600 CPU、16GB RAM 和独立 NVIDIA GeForce RTX 2060 GPU。 运行节点渲染与布局生成管线的应用代码全部使用 React 编写,由 babel transcompiler 编译,并通过 webpack module bundler 打包。 对于渲染和布局计算 benchmark,作者生成了 11 个顶点数量不同的图,所有图都使用 JavaScript 的 Math.random 函数生成完全随机的边连接和初始节点位置。 对于这些人工图,作者保持边数与顶点数比例为 20,以模拟真实世界数据。 每个图的总顶点数从 102106 不等。 为了证明性能结果不会受到合成图性质的偏置影响,作者还使用 SuiteSparse Matrix Collection 中由广泛应用生成的数据集进行了额外评估。

4.2. Graph Rendering Benchmark

Rendering performance on dedicated GPU
图3:独立 GPU(NVIDIA GeForce RTX 2060)上的渲染性能。GraphWaGu 在超过 10,000 个节点后成为竞争者中表现最好的系统,并能以约 10 FPS 渲染包含 50,000 个节点和 1,000,000 条边的图。
Rendering performance on integrated GPU
图4:集成 GPU(AMD Radeon C7)上的渲染性能。GraphWaGu 在超过 1,000 个节点和 20,000 条边后明显优于其他可视化框架。

We compare the rendering performance of GraphWaGu with other popular web-based graph rendering frameworks: NetV.js, D3.js, G6.js, Cytoscape, Sigma.js, and Stardust. Among these frameworks, D3.js is the most popular due to its robust catalogue of visualization options and built-in libraries for algorithms such as computing force directed graph layouts. However, due to its dependency on the DOM tree and SVG, D3.js cannot effectively handle a large number of graphical marks, especially when running animation. The frameworks NetV.js, Sigma.js, and Stardust are built on top of WebGL, and can therefore utilize the GPU for high performance rendering; however, they do not accelerate graph layout computation. G6.js does not leverage the GPU for rendering nor computation, and has limited scalability for large scale graph rendering as a result.

作者将 GraphWaGu 的渲染性能与其他流行 Web 图渲染框架进行比较,包括 NetV.js、D3.js、G6.js、Cytoscape、Sigma.js 和 Stardust。 在这些框架中,D3.js 最流行,因为它拥有强大的可视化选项目录,并内置了诸如力导向图布局计算等算法库。 然而,由于依赖 DOM tree 和 SVG,D3.js 无法有效处理大量图形标记,尤其是在运行动画时。 NetV.js、Sigma.js 和 Stardust 构建在 WebGL 之上,因此能够利用 GPU 进行高性能渲染;不过它们不会加速图布局计算。 G6.js 既没有利用 GPU 进行渲染,也没有利用 GPU 进行计算,因此在大规模图渲染中的可扩展性有限。

In order to properly stress test the abilities of the rendering libraries chosen, we one-by-one call each to render a completely different graph of the size being tested each frame. We randomly generate the graphs for each frame as detailed in the experimental setup, and do not include graph generation time in the time it takes to complete a frame. Frames per second is calculated for each library after repeating this process for a total duration of five seconds for each graph size. The FPS is then averaged over this five seconds, and this is what we record, capped at 60 FPS since it is the maximum display frequency. We report these FPS results for all libraries on both the dedicated and integrated GPUs in Figure 3 and Figure 4 respectively.

为了充分压力测试所选渲染库的能力,作者逐一调用每个库,在每一帧渲染一个完全不同的、具有测试规模的图。 作者按照实验设置中描述的方法为每帧随机生成图,并且不把图生成时间计入完成一帧所需的时间。 对于每种图规模,该过程总共重复五秒,并据此计算每个库的每秒帧数。 随后作者对这五秒内的 FPS 取平均并记录;由于显示频率上限为 60 FPS,因此记录值被截断在 60 FPS。 作者分别在 图3图4 中报告了所有库在独立 GPU 与集成 GPU 上的 FPS 结果。

On both systems, GraphWaGu outperforms other frameworks and is the most scalable solution. The rendering performance of GraphWaGu is significantly better for the largest scale graphs even when using an integrated GPU, as shown in Figure 4. On the integrated GPU, WebGL based libraries performed similarly to optimized serial libraries up to the highest number of nodes, as there was not much GPU power for them to take advantage of. On the dedicated GPU, WebGL based libraries perform much better, with Stardust maintaining about equal FPS to GraphWaGu until passing 10,000 nodes and 200,000 edges, which is the limit of Stardust graph rendering. For small sized graphs, all frameworks perform well, hence the FPS performance reaches the nominal limit of 60 FPS. For larger graphs, GraphWaGu shows better performance compared to other frameworks. We report that we are able to maintain interactive rendering (≥ 10 FPS) until 100,000 nodes and 2,000,000 edges on the dedicated GPU system, while NetV.js is the only other library that maintains rendering, at 3 FPS.

在两个系统上,GraphWaGu 都优于其他框架,并且是可扩展性最强的方案。 即使使用集成 GPU,GraphWaGu 在最大规模图上的渲染性能也明显更好,如 图4 所示。 在集成 GPU 上,基于 WebGL 的库在最高节点数量之前与优化过的串行库表现相近,因为可利用的 GPU 算力并不多。 在独立 GPU 上,基于 WebGL 的库表现好得多,Stardust 在超过 10,000 个节点和 200,000 条边之前,与 GraphWaGu 保持大致相同的 FPS,而这也是 Stardust 图渲染的极限。 对于小规模图,所有框架表现都很好,因此 FPS 性能达到 60 FPS 的名义上限。 对于更大的图,GraphWaGu 相比其他框架表现更好。 作者报告称,在独立 GPU 系统上,GraphWaGu 能够在 100,000 个节点和 2,000,000 条边之前保持交互式渲染(≥ 10 FPS),而 NetV.js 是唯一还能继续渲染的其他库,但仅有 3 FPS。

4.3. Layout Computation Benchmarks

Layout computation performance on integrated GPU
图5:集成 GPU 上的布局计算性能。在 20,000 个节点之前,GraphWaGu 算法最快;随后集成显卡无法很好扩展力的并行计算。
Layout computation performance on dedicated GPU
图6:独立 GPU 上的布局计算性能。GraphWaGu FR 在 5,000 个节点之前平均迭代时间最佳,之后 GraphWaGu BH 成为最佳方案。

In this section, we evaluate the efficacy of GraphWaGu FR and GraphWaGu BH layout computation functionality. Because technologies like Stardust and NetV.js support only rendering while relying on other libraries to create graph layouts, we evaluate our system only against D3.js’s D3-force layout computation ability. This library utilizes a Barnes-Hut approximation of repulsive forces, so that each iteration of its algorithm is O(|V|log|V|), but it is only a serial implementation created in javascript. For each generated graph, we measured the average iteration time of the force-directed layout algorithm after computing 87 iterations. Because graph rendering and layout computation are coupled in the GraphWaGu system through the setup of WebGPU command encoding, each iteration time includes both layout computation and rendering of the current graph layout on that frame.

本节评估 GraphWaGu FR 和 GraphWaGu BH 的布局计算功能。 由于 Stardust 和 NetV.js 这类技术只支持渲染,并依赖其他库创建图布局,作者只将系统与 D3.js 的 D3-force 布局计算能力进行比较。 该库使用 Barnes-Hut 近似计算排斥力,因此算法每次迭代为 O(|V|log|V|),但它只是 JavaScript 中的串行实现。 对于每个生成的图,作者在计算 87 次迭代后测量力导向布局算法的平均迭代时间。 由于 GraphWaGu 系统中图渲染和布局计算通过 WebGPU command encoding 设置耦合在一起,每次迭代时间都包含该帧中当前图布局的布局计算和渲染。

Figure 5 shows the results for the integrated GPU. We observe that GraphWaGu FR yields the best performance for graphs with relatively fewer number of nodes, however, D3-force outperforms GraphWaGu algorithms for graphs with larger number of nodes. This is because the low computational benefit offered by the low-end dedicated GPU does not offset the additional cost of creating and running the compute pipelines for GraphWaGu BH here. Figure 6 shows the results for the dedicated GPU. For the largest graphs, GraphWaGu BH has the best iteration time, although it grows at the same rate as D3-force iteration time as both have O(|V|log|V|) repulsive force calculation and O(|E|) attractive force calculation cost. Overall, GraphWaGu BH is able to maintain interactive frame rate while computing layout and rendering until 20,000 nodes and 400,000 edges, while D3-force lasts until 2,000 nodes and 40,000 edges, and GraphWaGu FR lasts until 10,000 nodes and 200,000 edges. Also, with the use of GraphWaGu BH, we can scale our layout computation to a graph of 100,000 nodes and 2,000,000 edges while computing its convergence in less than three minutes.

图5展示了集成 GPU 上的结果。 作者观察到,对于节点数相对较少的图,GraphWaGu FR 性能最佳;然而对于节点数更大的图,D3-force 优于 GraphWaGu 算法。 这是因为低端集成 GPU 提供的计算收益不足以抵消创建和运行 GraphWaGu BH compute pipeline 的额外开销。 图6展示了独立 GPU 上的结果。 对于最大的图,GraphWaGu BH 具有最佳迭代时间;不过它与 D3-force 的迭代时间以相同速率增长,因为二者都具有 O(|V|log|V|) 的排斥力计算和 O(|E|) 的吸引力计算代价。 总体而言,GraphWaGu BH 能够在 20,000 个节点和 400,000 条边之前,在计算布局和渲染时保持交互式帧率;D3-force 只能维持到 2,000 个节点和 40,000 条边,GraphWaGu FR 则维持到 10,000 个节点和 200,000 条边。 此外,使用 GraphWaGu BH 后,作者能够把布局计算扩展到包含 100,000 个节点和 2,000,000 条边的图,并在不到三分钟内计算其收敛结果。

4.4. Layout Quality Evaluation

In order to accurately measure the performance of the different layout creation algorithms, a comparison must be made on the quality of outputted graphs. There are many aesthetic metrics available which seek to quantitatively evaluate different ideal features in a 2-dimensional graph. For the sake of this paper, we have chosen the following three metrics to verify the quality of graphs outputted by D3, GraphWaGu FR, and GraphWaGu BH.

为了准确衡量不同布局生成算法的性能,必须比较输出图的质量。 已有许多美学指标用于定量评估二维图中的不同理想特征。 在本文中,作者选择以下三种指标来验证 D3、GraphWaGu FR 和 GraphWaGu BH 输出图的质量。

Edge Uniformity is a measure of the variance of a layout’s edge lengths. We calculate this measure as the normalized standard deviation of edge length:

边均匀性(Edge Uniformity, EU)衡量布局中边长的方差。 作者将其计算为边长的归一化标准差:

EU=eE(lelμ)2|E|lμ2

Because a higher value corresponds to greater standard deviation among edge lengths, a lower value of EU is a better aesthetic measure for a layout. Stress measures the difference between the input graph theoretical distances and output realized 2-dimensional distances for each pair of vertices in a layout. A lower value of stress corresponds to a better layout as it means depicting the input graph more accurately. Neighborhood Preservation evaluates the precision of a layout in maintaining the proximity of nodes when translating from graph space to 2-dimensional space. A higher value of NP means a better layout, with a value of 1 meaning that all vertices preserve all their neighbors.

由于较高数值对应边长标准差更大,因此 EU 越低,布局美学质量越好。 Stress 衡量布局中每一对顶点的输入图理论距离与输出二维距离之间的差异。 Stress 越低,说明布局越能准确描绘输入图,因此布局越好。 邻域保持度(Neighborhood Preservation, NP)评估从图空间转换到二维空间时,布局保持节点邻近关系的精度。 NP 越高表示布局越好,数值为 1 表示所有顶点都保留了全部邻居。

表1:GraphWaGu FR 和 GraphWaGu BH 与 D3 的布局和性能对比。该表原文包含布局图像与平均迭代时间柱状图,因此以裁剪图形式保留论文视觉效果。
Layout and performance comparisons of GraphWaGu FR and GraphWaGu BH with D3
表2:表1中布局的质量指标。粗体表示每个指标的最佳结果:边均匀性(EU)、stress(S)和邻域保持度(NP)。
GraphQuality metric
D3-EUFR-EUBH-EUD3-SFR-SBH-SD3-NPFR-NPBH-NP
Pkustk018.3E-018.3E-017.7E-013.0E+073.6E+072.5E+074.4E-014.3E-014.4E-01
Pkustk028.1E-017.3E-016.9E-011.3E+089.8E+071.1E+083.3E-013.4E-013.3E-01
Finance_2561.3E+009.2E-018.8E-013.9E+083.2E+083.7E+083.4E-024.8E-021.8E-02
ba_network1.1E+001.6E+001.1E+009.1E+067.3E+067.9E+062.9E-027.4E-023.6E-02
Fe_4elt24.8E-016.3E-014.5E-014.2E+072.7E+072.5E+071.3E-011.8E-011.0E-01

As shown in Table 1, we compute the layout of five real graphs from the SuiteSparse Matrix Collection in D3.js, GraphWaGu FR, and GraphWaGu BH on the dedicated GPU. To fairly compare the quality of the layouts being output, and to ensure that they would reach convergence of energy minimization, each algorithm ran for 2000 iterations for each dataset. Each layout was then rendered using GraphWaGu for consistency, and this output is shown. In terms of performance, we observe the same behaviour experienced in our benchmarks, where GraphWaGu outperforms D3.js for small graphs with both approaches and large graphs with the BH approach. Because of this, we compute the quality metrics described above to judge quantitatively the aesthetic standard of all the layouts, and depict the results in Table 2. For edge uniformity, GraphWaGu BH is the winner for all 5 graphs. For both stress and neighborhood preservation, either GraphWaGu FR or GraphWaGu BH has the best score for all datasets, although D3 is comparable for most. Overall, these results point that GraphWaGu creates layouts of equal or better quality then D3 when using both the FR and BH algorithms.

表1 所示,作者在独立 GPU 上使用 D3.js、GraphWaGu FR 和 GraphWaGu BH 为 SuiteSparse Matrix Collection 中的五个真实图计算布局。 为了公平比较输出布局质量,并确保它们达到能量最小化收敛,每种算法在每个数据集上都运行 2000 次迭代。 为保持一致,每个布局随后都使用 GraphWaGu 渲染,表中展示了这些输出。 在性能方面,作者观察到与 benchmark 相同的行为:对于小图,GraphWaGu 的两种方法都优于 D3.js;对于大图,BH 方法优于 D3.js。 因此,作者计算上述质量指标,以定量判断所有布局的美学标准,并在 表2 中展示结果。 对于边均匀性,GraphWaGu BH 在 5 个图上全部胜出。 对于 stress 和邻域保持度,GraphWaGu FR 或 GraphWaGu BH 在所有数据集上都取得最佳分数,尽管 D3 在多数情况下也具有可比性。 总体而言,这些结果表明,在使用 FR 和 BH 算法时,GraphWaGu 可以生成质量等于或优于 D3 的布局。

5. Conclusions

We have presented GraphWaGu, the first WebGPU based graph visualization tool that enables parallel layout computation and rendering for large scale graphs in the browser. We have implemented a WebGPU based layout computation for both Frutcherman-Reingold and Barnes-Hut algorithms. Performance scaling studies have demonstrated that GraphWaGu can achieve the best performance for both small graphs and large graphs when compared with existing state-of-the-art web based visualization libraries. The layout computation and rendering performance of GraphWaGu pave the way for a new generation of web-based graph visualization tools that leverage the full power of the GPU.

作者提出了 GraphWaGu,这是第一个基于 WebGPU 的图可视化工具,使浏览器中大规模图的并行布局计算和渲染成为可能。 作者实现了基于 WebGPU 的 Frutcherman-Reingold 和 Barnes-Hut 两种算法的布局计算。 性能扩展实验表明,与现有先进 Web 可视化库相比,GraphWaGu 在小图和大图上都能取得最佳性能。 GraphWaGu 的布局计算与渲染性能,为新一代基于 Web、能够充分利用 GPU 完整能力的图可视化工具铺平了道路。