Heterophilic Graph Neural Network Based on Spatial and Frequency Domain Adaptive Embedding Mechanism

2024-03-02 01:32LanzeZhangYijunGuandJingjiePeng

Lanze Zhang,Yijun Guand Jingjie Peng

College of Information and Cyber Security,People’s Public Security University of China,Beijing,100038,China

ABSTRACT Graph Neural Networks(GNNs)play a significant role in tasks related to homophilic graphs.Traditional GNNs,based on the assumption of homophily, employ low-pass filters for neighboring nodes to achieve information aggregation and embedding.However, in heterophilic graphs, nodes from different categories often establish connections,while nodes of the same category are located further apart in the graph topology.This characteristic poses challenges to traditional GNNs,leading to issues of“distant node modeling deficiency”and“failure of the homophily assumption”.In response,this paper introduces the Spatial-Frequency domain Adaptive Heterophilic Graph Neural Networks (SFA-HGNN), which integrates adaptive embedding mechanisms for both spatial and frequency domains to address the aforementioned issues.Specifically, for the first problem, we propose the“Distant Spatial Embedding Module”,aiming to select and aggregate distant nodes through high-order random walk transition probabilities to enhance modeling capabilities.For the second issue,we design the“Proximal Frequency Domain Embedding Module”,constructing adaptive filters to separate high and low-frequency signals of nodes,and introduce frequency-domain guided attention mechanisms to fuse the relevant information,thereby reducing the noise introduced by the failure of the homophily assumption.We deploy the SFA-HGNN on six publicly available heterophilic networks,achieving state-of-the-art results in four of them.Furthermore,we elaborate on the hyperparameter selection mechanism and validate the performance of each module through experimentation,demonstrating a positive correlation between“node structural similarity”,“node attribute vector similarity”,and“node homophily”in heterophilic networks.

KEYWORDS Heterophilic graph;graph neural network;graph representation learning;failure of the homophily assumption

1 Introduction

Traditional Graph Neural Networks (GNNs) [1–4] have demonstrated impressive performance in semi-supervised learning tasks related to homophilic graphs.Most GNNs assume that nodes tend to establish connections with strongly homophilic nodes of the same class,known as the homophily assumption [5].Traditional GNNs function as low-pass filters [6,7], and based on the homophily assumption, aggregate feature information from neighboring nodes with similar attributes to create a graph representation that integrates homophilic nodes.These models show robust performance in strongly homophilic networks because the central node and its adjacent nodes often belong to the same class and exhibit significant similarity in attribute vectors,allowing for effective representation during message aggregation[8].

However, the opposite is true in heterophilic networks, where most nodes tend to connect with nodes of different classes and lower similarity in attribute vectors [5,8].As a result, highly homophilic nodes are often located in distant regions from the central node.Traditional GNNs based on the homophily assumption introduce considerable noise to node representations through message passing in heterophilic networks[9,10].This phenomenon is referred to as the“failure of the homophily assumption”[5].Additionally,traditional GNNs focus more on aggregating information from proximal neighborhoods,which leads to inadequate modeling of highly homophilic nodes that are hidden in distant regions.We refer to this issue as the “distant node modeling deficiency” [5].Consequently, models such as MLPs that ignore graph structure can outperform GNNs in some experiments[8].

To address the above issues simultaneously,this paper proposes the SFA-HGNN model.First,to tackle the “distant node modeling deficiency”, we introduce the concept of structural similarity for distant nodes during the structural encoding stage by high-order random walks originating from each node.It can help identify highly homophilic distant nodes.We establish direct connections between the central node and these distant homophilic nodes to facilitate the potential discovery of neighborhoods.Thereby we obtain the results of spatially adaptive embedding via attention mechanisms to integrate the distant node information.Second, to address the “failure of the homophily assumption”, we design an adaptive filter that amplifies differences between nodes using high-pass filtering and preserves common features using low-pass filtering [5].We use the embedding which has similarity of distant attribute vectors embedded in the structural encoding stage as guidance for the frequencydirected attention mechanism, which learns how to fuse high-frequency and low-frequency signals in the proximal neighborhood.This allows the high-pass filter to capture neighborhood differential information and the low-pass filter to capture homophily information.By separating the node’s ego-information from neighborhood information,we prevent the central node’s features from being smoothed by noise.Finally, we merge the distant spatial and proximal frequency node embedding results to accomplish node classification in heterophilic graphs.In addition to this,additional attention should be paid,different from the definition of source and target domains in graph transfer learning[11], this paper adopts the definitions related to spatial and frequency domain methods based on the design principles of graph neural networks.The spatial domain method focuses more on the connection relationships between nodes and their neighboring nodes,as well as the node features.On the other hand,the frequency domain method considers node features as spectral signals and utilizes convolution operations on the spectral signals to achieve information propagation and analysis of graph structures[1].

Specifically,the main contributions of this paper are as following:

• The SFA-HGNN model addresses the challenges of the “distant node modeling deficiency”and the“failure of the homophily assumption”commonly faced by traditional GNNs through the distant spatial embedding module and the proximal frequency embedding module.

• SFA-HGNN has deployed in six common heterophilic networks, achieving state-of-the-art results in four of them.This validates the effectiveness of the proposed model design.Besides,the paper thoroughly discusses the selection mechanism for the hyperparameter set through theory and experiments.

• The paper provides experimental evidence for the positive correlation among“node structural similarity”, “node attribute vector similarity”, and “node homophily”, and demonstrates the advantages of the constructed distant homophilic subgraph in enhancing neighborhood homophily and attribute vector similarity.The paper also proves the advantages of the frequency-directed attention mechanism in adaptive learning of high-frequency and lowfrequency signals in proximal nodes.

2 Related Work

Heterophilyrefers to the phenomenon in which nodes in a graph are more inclined to associate with characteristics of nodes from different classes,contrasting withHomophily[5,7,8].On the other hand,Heterogeneityindicates that nodes or relationship types in the graph belong to more than two classes,as opposed toHomogeneity[12–14].Existing heterophilic graph-oriented neural network design schemes can be categorized into “Non-local Neighbor Extension” and “GNN Architecture Refinement”to solve the problems of effective Neighbor information discovery and fully integrating neighbor information,respectively[5].On this basis,this paper supplements the method of adaptive spatial structure modeling to make full use of the structural roles played by nodes in heterophilic graphs to achieve remote neighbor reconstruction and supplements the information on node structural roles to improve the modeling capability of the model for remote nodes.

2.1 Heterophilic Graph Neural Network Based on Non-Local Neighbor Extension

In the context of the homophily graph message-passing framework,neighboring nodes are typically defined as those reachable from the center node within one hop[15].However,in the heterophilic graph,nodes of the same type that exhibit high structural similarity may possess significant topological distances from each other[16,17].Consequently,information from distant nodes in the heterophilic graph is challenging to aggregate through shallow models based on the homophily assumption.In summary,the Non-local Neighbor Extension,through the following approaches,expands the scope of neighborhood aggregation to non-local nodes:High-order neighbor mixing and Potential neighbor discovery.By doing so, it aggregates crucial features from non-local nodes to address the issue of“distant node modeling deficiency”[5,16,17].

High-order neighbor mixingaims to aggregate information from neighboring nodes within a topological distance of one hop to k hops from the central node, enabling heterophilic GNNs to incorporate potential representations from nodes in various neighborhood orders to obtain node embeddings [5].It defines the kth-order neighborhood asNk(v)= {u:d(v,u)=k}, whered(v,u)represents the graph structural distance between two nodes [18].The MixHop model not only considers two-hop neighborhood message propagation but also encodes other neighborhood information through linear transformations[18].The resulting representations are concatenated and combined to obtain the final node embedding, aiming to complement the homophilic information in heterophilic networks.The H2GCN model starts by theoretically demonstrating that a high level of heterophily within the first-order neighborhoods leads to an increase in homophilic information within the second-order neighborhoods [19].Moreover, it aggregates homophilic information from higher-order neighborhoods during each round of message passing.TDGNN constructs the directly connected subgraphs of each k-order neighborhood and the central node, respectively, parallelizes the message passing, aggregates the homophilic information of the remote nodes, and improves the message passing efficiency at the same time [16].The above-mentioned models are built upon the inherent graph topology, aiming to fuse information from different-order neighborhood nodes to integrate distant homophilic patterns.However, their essence lies in the unfiltered aggregation of potential neighborhoods,which poses the risk of introducing noise and excessive smoothing,making it challenging to effectively transmit valuable information from more remote nodes to the central node[7].

Potential neighbor discoveryaims to leverage the global graph structure and novel neighborhood definitions to uncover“neighbor nodes”with latent homophilic information and subsequently aggregate them [5].Potential neighbors are defined asNp(v)= {u:s(v,u)

2.2 Heterophilic Graph Neural Networks Based on GNN Architecture Refinement

GNN Architecture Refinement is a redesign of the AGGREGATE and UPDATE modules in the traditional message passing framework[15],aiming to fully aggregate the information of neighboring nodes of each order within the connectivity component to amplify the distinguishability between heterophilic node representations,which can be achieved by three methods,namely,Adaptive message aggregation,Ego-neighbor separation,andInter-layer combination[5].

Adaptive Message Aggregationaddresses the issue of “homophily assumption failure”by introducing adaptive edge weights to distinguish between heterophilic and homophilic information from the neighborhood during the message-passing process.In the frequency domain,both the FAGCN[24]and ACM[25]models are built upon the assumption of“High-pass filters amplify differences between nodes,while low-pass filters preserve common node features”.They utilize high-pass filters to model the heterophilic information between nodes and optimize the AGGREGATE module of the model.In FAGCN,an attention mechanism is constructed to adaptively learn the fusion ratio of high-frequency and low-frequency information in the neighborhood [24].On the other hand, ACM simultaneously incorporates low-pass, high-pass, and ego-information filters.It adaptively integrates common and differential information between nodes and their neighborhoods while preserving ego-information to mitigate the loss of original data [25].The above models learn the most accurate embedding representations of heterophilic graph nodes starting from learning the neighborhood difference and homophilic information.In the spatial domain, combining the topological structure of the spatial graph with node classes to assign adaptive edge weights to neighborhood information,WRGNN[26]transforms the original heterophilic graph into a multi-relational graph.This is achieved by modeling heterophilic edges to obtain link weights during the process of message passing[27].

Ego-Neighbor Separationaims to disentangle the central node from the neighborhood information during the“Aggregation”and“Update”process,adopting a non-mixing approach to avoid the impact of averaging heterophilic noise from the neighborhood[5].The H2GCN[19]avoids self-loop connectivity and adopts a non-hybrid approach so that the node representations retain distinguishability after multiple rounds of messaging; the WRGNN [26] imposes different mapping functions for the neighborhood information and the central node ego-information to carry out the messaging.

Inter-Layer Combinationaims to utilize the aggregated results of intermediate representations from each layer of the model to obtain the final node embedding [5].This enables the learning of homophilic information distributed across various neighborhood orders within the heterophilic graph.The JKNet [28] and GCNII [29] models achieve global topological node information aggregation from the perspective of cross-neighborhood information fusion.The JKNet model flexibly captures information from various neighborhood orders, enabling the comprehensive exploration of homophilic information concealed within these diverse neighborhoods.On the other hand, the GCNII model introduces initial information into each layer’s node embedding representation and supplements the weight matrix with an identity mapping.This approach not only facilitates the learning of homophilic information from various neighborhood orders but also effectively preserves the initial ego-information.Both of the above models extend the scope of AGGREGATE to more neighborhood layers, allowing for a comprehensive consideration of homophilic information from distant neighborhoods.

2.3 Heterophilic Graph Neural Network Based on Spatial Structure Modeling

The structural role refers to the structural relationship exhibited by nodes and their neighborhoods in the original graph topology [30,31].Heterophilic graph nodes and their neighborhoods often belong to different classes and exhibit distinct feature vectors.In the local topological structure,these differences manifest as variations in structural roles.Therefore,embedding the attribute information describing the structural roles into feature vectors and reconstructing distant neighborhoods based on this information can result in embeddings of distant nodes that incorporate structural attribute information[31].

Currently, graph neural network models that rely solely on message passing mechanisms can aggregate neighborhood node information based on the original graph topology [15].However, the expressive power of these models is limited by the one-dimensional Weisfeiler-Lehman test (1-WL test)[32]and cannot fully capture the structural roles that nodes play in the topology.

As shown in Fig.1, The above process approximates the node embedding of traditional graph neural networks from the perspective of the 1-WL test.This paper simplifies the 1-WL test process by obtaining the representation of the current node through the addition of labels of its first-order neighboring nodes.Additionally,the term“Node Embedding Tree”refers to a tree structure formed by expanding the first-order neighborhood of each node layer by layer, with the central node as the parent node.The sibling nodes of the node embedding tree are arranged in a counterclockwise order based on the original graph layout.Furthermore,the computation results of the two-order 1-WL test processes with S1/S2 as the center are marked in the vicinity of each node.

In conclusion, it is shown that using simple message passing within n orders alone cannot accurately represent nodes that have the same n-order Nodes Embedding Tree but possess different structural information.Entities S1 and S2 play different structural roles within their respective connected components.S2 acts as a hub node,taking on the task of connecting the other seven nodes.However, in traditional MP-GNN, nodes with similar first-order structures are assigned the same node representation during one round of message passing.This can lead to confusion about the roles these nodes play in the graph structure,making it difficult to effectively model the rich information embedded in structural roles.

Figure 1:Nodes embedding diagram

Recent studies have shown that supplementing MP-GNN with deterministic distance attributes as structural role information can effectively compensate for the shortcomings of traditional graph neural network models in describing node structural roles.DE-GNN [31] incorporates distance encodingζ(u|N)as additional attribute information for nodes to complement the description of the neighborhood structure of labeled nodes.The specific definition is as follows:

ζ(u|v)represents a certain distance metric defined between nodeuand nodev,usually taking the various order relationships between nodes as input.The specific definition is as follows:

W=AD-1is the random walk matrix,Wkis the kth order random walk matrix, and the structure mapping functiong(·)transformsluvinto different types of distance measures.For example,neighborhood structure similaritygrwdescribed by random walk transition probabilities; the node structure hierarchy distributiongspcharacterized by the shortest path length.By computing the distance attributes of each node relative to the central nodeu,the message passing process will obtain a central node’s structural role embedding vector that integrates the topological information of the neighborhood nodes.It can help redefine the way neighborhood nodes are selected,thereby achieving a potential neighborhood discovery mechanism that integrates graph structure information in the spatial domain.

2.4 Heterophilic Noise and Self-Supervised Learning

In their research,Dai et al.addressed the issue of noisy edges and limited node labels and proposed the RS-GNN model [33].The definition of noisy edges in their work is similar to the definition of Heterophilic noise resulting from the “failure of homophily assumption” in our paper.RS-GNN tackles the problem of Heterophilic noise (noisy edges) by incorporating the idea of graph selfsupervised learning.It trains the Link Predictor to assign high message-passing weights to node pairs with similar features and low message-passing weights to pairs with low feature similarity.The selfsupervised learner is then trained through the reconstruction of the adjacency matrix, enabling the reconstruction of weight coefficients and node connectivity to mitigate potential interference caused by heterophilic noise.

However,RS-GNN also has certain limitations.The model assumes that“nodes are more likely to connect with similar nodes,”which serves as the basis for training the Link Predictor based on the adjacency matrix and edge reconstruction task,leading to a stronger emphasis on connecting similar node pairs.However,this assumption does not hold completely in Heterophilic graphs,giving rise to the phenomenon of“failure of homophily assumption”.If we directly train the Link Predictor based on the adjacency matrix of a Heterophilic graph(which exhibits nodes that are more likely to connect with different types of nodes),it may have a negative impact on self-supervised learning.This is because the adjacency matrix of a heterophilic graph inherently contains more heterophilic noise compared to a normal dataset, and constructing a Pretext Task directly based on this may affect the training of the Link Predictor.In the following sections, we will address the issue of heterophilic noise from a perspective more suitable for highly heterophilic graph data.

3 Prior Knowledge

3.1 Basic Definition

Predefinition:A∈RN×Nrepresents the adjacency matrix without self-loops of an undirected graphG(V,E),whereVdenotes the set of nodes andErepresents the set of edges.The normalized graph Laplacian matrix is defined asL=In-D-1/2AD-1/2,whereinD∈RN×Nis the diagonal degree matrix,Di,i= Σj Ai,j, andInis a diagonal matrix.In summary,Lis a real symmetric matrix equipped with mutually orthogonal eigenbasis vectorseach corresponding to the eigenvalueλl∈[0,2].Thus,the symmetric normalized form of the Laplacian matrix can be expressed asL=UΛUT,whereΛ=diag([λ1,λ2,···,λn])andU=[u1,u2,···,un].

Graph Fourier Transform:From the theory of graph signal processing[33],can be used as an orthogonal basis for the graph Fourier transform,and the Fourier transform of the signalxcan be defined as=UTxand the Fourier inverse transform asx=U,which defines the convolution operation ∗Gbetween the signalxand the convolution kernelfas follow:

where ⊙denotes the element-wise multiplication between vectors.The frequency domain convolution kernelgθis typically represented as a diagonal matrix,used to simplifyUTf.

Adjacency Matrix of Each Order:In this paper,we defineAras the rth-order adjacency matrix of the central node asnodeseeds.ARrepresents the set of Rth-order adjacency matrices within a connected component,encompassing the adjacency relationships between nodes of different orders.The specific definition is as follows:

3.2 Homophily Measurement Metrics

Relevant work has shown that the relationship between node labels and graph structure can serve as a metric for graph homophily[8].In this paper,we choose edge homophily and node homophily as measures to assess the intrinsic homophilic information within the graph data as follows:

The range of values for the above metrics is [0,1].A higher value of the metric reflects a graph structure with stronger homophilic information,while a lower value indicates a dominant presence of heterophilic information.

4 Model

4.1 SFA-HGNN Model Framework

The SFA-HGNN model is structured as follows,as shown in Fig.2.

Figure 2:Model structure diagram

Data Input:The model takes the node feature vectorsH∈RN×Fand adjacency matrixA∈RN×Nof the heterophilic graphHG(V,E)as input.

Node Embedding:The node embedding process involves two main components:the Distant Spatial Embedding Module and the Proximal Frequency Embedding Module:

Distant Spatial Embedding Module:This module focuses on creating a mechanism for potential neighborhood discovery that incorporates structural information.It starts by embedding the shortest path lengths and random walk transition probabilities from the neighborhood to the central node into the attribute vectors of neighboring nodes.The high-order random walk transition probabilities originating from the central node serve as a measure of homophily information.This helps prioritize the selection of highly homophilic distant nodes and establishes direct connections(“high-speed link”)between the“new neighborhood nodes”and the central node.This process creates a distant homophilic subgraph.Graph attention mechanisms are then applied to obtain node representations of the central node within this distant subgraph, which become the spatial embedding results, capturing both the structural role and the homophily information of distant nodes.

Proximal Frequency Embedding Module:This module aims to achieve adaptive message aggregation using frequency-domain methods to integrate effective attribute information from proximal neighborhoods.Leveraging high-pass filters to amplify differences between nodes and low-pass filters to preserve shared node features, an adaptive filter is designed to select high-frequency and lowfrequency signals of the central node.A frequency-directed attention mechanism is introduced,incorporating prior information from the similarity of attribute vectors of distant nodes.This guides the fusion of high-frequency and low-frequency signals to get the frequency-domain embedding of proximal nodes in the heterophilic graph.

Node Classification:The embeddings from the two modules are concatenated and fused to produce a node embedding with both spatial and frequency-domain adaptability.After passing through fully connected layers and applying thesoftmax,the model outputs the node classification results.

4.2 Node Embedding

4.2.1 Distant Spatial Adaptive Embedding Module

While theHigh-order neighbor mixingmethod helps embed homophilic information from higherorder neighborhoods for node representation in heterophilic graphs, aggregating all high-order neighborhoods without filtering can lead to compromise information quality and increase the risk of over-smoothing.On the other hand, thePotential neighbor discoverymethod lacks the ability to learn node structures, resulting in an insufficient utilization of graph topology during the process of potential neighborhood discovery and a disconnection between information aggregation and the graph structure.Given these considerations, the key to this module’s design is the targeted selection of distant homophily information in the graph topology guided by nodes’structural information.The specific design is as follows:

The structural encoding mechanism focuses on embedding the topological attributes of all nodes within a connected component relative to the central node.Simultaneously,it uses high-order random walk transition probabilities originating from the central node to selectively identify highly homophilic distant nodes.A virtual high-speed link is established between these selected nodes and the central node,creating a direct connected subgraph.Within this subgraph,message passing based on attention mechanisms enables the fusion of both topological roles and distant homophily information resulting in spatial embeddings.

Structural Encoding:This module aims to describe the topological information of a central node’s local neighborhood in the attribute vectors.As shown in the Fig.3,S1 within the connected component exhibits tighter internal connections, while S2 plays a crucial role in connecting the two branches.Consequently,they assume distinct structural roles.However,traditional graph neural network models based solely on simple message passing mechanisms struggle to effectively differentiate between them.

If only first-order neighborhood nodes are used for message passing between S1 and S2, they would yield the same node embedding results.However,by encoding structural information into the attribute vectors,such as using the shortest path length(SE)as an example,where the color of nodes reflects the distance from the central node based on the shortest path length, it becomes possible to express the structural role of the central node based on the topological information provided by background nodes.This encoding of structural information allows for node classification and differentiation.

Figure 3:Structural encoding diagram

In this paper,the shortest path length and random walk transition probability are used to describe the node role structure.The specific definition is as follows:

The above nodeuis the central node, the nodevis the background node inu’sconnected component,andAR=(A1,A2,...,AR)is the R-order adjacency matrix set ofu’sconnected component,reflecting the correlation between nodes of different orders;gspandgrw(W)are used to obtain the shortest path distance between the nodes and the migration probability of the higher-order random walks,these are concatenated to obtainζ(v|u)the structural information of the nodevwith respect to the central nodeu.The specific definitions are as follows:

As shown above,gspcharacterizes the shortest path distance from nodevto nodeuby one-hot coding, and sets a thresholdmax_spto limit the dimension of the embedding vector;grwcalculates the higher-order random walk migration probability between nodesuandvbased on the random walk matrixWin an order-by-order manner; In this paper,εis defined as the radius of effective message transmission, which is used as a hyperparameter to predefine the farthest hop of effective message transmission in the message transmission process,and thus define the higher-order random walk intervaln∈[ε,2ε],so as to embed the higher-order random walk migration probability vector ofε+1 dimensions for nodev.

Definition and Description of Hyperparameters:The abovemax_spandnare hyperparameters,which can be dynamically adjusted according to the a priori information, and are defined asεand[ε,2ε]which are analyzed as follows:

(1) According to the definition ofε, it can be seen that the distance between the neighborhood nodes and the central node within the connectivity component, so the shortest path length between nodesuandvis less than or equal toε,and therefore max_sp=ε;

(2)According to the definition of random walk,it can be seen that the low-order random walk is restricted by the distribution of proximal nodes,more inclined to assign higher migration probability to the proximal nodes,there is a numerical instability phenomenon,and the process of the walk is not converged to take care of the distal nodes and the lack of global information description.

At the same time, whenε≤nthe random walk will traverse all the background nodes in the connected component,in order to make the distal nodes be fully described by the random walk and reduce the influence of the unstable value of the low-order random walk, so as to ensure that the information expressed is focused on the structural similarity between the distal nodes and the central nodes,so this paper will set the hyper-parameter ton∈[ε,2ε]initially.

Distal Homophilic Subgraph Samplingaims at mining homophilic information relative to the distal end of the central node, and guarantees the quality of homophilic information with the orientation of random walk migration probability.In this paper, we design the sampling mechanism of distal nodes, which is different from the aggregation of distal nodes without screening in the high-order neighbor mixing method,and we utilize a higher-order migration probability measure of homophily between nodes based on random walks originating from a central node.Using this as a basis,it ranks distant nodes and selectively prioritizes those with higher migration probabilities,indicating stronger similarity which ensures the quality of information sampled from nodes.

As shown in Fig.4, The article establishes the distal node threshold asη.It considers neighborhoods that are beyondηhops from the central nodenodeseedsto be distal nodes.Thekth-order random walk migration probabilitys koriginating fromnodeseedsis then employed to gauge its potential similarity with these distal nodes.The definition is outlined as follows:skrepresents the migration probability vector ofnodeseedswith respect to the other nodes afterkthorder random walk migration.To ensure convergence of the random walk process and comprehensive access to distal nodes,aggregation of migration probabilities is applied only to distal nodes where the distanced(u,v)≥ηand falls within the range of[ε,2ε].This aggregated migration probability results in the distal node structural similarity denoted asS.The definition is presented as follows:

The algorithm aggregates migration probabilities only for the higher-order random walk segment,specifically within the range ofεto 2ε,ensuring a comprehensive description of structural information for distal nodes with respect toS, Subsequently, sampling is conducted based on the priority of structural similarity, where the sampling process for distal nodes is denoted astopk(S), involving the selection of the topkdistal nodes ranked byS.To maintain the quality of selected distal node information,a hyperparameterSample_Accountis introduced to determine the sampling proportion of distal nodes and to limit the number ofk.

Virtual High-Speed Linkaims to redefine the selection methodNfarfor neighboring nodes based on the structural similarityS.Simultaneously,the consideration of distal homophilic subgraphs focuses on higher-order information to reduce the interference from initial-stage random walks favoring nearby nodes.Therefore,higher-order migration probabilities are employed as the metric for structural similarity,offering enhanced numerical stability.Given that most nodes in the distant and central node lack direct connections, the establishment of a “virtual direct high-speed link”between distal nodes andnodeseedsis necessary to construct a distal homophilic subgraph denoted asand it enhances information transmission efficiency.The definition is outlined as follows:

Distal Spatial Adaptive Message Passing:As mentioned earlier,using the virtual high-speed link,a shallow subgraph structure has been established between the central node and the distal nodes.Therefore, this article adopts a single-layer graph attention mechanism to complete the embedding of distal information,detailed as follows:

MLP refers to a single-layer feedforward neural network.It takes the linear transformation results of the feature vectors of nodesiandjas input, generating importance coefficients, denoted aseij.Subsequently, applyingsoftmaxyields the attention coefficients,aijbetween the nodes.This process facilitates information fusion within the distal homophilic subgraph based on an attention mechanism.To summarize,the message passing enables the spatial embedding results,denoted aswhich combine attribute information of distal nodes with their structural roles in the graph.

4.2.2 Proximal Frequency Adaptive Embedding Module

Highly heterophilic nodes often have direct neighbors in different categories.However, graph neural networks based on the homophilic assumption lead to node representations being smoothed by the heterophilic information from neighboring nodes through low-pass filtering,thereby reducing the discriminative power of node representations.To address this issue,it is crucial to design high-pass filters that capture the differences between node representations and the heterophilic information from their neighborhoods and design low-pass filters ensure that common information among neighboring nodes is adequately learned.

In this study, an adaptive filter is defined to separate the low and high-frequency components contained in node features.Leveraging the prior information of similarity between attribute vectors of distal nodes,a frequency-domain guided attention mechanism is introduced to learn how to aggregate high-frequency and low-frequency signals in the graph, achieving adaptive message aggregation.Thereby we can obtain the result of the frequency domain adaptive embedding of the proximal node.

Adaptive Filter:The adaptive filter is employed to separate the low-frequency and high-frequency components of node features.Drawing on graph signal processing theory and inspired by the relevant definitions in GCN [3], the forms of low-frequency and high-frequency convolutional kernels are defined as follows:

β∈[0,1]serves as an adaptive filter coefficient,allowing for the adjustment of the scaling degree of convolutional kernels for different frequency bands.To investigate the effects of convolutional kernels on signals in different frequency domains while avoiding the influence of numerical value,we define the second-order norm functionsfor the convolutional kernels,as illustrated in Fig.5.

Figure 5:Filter effect diagram

As shown in the above illustration, whenλ >1,whenλ <<1,Therefore,the high-frequency convolutional kernel designed in this study amplifies the high-frequency components of the graph signal more effectively while suppressing the low-frequency components compared to the GCN convolutional kernel.Similarly,the low-frequency convolutional kernel designed in this study has advantages in amplifying low-frequency signals and suppressing high-frequency signals compared to GCN.

According to Fourier transform theory,in the spatial domain,the convolution of a filter with a signal, denoted asf∗G x, is essentially equivalent to performing element-wise multiplication in the spectral domain after Fourier transforming both the filter and the signal, followed by an inverse Fourier transform to bring the result back to the spatial domain.Based on Eq.(1), the following definitions can be derived:

Therefore, by substituting the defined high-frequency and low-frequency convolutional kernels from this paper into the above equation, we can derive the forms of the adaptive filter for highfrequency and low-frequency components as follows:

According to the above equation,the low-pass filter designed in this study essentially aggregates node and neighborhood features in a specific proportion, leading to a gradual convergence of node representations.On the other hand,the high-pass filter amplifies the differences between nodes and their neighborhoods,resulting in high-frequency representations that differ from the neighborhood.Both the low-pass and high-pass filters are collectively defined as the adaptive filter in this study.They operate on node features, amplifying either the commonality or distinctiveness between nodes and their neighborhoods,thus capturing intrinsic high-frequency and low-frequency information in node representations.

Frequency-Domain Aggregation Mechanism:Considering that the heterophily between each node and its neighborhood varies,embedding solely based on fixed-frequency domain information for all nodes in the graph could distort node representations.Therefore, learning the fusion of node highfrequency and low-frequency representations is essential to achieve adaptive embedding results in the frequency domain.The frequency-domain aggregation mechanism designed in this study is presented as follows:

Defining the node features asH= {h1,h2,···,hN} ∈ RN×F, and settingandas parameters for adaptive learning of the fusion proportions of node high-frequency and low-frequency representations,thereby we obtain the frequency-domain embedding result for nodes,denoted as.By simplifying the calculation process based on Eqs.(25) and (26), the final form of the frequencydomain aggregation mechanism is as follows:

Frequency-Domain Guided Attention Mechanism:FAGCN emphasizes the necessity for prior knowledge,such as homophily information,to guide the selection of high-frequency or low-frequency signals during the message-passing process [24].This requirement presents challenges for semisupervised learning models and underscores the need for a mechanism that can adequately perceive graph homophily and employ it to guide the adaptive aggregation of high-frequency and low-frequency signals.Hence, in this study, a Frequency-Domain Guided Attention Mechanism is introduced,as described below:

To ensure thateffectively captures the actual demands of nodes for high-frequency and lowfrequency information, a frequency-domain guided informationprior, is defined in this work.This prior information is introduced to assist the attention mechanism,denoted asFAttention,in learning the graph signal’s frequency-domain preference,.The specifics are detailed as follows:

In this study,the average attribute vector similaritysimibetween nodeiand its homophilic distal nodes is employed as the basis for measuring its high-frequency preference.To ensure normalization and numerical stability,simallis defined to calculate the mean of all nodesimivalues.Thus, the frequency-domain guided informationprioris defined as follows:

In heterophilic graph,when node’ssimiis larger,its homophily with proximal neighbors decreases.Therefore, the process of embedding neighborhood information should focus on learning highfrequency information that amplifies the differences between the node and its proximal neighbors,i.e.,Thus,whensimi > simall,priori <0 and providing negative guidance for node embedding,leadingwG

ijto take smaller values,emphasizing the learning of high-frequency information from proximal neighbors.Conversely, whenpriori >0, it encouragesto take larger values,leaning towards learning the low-frequency commonality from proximal neighbors.In summary,the Frequency-Domain Guided Attention Mechanism,denoted asFAttention,is defined as follows:

As shown above,the concatenation operation||allows the attention mechanism to consider both the node itself and its neighboring nodes’feature information.The MLP facilitates the transition of the concatenated vector dimension from 2F to 1.Additionally, the introduction ofpriorenables the utilization of the prior knowledge fromsimto guide the learning of frequency-domain preferences during message propagation within proximal neighborhoods.Thetanh()activation ensures that∈[-1,1],supporting the learning of inter-node distinctiveness information during the message-passing process.

Proximal Frequency-Domain Adaptive Message Passing:Drawing inspiration from the design of FAGCN[24],this study employs the linear transformation of input features as the initial information for each layer.This separation of self-information and neighborhood information retains the original node features during the message passing process,alleviating the impact of excessive smoothing.Based on the node representation process defined by Eq.(31), the proximal frequency-domain adaptive message passing in this study is formulated as follows:

The described equation shows the weight parametersW1andW2, whereFlrepresents the dimension of the hidden layer, anddenotes the proximal frequency-domain adaptive node embedding result.From this analysis,we can infer that the time complexity for a single-layer message passing process isO((N+|E|)×Fl).

4.3 Nodes Classification

The study concatenates the proximal frequency-domain embeddingwith the distant spatialdomain embeddingto obtain a node representation that combines both spatial and frequencydomain adaptiveness.Subsequently,it undergoes a fully connected layer(FC)to transform the vector dimension to match the node classification dimension.Finally, theSoftmaxfunction is applied to output the classification result, and the model is trained using the cross-entropy loss function.The overall process is outlined as follows:

5 Experiments

5.1 Datasets

In this paper,we deploy the SFA-HGNN on six publicly available heterogeneous graph datasets,which are described as shown in Table 1.

Table 1: Statistics and properties of benchmark datasets with heterophily

TheChameleon(Cha) andSquirrel(Squ) datasets consist of subgraphs extracted from the“Wikipedia” web pages.In these datasets, nodes represent web pages related to specific topics,while edges represent the mutual connections between these pages.Node features correspond to the information nouns found on these web pages,and the web pages are classified into 5 classes based on their monthly visitation rates.

TheFilmdataset is a subgraph extracted from the movie-director-actor-author relationship network.Where each node corresponds to an actor, the edge between two nodes represents both appearing on the same Wikipedia page at the same time, and the node features represent keywords on the Wikipedia page,and all the nodes are classified into 5 classes based on the type of participant.

Cornell(Cor),Texas(Tex)andWisconsin(Wis):the above datasets are three subsets of the WebKB dataset collected at CMU,which represent links between the corresponding university web pages.In these networks,nodes represent web pages,edges are hyperlinks between them,and node features are bag-of-words representations of web pages,while all the nodes mentioned above are classified into five classes:students,programs,courses,staff,and faculty.

5.2 Baselines

In this paper,we refer to the framework outlined by Zheng et al.[5].For heterophilic graph neural networks and select a set of representative models from various classes as the baseline networks,which are classified as follows:

Firstly, the concept of“Non-local Neighbor Extension”refers to the approach of high-order neighbor mixing or potential neighbor discovery to identify same-class nodes in the heterophilic graph that may not be directly connected to the central node but share high structural similarity.This helps introduce high homophilic information to the central node, thereby improving the quality of node representations.The core idea of the distal node spatial embedding module in SFA-HGNN is similar to this concept.To assess the practical effectiveness of this module, we have selected the following baseline models for comparative analysis:

MixHop[18]andH2GCN[19]are representatives ofhigh-order neighbor mixing,which accomplish node embedding by aggregating the information of neighboring nodes within multiple hops from the central node.

Geom-GCN[20],GPNN[23], andNode2Seq[22] are representatives ofpotential neighbor discovery.They define the similarity between distant nodes and the central node based on geometric relationships in latent space, attention scores from pointer networks, or sequence orders based on attention scores.These methods prioritize the fusion and embedding of highly similar nodes.

Secondly,Graph Neural Network Architecture Refinementis to optimize the traditional GNN message aggregation and updating mechanism in order to obtain a more distinguishable node representation.Its common methods includeAdaptive message aggregation,Ego-neighbor separation,andInter-layer combination.The enhanced model’s Adaptive message aggregation mechanism aligns with the proximal embedding module in this paper.We select the following baseline models to compare and analyze the practical effectiveness of this module.

FAGCN[24] andWRGNN[26] are representatives of Adaptive Message Aggregation, in which FAGCN defines high-pass and low-pass filters from the frequency domain perspective and learns the fusion ratio of the above filtering results; WRGNN models the relational edges of the heterophilic graph transformations from the spatial domain perspective, and aggregates the nodes with more significant link weights.

JK-Net[28] is a representative ofinter-layer combination, which takes the embeddings from the successive layers as input for final information fusion, enabling adaptive fusion learning across different neighborhood orders.

Thirdly, in this paper, we select frequency-domain and spatial-domain graph neural networks based on the homophily assumption to compare the effectiveness of our proposed model in heterophilic graphs.The specific models are as follows:

GCN[3] andSGC[34] are frequency-domain graph neural networks, whileGAT[35] andGraphSage[4]are spatial-domain graph neural networks.

Fourthly,MLPis a feedforward neural network based on node attributes,aiming to contrast the actual effects of homophily assumptions and message-passing mechanisms in heterophilic graphs[36].

5.3 Experimental Settings

Basic Parameter Settings:For training SFA-HGNN, we set the number of epochs to 500 and employ an early stopping strategy with a threshold of 200 epochs.The model parameters are adjusted based on the cross-entropy loss and accuracy of the validation set, and Adam is selected as the optimizer.The hyperparameters set for the above datasets are as follows:the hidden layer dimensions are chosen from {16, 32, 64}, learning rates are selected from {0.01, 0.005, 0.001}, dropout rates for each layer are set to {0.4, 0.5, 0.6}, and weight decay values are specified as {5E-4, 1E-4, 5E-5}.Additionally,the dataset is split into training,validation,and test sets using a ratio of 60%/20%/20%.

Key Parameter Settings:For training SFA-HGNN,we adjust the number of layers in the proximal node frequency domain embedding module to define the proximal node threshold, denoted asK∈{2,3};sets the distal node threshold asη∈{2,3};specifies the effective information propagation radius asε∈{3,4,5};sets the adaptive filter coefficient asβ∈{0.3,0.4,0.5};and defines the sample account asSample_Account∈ {2%,4%,5%,10%,20%}.The detailed rationale for these hyperparameter choices can be found in Section 5.6 of this paper.

The remaining baseline models were configured according to the parameter settings used by these papers[20,23,25,26],and others in the same experimental environment.The average predictive results were demonstrated on the test sets obtained from 10 random splits of each dataset.To ensure a fair comparison, the experimental procedures and settings were consistent with the implementation approach used by Pei et al.[20]and others.For all datasets,uniform initial feature vectors and labels were employed in the experiments.

The software environments used for the experiments in this paper are Pytorch,Pytorch Geometric,and Python 3.8.The hardware environments used are GPU RTX 3090(2 GB);CPU Intel(R)Xeon(R)Gold 6330@2.00 GHz;and RAM 80 GB.

5.4 Experiment Result

The experimental results of the model on the above six datasets are shown in Table 2, and this paper demonstrates the average accuracy and standard deviation under 10 different sets of randomized data splits.

Table 2: Experiments results

Table 3: Experiments results

Table 4: Hyper parameters set

Table 5: Hyper parameters set

As shown in Tables 2 and 3,the optimal experimental results for each dataset are shown in bold,and the experimental results are analyzed as follows:

SFA-HGNN integrates the core ideas of Adaptive Message Aggregation and Potential Neighbor Discovery, and designs adaptive filter and oriented frequency domain attention mechanism to introduce frequency domain adaptivity for Heterophilic message passing;At the same time,we design structural coding and distal homophilic subgraph sampling to embed the rich structural information represented by random wandering migration probability for the nodes,and use it as a guide to mine the distal high homophilic nodes as a supplement to the embedding,so as to obtain the node embedding results with both spatial-frequency domain adaptivity.And the model mitigates the effects of noise and over-smoothing introduced by the High-order Neighbor Mixing model’s unfiltered aggregation of distal nodes;compared with the Potential Neighbor Discovery model,it fully combines the structural role information of nodes,and avoids the problem of separating the classification of nodes from the topology of the graph.

According to the experimental results,SFA-HGNN achieves SOTA results in CorWisTexFilm;compared with the GCN withHomophily Assumption,it has an average performance enhancement of 19.66%,which proves the validity of the motivation of the design of this model;

Compared with theGNN Architecture Refinement,the advantage of this model is that it complements the homophily information of the distal nodes and filters out the effective information of the proximal nodes by combining with the frequency domain adaptivity.As a result.As a result,the model has an average performance improvement of 6.41%and 8.56%compared to WRGNN and FAGCN based on spatial or frequency domain methods alone;

Comparing withPotential Neighbor Discovery, the advantage of this model is that it closely combines the process of potential neighbor discovery with structural information and supplements the node embedding with the learning results of the frequency domain adaptivity of the proximal nodes,which results in 2.77% and 16.5% improvement compared with the models based on the sequential ordering only,GPNN and Node2seq.

Compared withHigh-order Neighbor Mixing, the advantage of this model lies in the targeted acquisition of distal nodes with high homophily through structural similarity,which reduces the risk of introducing heterophilic information noise and over-smoothing,and the frequency-domain adaptive method to aggregate the proximal neighborhood’s effective information.As a result,the experimental effect is improved by 12.97% and 9.08% compared with the methods of MixHop and H2GCN that directly aggregate the nodes of each order.

Meanwhile,GPNN achieves excellent experimental results in the Chameleon and Squirrel datasets because theHnodeof the two datasets are 0.25 and 0.22, respectively, compared with the rest of the datasets,they have slightly higher homophily and relatively dense edges;

GPNN samples the initial neighborhood nodes and generates the initial node sequences through the BFS algorithm,and then carries out further learning on the node sequences,so the mechanism will tend to learn the proximal node sequences in dense graphs, and the above two datasets can provide relatively rich proximal isomorphism information and dense edges for GPNN to train the pointer network,thus achieving better experimental results in the above datasets.

To demonstrate the effectiveness of the node embedding and classification in this study, we performed t-SNE visualization analysis on the node embedding results of the Wisconsin and Texas datasets after 100 training epochs.The following figure,Fig.6,illustrates this analysis:

As shown in the Fig.6, the SFA-HGNN model achieves distinct separation between different classes of nodes with a significant inter-class distance and a compact intra-class distance after just 100 training epochs.This layout formation indicates that SFA-HGNN effectively discriminates between different classes.

Figure 6:T-SNE results

5.5 Parameter Sensitivity Analysis

5.5.1 Selection of Network Prior Information and Hyperparameter Range

This section sets the core hyperparameter value range in combination with prior information,defined as follows:

Effective information transmission radiusε:Considering the networkdiameterreflects the longest distance between two nodes within the connected components, it is hard to represent the main transmission path in the message transmission process.Therefore, this paper uses the average value of the networkRadiusand the average shortest path lengthAspas the basis for selectingε.This can reflect the actual path length of message transmission among many nodes and ensure that the sampled subgraph contains ample information from distant nodes.The specific definition is as follows:

Distal node thresholdηand proximal node thresholdK:The number of each-order neighboring nodes originating from the central node is defined asHopNum.Then,the similarity of attribute vectors between these nodes and the central node isFsimilarity, along with the count of homophilic nodesHomoNumand the proportion of homophilic nodesHomoAccount.Based on the trends of these indicators with the change in the sampling order, they are used as the basis for selecting the abovementioned hyperparameters.The relevant definitions are as follows:

Distal node sampling ratio (Sample_Account): To achieve a relatively balanced information density for both near and distal message transmission,the distal node sampling ratio is set.Ensuring a dynamic balance between the number of nodes on the near and far sides while maintaining the quality of homophilic information.The specific definition is as follows:

In summary,the relevant experiments are carried out by taking the validation set nodes defined in the previous section,and the results of the above a priori information calculations are shown below:

As shown in Fig.7 and combined with Eq.(41), this study defines the preliminary range forεasε∈{3,4,5,6},ensuring thatεis slightly larger thanAspto ensure sufficient sampling of far-end effectiveness information.

As shown in Fig.7,the fluctuations in“HomoNum”for various datasets with respect to the order are quite noticeable.For the Wis/Tex/Cor dataset,the number of homophilic nodes reaches its peak at the 2nd order,followed by a decreasing trend.The Film/Squ/Cha dataset exhibits an initial increase followed by a decrease,with the peaks mainly concentrated at the 3rd and 4th orders.Notably,Cha is relatively more stable compared to other datasets.Given that the aforementioned datasets fall into five categories,this paper sets 20%as the threshold for determining the strength of effective information at each order.

Figure 7:Dataset prior information

As shown in Fig.7, the fluctuations in “HomoAccount”for the Wis/Tex/Cor dataset are more pronounced,but strong overall homophilic information across different orders.For the Film/Squ/Cha dataset,“HomoAccount”remains relatively stable,with a general fluctuation around 20%.

The overarching principle for setting the range of“K”and“η”is as follows:Using the proximal node threshold“K”to define the proximal neighborhood range with the relatively higher“HomoAccount”and more significant homophilic node count(“HomoNum”)to mitigate noise interference from neighboring node message propagation.

Similarly, using the distal node threshold “η” to define the far-end node range with a lower“HomoAccount”and a relatively smaller “HomoNum”, thereby constructing a far-end homophilic subgraph to enhance the quality of information aggregation.In summary,this paper tentatively selectsK∈{1,2,3}andη∈{2,3,4}.

As shown in Fig.7,nodes from Wis/Tex/Cor/Squ are primarily concentrated at the 3rd order,while nodes from the Film/Squ dataset are predominantly found at the 4th order.Considering the definitions ofHopNum,“K”and“η”,this study tentatively sets the distal node sampling ratioSample_Account∈{1%,1.5%,2%,4%,5%,10%,20%,25%} to ensure a relative balance in information density between the near and far ends.

5.5.2 Parameter Experimental Analysis

This section aims to integrate the aforementioned prior information with the intrinsic characteristics of the dataset.We design four sets of hyperparameters for each dataset, adjusting the information density of proximal and distal node connections.These sets are specifically tailored to embed information for nodes at close, intermediate,middle-distance, and far distances.The specific parameter designs are shown in Tables 4 and 5.

The effective information propagation radiusε,and the threshold for distal nodesη,are employed to constrain the upper and lower bounds of distal nodes.And sample a specific proportion of nodes bySample_Account.Hence,whenηis broader andSample_Accountis higher,the model tends to incorporate information from distal nodes into the central node embedding via spatial domain methods.

Conversely,the proximal node thresholdK,serves as an upper limit for proximal nodes.A larger value ofKimplies a broader coverage of proximal node information, leading the model to favor embedding proximal node information into the central node via frequency domain methods.

Based on the aforementioned combination of parameters, the experimental results are shown in Fig.8a.Additionally, employing the aforementioned optimal parameters, sensitivity analysis is conducted on the adaptive filter coefficientβacross various datasets,as shown in Fig.8b:

Figure 8:Sensitivity analysis

As shown in Fig.8,the optimal hyperparameters for the aforementioned datasets predominantly cluster around the mid-distance range.This suggests that embedding a balanced combination of proximal neighborhood information and distal homophily information can lead to superior representation.Specifically,the Cor/Tex/Wis datasets are particularly sensitive to changes in the primary information embedding method.Overemphasizing either proximal or distal information can result in a decline in model performance.Conversely,the experimental results for the Film dataset are relatively stable,showing low sensitivity to changes in the embedding of proximal or distal information.The Cha/Squ datasets obtain their optimal results around mid-distance embedding parameters, with the Squirrel dataset showing a preference for spatial domain methods to aggregate distal node information for optimal node representation.

As shown in Fig.8, compared to other datasets, Cor/Wis/Tex exhibit higher sensitivity to variations in the adaptive filter coefficientβ.Optimal results across datasets converge aroundβ=0.3,proving the notion that moderate scaling of the convolution kernel across different frequency bands is conducive to learning the best embedding representation.Ifβis excessively big,it may diminish the learning capability for certain frequency bands,leading to information loss.

5.6 Ablation Experiment

5.6.1 Frequency Domain Embedding Module Ablation Analysis

The aim of this section is to adapt the information fusion method in the message aggregation process by modifying Eq.(37)as follows to verify the effectiveness of frequency domain adaptivity for learning proximal nodes:

As described above,a linear transformation is applied to the concatenated vectors of dimension 2F using a feedforward neural networkMLPto obtain an information fusion scalar.The weight coefficientis defined to eliminate the“frequency domain orientation informationprior”.Additionally,andare introduced, incorporating two nonlinear activation functions to fix the edge weight coefficients positively during the message-passing process, thereby canceling the frequency domain adaptivity.

For the experiments conducted on the aforementioned datasets,the optimal parameter set defined in Section 5.6.1 is employed.The data is randomly partitioned and the experiment is repeated 10 times to obtain results,presenting the average values across all datasets.Other settings remain consistent with Section 4.3.The experimental outcomes are shown in Fig.9.

Figure 9:Frequency domain module ablation results

As shown in Fig.9,the introduction of the frequency domain orientation informationpriorincorporates the similarity between the attribute vectors of the central node and its distant neighborhood.This reflection captures the central node’s preferences for proximal and distal information.Based on the magnitude of these preference indicators,the embedding process selectively learns high-frequency and low-frequency signals from the proximal neighborhood.As a result,an average of 73.55%optimal experimental outcomes is achieved across all datasets.Compared to the sole utilization of the nonlinear activation function tanh to achieve frequency domain adaptivity in,there is a notable experimental enhancement of 2.0%.

On the other hand,andmap the information fusion weights to positive values,effectively making the message-passing process equivalent to a weighted averaging of neighborhood information.This could introduce the influence of proximal noise during the heterophilic graph neighborhood aggregation process.Consequently,when compared to the experimental results obtained by introducingprior,andexhibit reduced performance by 5.2%and 4.0%,respectively.

5.6.2 Spatial Embedding Module Ablation Analysis

To validate the effectiveness of the spatial domain embedding module in capturing distal node homophily information, a comparison is conducted among three sets of neighboring nodes: firstorder neighborhood nodes,the top 5 neighborhood nodes sampled based on attention scores from the Node2Seq model,and distal nodes selected using random walk transition probabilities.The relative homophily measure,Hnode, with respect to the central node is computed for these sets.Utilizing the optimal hyperparameters designed in Section 5.6.2, a distal homophily subgraph is constructed for all nodes in each dataset.Within this subgraph,the distal nodes associated with the central node via virtual high-speed links are used to calculateHnode.

Moreover,Zhu et al.[19]theoretically established that second-order neighbor homophily information dominates in heterophilic network nodes.Therefore,this study adopts the second-order neighbor attribute vector similarity,Fsimilaritytwohop,as a baseline.Additionally,the attribute vector similarity,Fsimilarityfar, between the central node and distal nodes is calculated.By comparing these two measures,the model’s preferred nodes are validated to possess not only homophily but also attribute vector similarity.The results of the two aforementioned experiments are presented in Fig.10.

Figure 10:Spatial domain module ablation results

As depicted in Fig.10, this study utilizes a node selection mechanism guided by random walks to model node structural similarity through transition probabilities.This mechanism filters out distal homophilic nodes distributed across the heterophilic graph.This approach,compared to node selection based on attention scores, effectively leverages the inherent topological information of the graph,leading to enhanced node selection results.Consequently,compared to 1-Hop Neighbors and Rank with Attention,the average improvement is 22%and 9.6%,respectively.

Based on the earlier discussion, second-order neighbors predominantly exhibit central node homophily information.Therefore, it serves as the baseline for highlight the relationship between the feature vectors of the selected distal nodes and the central node.As shown in Fig.10b, across all datasets,the distal nodes exhibit an improvement over second-order neighbor similarity, with an average improvement of 0.102.This experiment validates that the random walk transition probabilities originating from the central node effectively capture the varying strengths of both node homophily and attribute vector similarity between nodes.

Simultaneously,adhering to the optimal parameter settings defined in Section 5.6.1,in the Wis,Cor,and Tex datasets,theSample_Accountis set to{2%,4%,5%,10%,20%}.And compute the distal nodehomophilyfarand distal neighborhoodFsimilarityfar, for variousSample_Account.This process aims to verify the relationship between higher-order random walk transition probabilities and the similarity and homophily information of sampled nodes,as shown in Fig.11.

Figure 11:Results of correlation verification

As mentioned earlier,Sample_Accountis defined as the sampling ratio based on the priority of node high-order random walk transition probability scores.As this value increases, the structural similarity between the sampled distant nodes and the central node decreases.As shown in the figure above, bothHomophilyfarandFsimilarityfardecrease asSample_Accountincreases, and their overall trends are similar.This experimental result further confirms the hypothesis of our paper:a sampling mechanism guided by higher-order random walk transition probabilities tends to prioritize the selection of distal nodes with high attribute vector similarity and a significant homophily.Additionally,there appears to be a positive correlation between node homophily,attribute vector similarity,and the score of higher-order random walk transition probabilities(i.e.,node structural similarityS).

6 Conclusion and Future Work

This paper introduces the SFA-HGNN model to address the challenges that traditional GNNs face when applied to heterophilic graphs,specifically the issues of“missing modeling of distal nodes”and“failure of homophily assumption”.To tackle the former,SFA-HGNN employs a“distal spatial embedding module” based on higher-order random walk transition probabilities to sample and aggregate information from distal nodes with high structural similarity,thereby enhancing the model’s ability to capture distal node characteristics.To address the latter, the “proximal frequency domain embedding module”is designed to adaptively learn high and low-frequency signals from proximal nodes to fuse valuable information, reducing noise interference introduced by the failure of the homophily assumption on the low-pass filters.

The paper concludes by demonstrating the excellent performance of SFA-HGNN in heterophilic network node classification tasks, explaining the theoretical mechanisms behind hyperparameter selection and the effectiveness of each module.The positive correlation among node attribute vector similarity,node homophily,and node structural similarity is validated through experiments.However,the model still has room for improvement.For instance, the structural encoding process essentially is pre-embedding node information, and its time complexity is closely related to the complexity of network nodes and edges.Future work could focus on further improving the model in response to this challenge.

Acknowledgement:We are grateful to the editors and reviewers for their helpful suggestions, which have greatly improved this paper.

Funding Statement:This work is supported by the Fundamental Research Funds for the Central Universities(Grant No.2022JKF02039).

Author Contributions:The authors confirm contribution to the paper as follows: study conception and design: Lanze Zhang, Yijun Gu; data collection: Lanze Zhang, Jingjie Peng; analysis and interpretation of results: Lanze Zhang, Jingjie Peng; draft manuscript preparation: Lanze Zhang,Jingjie Peng.All authors reviewed the results and approved the final version of the manuscript.

Availability of Data and Materials:If the SFA-HGNN model and relevant experimental data are needed, readers are advised to contact the author, and the author will provide you with help in the first place.

Conflicts of Interest:The authors declare that they have no conflicts of interest to report regarding the present study.