Neo4j图数据库学习笔记
一、安装
参考:Neo4j 第一篇:在Windows环境中安装Neo4j
安装照这篇教程走就可以了,注意环境路径的配置和JAVA-Neo4j版本的适配。
二、入门
2.1 Neo4j浏览器
Neo4j服务器具有一个集成的浏览器,在启动neo4j服务之后,可以使用neo4j集成的浏览器管理图数据库。
在一个运行neo4j服务器主机上访问 "http://localhost:7474/"
- 默认host:bolt://localhost:7687
- 默认用户:neo4j
- 默认密码:neo4j
第一次成功connect到Neo4j服务器之后,需要重置密码。
2.2 创建节点和关系
通过Cypher脚本代码,点击Play按钮,可以在图数据库中创建节点和关系:
2.2.1 创建节点
// 创建单个节点
CREATE (p:Person {name: '张三', age: 30, city: '北京'})
// 创建多个节点
CREATE (p1:Person {name: '李四', age: 25, city: '上海'}),
(p2:Person {name: '王五', age: 35, city: '广州'}),
(c1:Company {name: '阿里巴巴', industry: '互联网'}),
(c2:Company {name: '腾讯', industry: '互联网'})
2.2.2 创建关系
// 在创建节点的同时创建关系
CREATE (p:Person {name: '赵六', age: 28})-[:WORKS_FOR {since: 2020}]->(c:Company {name: '字节跳动'})
// 为已存在的节点创建关系
MATCH (p:Person {name: '张三'}), (c:Company {name: '阿里巴巴'})
CREATE (p)-[:WORKS_FOR {position: '高级工程师', since: 2019}]->(c)
// 创建朋友关系
MATCH (p1:Person {name: '张三'}), (p2:Person {name: '李四'})
CREATE (p1)-[:FRIEND_OF {since: '2018-01-01'}]->(p2)
2.2.3 查询数据
// 查询所有节点
MATCH (n) RETURN n
// 查询特定类型的节点
MATCH (p:Person) RETURN p
// 查询带关系的数据
MATCH (p:Person)-[r:WORKS_FOR]->(c:Company)
RETURN p.name, r.position, c.name
// 条件查询
MATCH (p:Person)
WHERE p.age > 30
RETURN p.name, p.age
2.3 批量导入结构化数据
2.3.1 通过LOAD CSV批量导入节点
准备CSV文件(persons.csv):
name,age,city,email
张三,30,北京,zhangsan@email.com
李四,25,上海,lisi@email.com
王五,35,广州,wangwu@email.com
赵六,28,深圳,zhaoliu@email.com
导入命令:
// 方法1:从本地文件导入
LOAD CSV WITH HEADERS FROM 'file:///persons.csv' AS row
CREATE (p:Person {
name: row.name,
age: toInteger(row.age),
city: row.city,
email: row.email
})
// 方法2:从URL导入
LOAD CSV WITH HEADERS FROM 'https://example.com/data/persons.csv' AS row
CREATE (p:Person {
name: row.name,
age: toInteger(row.age),
city: row.city,
email: row.email
})
// 方法3:使用MERGE避免重复
LOAD CSV WITH HEADERS FROM 'file:///persons.csv' AS row
MERGE (p:Person {name: row.name})
SET p.age = toInteger(row.age),
p.city = row.city,
p.email = row.email
公司数据导入(companies.csv):
name,industry,founded,location
阿里巴巴,互联网,1999,杭州
腾讯,互联网,1998,深圳
字节跳动,互联网,2012,北京
华为,通信,1987,深圳
LOAD CSV WITH HEADERS FROM 'file:///companies.csv' AS row
CREATE (c:Company {
name: row.name,
industry: row.industry,
founded: toInteger(row.founded),
location: row.location
})
2.3.2 批量建立关系
准备关系数据(relationships.csv):
person_name,company_name,position,start_year
张三,阿里巴巴,高级工程师,2019
李四,腾讯,产品经理,2020
王五,华为,技术专家,2018
赵六,字节跳动,算法工程师,2021
建立关系:
LOAD CSV WITH HEADERS FROM 'file:///relationships.csv' AS row
MATCH (p:Person {name: row.person_name})
MATCH (c:Company {name: row.company_name})
CREATE (p)-[:WORKS_FOR {
position: row.position,
start_year: toInteger(row.start_year)
}]->(c)
朋友关系数据(friendships.csv):
person1,person2,relationship_type,since
张三,李四,同事,2019-01-01
李四,王五,朋友,2018-06-15
王五,赵六,校友,2020-03-10
LOAD CSV WITH HEADERS FROM 'file:///friendships.csv' AS row
MATCH (p1:Person {name: row.person1})
MATCH (p2:Person {name: row.person2})
CREATE (p1)-[:FRIEND_OF {
type: row.relationship_type,
since: date(row.since)
}]->(p2)
2.3.3 删除重复的节点和关系
// 查找重复的人员节点
MATCH (p:Person)
WITH p.name as name, collect(p) as persons
WHERE size(persons) > 1
UNWIND tail(persons) as duplicate
DELETE duplicate
// 查找重复的关系
MATCH (p1:Person)-[r:WORKS_FOR]->(c:Company)
WITH p1, c, collect(r) as rels
WHERE size(rels) > 1
UNWIND tail(rels) as duplicate_rel
DELETE duplicate_rel
// 删除没有关系的孤立节点
MATCH (n)
WHERE NOT (n)--()
DELETE n
// 清理所有数据
MATCH (n)
DETACH DELETE n
2.3.4 全图可视化
// 显示所有节点和关系
MATCH (n)-[r]->(m)
RETURN n, r, m
// 显示特定深度的关系网络
MATCH path = (p:Person)-[*1..2]-(connected)
WHERE p.name = '张三'
RETURN path
// 显示公司的员工网络
MATCH (c:Company)<-[r:WORKS_FOR]-(p:Person)
WHERE c.name = '阿里巴巴'
RETURN c, r, p
// 显示朋友圈网络
MATCH path = (p:Person)-[:FRIEND_OF*1..3]-(friend:Person)
WHERE p.name = '张三'
RETURN path
2.3.5 可视化升级——NeoVis
NeoVis.js是一个基于vis.js的Neo4j图可视化库。
HTML示例:
<!DOCTYPE html>
<html>
<head>
<title>Neo4j可视化</title>
<script src="https://unpkg.com/neovis.js@2.0.2"></script>
<style>
#viz {
width: 100%;
height: 700px;
border: 1px solid lightgray;
font: 22pt arial;
}
</style>
</head>
<body>
<div id="viz"></div>
<script>
var config = {
container_id: "viz",
server_url: "bolt://localhost:7687",
server_user: "neo4j",
server_password: "your_password",
labels: {
"Person": {
"caption": "name",
"size": "age",
"community": "city"
},
"Company": {
"caption": "name",
"size": "founded",
"community": "industry"
}
},
relationships: {
"WORKS_FOR": {
"caption": "position",
"thickness": "start_year"
}
},
initial_cypher: "MATCH (n)-[r]-(m) RETURN n,r,m"
};
var viz = new NeoVis.default(config);
viz.render();
</script>
</body>
</html>
三、Python连接Neo4j图数据库
3.1 安装Python驱动
pip install neo4j
pip install pandas # 用于数据处理
pip install matplotlib # 用于可视化
3.2 基本连接
from neo4j import GraphDatabase
import pandas as pd
class Neo4jConnection:
def __init__(self, uri, user, password):
self.driver = GraphDatabase.driver(uri, auth=(user, password))
def close(self):
self.driver.close()
def query(self, query, parameters=None, db=None):
assert self.driver is not None, "Driver not initialized!"
session = None
response = None
try:
session = self.driver.session(database=db) if db is not None else self.driver.session()
response = list(session.run(query, parameters))
except Exception as e:
print("Query failed:", e)
finally:
if session is not None:
session.close()
return response
# 建立连接
conn = Neo4jConnection(uri="bolt://localhost:7687",
user="neo4j",
password="your_password")
3.3 基本CRUD操作
3.3.1 创建数据
def create_person(conn, name, age, city):
"""创建人员节点"""
query = """
CREATE (p:Person {name: $name, age: $age, city: $city})
RETURN p
"""
parameters = {"name": name, "age": age, "city": city}
result = conn.query(query, parameters)
return result
def create_company(conn, name, industry, founded):
"""创建公司节点"""
query = """
CREATE (c:Company {name: $name, industry: $industry, founded: $founded})
RETURN c
"""
parameters = {"name": name, "industry": industry, "founded": founded}
result = conn.query(query, parameters)
return result
def create_work_relationship(conn, person_name, company_name, position, start_year):
"""创建工作关系"""
query = """
MATCH (p:Person {name: $person_name})
MATCH (c:Company {name: $company_name})
CREATE (p)-[:WORKS_FOR {position: $position, start_year: $start_year}]->(c)
RETURN p, c
"""
parameters = {
"person_name": person_name,
"company_name": company_name,
"position": position,
"start_year": start_year
}
result = conn.query(query, parameters)
return result
# 使用示例
create_person(conn, "Python张", 28, "北京")
create_company(conn, "Python公司", "软件开发", 2020)
create_work_relationship(conn, "Python张", "Python公司", "Python开发工程师", 2021)
3.3.2 查询数据
def get_all_persons(conn):
"""获取所有人员"""
query = "MATCH (p:Person) RETURN p.name as name, p.age as age, p.city as city"
results = conn.query(query)
# 转换为DataFrame
data = []
for record in results:
data.append({
'name': record['name'],
'age': record['age'],
'city': record['city']
})
return pd.DataFrame(data)
def get_person_company_relationships(conn):
"""获取人员和公司关系"""
query = """
MATCH (p:Person)-[r:WORKS_FOR]->(c:Company)
RETURN p.name as person_name,
r.position as position,
r.start_year as start_year,
c.name as company_name,
c.industry as industry
"""
results = conn.query(query)
data = []
for record in results:
data.append({
'person_name': record['person_name'],
'position': record['position'],
'start_year': record['start_year'],
'company_name': record['company_name'],
'industry': record['industry']
})
return pd.DataFrame(data)
def find_colleagues(conn, person_name):
"""查找同事"""
query = """
MATCH (p:Person {name: $person_name})-[:WORKS_FOR]->(c:Company)<-[:WORKS_FOR]-(colleague:Person)
WHERE colleague.name <> $person_name
RETURN colleague.name as colleague_name, c.name as company_name
"""
parameters = {"person_name": person_name}
results = conn.query(query, parameters)
colleagues = [record['colleague_name'] for record in results]
return colleagues
# 使用示例
persons_df = get_all_persons(conn)
print(persons_df)
relationships_df = get_person_company_relationships(conn)
print(relationships_df)
colleagues = find_colleagues(conn, "张三")
print(f"张三的同事: {colleagues}")
3.3.3 更新数据
def update_person_age(conn, name, new_age):
"""更新人员年龄"""
query = """
MATCH (p:Person {name: $name})
SET p.age = $new_age
RETURN p.name as name, p.age as age
"""
parameters = {"name": name, "new_age": new_age}
result = conn.query(query, parameters)
return result
def update_work_position(conn, person_name, company_name, new_position):
"""更新工作职位"""
query = """
MATCH (p:Person {name: $person_name})-[r:WORKS_FOR]->(c:Company {name: $company_name})
SET r.position = $new_position
RETURN p.name as person_name, r.position as position, c.name as company_name
"""
parameters = {
"person_name": person_name,
"company_name": company_name,
"new_position": new_position
}
result = conn.query(query, parameters)
return result
# 使用示例
update_person_age(conn, "张三", 31)
update_work_position(conn, "张三", "阿里巴巴", "资深工程师")
3.3.4 删除数据
def delete_person(conn, name):
"""删除人员(包括相关关系)"""
query = """
MATCH (p:Person {name: $name})
DETACH DELETE p
"""
parameters = {"name": name}
result = conn.query(query, parameters)
return result
def delete_work_relationship(conn, person_name, company_name):
"""删除工作关系"""
query = """
MATCH (p:Person {name: $person_name})-[r:WORKS_FOR]->(c:Company {name: $company_name})
DELETE r
"""
parameters = {"person_name": person_name, "company_name": company_name}
result = conn.query(query, parameters)
return result
def clear_all_data(conn):
"""清空所有数据"""
query = "MATCH (n) DETACH DELETE n"
result = conn.query(query)
return result
3.4 批量数据操作
3.4.1 批量导入
def batch_create_persons(conn, persons_data):
"""批量创建人员"""
query = """
UNWIND $persons as person
CREATE (p:Person {
name: person.name,
age: person.age,
city: person.city,
email: person.email
})
"""
parameters = {"persons": persons_data}
result = conn.query(query, parameters)
return result
def batch_create_relationships(conn, relationships_data):
"""批量创建关系"""
query = """
UNWIND $relationships as rel
MATCH (p:Person {name: rel.person_name})
MATCH (c:Company {name: rel.company_name})
CREATE (p)-[:WORKS_FOR {
position: rel.position,
start_year: rel.start_year
}]->(c)
"""
parameters = {"relationships": relationships_data}
result = conn.query(query, parameters)
return result
# 使用示例
persons_data = [
{"name": "批量用户1", "age": 25, "city": "北京", "email": "user1@example.com"},
{"name": "批量用户2", "age": 30, "city": "上海", "email": "user2@example.com"},
{"name": "批量用户3", "age": 35, "city": "广州", "email": "user3@example.com"}
]
relationships_data = [
{"person_name": "批量用户1", "company_name": "阿里巴巴", "position": "工程师", "start_year": 2020},
{"person_name": "批量用户2", "company_name": "腾讯", "position": "产品经理", "start_year": 2019}
]
batch_create_persons(conn, persons_data)
batch_create_relationships(conn, relationships_data)
3.4.2 从DataFrame导入
def dataframe_to_neo4j(conn, df, node_label, id_column):
"""将DataFrame数据导入Neo4j"""
# 转换DataFrame为字典列表
records = df.to_dict('records')
# 动态构建Cypher查询
properties = ', '.join([f"{col}: row.{col}" for col in df.columns])
query = f"""
UNWIND $records as row
MERGE (n:{node_label} {{{id_column}: row.{id_column}}})
SET n += {{
{properties}
}}
"""
parameters = {"records": records}
result = conn.query(query, parameters)
return result
# 使用示例
# 假设有一个包含员工信息的DataFrame
employee_df = pd.DataFrame({
'employee_id': [1, 2, 3],
'name': ['员工A', '员工B', '员工C'],
'department': ['技术部', '产品部', '市场部'],
'salary': [50000, 60000, 55000]
})
dataframe_to_neo4j(conn, employee_df, 'Employee', 'employee_id')
3.5 图算法和分析
3.5.1 路径查找
def find_shortest_path(conn, start_person, end_person):
"""查找两人之间的最短路径"""
query = """
MATCH path = shortestPath((start:Person {name: $start_person})-[*]-(end:Person {name: $end_person}))
RETURN path,
length(path) as path_length,
[node in nodes(path) | node.name] as node_names,
[rel in relationships(path) | type(rel)] as relationship_types
"""
parameters = {"start_person": start_person, "end_person": end_person}
result = conn.query(query, parameters)
return result
def find_all_paths(conn, start_person, end_person, max_depth=5):
"""查找两人之间的所有路径"""
query = f"""
MATCH path = (start:Person {{name: $start_person}})-[*1..{max_depth}]-(end:Person {{name: $end_person}})
RETURN path,
length(path) as path_length,
[node in nodes(path) | node.name] as node_names
ORDER BY path_length
"""
parameters = {"start_person": start_person, "end_person": end_person}
result = conn.query(query, parameters)
return result
# 使用示例
shortest_path = find_shortest_path(conn, "张三", "李四")
all_paths = find_all_paths(conn, "张三", "李四", 3)
3.5.2 中心性分析
def calculate_degree_centrality(conn):
"""计算度中心性"""
query = """
MATCH (p:Person)
OPTIONAL MATCH (p)-[r]-()
WITH p, count(r) as degree
RETURN p.name as person_name, degree
ORDER BY degree DESC
"""
result = conn.query(query)
return result
def find_influential_people(conn, min_connections=2):
"""查找有影响力的人(连接数多)"""
query = """
MATCH (p:Person)-[r]-()
WITH p, count(r) as connections
WHERE connections >= $min_connections
RETURN p.name as person_name,
connections,
p.city as city
ORDER BY connections DESC
"""
parameters = {"min_connections": min_connections}
result = conn.query(query, parameters)
return result
# 使用示例
degree_centrality = calculate_degree_centrality(conn)
influential_people = find_influential_people(conn, 3)
3.5.3 社区检测
def find_communities(conn):
"""简单的社区检测(基于公司分组)"""
query = """
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WITH c.name as company, collect(p.name) as employees
WHERE size(employees) > 1
RETURN company, employees, size(employees) as community_size
ORDER BY community_size DESC
"""
result = conn.query(query)
return result
def find_friend_circles(conn, person_name):
"""查找朋友圈"""
query = """
MATCH (center:Person {name: $person_name})-[:FRIEND_OF*1..2]-(friend:Person)
WITH center, collect(DISTINCT friend.name) as friends
RETURN center.name as center_person, friends, size(friends) as circle_size
"""
parameters = {"person_name": person_name}
result = conn.query(query, parameters)
return result
# 使用示例
communities = find_communities(conn)
friend_circles = find_friend_circles(conn, "张三")
3.6 数据可视化
import matplotlib.pyplot as plt
import networkx as nx
from matplotlib.font_manager import FontProperties
# 设置中文字体
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号
def visualize_network(conn, output_file='network.png'):
"""可视化网络图"""
# 获取节点数据
nodes_query = """
MATCH (p:Person)
RETURN p.name as name, p.age as age, p.city as city
"""
nodes_result = conn.query(nodes_query)
# 获取关系数据
edges_query = """
MATCH (p1:Person)-[r]-(p2:Person)
RETURN p1.name as source, p2.name as target, type(r) as relationship_type
"""
edges_result = conn.query(edges_query)
# 创建NetworkX图
G = nx.Graph()
# 添加节点
for record in nodes_result:
G.add_node(record['name'],
age=record['age'],
city=record['city'])
# 添加边
for record in edges_result:
G.add_edge(record['source'],
record['target'],
relationship=record['relationship_type'])
# 绘制图形
plt.figure(figsize=(12, 8))
pos = nx.spring_layout(G, k=2, iterations=50)
# 绘制节点
nx.draw_networkx_nodes(G, pos,
node_color='lightblue',
node_size=1000,
alpha=0.7)
# 绘制边
nx.draw_networkx_edges(G, pos,
edge_color='gray',
alpha=0.5)
# 绘制标签
nx.draw_networkx_labels(G, pos,
font_size=10)
plt.title('社交网络图')
plt.axis('off')
plt.tight_layout()
plt.savefig(output_file, dpi=300, bbox_inches='tight')
plt.show()
# 使用示例
visualize_network(conn, 'social_network.png')
3.7 性能优化
3.7.1 创建索引
def create_indexes(conn):
"""创建索引提高查询性能"""
indexes = [
"CREATE INDEX person_name_index FOR (p:Person) ON (p.name)",
"CREATE INDEX company_name_index FOR (c:Company) ON (c.name)",
"CREATE INDEX person_email_index FOR (p:Person) ON (p.email)",
"CREATE CONSTRAINT person_name_unique FOR (p:Person) REQUIRE p.name IS UNIQUE"
]
for index_query in indexes:
try:
conn.query(index_query)
print(f"成功创建: {index_query}")
except Exception as e:
print(f"创建失败: {index_query}, 错误: {e}")
def show_indexes(conn):
"""显示所有索引"""
query = "SHOW INDEXES"
result = conn.query(query)
return result
# 使用示例
create_indexes(conn)
indexes = show_indexes(conn)
3.7.2 查询优化
def optimized_person_search(conn, name_pattern):
"""优化的人员搜索"""
query = """
MATCH (p:Person)
WHERE p.name CONTAINS $name_pattern
RETURN p.name as name, p.age as age, p.city as city
ORDER BY p.name
LIMIT 10
"""
parameters = {"name_pattern": name_pattern}
result = conn.query(query, parameters)
return result
def batch_query_with_explain(conn, query, parameters=None):
"""带执行计划的查询"""
explain_query = f"EXPLAIN {query}"
profile_query = f"PROFILE {query}"
print("执行计划:")
explain_result = conn.query(explain_query, parameters)
print("性能分析:")
profile_result = conn.query(profile_query, parameters)
# 执行实际查询
actual_result = conn.query(query, parameters)
return actual_result
# 使用示例
search_result = optimized_person_search(conn, "张")
# 分析查询性能
test_query = "MATCH (p:Person)-[:WORKS_FOR]->(c:Company) RETURN p.name, c.name"
result = batch_query_with_explain(conn, test_query)
3.8 实战案例:推荐系统
class RecommendationSystem:
def __init__(self, connection):
self.conn = connection
def recommend_friends(self, person_name, limit=5):
"""基于共同朋友推荐新朋友"""
query = """
MATCH (person:Person {name: $person_name})-[:FRIEND_OF]-(friend)-[:FRIEND_OF]-(recommend)
WHERE person <> recommend
AND NOT (person)-[:FRIEND_OF]-(recommend)
WITH recommend, count(friend) as mutual_friends
RETURN recommend.name as recommended_person,
mutual_friends,
recommend.city as city
ORDER BY mutual_friends DESC
LIMIT $limit
"""
parameters = {"person_name": person_name, "limit": limit}
result = self.conn.query(query, parameters)
return result
def recommend_companies(self, person_name, limit=3):
"""基于朋友的公司推荐工作"""
query = """
MATCH (person:Person {name: $person_name})-[:FRIEND_OF]-(friend)-[:WORKS_FOR]->(company)
WHERE NOT (person)-[:WORKS_FOR]->(company)
WITH company, count(friend) as friend_count
RETURN company.name as recommended_company,
company.industry as industry,
friend_count
ORDER BY friend_count DESC
LIMIT $limit
"""
parameters = {"person_name": person_name, "limit": limit}
result = self.conn.query(query, parameters)
return result
def recommend_by_skills(self, person_name, limit=5):
"""基于技能相似性推荐(假设有技能数据)"""
# 这里假设Person节点有skills属性
query = """
MATCH (person:Person {name: $person_name})
MATCH (other:Person)
WHERE person <> other
AND NOT (person)-[:FRIEND_OF]-(other)
WITH person, other,
size([skill IN person.skills WHERE skill IN other.skills]) as common_skills
WHERE common_skills > 0
RETURN other.name as recommended_person,
other.city as city,
common_skills
ORDER BY common_skills DESC
LIMIT $limit
"""
parameters = {"person_name": person_name, "limit": limit}
result = self.conn.query(query, parameters)
return result
# 使用推荐系统
recommender = RecommendationSystem(conn)
friend_recommendations = recommender.recommend_friends("张三")
company_recommendations = recommender.recommend_companies("张三")
print("朋友推荐:")
for rec in friend_recommendations:
print(f" 推荐: {rec['recommended_person']}, 共同朋友数: {rec['mutual_friends']}")
print("公司推荐:")
for rec in company_recommendations:
print(f" 推荐: {rec['recommended_company']}, 行业: {rec['industry']}")
四、高级功能
4.1 时间序列数据
def create_temporal_data(conn):
"""创建时间序列数据"""
# 创建带时间戳的关系
query = """
MATCH (p:Person {name: '张三'}), (c:Company {name: '阿里巴巴'})
CREATE (p)-[:WORKED_AT {
start_date: date('2019-01-01'),
end_date: date('2021-12-31'),
position: '工程师'
}]->(c)
"""
conn.query(query)
# 查询时间范围内的数据
query = """
MATCH (p:Person)-[r:WORKED_AT]->(c:Company)
WHERE r.start_date <= date('2020-12-31')
AND (r.end_date IS NULL OR r.end_date >= date('2020-01-01'))
RETURN p.name, c.name, r.position, r.start_date, r.end_date
"""
result = conn.query(query)
return result
4.2 地理空间数据
def create_spatial_data(conn):
"""创建地理空间数据"""
# 创建带坐标的节点
query = """
CREATE (office:Office {
name: '北京总部',
location: point({latitude: 39.9042, longitude: 116.4074}),
address: '北京市朝阳区'
})
"""
conn.query(query)
# 空间查询:查找附近的办公室
query = """
MATCH (office:Office)
WITH office,
distance(office.location, point({latitude: 39.9000, longitude: 116.4000})) as dist
WHERE dist < 10000 // 10公里内
RETURN office.name, office.address, dist
ORDER BY dist
"""
result = conn.query(query)
return result
4.3 全文搜索
def setup_fulltext_search(conn):
"""设置全文搜索索引"""
# 创建全文搜索索引
query = """
CREATE FULLTEXT INDEX person_search
FOR (p:Person)
ON EACH [p.name, p.email, p.bio]
"""
try:
conn.query(query)
print("全文搜索索引创建成功")
except Exception as e:
print(f"索引可能已存在: {e}")
def fulltext_search(conn, search_term):
"""全文搜索"""
query = """
CALL db.index.fulltext.queryNodes('person_search', $search_term)
YIELD node, score
RETURN node.name as name, node.email as email, score
ORDER BY score DESC
LIMIT 10
"""
parameters = {"search_term": search_term}
result = conn.query(query, parameters)
return result
# 使用全文搜索
setup_fulltext_search(conn)
search_results = fulltext_search(conn, "张三 OR engineer")
五、最佳实践
5.1 数据建模原则
-
节点设计:
- 使用有意义的标签
- 避免过度嵌套的属性
- 考虑查询模式设计索引
-
关系设计:
- 关系类型要明确
- 关系属性不要过多
- 考虑方向性
-
性能优化:
- 为频繁查询的属性创建索引
- 使用MERGE而非CREATE避免重复
- 批量操作优于逐个操作
5.2 查询优化技巧
# 好的查询模式
good_query = """
MATCH (p:Person {name: $name})-[:WORKS_FOR]->(c:Company)
WHERE c.industry = 'IT'
RETURN p, c
"""
# 避免的查询模式(扫描所有节点)
bad_query = """
MATCH (p:Person)-[:WORKS_FOR]->(c:Company)
WHERE p.name = $name AND c.industry = 'IT'
RETURN p, c
"""
5.3 数据一致性
def safe_person_update(conn, old_name, new_name):
"""安全的人员信息更新"""
# 使用事务确保数据一致性
with conn.driver.session() as session:
def update_person_name(tx, old_name, new_name):
# 检查新名称是否已存在
check_query = "MATCH (p:Person {name: $new_name}) RETURN count(p) as count"
result = tx.run(check_query, new_name=new_name)
if result.single()['count'] > 0:
raise ValueError(f"名称 {new_name} 已存在")
# 更新名称
update_query = """
MATCH (p:Person {name: $old_name})
SET p.name = $new_name
RETURN p
"""
result = tx.run(update_query, old_name=old_name, new_name=new_name)
return result.single()
try:
result = session.write_transaction(update_person_name, old_name, new_name)
print(f"成功更新: {old_name} -> {new_name}")
return result
except Exception as e:
print(f"更新失败: {e}")
return None
六、总结
Neo4j作为图数据库的代表,在处理复杂关系数据方面具有独特优势:
6.1 优势
- 直观的数据模型:图结构更接近现实世界的关系
- 高效的关系查询:JOIN操作转化为图遍历
- 灵活的Schema:支持动态添加属性和关系类型
- 强大的查询语言:Cypher语言表达力强
6.2 适用场景
- 社交网络分析
- 推荐系统
- 欺诈检测
- 知识图谱
- 网络分析
- 供应链管理
6.3 注意事项
- 数据量限制:超大规模数据需要考虑分片
- ACID性能:事务处理可能影响并发性能
- 内存需求:图数据通常需要较多内存
- 学习成本:Cypher语言需要学习时间
记住关闭连接:
# 使用完毕后关闭连接
conn.close()