Dans mon application, j'ai une requête qui effectue une recherche dans le tableau "fichiers".
La table "fichiers" est partitionnée par "f". "Créée" (voir la définition de la table et contient ~ 26 millions de lignes pour le client 19 ("f". "Cid = 19).
Le point ici est, si je fais cette requête:
SELECT "f"."id" AS "FileId"
, "f"."name" AS "FileName"
, "f"."year" AS "Fileyear"
, "f"."cid" AS "clientId"
, "f"."created" AS "FileDate"
, CASE WHEN ("vnVE0"."value" is not null AND "vnVE0"."value" != '')
THEN CAST("vnVE0"."value" AS decimal(28,2))
ELSE 0 END AS "keywordValueCol0_numeric"
FROM files "f"
OUTER APPLY
(
SELECT DISTINCT
VT.[value]
FROM dbo.value_number AS VT
WHERE
VT.id_file = F.id
AND VT.id_field = 260
) AS "vnVE0"
WHERE "grapado" IS NULL AND "masterversion" IS NULL AND ("f"."year" = 2013 OR "f"."year" = 0) AND "f"."cid" = 19
GROUP BY "f"."id", "f"."name", "f"."year", "f"."cid", "f"."created", CASE WHEN ("vnVE0"."value" is not null AND "vnVE0"."value" != '')
THEN CAST("vnVE0"."value" AS decimal(28,2))
ELSE 0 END
ORDER BY (SELECT NULL)
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
J'obtiens les résultats en 0 secondes, avec le plan d'exécution suivant: https://www.brentozar.com/pastetheplan/?id=SkV0-FDcG
Mais si j'essaye de commander par "nom" la requête devient trop lente:
SELECT "f"."id" AS "FileId"
, "f"."name" AS "FileName"
, "f"."year" AS "Fileyear"
, "f"."cid" AS "clientId"
, "f"."created" AS "FileDate"
, CASE WHEN ("vnVE0"."value" is not null AND "vnVE0"."value" != '')
THEN CAST("vnVE0"."value" AS decimal(28,2))
ELSE 0 END AS "keywordValueCol0_numeric"
FROM files "f"
OUTER APPLY
(
SELECT DISTINCT
VT.[value]
FROM dbo.value_number AS VT
WHERE
VT.id_file = F.id
AND VT.id_field = 260
) AS "vnVE0"
WHERE "grapado" IS NULL AND "masterversion" IS NULL AND ("f"."year" = 2013 OR "f"."year" = 0) AND "f"."cid" = 19
GROUP BY "f"."id", "f"."name", "f"."year", "f"."cid", "f"."created", CASE WHEN ("vnVE0"."value" is not null AND "vnVE0"."value" != '')
THEN CAST("vnVE0"."value" AS decimal(28,2))
ELSE 0 END
ORDER BY "f"."name"
OFFSET 0 ROWS FETCH NEXT 50 ROWS ONLY;
Cette requête prend 11 minutes pour me renvoyer les résultats, avec le plan d'exécution suivant: https://www.brentozar.com/pastetheplan/?id=Sk3Fbtv9M
De plus, si je modifie l'ordre par colonne, le résultat est le même.
Comme vous pouvez le voir dans le plan d'exécution, l'index "files_mv" a un coût de 61%, voici la définition de l'index:
CREATE NONCLUSTERED INDEX [files_mv] ON [dbo].[files]
(
[masterversion] ASC,
[year] ASC,
[cat_id] ASC,
[cid] ASC,
[eid] ASC,
[grapado] ASC,
[sub_id] ASC,
[tip_id] ASC
)
INCLUDE ( [id],
[name]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO
J'utilise SQL Server avec Azure. Plus précisément, Azure SQL Database avec tarification / niveau de modèle "S4 Estándar (200 DTU)".
Je reçois beaucoup de données, mais je pense que la connexion Internet n'est pas un goulot d'étranglement, car dans d'autres requêtes, je reçois également beaucoup de données et elles sont plus rapides.
De plus, je fais une insertion massive de données dans ces tables, et dans quelques jours, j'aurai jusqu'à ~ 240 millions de lignes dans la table des fichiers (pour un cid) et ~ 480 millions de lignes dans la table value_number.
information additionnelle
Fonction de partition "PF_files_partitioning":
CREATE PARTITION FUNCTION PF_files_partitioning (DATETIME2(7))
AS
RANGE LEFT FOR VALUES ( '2013-03-31 23:59:59',
'2013-06-30 23:59:59',
'2013-09-30 23:59:59',
'2013-12-31 23:59:59',
'2014-03-31 23:59:59',
'2014-06-30 23:59:59',
'2014-09-30 23:59:59',
'2014-12-31 23:59:59',
'2015-03-31 23:59:59',
'2015-06-30 23:59:59',
'2015-09-30 23:59:59',
'2015-12-31 23:59:59',
'2016-03-31 23:59:59',
'2016-06-30 23:59:59',
'2016-09-30 23:59:59',
'2016-12-31 23:59:59',
'2017-03-31 23:59:59',
'2017-06-30 23:59:59',
'2017-09-30 23:59:59',
'2017-12-31 23:59:59',
'2018-03-31 23:59:59')
Schéma de partition "PS_files_partitioning":
CREATE PARTITION SCHEME PS_files_partitioning AS PARTITION PF_files_partitioning ALL TO ([PRIMARY]);
** J'aurai environ 15 millions de lignes dans chaque partition.
table des fichiers:
CREATE TABLE [dbo].[files](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[cid] [tinyint] NOT NULL,
[eid] [bigint] NOT NULL,
[cat_id] [bigint] NOT NULL,
[tip_id] [bigint] NULL,
[sub_id] [bigint] NULL,
[year] [smallint] NOT NULL,
[caducidad] [smallint] NULL,
[grapadopri] [int] NOT NULL,
[grapado] [bigint] NULL,
[name] [nvarchar](255) NOT NULL,
[extension] [tinyint] NOT NULL,
[size] [bigint] NOT NULL,
[id_doc] [bit] NOT NULL,
[observaciones] [nvarchar](255) NOT NULL,
[indexed] [bit] NOT NULL,
[signed] [bit] NOT NULL,
[created] [datetime2](7) NOT NULL,
[name_lower] [nvarchar](255) NOT NULL,
[modified] [datetime2](7) NULL,
[related] [bit] NOT NULL,
[masterversion] [bigint] NULL,
[versioned] [bit] NOT NULL,
[hwsignature] [tinyint] NOT NULL,
[blockedUserId] [smallint] NULL,
CONSTRAINT [PK_files_id] PRIMARY KEY CLUSTERED
(
[id] ASC,
[created] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created]),
CONSTRAINT [files$estructure_unique] UNIQUE NONCLUSTERED
(
[cat_id] ASC,
[tip_id] ASC,
[sub_id] ASC,
[year] ASC,
[name] ASC,
[grapado] ASC,
[created] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
table value_number:
CREATE TABLE [dbo].[value_number](
[id] [bigint] IDENTITY(1,1) NOT NULL,
[id_file] [bigint] NOT NULL DEFAULT ((0)),
[id_field] [bigint] NOT NULL DEFAULT ((0)),
[value] [nvarchar](255) NULL DEFAULT (NULL),
[id_doc] [bigint] NULL DEFAULT (NULL)
CONSTRAINT [PK_value_number_id] PRIMARY KEY CLUSTERED
(
[id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)
Table des index des fichiers
CREATE NONCLUSTERED INDEX [files_clientes] ON [dbo].[files]
(
[cid] ASC
)
INCLUDE ([id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [files_grapado] ON [dbo].[files]
(
[grapado] ASC
)
INCLUDE ( [id],
[name]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [files_mv] ON [dbo].[files]
(
[masterversion] ASC,
[year] ASC,
[cat_id] ASC,
[cid] ASC,
[eid] ASC,
[grapado] ASC,
[sub_id] ASC,
[tip_id] ASC
)
INCLUDE ( [id],
[name]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [files_ocr] ON [dbo].[files]
(
[cid] ASC,
[grapado] ASC,
[indexed] ASC,
[masterversion] ASC,
[extension] ASC
)
INCLUDE ( [id],
[eid],
[cat_id],
[tip_id],
[sub_id],
[year],
[name]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [files_ocr2] ON [dbo].[files]
(
[cid] ASC,
[eid] ASC,
[grapado] ASC,
[indexed] ASC,
[masterversion] ASC,
[extension] ASC
)
INCLUDE ( [id],
[cat_id],
[tip_id],
[sub_id],
[year],
[name]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [files_ocr3] ON [dbo].[files]
(
[cid] ASC,
[cat_id] ASC,
[grapado] ASC,
[indexed] ASC,
[masterversion] ASC,
[extension] ASC
)
INCLUDE ( [eid],
[tip_id],
[sub_id],
[year],
[name]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [busqueda_name] ON [dbo].[files]
(
[cid] ASC,
[eid] ASC,
[grapado] ASC,
[year] ASC
)
INCLUDE ( [id],
[cat_id],
[tip_id],
[sub_id],
[grapadopri],
[name],
[size],
[id_doc],
[signed],
[created],
[modified],
[related],
[masterversion]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [busqueda2] ON [dbo].[files]
(
[cid] ASC,
[eid] ASC,
[cat_id] ASC,
[grapado] ASC,
[masterversion] ASC,
[year] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [cid] ON [dbo].[files]
(
[cid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [eid] ON [dbo].[files]
(
[eid] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [extension] ON [dbo].[files]
(
[extension] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [FK_files_archivo] ON [dbo].[files]
(
[grapado] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [FK_files_tipo] ON [dbo].[files]
(
[tip_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [grapadopri] ON [dbo].[files]
(
[grapadopri] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [index_all] ON [dbo].[files]
(
[cid] ASC,
[eid] ASC,
[grapado] ASC,
[masterversion] ASC
)
INCLUDE ( [cat_id],
[tip_id],
[sub_id],
[year],
[grapadopri],
[name],
[size],
[id_doc],
[signed],
[created],
[modified],
[related],
[versioned]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [missing_index_7_6] ON [dbo].[files]
(
[cid] ASC,
[eid] ASC,
[grapado] ASC,
[name] ASC,
[year] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [ocrCloudClients] ON [dbo].[files]
(
[grapado] ASC,
[indexed] ASC,
[extension] ASC
)
INCLUDE ( [cid],
[eid],
[cat_id],
[tip_id],
[sub_id]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [searchEntity] ON [dbo].[files]
(
[cid] ASC,
[eid] ASC,
[grapado] ASC,
[masterversion] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
CREATE NONCLUSTERED INDEX [sub_id] ON [dbo].[files]
(
[sub_id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])
Index de la table value_number
CREATE NONCLUSTERED INDEX [searchValues] ON [dbo].[value_number]
(
[id_field] ASC
)
INCLUDE ( [id_file],
[value]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
CREATE NONCLUSTERED INDEX [search] ON [dbo].[value_number]
(
[id_file] ASC,
[id_field] ASC
)
INCLUDE ( [value]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
CREATE NONCLUSTERED INDEX [id_field] ON [dbo].[value_number]
(
[id_field] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
CREATE NONCLUSTERED INDEX [FK_valueesN_documento] ON [dbo].[value_number]
(
[id_doc] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
CREATE NONCLUSTERED INDEX [FK_valueesN_archivo] ON [dbo].[value_number]
(
[id_file] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
Les statistiques sont à jour. J'ai changé les types de données pour l'année et d'autres colonnes et maintenant les performances semblent être un peu meilleures, mais le plan d'exécution est toujours le même. J'essaie de corriger l'estimation de cardinalité (changement d'index), mais je n'ai pas encore réussi. Selon la documentation Azure, je devrais avoir 130 niveaux de compatibilité dans la base de données et j'en ai déjà 100 avec ProductVersion 12.0.