Update Datagarage table counts to handle tables with more than 2 billion records

I'm running into more clients where there are more than 2B records in a table.   The problem is that Datagarage.dbo.dgTargetSourceTableRowCount stores the value of records in a table in an INT (limited to 2,147,483,647) so I have to change the column to a BIGINT.  Also, the SP [DataGarage].[dbo].[webService_CompletedTableDownload] needs to be updated to use COUNT_BIG(*) instead of COUNT(*), as well as other references to table count where they are stored as an INT vs. a BIGINT...and this only fixes 2/3 of the problem.

 

(related ask, while we are updating these tables can we update the datetime values to datetime2?  The data conversions are implicit so the impact should be minimal but then they tables will be to Microsoft's recommended data type and support finer granularity when it's needed)

reference:  https://docs.microsoft.com/en-us/sql/t-sql/functions/cast-and-convert-transact-sql?view=sql-server-ver15#implicit-conversions

 

The bigger issue is that the count of records on a table is unqualified (SELECT COUNT(*)) so it has to do a full table scan of all the records on the table...which sucks for performance.  On a high performing SQL Server I cancelled the count of 3B records after 9 minutes of trying to run....I don't have a good suggestion on how to fix that one...

  • Jake Cohen
  • Feb 17 2020
  • Shipped
  • Apr 2, 2020

    Admin response

    Jake - thanks for raising this idea. As we deal with larger datasets, we need to be able to effectively handle record counts in collect. I'll add this into the DSP backlog and align to a future DSP release.

  • Attach files
  • Daniel Downey commented
    14 Dec, 2020 05:37pm

    Tyler,

    This SQL should be made dynamic and will handle Billions of rows in less than 10 seconds.

    SELECT INFORMATION_SCHEMA.TABLES.TABLE_CATALOG AS Target,

    INFORMATION_SCHEMA.TABLES.TABLE_NAME AS [Table], sys.sysindexes.rowcnt AS RecordCount

    FROM sys.sysobjects WITH (NOLOCK) INNER JOIN sys.sysindexes WITH (NOLOCK) ON sys.sysobjects.id = sys.sysindexes.id INNER JOIN

    INFORMATION_SCHEMA.TABLES WITH (NOLOCK) ON sys.sysobjects.name = INFORMATION_SCHEMA.TABLES.TABLE_NAME

    WHERE (sys.sysobjects.xtype = 'U') AND (sys.sysindexes.indid IN (0, 1)) AND INFORMATION_SCHEMA.TABLES.TABLE_NAME = 'LFA1'

  • Larry Rich commented
    22 Jul, 2020 05:11pm

    Additional DSW/Transform tables with RecordCount field that should be made BIGINT.
    Additionally the LLBLGen entities the Collect/Transform dlls need to be updated as well.

    ttWaveProcessAreaObjectTarget

    ttWaveProcessAreaObjectTargetSource

    dswMilestoneObject

    dswSourceTable

    ttReportRun

    ttWaveProcessAreaObjectTargetDataServicesReport

    ttWaveProcessAreaObjectTargetDataServicesReportRemediation

    ttWaveProcessAreaObjectTargetDataServicesRule

    ttWaveProcessAreaObjectTargetDataServicesRuleAudit

    ttWaveProcessAreaObjectTargetExport

    ttWaveProcessAreaObjectTargetReport

    ttWaveProcessAreaObjectTargetReportRemediation

    ttWaveProcessAreaObjectTargetRule

    ttWaveProcessAreaObjectTargetRuleAudit

    ttWaveProcessAreaObjectTargetSourceReport

    ttWaveProcessAreaObjectTargetSourceRule

    ttWaveProcessAreaObjectTargetSourceRuleAudit

    and their matching store procedures
    aspTargetyyyyMetricsUpdPrm

  • Larry Rich commented
    30 Jun, 2020 10:59pm

    For the TxDot project these tables needed RecordCount/TableCount changed to BIGINT:

    DataGarage db tables:

    dgTargetSourceTable

    dgTargetSourceTableMetric

    dgTargetSourceTableRowCount

    dgTargetSourceTableRule

    DSW db tables:

    ttWaveProcessAreaObjectTarget

    ttWaveProcessAreaObjectTargetSource

    These stored procs needed COUNT(*) changed to COUNT_BIG(*)

    but a better solution would be to use a version of the attached stored proc (boaGetTableRowCount) to get the table row count instead of using COUNT_BIG(*).

    DataGarage db stored procs:

    webService_CompletedBuildIndex

    webService_CompletedDBMotoDownload

    webService_CompletedTableDownload

    webService_FailedBuildIndex

    webService_FailedDBMotoDownload

    webService_FailedTableDownload

    DSW db stored procs:

    aspTargetMetricsUpdPrm

    aspTargetMetricsUpdPrm

  • Jake Cohen commented
    2 Apr, 2020 11:58am

    Datagarage.dbo.webTargetSourceTableSetupHor has a CONVERT in the SELECT statement that needs to be updated to the datatype BIGINT:

    CONVERT (BIGINT, ISNULL(dbo.dgTargetSourceTable.RecordCount, 0))

  • Jake Cohen commented
    2 Apr, 2020 11:57am

    I like how sp_spaceused 'tablename' works to pull record counts, maybe it's similar.

  • Guest commented
    1 Apr, 2020 02:51pm

    The row count should be retrieved from the sysindexes table, as follows:

    SELECT TOP (100) PERCENT INFORMATION_SCHEMA.TABLES.TABLE_CATALOG, INFORMATION_SCHEMA.TABLES.TABLE_NAME, sys.sysindexes.rowcnt AS RecordCount

    FROM sys.sysobjects INNER JOIN

    sys.sysindexes ON sys.sysobjects.id = sys.sysindexes.id INNER JOIN

    INFORMATION_SCHEMA.TABLES ON sys.sysobjects.name = INFORMATION_SCHEMA.TABLES.TABLE_NAME

    WHERE (sys.sysobjects.xtype = 'U') AND (sys.sysindexes.indid IN (0, 1))