Uploaded image for project: 'Cost Management'
  1. Cost Management
  2. COST-3636

GCP: No new data for nise populator

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 2023Q1
    • None
    • None
    • None
    • False
    • None
    • False

      As of March 13th of this month the nise-populator job is no longer collecting data. I have investigated the issue and we use two dictionaries to decide when new manifests should be created in the GCP downloader

      key: partion_date
      value: export_time

      We compare the latest export time we have in our database for a partition date to the latest export time in BigQuery. If a new export time is available we create a new manifest and try to download that partition date.

      However, the current query we run for the nise populator dataset is not returning anything:

       

      SELECT DATE(_PARTITIONTIME), DATETIME(max(export_time)) FROM {project}.{dataset}.{table_id} WHERE DATE(_PARTITIONTIME) BETWEEN '2023-03-12' AND '2023-03-22' GROUP BY DATE(_PARTITIONTIME);

      So I ran a modified version of this query to see what partition dates are available for this dataset in bigquery

      SELECT DATE(_PARTITIONTIME), DATETIME(max(export_time)) FROM {project_id}.{dataset}.{table_id} GROUP BY DATE(_PARTITIONTIME);
      Row f0_ f1_  
      1 2023-03-01 2023-03-23T15:22:11

      Currently when we recreate the nise populator  dataset all of the data is being partitioned under the first of the month. We have custom SQL in nise to modify the partition time here:
      https://github.com/project-koku/nise/pull/399/files#diff-ab58cd5a717def68ef894fd263ac3e1e4032786eb1ebe03208d8400eb74f25a3R306

              # Our downloader downloads by the paritiontime, however the default partitiontime is the date
              # the data is uploaded to bigquery. Therefore, everything goes into one single day. The load
              # job config does not let you upload to the _PARTITIONTIME because it is a prebuild column in
              # bigquery. However, we do have permission to update it.
              partition_date_sql = f"""
              UPDATE `{table_id}` SET _PARTITIONTIME=CAST(DATE_TRUNC(DATE(usage_start_time), MONTH) AS timestamp) WHERE 1=1;
              """
              bigquery_client.query(partition_date_sql)

      I ran this SQL:
      ```
      SELECT DISTINCT DATE_TRUNC(DATE(usage_start_time), MONTH) AS timestamp FROM {project_id}.{dataset}.{table_id};
      ```

      Row timestamp  
      1 2023-03-01

      I think this SQL needs updating to allow the partition date to be in our daily range for the downloader. 

            myersco Cody Myers
            myersco Cody Myers
            Pedro Lopez Perez Pedro Lopez Perez
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: