-
Bug
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
False
-
None
-
False
-
-
As of March 13th of this month the nise-populator job is no longer collecting data. I have investigated the issue and we use two dictionaries to decide when new manifests should be created in the GCP downloader
key: partion_date
value: export_time
We compare the latest export time we have in our database for a partition date to the latest export time in BigQuery. If a new export time is available we create a new manifest and try to download that partition date.
However, the current query we run for the nise populator dataset is not returning anything:
SELECT DATE(_PARTITIONTIME), DATETIME(max(export_time)) FROM {project}.{dataset}.{table_id} WHERE DATE(_PARTITIONTIME) BETWEEN '2023-03-12' AND '2023-03-22' GROUP BY DATE(_PARTITIONTIME);
So I ran a modified version of this query to see what partition dates are available for this dataset in bigquery
SELECT DATE(_PARTITIONTIME), DATETIME(max(export_time)) FROM {project_id}.{dataset}.{table_id} GROUP BY DATE(_PARTITIONTIME);
Row | f0_ | f1_ | |
---|---|---|---|
1 | 2023-03-01 | 2023-03-23T15:22:11 |
Currently when we recreate the nise populator dataset all of the data is being partitioned under the first of the month. We have custom SQL in nise to modify the partition time here:
https://github.com/project-koku/nise/pull/399/files#diff-ab58cd5a717def68ef894fd263ac3e1e4032786eb1ebe03208d8400eb74f25a3R306
# Our downloader downloads by the paritiontime, however the default partitiontime is the date # the data is uploaded to bigquery. Therefore, everything goes into one single day. The load # job config does not let you upload to the _PARTITIONTIME because it is a prebuild column in # bigquery. However, we do have permission to update it. partition_date_sql = f""" UPDATE `{table_id}` SET _PARTITIONTIME=CAST(DATE_TRUNC(DATE(usage_start_time), MONTH) AS timestamp) WHERE 1=1; """ bigquery_client.query(partition_date_sql)
I ran this SQL:
```
SELECT DISTINCT DATE_TRUNC(DATE(usage_start_time), MONTH) AS timestamp FROM {project_id}.{dataset}.{table_id};
```
Row | timestamp | |
---|---|---|
1 | 2023-03-01 |
I think this SQL needs updating to allow the partition date to be in our daily range for the downloader.