Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: 2023Q1
Affects Version/s: None
Component/s: None
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

As of March 13th of this month the nise-populator job is no longer collecting data. I have investigated the issue and we use two dictionaries to decide when new manifests should be created in the GCP downloader

key: partion_date
value: export_time

We compare the latest export time we have in our database for a partition date to the latest export time in BigQuery. If a new export time is available we create a new manifest and try to download that partition date.

However, the current query we run for the nise populator dataset is not returning anything:

SELECT DATE(_PARTITIONTIME), DATETIME(max(export_time)) FROM {project}.{dataset}.{table_id} WHERE DATE(_PARTITIONTIME) BETWEEN '2023-03-12' AND '2023-03-22' GROUP BY DATE(_PARTITIONTIME);

So I ran a modified version of this query to see what partition dates are available for this dataset in bigquery

SELECT DATE(_PARTITIONTIME), DATETIME(max(export_time)) FROM {project_id}.{dataset}.{table_id} GROUP BY DATE(_PARTITIONTIME);

Row	f0_	f1_
1	2023-03-01	2023-03-23T15:22:11

Currently when we recreate the nise populator dataset all of the data is being partitioned under the first of the month. We have custom SQL in nise to modify the partition time here:
https://github.com/project-koku/nise/pull/399/files#diff-ab58cd5a717def68ef894fd263ac3e1e4032786eb1ebe03208d8400eb74f25a3R306

        # Our downloader downloads by the paritiontime, however the default partitiontime is the date
        # the data is uploaded to bigquery. Therefore, everything goes into one single day. The load
        # job config does not let you upload to the _PARTITIONTIME because it is a prebuild column in
        # bigquery. However, we do have permission to update it.
        partition_date_sql = f"""
        UPDATE `{table_id}` SET _PARTITIONTIME=CAST(DATE_TRUNC(DATE(usage_start_time), MONTH) AS timestamp) WHERE 1=1;
        """
        bigquery_client.query(partition_date_sql)

I ran this SQL:
```
SELECT DISTINCT DATE_TRUNC(DATE(usage_start_time), MONTH) AS timestamp FROM {project_id}.{dataset}.{table_id};
```

Row	timestamp
1	2023-03-01

I think this SQL needs updating to allow the partition date to be in our daily range for the downloader.

Assignee:: Cody Myers

Reporter:: Cody Myers

QA Contact:: Pedro Lopez Perez

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2023/03/22 2:01 PM

Updated:: 2023/11/22 7:47 PM

Resolved:: 2023/04/03 11:19 AM

Details

Description

Attachments

Activity

People

Dates