Relative Content

Tag Archive for aws-glue

Why won’t custom transforms load in AWS Glue?

I have a simple Glue ETL Job that extracts a few tables from an RDS instance and sends the data to an external data lake. That works fine, but I need to add a Custom Transform to modify some of the data. I’m following this AWS guide. I’ve copied and pasted the contents of the example json and python files into files named customFilterState.json and customFilterState.py, and uploaded these to the S3 bucket that contains my Job assets (the scripts and whatnot). I placed these two files in a folder called transforms at the base level, per the documentation. However, the Custom Transform does not show up in the Visual ETL Editor.

What are the character limitations of a SchemaName in the AWS Schema Registry?

What are the limiations of the SchemaName within AWS Glue Schema Registry? In particular I would like to know:

Glue job keeps running while throwing “ErrorMessage: Partition already exists.” error

My PySpark script joins several tables and writes the result with the code below:

Job bookmark not working when joining 2 tables in AWS Glue

our current script was generated using visual ETL on AWS Glue. It works but the incremental loading (with job bookmark) does not work. for every run all the data is uploaded in s3 again. What would be the possible issues here?

AWS Glue ETL Filter Transformation not returning expected results

Using Glue ETL UI version 4.0 and I want to use the Filter transformation to remove any blank records based on a selected field. The field is a type string but it’s has a datetime field. Format is 2023-10-21T00:35:01.000+0000 . I have also tried other string column to filter with no success.

Datetime is aws glue job is not taking the local time-zone, it is using UTC?

I am running an AWS Glue job in ap-south-1 region. Here I have used the python datetime function to load one of my audit columns (created_on). When I tried to load the datetime.now(), it is taking the UTC time-zone, instead of the local time-zone. How to solve this, and why glue is taking the UTC time-zone even though it is hosted in the ap-south-1 region?

AWS Glue Crawler: How to put “None” as the Quote Symbol in my Classifier?

I made a custom classifier to change the data types in Athena when the Glue Crawler is run. The CSV file does not have anything around the column name, no quotes, double quotes, tabs, nothing. It is separated by a pipe delimeter but there are no characters surrounding the name of the column itself. I want to put nothing as the quote symbol in the glue classifier, but there is no option for that.

Thiết kế website giá rẻ

Danh mục