Python iproperty Import

ylwong

main.py

This Python code is a comprehensive script designed for data scraping, processing, and database operations. It's structured around various functions that handle different tasks related to data collection from web sources, processing that data, and storing it in a database. The code also involves some auxiliary functionalities like watermark configuration and image processing.

Here's a detailed breakdown of the code:

1. Imports and Dependencies

import os
from pydoc import ispackage
import time
import util
from log import my_log
import yaml
import json
import pymssql
import decimal
import uuid
import math
import datetime
import random
from util.image import imageThread
from urllib.parse import urlparse
from lxml import etree
import traceback
import _cffi_backend as backend
import operator
from functools import reduce
from benedict import benedict
import re
import keyboard
import pytz
from dateutil import relativedelta
from dateutil.parser import parse

Standard Libraries: The script imports several standard libraries (os, time, math, datetime, etc.) for general-purpose tasks like file handling, time management, and mathematical operations.
Third-Party Libraries:
- yaml: For parsing YAML configuration files.
- pymssql: For connecting and executing queries on a Microsoft SQL Server database.
- lxml.etree: For parsing HTML/XML documents.
- benedict: A library for enhanced dictionary functionalities.
- keyboard: For capturing keyboard events.
- pytz and dateutil: For timezone and date parsing/manipulation.
- re: For regular expressions.
Custom Modules:
- util: Appears to be a custom module, potentially handling various utility functions.
- log: Likely handles logging.
- imageThread: Handles image processing, including watermarking.

2. Global Variables

yaml_data = yaml.load(open("config.yaml"), Loader=yaml.FullLoader)
instance = yaml_data['instance']
User_Agent = yaml_data['User_Agent']['value']
host = yaml_data['mssql']['host']
user = yaml_data['mssql']['user']
pwd = yaml_data['mssql']['pwd']
database = yaml_data['mssql']['database']
port = yaml_data['mssql']['port']
ipage = yaml_data['ipage']
daysdiff = yaml_data['daysdiff']
skipcounter = yaml_data['skipcounter']
cdnImg = yaml_data['cdnImg']['value']

YAML Configuration: The script loads configuration parameters from a config.yaml file. These parameters include database connection details (host, user, pwd, etc.), user-agent strings for web requests, image CDN settings, and more.

3. Database Connection Setup

conn = pymssql.connect(host=host, user=user, password=str(pwd), database=database, port=str(port))
cursor = conn.cursor(as_dict=True)

Database Connection: Establishes a connection to a Microsoft SQL Server database using the pymssql library.

4. Global Variables and Flags

gcount = 0
rcount = 90
quit = False

Counters: gcount and rcount are used in certain functions to control iterations or indexing.
Flag: quit is used as a global flag to control the termination of certain processes, particularly in the key_capture_thread() function.

5. Key Capture Thread

def key_capture_thread():
    global quit
    a = keyboard.read_key()
    if a == "esc":
        print("ESC Stop, finishing the current slug")
        quit = True

Keyboard Listener: This function listens for the "ESC" key to be pressed. If detected, it sets the quit flag to True, which can be used to gracefully stop long-running processes.

6. General Utility Functions

`_getgcount()`

def _getgcount():
    global gcount
    gcount += 1
    rqcode = math.floor(gcount/rcount) % 250
    return format(rqcode+1, '#04x')[2:]

Counter Function: Increments the global gcount and calculates a hexadecimal code based on this value.

`ele_to_str(ele)`

def ele_to_str(ele):
    return etree.tostring(ele, pretty_print=True, encoding="utf-8", method="html").decode("utf-8")

Element to String: Converts an lxml element object to a pretty-printed string representation.

`_getJSONval()` and `_getJSON()`

def _getJSONval(json, key, type="any"):
    val = None
    try:
        val = reduce(operator.getitem, key.split('.'), json)
    except Exception as e:
        if(type=="array"):
            val = []
    return val

def _getJSON(json, key, type="any"):
    val = None
    try:
        val = json[key]
        if(val):
            if(type == "int"):
                val = int(val)
            elif(type == "decimal"):
                val = float(str(val).replace(",",""))
    except Exception as e:
        if(type=="array"):
            val = []
        else:
            val = None
    return val

JSON Handling: These functions are used to safely extract values from a JSON-like dictionary structure. They allow for nested key access and type casting.

`_cleanImageDir()`

def _cleanImageDir():
    import shutil
    import os
    cursor.execute('SELECT ID, MerchantID FROM dbo.Merchants WHERE ID NOT IN (SELECT ID FROM dbo.Merchants_copy1)')
    results = cursor.fetchall()
    for imgdir in results:
        path = "F:/image/" + imgdir['MerchantID']
        if os.path.exists(path):
            print(path)
            shutil.rmtree(path)

Directory Cleanup: This function deletes directories corresponding to merchants that are no longer present in a backup table (Merchants_copy1).

`_getrow()` and `_getcol()`

def _getrow(table="", filter=""):
    global conn, cursor
    cursor.execute(f'SELECT ID FROM dbo.{table} WHERE {filter}')
    results = cursor.fetchall()
    return len(results)

def _getcol(table="", col="", filter=""):
    global conn, cursor
    cursor.execute(f'SELECT {col} FROM dbo.{table} WHERE {filter}')
    results = cursor.fetchall()
    return results

Database Query Helpers: These functions are simple helpers to execute SQL queries and fetch either row counts or specific columns.

7. Data Collection and Processing Functions

`_getcat()`

def _getcat():
    global conn, cursor
    cursor.execute('SELECT * FROM dbo.PropType')
    results = cursor.fetchall()
    propType = benedict()
    for row in results:
        propType[f"{row['code']}"] = row['name']

    a = 34
    print(propType[f'{a}'])

Property Type Data Fetching: This function fetches property type data from the database, stores it in a benedict dictionary, and prints a specific entry.

`_getlist_city()`, `_getlist_state()`, `_getlist_extra()`

These functions are responsible for scraping and processing lists of cities, states, or other entities from a business directory website. The data is then stored in corresponding database tables (City, State, Jobs2).

_getlist_city(): Scrapes city data and inserts it into the City table.
_getlist_state(): Scrapes state data and inserts it into the State table.
_getlist_extra(): Scrapes extra data from states not found in cities and inserts it into the Jobs2 table.

`_setlist()` and `_getlistByType()`

_setlist(): Prepares URLs for scraping property listings, either for sale or rent, and inserts them into the Jobs_Listing table.
_getlistByType(): Scrapes property listings by type (sale or rent) and processes them. This function handles the pagination and checks if a listing should be skipped based on its age or if it already exists in the database.

`_getlist()`, `_getlist_rent()`

These functions are responsible for scraping property listings, storing the scraped URLs in the database, and processing them further if needed.

8. Data Insertion and Processing

`getTbl()`

def getTbl():
    global conn, cursor
    cursor.execute("""SELECT COLUMN_NAME, DATA_TYPE 
    FROM INFORMATION_SCHEMA.COLUMNS 
    WHERE TABLE_NAME = 'ProppyPropertyMarket'
    AND COLUMN_NAME <> 'ID';""")
    results = cursor.fetchall()
    tbl = {}
    for row in results:
        att = {}
        att['type'] = row['DATA_TYPE']
        att['value'] = None
        tbl[row['COLUMN_NAME']] = att
    print(json.dumps(tbl))
    return

Database Schema Fetching: This function retrieves the schema of the ProppyPropertyMarket table (excluding the ID column) and prints it as a JSON object.

`_getdata()`

def _getdata(ltype, slug):
    global conn, cursor,

 quit, idate, watermark_setting, cdnImg, daysdiff
    utc=pytz.UTC
    watermark_setting = get_watermark_config()
    # ...
    return 1

Core Data Processing: This is the main function responsible for scraping and processing detailed property data from a URL (slug). It performs several tasks:
- Fetches the property details from the URL.
- Processes and extracts data, including the advertiser type, agent information, property details, and images.
- Inserts the processed data into the corresponding database tables.
- Handles watermarking of images and other image-related processing.

`date_diff()`

def date_diff():
    utc=pytz.UTC
    startWorkDate = "2022-10-12T14:40:49.875Z"
    d1 = parse(startWorkDate)
    d1 = d1 + datetime.timedelta(hours=8)
    print(d1.strftime('%Y-%m-%d %H:%M:%S'))
    # ...

Date Difference Calculation: This utility function calculates the difference between two dates and prints it. It seems to be a helper function for date-related calculations.

`get_watermark_config()`

def get_watermark_config():
    watermark_setting = WaterMark()
    yaml_data = yaml.load(open("config.yaml"), Loader=yaml.FullLoader)
    watermark_setting.enabled = yaml_data['Watermark']['enabled']
    # ...
    return watermark_setting

Watermark Configuration: This function loads watermark settings from the YAML configuration file and returns them as a WaterMark object.

9. Main Script Execution

if __name__ == '__main__':
    my_log()
    _getlistByType("rent")
    conn.close()

Script Entry Point: When the script is run, it initializes logging, processes property listings for rent by calling _getlistByType(), and finally closes the database connection.

Summary

This Python script is a sophisticated data scraper and processor designed to fetch real estate or property-related information from various online sources, process it, and store it in a Microsoft SQL Server database. The script handles various tasks such as:

Scraping and processing city, state, and property data.
Handling database connections and queries.
Performing data extraction and transformation.
Processing images, including watermarking.
Handling pagination and managing large datasets efficiently.
Logging and error handling.

This script is well-suited for an environment where large-scale data collection and processing are needed, particularly in real estate or similar industries where such information is critical.

ylwong

util\url.py

This Python script is designed to make HTTP requests, particularly for fetching web pages and images, while managing retries, handling exceptions, and processing the response content. The script also loads configuration details from a YAML file, which includes user-agent and cookie information.

Breakdown of the Script:

1. Imports and Dependencies

import json
import yaml
import urllib.request
import socket
import socks
import time
import brotli
import random
import traceback
import certifi
import gzip

Standard Libraries: The script imports several standard libraries such as json, urllib.request, and time.
Third-Party Libraries:
- yaml: For loading configuration data from a YAML file.
- brotli: For decompressing Brotli-compressed responses.
- certifi: Provides Mozilla's Certificate Authority (CA) bundle for HTTPS requests.
- gzip: For handling GZIP-compressed responses.

2. Configuration Loading

yaml_data = yaml.load(open("config.yaml"), Loader=yaml.FullLoader)
User_Agent = yaml_data['User_Agent']['value']
Cookie = yaml_data['Cookie']['value']

YAML Configuration: The script loads the user-agent string and cookie from a config.yaml file. This information is used to set HTTP headers for requests.

3. Function: `post_url(url, data)`

def post_url(url, data):
    headers = {
        'User-Agent': f'{User_Agent}',
        'Content-Type': 'application/json'
    }

    data = json.dumps(data)
    data = str(data)
    data = data.encode('utf-8')

    req =  urllib.request.Request(url, data=data, headers=headers)

    resp = urllib.request.urlopen(req).read().decode('utf-8')

    return resp

Purpose: This function sends an HTTP POST request to a specified URL with the provided data.
Steps:
- The data dictionary is converted to a JSON string, then to a byte string.
- An HTTP request is constructed with the necessary headers, including the user-agent and content-type.
- The request is sent, and the response is decoded from UTF-8.

4. Function: `getimg(url, filename=None)`

def getimg(url, filename=None):
    urllib.request.urlretrieve(url, filename)

Purpose: This simple function downloads an image from a specified URL and saves it to the given filename.

5. Function: `get_imageurl(url, filename=None)`

def get_imageurl(url, filename=None):
    headers = {
        'User-Agent': f'{User_Agent}'
    }

    req =  urllib.request.Request(url, headers=headers)
    resp = None

    tries = 10
    for i in range(tries):
        try:
            resp = urllib.request.urlopen(req, timeout=15)
        except urllib.error.HTTPError as e:
            print(e)
            if(i < tries-1):
                print("Retry:" + str(i+1) +  " " + url)
                time.sleep(1)
                continue
            return False
        except urllib.error.URLError as e:
            print(e)
            if(i < tries-1):
                print("Retry:" + str(i+1) +  " " + url)
                time.sleep(1)
                continue
            return False
        except Exception as e:
            print(e)
            if(i < tries-1):
                print("Retry:" + str(i+1) +  " " + url)
                time.sleep(1)
                continue
            return False
        else:
            f = open(filename,'wb')
            f.write(resp.read())
            f.close()
            return True

Purpose: This function retrieves an image from a URL, saves it to a file, and handles retries in case of failures.
Steps:
- A request is constructed with a user-agent header.
- It attempts to download the image up to 10 times if an error occurs (e.g., HTTP error, URL error, or timeout).
- If successful, the image is saved to the specified file.

6. Function: `get_url(url, cookieno="00")`

def get_url(url, cookieno="00"):
    url = url.replace("https://www.iproperty.com.my/", "https://live.sg.iproperty.com.my/")

    headers1 = {
        'Host': 'www.iproperty.com.my',
        'User-Agent': f'{User_Agent}',
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate, br',
        'Cookie': f'{Cookie}',
        'Upgrade-Insecure-Requests': '1',
        'Sec-Fetch-Dest': 'document',
        'Sec-Fetch-Mode': 'navigate',
        'Sec-Fetch-Site': 'none',
        'Sec-Fetch-User': '?1',
        'Cache-Control': 'max-age=0',
        'TE': 'trailers',
        'dnt': '1'
    }

    req =  urllib.request.Request(url, headers=headers1)
    resp = None

    tries = 10
    for i in range(tries):
        try:
            resp = urllib.request.urlopen(req, timeout=3)
        except urllib.error.HTTPError as e:
            print(e)
            if(i < tries-1):
                print("Retry:" + str(i+1) +  " " + url)
                time.sleep(1)
                continue
            return "ERROR"+str(e.code)
        except urllib.error.URLError as e:
            print(e)
            if(i < tries-1):
                print("Retry:" + str(i+1) +  " " + url)
                time.sleep(1)
                continue
            return "ERROR-URL"
        except Exception as e:
            print(traceback.format_exc())
            print(e)
            if(i < tries-1):
                print("Retry:" + str(i+1) +  " " + url)
                time.sleep(1)
                continue
            return "ERROR-OTHER"
        else:
            response = None
            try:
                response = gzip.decompress(resp.read())
            except Exception as e:
                print(traceback.format_exc())
                print(e)
                if(i < tries-1):
                    print("Retry:" + str(i+1) +  " " + url)
                    time.sleep(1)
                    continue
                return "ERROR-OTHERRESP"
            else:
                return response

Purpose: This function sends an HTTP GET request to a URL, handles retries, and processes the response.
Steps:
- The URL is modified (replacing the base domain).
- HTTP headers are set, including cookies, user-agent, and content type.
- A request is sent to the server, and the script retries up to 10 times in case of errors.
- If the response is compressed with GZIP, it decompresses the response and returns the data. Otherwise, it returns the error code.

Summary

This script is primarily focused on making HTTP requests to retrieve web pages or images, handle various types of errors (like timeouts, HTTP errors), and retry the operation if necessary. It also includes functionalities to handle different content encodings (like GZIP or Brotli) and decompression. The use of a configuration file allows for dynamic setting of headers like user-agent and cookies, which can be useful when scraping websites that enforce anti-scraping measures.

If you need any further explanation or specific details about any part of the script, feel free to ask!

ylwong

util\image.py

This Python script defines a class imageThread, which is responsible for downloading an image from a given URL, applying a watermark to it, and saving the image in both regular and thumbnail formats. The script also manages directories for saving these images based on certain parameters.

Breakdown of the Script:

1. Imports and Global Variables

import os
from PIL import Image, ImageOps, ImageDraw, ImageFont
import traceback

file_base = "./image/"

os: Used for directory creation and file handling.
PIL (Pillow): A Python Imaging Library used for opening, manipulating, and saving different image file formats.
traceback: Used for detailed error reporting when exceptions occur.
file_base: The base directory where images will be stored.

2. Class: `imageThread`

The imageThread class is designed to encapsulate all the logic needed to download an image, process it (including adding a watermark), and save it to the appropriate directory.

`init` Method

def __init__(self, image_url, wm_text="", lister="", watermark_setting=None, ext="", filename="", idate="", cdnImgDir=""):
    self.image_url = image_url
    self.wm_text = wm_text or ""
    self.lister = lister
    self.watermark_setting = watermark_setting
    self.ext = ext
    self.filename = filename
    self.idate = idate
    self.cdnImgDir = cdnImgDir

Purpose: Initializes the object with all necessary parameters:
- image_url: The URL of the image to be downloaded.
- wm_text: Optional watermark text to be added to the image.
- lister: Identifier for the image owner or source.
- watermark_setting: Configuration settings for the watermark, such as font, color, etc.
- ext: The file extension (not actively used in the code).
- filename: The name under which the image will be saved.
- idate: Date identifier used to create specific directories.
- cdnImgDir: The directory path where the processed image will be saved.

`run` Method

def run(self) -> None:
    file_path = file_base + "%s/%s" % (self.idate, self.lister)
    cdnImgpath = self.cdnImgDir.replace("IDATE", self.idate)
    cdnImgpath = cdnImgpath.replace("LISTER", self.lister)
    
    if not os.path.exists(file_path):
        os.makedirs(file_path)
    
    img_file = file_path + "/%s" % self.filename
    result = url.get_imageurl(self.image_url, img_file)
    
    if result == True:
        try:
            image = Image.open(img_file)
            image = ImageOps.exif_transpose(image)
            if image.mode not in ("RGB"):
                image = image.convert("RGB")
            image.thumbnail(size=(1080, 1080))
            
            if result and self.watermark_setting.enabled == 1:
                try:
                    # Watermark properties
                    wm_color = self.watermark_setting.background_color
                    wm_transp_color = self.watermark_setting.transp_background_color
                    ratio_x = 0.56
                    ratio_y1 = 0.6
                    ratio_y2 = 11.4
                    width, height = image.size
                    font_size = int(width / 50)
                    spacing = width / 200
                    stroke_width = int(width / 500)
                    padding = int(width / 100)
                    
                    if height > width:
                        ratio_x = 0.4
                        ratio_y1 = 0.7
                        ratio_y2 = 8.4
                        font_size = int(height / 50)
                        spacing = height / 200
                        stroke_width = int(height / 500)
                        padding = int(height / 100)
                    
                    if stroke_width < 1:
                        stroke_width = 1
                    
                    draw = ImageDraw.Draw(image, "RGB")
                    text = self.watermark_setting.text + "\n" + self.wm_text
                    font = ImageFont.truetype(self.watermark_setting.font, font_size)
                    
                    x = width * ratio_x
                    y1 = height * ratio_y1
                    y2 = (width / ratio_y2) + y1
                    
                    draw.rectangle((x, y1, width, y2), fill=wm_color)
                    draw.text((x + padding, y1 + padding), text, spacing=spacing, font=font, stroke_width=stroke_width, fill=self.watermark_setting.font_color, stroke_fill=self.watermark_setting.font_stroke_color)
                except Exception as e:
                    print(traceback.format_exc())
                    
            if not os.path.exists(cdnImgpath):
                os.makedirs(cdnImgpath)
            
            cdn_img_file = cdnImgpath + self.filename
            cdn_img_file = os.path.splitext(cdn_img_file)[0]
            image.save(cdn_img_file + ".webp", "WEBP", optimize=True)
            
            image.thumbnail(size=(150, 150))
            image.save(cdn_img_file + "_t.webp", "WEBP", optimize=True)
            image.close()
        except Exception as e:
            print(traceback.format_exc())
            image.close()
            if os.path.exists(img_file):
                os.remove(img_file)
            result = False
    elif result in ("HTTPError", "URLError", "ERROR-OTHER"):
        return "STOP"
    return result

Purpose: This method performs the following steps:
1. Create Directories: If the required directories don't exist, it creates them.
2. Download the Image: Uses a utility function (url.get_imageurl) to download the image from the provided URL.
3. Process the Image:
  - Opens the downloaded image.
  - Corrects the orientation using EXIF data.
  - Converts the image to RGB if it's not already in that mode.
  - Resizes the image to a maximum size of 1080x1080 pixels.
4. Add Watermark:
  - If watermarking is enabled, calculates the position, size, and other properties for the watermark.
  - Draws the watermark text on the image.
5. Save the Image:
  - Saves the processed image in both full-size and thumbnail versions in the specified directory.
6. Error Handling: If any exceptions occur, they are caught, and the image file is deleted if necessary. The method also handles various error codes like HTTPError, URLError, and others, returning "STOP" if an error is detected.

Summary

The imageThread class is a robust utility designed for handling image downloads, processing, and saving, including the application of watermarks. It is well-suited for tasks where images need to be dynamically processed and stored, such as in content management systems, e-commerce platforms, or any application dealing with large volumes of images.

This script is useful in environments where image processing needs to be automated, with careful attention to maintaining quality, applying watermarks, and ensuring images are stored in a structured and retrievable manner. The use of Pillow for image manipulation and the focus on error handling makes this script reliable and versatile.

ylwong

dlimg.py

This script is a more advanced implementation focusing on downloading and processing images related to property listings. It connects to a SQL Server database, fetches image URLs, applies watermarks, saves the images, and updates the database accordingly. Below is a detailed breakdown of the script:

1. Imports and Global Variables

import os
import time
from log import my_log
import yaml
import json
import pymssql
import uuid
import datetime
from util.image import imageThread
from urllib.parse import urlparse
import traceback

class WaterMark:
    enabled = 0
    text = ""
    font = ""
    font_color = ""
    font_stroke_color = ""
    background_color = ""
    transp_background_color = ""

Standard Libraries: The script imports standard libraries for file handling (os), time management (time), UUID generation (uuid), and date/time operations (datetime).
Custom Modules:
- my_log for logging.
- imageThread from a custom utility module (util.image) for image processing.
Database Connection: pymssql is used to connect to a Microsoft SQL Server database.
Traceback: For detailed error reporting.
WaterMark Class: Defines a class to hold watermark settings.

2. Configuration Loading

yaml_data = yaml.load(open("config.yaml"), Loader=yaml.FullLoader)
instance = yaml_data['instance']
User_Agent = yaml_data['User_Agent']['value']
host = yaml_data['mssql']['host']
user = yaml_data['mssql']['user']
pwd = yaml_data['mssql']['pwd']
database = yaml_data['mssql']['database']
port = yaml_data['mssql']['port']
cdnImg = yaml_data['cdnImg']['value']
cdnImgDir = yaml_data['cdnImgDir']['value']

YAML Configuration: Loads configuration data, including database connection details (host, user, pwd, etc.) and CDN image paths.

3. Database Connection Setup

conn = pymssql.connect(host=host, user=user, password=str(pwd), database=database, port=str(port), autocommit=True)
cursor = conn.cursor(as_dict=True)
quit = False

Connection: Establishes a connection to the SQL Server database.
Cursor: Sets up a cursor for executing SQL queries and fetching results.

4. SQL Queries

update_sql = '''UPDATE [dbo].[PropertyMarketImages] SET dstatus = 1 WHERE ID = %d;'''
delete_sql = '''DELETE FROM [dbo].[PropertyMarketImages] WHERE ID = %d;'''
watermark_setting = None

SQL Statements:
- update_sql: Marks an image as processed (dstatus = 1) after successful download and processing.
- delete_sql: Deletes an image record from the database if there is an error during processing.

5. Function: `setImgData()`

def setImgData():
    global conn, cursor, quit
    size = 30
    img_sql = '''select DISTINCT(ListingID) FROM PropertyMarketImages WHERE [ImageURL] IS NULL;'''

    sql = """INSERT INTO Jobs_Image (slugs) VALUES (%s) """
    iconn = pymssql.connect(host=host, user=user, password=str(pwd), database=database, port=str(port))
    icursor = iconn.cursor(as_dict=True)
    cursor.execute(img_sql)
    results = cursor.fetchmany(size)
    my_log.debug(len(results))
    i = 0
    while len(results) > 0:
        slugs = []
        i += 1
        for row in results:
            slugs.append(row['ListingID'])
        if len(slugs) > 0:
            json_data = json.dumps(slugs)
            icursor.execute(sql, json_data)
            iconn.commit()
        results = cursor.fetchmany(size)
    conn.close()
    iconn.close()

Purpose: This function retrieves listing IDs from the PropertyMarketImages table where the ImageURL is null, indicating that the images have not been processed yet. It then inserts these IDs into the Jobs_Image table for further processing.

6. Function: `get_watermark_config()`

def get_watermark_config():
    watermark_setting = WaterMark()
    yaml_data = yaml.load(open("config.yaml"), Loader=yaml.FullLoader)
    watermark_setting.enabled = yaml_data['Watermark']['enabled']
    watermark_setting.text = yaml_data['Watermark']['text']
    watermark_setting.font = yaml_data['Watermark']['font']
    watermark_setting.font_color = yaml_data['Watermark']['font_color']
    watermark_setting.font_stroke_color = yaml_data['Watermark']['font_stroke_color']
    watermark_setting.background_color = yaml_data['Watermark']['background_color']
    watermark_setting.transp_background_color = yaml_data['Watermark']['transp_background_color']
    return watermark_setting

Purpose: Loads the watermark settings from the YAML configuration file and returns a WaterMark object with these settings.

7. Function: `flImg(row=None)`

def flImg(row=None):
    global quit, cdnImg, cdnImgDir
    my_log.debug("DL: " + str(row['ID']))
    imgurl = row['srcImageURL']
    split = urlparse(imgurl).path
    ext = os.path.splitext(split)[1]
    fname = os.path.basename(urlparse(row['ImageURL']).path)
    pubID = row['ListingID'] + "/%s" % fname
    idate = row['ImportDate'].strftime('%Y-%m-%d')
    image = cdnImg + pubID
    try:
        rtn = imageThread(image_url=imgurl, lister=row['ListingID'], filename=fname, watermark_setting=watermark_setting, wm_text=row['wmtext'], idate=idate, cdnImgDir=cdnImgDir).run()
        if (rtn == True):
            cursor.execute(update_sql, (row['ID']))
        elif (rtn == "STOP"):
            my_log.critical("------ " + rtn + ":" + str(row['ID']) + ": " + imgurl + " ------")
            return rtn
        else:
            my_log.critical("------Error getting:" + str(row['ID']) + ": " + imgurl + " ------")
            cursor.execute(delete_sql, (row['ID']))
    except Exception as e:
        print(traceback.format_exc())
        my_log.critical("------Error getting: " + str(row['ID']) + imgurl + " ------")
        cursor.execute(delete_sql, (row['ID']))

    time.sleep(0.01)
    return 
    return "STOP"

Purpose:
- Downloads an image from the specified srcImageURL.
- Processes the image using imageThread, including applying any configured watermarks.
- Updates the database to mark the image as processed or deletes the record if there is an error.
- The function also logs errors and critical events.

8. Function: `_getdata()`

def _getdata():
    global conn, cursor, quit, watermark_setting
    watermark_setting = get_watermark_config()
    cursor.execute('SELECT TOP 50 * FROM dbo.PropertyMarketImages WHERE dstatus = 0 Order By id')
    results = cursor.fetchall()
    my_log.debug(len(results))
    while len(results) > 0 and not quit:
        for row in results:
            try:
                if (flImg(results.pop(0)) == "STOP"):
                    return
            except Exception as e:
                print(traceback.format_exc())
                my_log.critical("------" + str(e) + " while download image------")
        conn.commit()
        cursor.execute('SELECT TOP 50 * FROM dbo.PropertyMarketImages WHERE dstatus = 0 Order By id')
        results = cursor.fetchall()
        my_log.debug(len(results))

    conn.close()

Purpose:
- Fetches a batch of 50 images from the PropertyMarketImages table that have not yet been processed (dstatus = 0).
- For each image, it calls flImg() to download and process the image.
- The process continues until all images are processed or the quit flag is set to True.

9. Main Execution

if __name__ == '__main__':
    my_log()
    _getdata()

Purpose: When the script is run, it initializes logging and then begins processing images by calling the _getdata() function.

Summary

This script is designed to automate the processing of property listing images in a database. It handles downloading images, applying watermarks, saving the images, and updating the database to reflect the processed status. The script is structured to be robust with error handling, logging, and a clear separation of concerns through different functions.

The primary tasks it performs include:

Loading configuration and watermark settings.
Fetching unprocessed image records from the database.
Downloading and processing images, including applying watermarks.
Updating or deleting records in the database based on the success or failure of the image processing.

This script is useful in scenarios where large volumes of images need to be processed automatically, such as in real estate platforms, content management systems, or e-commerce sites.