使用 c++ 或 Qt 将 SQL LIKE 表达式转换为正则表达式

Convert SQL LIKE expression to regex with c++ or Qt

本文关键字:表达式 转换 正则表达式 LIKE SQL c++ Qt 使用      更新时间:2023-10-16

>我有这样的查询:

SELECT a, b, c
FROM T
WHERE ( a LIKE 'f%' OR b LIKE 'f%' OR c LIKE 'f%' ) 

当然,原始查询使用更复杂的过滤器更复杂。

我需要确定哪个字段与a, b or c
匹配基于此,我使用 Qt a 可以将结果与正则表达式进行比较。

但是如何将LIKE中的表达式转换为正则表达式。

在本文中提出了部分解决方案。但可能存在更全面的例子?

PS:(在评论中回答问题)。

  • 我处理不同的DBMS(PostgreSQL,SQL Server,MySQL),并非所有DBMS都有内置的正则表达式支持
  • 我想找到匹配的,例如:

    const QString valueA( GetValueFromQuery() );
    if( valueA.contains( QRegExp( _CONVERTED_EXPRESSION_ )) {}

  • 在链接的帖子中描述了手工制作的解决方案。我想找到其他经过验证的解决方案。

虽然我仍然更喜欢通过将LIKE部分的结果添加到结果集中来解决,但这可能会满足您的需求。

此代码仍可能包含错误且未优化:

#include <algorithm>
#include <string>
#include <map>
#include <iostream>
#include <sstream>
std::string& replace_all(std::string& str, const std::string& old_value, const std::string& new_value)
{
    std::string::size_type pos = 0;
    while ((pos = str.find(old_value, pos)) != std::string::npos)
    {
        str.replace(pos, old_value.size(), new_value);
        pos += new_value.size() - old_value.size() + 1;
    }
    return str;
}
std::map<std::string, std::string> extractCharacterRanges(std::string& str)
{
    std::map<std::string, std::string> ranges;
    int rangeID = 0;
    std::string::size_type startPos = 0;
    std::string::size_type endPos = 0;
    while ((startPos = str.find("[", startPos)) != std::string::npos && (endPos = str.find("]", startPos + 1)) != std::string::npos)
    {
        std::stringstream ss;
        ss << "[[" << rangeID << "]]";
        std::string chars = str.substr(startPos + 1, endPos - startPos - 1);
        str.replace(startPos, chars.size() + 2, ss.str());
        rangeID++;
        startPos += ss.str().size();
        replace_all(chars, "[", "\[");
        replace_all(chars, "]", "\]");
        ranges[ss.str()] = "[" + chars + "]";
    }
    int open = 0;
    std::string::size_type searchPos = 0;
    startPos = 0; endPos = 0;
    do
    {
        startPos = str.find("[", searchPos);
        endPos = str.find("]", searchPos);
        if (startPos == std::string::npos && endPos == std::string::npos)
            break;
        if (startPos < endPos || endPos == std::string::npos)
        {
            open++;
            searchPos = startPos + 1;
        }
        else
        {
            if (open <= 0)
            {
                str.replace(endPos, 1, "\]");
                searchPos = endPos + 2;
            }
            else
            {
                open--;
                searchPos = endPos + 1;
            }
        }
    } while (searchPos < str.size());
    return ranges;
}

std::string sqllike_to_regex(std::string sqllike)
{
    replace_all(sqllike, ".", "\.");
    replace_all(sqllike, "^", "\^");
    replace_all(sqllike, "$", "\$");
    replace_all(sqllike, "+", "\+");
    replace_all(sqllike, "?", "\?");
    replace_all(sqllike, "(", "\(");
    replace_all(sqllike, ")", "\)");
    replace_all(sqllike, "{", "\{");
    replace_all(sqllike, "}", "\}");
    replace_all(sqllike, "\", "\\");
    replace_all(sqllike, "|", "\|");
    replace_all(sqllike, ".", "\.");
    replace_all(sqllike, "*", "\*");
    std::map<std::string, std::string> ranges = extractCharacterRanges(sqllike); //Escapes [ and ] where necessary
    replace_all(sqllike, "%", ".*");
    replace_all(sqllike, "_", ".");
    for (auto& range : ranges)
    {
        replace_all(sqllike, range.first, range.second);
    }
    return "^" + sqllike + "$";
}

int main() {
    std::cout << sqllike_to_regex("f%") << std::endl;//^f.*$
    std::cout << sqllike_to_regex("[A-Z]%") << std::endl;//^[A-Z].*$
    std::cout << sqllike_to_regex("[[A-Z][asd]]") << std::endl;//^[[A - Z][asd]]$
    std::cout << sqllike_to_regex("a]a") << std::endl;//^a]a$
    std::cout << sqllike_to_regex("[%] [[] ] % [_] _") << std::endl;//^[%] [[] ] .* [_] .$
    return 0;
}

您可以重复 LIKE 以获取匹配标志,也可以将其包装到派生表中:

SELECT a, b, c, 
   case when a LIKE 'f%' then 1 else 0 end as a_matched,
   case when b LIKE 'f%' then 1 else 0 end as b_matched,
   case when c LIKE 'f%' then 1 else 0 end as b_matched
FROM T
WHERE ( a LIKE 'f%' OR b LIKE 'f%' OR c LIKE 'f%' ) 

SELECT *
FROM
 (
   SELECT a, b, c, 
      case when a LIKE 'f%' then 1 else 0 end as a_matched,
      case when b LIKE 'f%' then 1 else 0 end as b_matched,
      case when c LIKE 'f%' then 1 else 0 end as b_matched
   FROM T
 ) dt
WHERE ( a_matched OR b_matched OR c_matched ) 

两者都应该以相同的方式进行优化,当然您最好检查实际计划。