Google Engines¶
Google API¶
SearXNG’s implementation of the Google API is mainly done in
get_google_info
.
For detailed description of the REST-full API see: Query Parameter Definitions. The linked API documentation can sometimes be helpful during reverse engineering. However, we cannot use it in the freely accessible WEB services; not all parameters can be applied and some engines are more special than other (e.g. Google News).
Google WEB¶
This is the implementation of the Google WEB engine. Some of this
implementations (manly the get_google_info
) are shared by other
engines:
- searx.engines.google.fetch_traits(engine_traits: EngineTraits, add_domains: bool = True)[source]¶
Fetch languages from Google.
- searx.engines.google.get_google_info(params, eng_traits)[source]¶
Composing various (language) properties for the google engines (Google API).
This function is called by the various google engines (Google WEB, Google Images, Google News and Google Videos).
- Parameters:
param (dict) – Request parameters of the engine. At least a
searxng_locale
key should be in the dictionary.eng_traits – Engine’s traits fetched from google preferences (
searx.enginelib.traits.EngineTraits
)
- Return type:
- Returns:
Py-Dictionary with the key/value pairs:
- language:
The language code that is used by google (e.g.
lang_en
orlang_zh-TW
)- country:
The country code that is used by google (e.g.
US
orTW
)- locale:
A instance of
babel.core.Locale
build from thesearxng_locale
value.- subdomain:
Google subdomain
google_domains
that fits to the country code.- params:
Py-Dictionary with additional request arguments (can be passed to
urllib.parse.urlencode()
).hl
parameter: specifies the interface language of user interface.lr
parameter: restricts search results to documents written in a particular language.cr
parameter: restricts search results to documents originating in a particular country.ie
parameter: sets the character encoding scheme that should be used to interpret the query string (‘utf8’).oe
parameter: sets the character encoding scheme that should be used to decode the XML result (‘utf8’).
- headers:
Py-Dictionary with additional HTTP headers (can be passed to request’s headers)
Accept: '*/*
- searx.engines.google.UI_ASYNC = 'use_ac:true,_fmt:prog'¶
Format of the response from UI’s async request.
Google Autocomplete¶
- searx.autocomplete.google_complete(query, sxng_locale)[source]¶
Autocomplete from Google. Supports Google’s languages and subdomains (
searx.engines.google.get_google_info
) by using the async REST API:https://{subdomain}/complete/search?{args}
Google Images¶
This is the implementation of the Google Images engine using the internal Google API used by the Google Go Android app.
This internal API offer results in
Google Videos¶
This is the implementation of the Google Videos engine.
Content-Security-Policy (CSP)
This engine needs to allow images from the data URLs (prefixed with the
data:
scheme):
Header set Content-Security-Policy "img-src 'self' data: ;"
Google News¶
This is the implementation of the Google News engine.
Google News has a different region handling compared to Google WEB.
the
ceid
argument has to be set (ceid_list
)the hl argument has to be set correctly (and different to Google WEB)
the gl argument is mandatory
If one of this argument is not set correctly, the request is redirected to CONSENT dialog:
https://consent.google.com/m?continue=
The google news API ignores some parameters from the common Google API:
num : the number of search results is ignored / there is no paging all results for a query term are in the first response.
save : is ignored / Google-News results are always SafeSearch
- searx.engines.google_news.ceid_list = ['AE:ar', 'AR:es-419', 'AT:de', 'AU:en', 'BD:bn', 'BE:fr', 'BE:nl', 'BG:bg', 'BR:pt-419', 'BW:en', 'CA:en', 'CA:fr', 'CH:de', 'CH:fr', 'CL:es-419', 'CN:zh-Hans', 'CO:es-419', 'CU:es-419', 'CZ:cs', 'DE:de', 'EG:ar', 'ES:es', 'ET:en', 'FR:fr', 'GB:en', 'GH:en', 'GR:el', 'HK:zh-Hant', 'HU:hu', 'ID:en', 'ID:id', 'IE:en', 'IL:en', 'IL:he', 'IN:bn', 'IN:en', 'IN:hi', 'IN:ml', 'IN:mr', 'IN:ta', 'IN:te', 'IT:it', 'JP:ja', 'KE:en', 'KR:ko', 'LB:ar', 'LT:lt', 'LV:en', 'LV:lv', 'MA:fr', 'MX:es-419', 'MY:en', 'NA:en', 'NG:en', 'NL:nl', 'NO:no', 'NZ:en', 'PE:es-419', 'PH:en', 'PK:en', 'PL:pl', 'PT:pt-150', 'RO:ro', 'RS:sr', 'RU:ru', 'SA:ar', 'SE:sv', 'SG:en', 'SI:sl', 'SK:sk', 'SN:fr', 'TH:th', 'TR:tr', 'TW:zh-Hant', 'TZ:en', 'UA:ru', 'UA:uk', 'UG:en', 'US:en', 'US:es-419', 'VE:es-419', 'VN:vi', 'ZA:en', 'ZW:en']¶
List of region/language combinations supported by Google News. Values of the
ceid
argument of the Google News REST API.
Google Scholar¶
This is the implementation of the Google Scholar engine.
Compared to other Google services the Scholar engine has a simple GET REST-API and there does not exists async API. Even though the API slightly vintage we can make use of the Google API to assemble the arguments of the GET request.
- searx.engines.google_scholar.detect_google_captcha(dom)[source]¶
In case of CAPTCHA Google Scholar open its own not a Robot dialog and is not redirected to
sorry.google.com
.
- searx.engines.google_scholar.parse_gs_a(text: Optional[str])[source]¶
Parse the text written in green.
Possible formats: * “{authors} - {journal}, {year} - {publisher}” * “{authors} - {year} - {publisher}” * “{authors} - {publisher}”
- searx.engines.google_scholar.time_range_args(params)[source]¶
Returns a dictionary with a time range arguments based on
params['time_range']
.Google Scholar supports a detailed search by year. Searching by last month or last week (as offered by SearXNG) is uncommon for scientific publications and is not supported by Google Scholar.
To limit the result list when the users selects a range, all the SearXNG ranges (day, week, month, year) are mapped to year. If no range is set an empty dictionary of arguments is returned. Example; when user selects a time range (current year minus one in 2022):
{ 'as_ylo' : 2021 }