Ces derniers temps, on peine à ne pas être blacklisté par les index de recherche qu’on interroge.
cf https://forum.arn-fai.net/t/topic/12162
Là c’est de nouveau le cas.
Ces derniers temps, on peine à ne pas être blacklisté par les index de recherche qu’on interroge.
cf https://forum.arn-fai.net/t/topic/12162
Là c’est de nouveau le cas.
J’ai mis à jour searxng et j’ai adapté la liste et la config des moteurs de recherche pour essayer de maximiser les chances d’avoir des résultats pertinents.
Ci-dessous la liste des moteurs configurés par défaut:
Bing est désactivé car il envoie des sites chinois OU des sites pour adulte au bout d’un moment. Google est désactivé car il renvoie une liste vide. Qwant idem (même en mode web-lite).
La mise à jour se fait avec un simple
yunohost app upgrade searxng
La configuration /var/www/searxng/settings.yml est la suivante:
general:
# Debug mode, only for development
debug: false
# displayed name
instance_name: "sans-nuage.fr"
# For example: https://example.com/privacy
privacypolicy_url: "https://arn-fai.net/fr/cgs"
# use true to use your own donation page written in searx/info/en/donate.md
# use false to disable the donation link
donation_url: "https://arn-fai.net/fr/asso/nous-soutenir"
# mailto:■■■■■■■■■■■■■■■■■■■
contact_url: "https://arn-fai.net/fr/contact"
# record stats
enable_metrics: true
brand:
new_issue_url: https://github.com/searxng/searxng/issues/new
docs_url: https://docs.searxng.org/
public_instances: https://searx.space
wiki_url: https://github.com/searxng/searxng/wiki
issue_url: https://github.com/searxng/searxng/issues
# custom:
# maintainer: "Alsace Réseau Neutre"
# Custom entries in the footer: [title]: [link]
# links:
# "Mentions légales": https://arn-fai.net/mentions
search:
# Filter results. 0: None, 1: Moderate, 2: Strict
safe_search: 1
# Existing autocomplete backends: "dbpedia", "duckduckgo", "google", "yandex",
# "seznam", "startpage", "swisscows", "qwant", "wikipedia" - leave blank to turn it off
# by default.
autocomplete: ""
# minimun characters to type before autocompleter starts
autocomplete_min: 4
# Default search language - leave blank to detect from browser information or
# use codes from 'languages.py'
default_lang: "auto"
# Available languages
# languages:
# - all
# - en
# - en-US
# - de
# - it-IT
# - fr
# - fr-BE
# ban time in seconds after engine errors
ban_time_on_fail: 5
# max ban time in seconds after engine errors
max_ban_time_on_fail: 120
suspended_times:
# Engine suspension time after error (in seconds; set to 0 to disable)
# For error "Access denied" and "HTTP error [402, 403]"
SearxEngineAccessDenied: 86400
# For error "CAPTCHA"
SearxEngineCaptcha: 86400
# For error "Too many request" and "HTTP error 429"
SearxEngineTooManyRequests: 3600
# Cloudflare CAPTCHA
cf_SearxEngineCaptcha: 1296000
cf_SearxEngineAccessDenied: 86400
# ReCAPTCHA
recaptcha_SearxEngineCaptcha: 604800
# remove format to deny access, use lower case.
# formats: [html, csv, json, rss]
formats:
- html
server:
# If you change port, bind_address or base_url don't forget to rebuild
# instance's environment (make buildenv)
port: 8888
bind_address: "127.0.0.1"
base_url: https://fouiner.sans-nuage.fr/ # Possible values: false or "https://example.org/location".
limiter: true # rate limit the number of request on the instance, block some bots
# If your instance owns a /etc/searxng/settings.yml file, then set the following
# values there.
secret_key: "********************************"
# Proxying image results through searx
image_proxy: true
public_instance: true
# 1.0 and 1.1 are supported
http_protocol_version: "1.0"
# POST queries are more secure as they don't show up in history but may cause
# problems when using Firefox containers
method: "POST"
default_http_headers:
X-Content-Type-Options: nosniff
X-XSS-Protection: 1; mode=block
X-Download-Options: noopen
X-Robots-Tag: noindex, nofollow
Referrer-Policy: no-referrer
valkey:
# https://redis-py.readthedocs.io/en/stable/connections.html#redis.client.Redis.from_url
url: "redis://@localhost:6379/1"
ui:
# Custom static path - leave it blank if you didn't change
static_path: ""
static_use_hash: false
# Custom templates path - leave it blank if you didn't change
templates_path: ""
# query_in_title: When true, the result page's titles contains the query
# it decreases the privacy, since the browser can records the page titles.
query_in_title: false
# infinite_scroll: When true, automatically loads the next page when scrolling to bottom of the current page.
infinite_scroll: false
# ui theme
default_theme: simple
# center the results ?
center_alignment: false
# URL prefix of the internet archive, don't forgett trailing slash (if needed).
# cache_url: "https://webcache.googleusercontent.com/search?q=cache:"
# Default interface locale - leave blank to detect from browser information or
# use codes from the 'locales' config section
default_locale: ""
# Open result links in a new tab by default
# results_on_new_tab: false
theme_args:
# style of simple theme: auto, light, dark
simple_style: auto
url_formatting: host
# Lock arbitrary settings on the preferences page. To find the ID of the user
# setting you want to lock, check the ID of the form on the page "preferences".
#
# preferences:
# lock:
# - language
# - autocomplete
# - method
# - query_in_title
# searx supports result proxification using an external service:
# https://github.com/asciimoo/morty uncomment below section if you have running
# morty proxy the key is base64 encoded (keep the !!binary notation)
# Note: since commit af77ec3, morty accepts a base64 encoded key.
#
# result_proxy:
# url: http://127.0.0.1:3000/
# # the key is a base64 encoded string, the YAML !!binary prefix is optional
# key: !!binary "your_morty_proxy_key"
# # [true|false] enable the "proxy" button next to each result
# proxify_results: true
# communication with search engines
#
outgoing:
# default timeout in seconds, can be override by engine
request_timeout: 3.0
# the maximum timeout in seconds
# max_request_timeout: 10.0
# suffix of searx_useragent, could contain information like an email address
# to the administrator
useragent_suffix: ""
# The maximum number of concurrent connections that may be established.
pool_connections: 100
# Allow the connection pool to maintain keep-alive connections below this
# point.
pool_maxsize: 20
# See https://www.python-httpx.org/http2/
enable_http2: true
# uncomment below section if you want to use a custom server certificate
# see https://www.python-httpx.org/advanced/#changing-the-verification-defaults
# and https://www.python-httpx.org/compatibility/#ssl-configuration
# verify: ~/.mitmproxy/mitmproxy-ca-cert.cer
#
# uncomment below section if you want to use a proxyq see: SOCKS proxies
# https://2.python-requests.org/en/latest/user/advanced/#proxies
# are also supported: see
# https://2.python-requests.org/en/latest/user/advanced/#socks
#
# proxies:
# all://:
# - http://proxy1:8080
# - http://proxy2:8080
#
# using_tor_proxy: true
#
# Extra seconds to add in order to account for the time taken by the proxy
#
# extra_proxy_timeout: 10.0
#
# uncomment below section only if you have more than one network interface
# which can be the source of outgoing search requests
#
# source_ips:
# - 1.1.1.1
# - 1.1.1.2
# - fe80::/126
# External plugin configuration, for more details see
# https://docs.searxng.org/dev/plugins.html
#
# plugins:
# - plugin1
# - plugin2
# - ...
# Comment or un-comment plugin to activate / deactivate by default.
#
# enabled_plugins:
# # these plugins are enabled if nothing is configured ..
# - 'Hash plugin'
# - 'Search on category select'
# - 'Self Information'
# - 'Tracker URL remover'
# - 'Ahmia blacklist' # activation depends on outgoing.using_tor_proxy
# # these plugins are disabled if nothing is configured ..
# - 'Hostname replace' # see hostname_replace configuration below
# - 'Open Access DOI rewrite'
# - 'Vim-like hotkeys'
# - 'Tor check plugin'
# # Read the docs before activate: auto-detection of the language could be
# # detrimental to users expectations / users can activate the plugin in the
# # preferences if they want.
# - 'Autodetect search language'
plugins:
searx.plugins.calculator.SXNGPlugin:
active: true
searx.plugins.infinite_scroll.SXNGPlugin:
active: false
searx.plugins.hash_plugin.SXNGPlugin:
active: true
searx.plugins.self_info.SXNGPlugin:
active: true
searx.plugins.unit_converter.SXNGPlugin:
active: true
searx.plugins.ahmia_filter.SXNGPlugin:
active: true
searx.plugins.hostnames.SXNGPlugin:
active: true
searx.plugins.time_zone.SXNGPlugin:
active: true
searx.plugins.oa_doi_rewrite.SXNGPlugin:
active: false
searx.plugins.tor_check.SXNGPlugin:
active: false
searx.plugins.tracker_url_remover.SXNGPlugin:
active: true
hostnames:
replace:
'(.*\.)?youtube\.com$': 'yewtu.be'
'(.*\.)?youtu\.be$': 'yewtu.be'
'(.*\.)?youtube-noocookie\.com$': 'yewtu.be'
# '(.*\.)?reddit\.com$': 'teddit.example.com'
# '(.*\.)?redd\.it$': 'teddit.example.com'
# '(www\.)?twitter\.com$': 'nitter.d420.de'
# remove:
# - '(.*\.)?facebook.com$'
# Configuration of the "Hostname replace" plugin:
#
# hostname_replace:
# '(.*\.)?youtube\.com$': 'invidious.example.com'
# '(.*\.)?youtu\.be$': 'invidious.example.com'
# '(.*\.)?youtube-noocookie\.com$': 'yotter.example.com'
# '(.*\.)?reddit\.com$': 'teddit.example.com'
# '(.*\.)?redd\.it$': 'teddit.example.com'
# '(www\.)?twitter\.com$': 'nitter.example.com'
# # to remove matching host names from result list, set value to false
# 'spam\.example\.com': false
checker:
# disable checker when in debug mode
off_when_debug: true
# use "scheduling: false" to disable scheduling
# scheduling: interval or int
# to activate the scheduler:
# * uncomment "scheduling" section
# * add "cache2 = name=searxngcache,items=2000,blocks=2000,blocksize=4096,bitmap=1"
# to your uwsgi.ini
# scheduling:
# start_after: [300, 1800] # delay to start the first run of the checker
# every: [86400, 90000] # how often the checker runs
# additional tests: only for the YAML anchors (see the engines section)
#
additional_tests:
rosebud: &test_rosebud
matrix:
query: rosebud
lang: en
result_container:
- not_empty
- ['one_title_contains', 'citizen kane']
test:
- unique_results
android: &test_android
matrix:
query: ['android']
lang: ['en', 'de', 'fr', 'zh-CN']
result_container:
- not_empty
- ['one_title_contains', 'google']
test:
- unique_results
# tests: only for the YAML anchors (see the engines section)
tests:
infobox: &tests_infobox
infobox:
matrix:
query: ["linux", "new york", "bbc"]
result_container:
- has_infobox
categories_as_tabs:
general:
images:
videos:
engines:
- name: brave
engine: brave
shortcut: br
time_range_support: true
paging: true
categories: [general, web]
brave_category: search
brave_spellcheck: false
disabled: false
- name: brave.images
engine: brave
network: brave
shortcut: brimg
categories: [images, web]
brave_category: images
disabled: false
- name: brave.videos
engine: brave
network: brave
shortcut: brvid
categories: [videos, web]
brave_category: videos
- name : wikipedia
engine : wikipedia
shortcut : wp
base_url : 'https://{language}.wikipedia.org/'
disabled: true
- name : bing
engine : bing
shortcut : bi
disabled : true
- name : bing images
engine : bing_images
shortcut : bii
disabled : true
- name : duckduckgo
engine : duckduckgo
shortcut : ddg
disabled : false
- name: duckduckgo images
engine: duckduckgo_extra
categories: [images, web]
ddg_category: images
shortcut: ddi
disabled: true
- name : google
engine : google
shortcut : go
# additional_tests:
# android: *test_android
disabled : true
- name: mullvadleta
engine: mullvad_leta
disabled: true
leta_engine: google
categories: [general, web]
shortcut: ml
- name : google images
engine : google_images
shortcut : goi
# additional_tests:
# android: *test_android
# dali:
# matrix:
# query: ['Dali Christ']
# lang: ['en', 'de', 'fr', 'zh-CN']
# result_container:
# - ['one_title_contains', 'Salvador']
disabled : false
- name : invidious
engine : invidious
base_url :
- https://inv.nadeko.net/
shortcut: iv
timeout : 5.0
# disabled : True
- name : openstreetmap
engine : openstreetmap
shortcut : osm
disabled: true
- name : qwant
qwant_categ: web-lite
engine : qwant
shortcut : qw
categories : general
disabled : true
additional_tests:
rosebud: *test_rosebud
- name: qwant images
qwant_categ: images
engine: qwant
shortcut: qwi
categories: [images, web]
network: qwant
# - name : qwant images
# engine : qwant
# shortcut : qwi
# categories : images
disabled : false
- name: sepiasearch
engine: sepiasearch
shortcut: sep
- name : startpage
engine : startpage
shortcut : sp
startpage_categ: web
additional_tests:
rosebud: *test_rosebud
disabled: false
- name : unsplash
engine : unsplash
# disabled: True
shortcut : us
- name : yahoo
engine : yahoo
shortcut : yh
disabled : true
- name: mojeek
shortcut: mjk
engine: mojeek
categories: [general, web]
disabled: false
- name: mojeek images
shortcut: mjkimg
engine: mojeek
categories: [images, web]
search_type: images
paging: false
disabled: false
# Doku engine lets you access to any Doku wiki instance:
# A public one or a privete/corporate one.
# - name: ubuntuwiki
# engine: doku
# shortcut: uw
# base_url: 'https://doc.ubuntu-fr.org'
# Be careful when enabling this engine if you are
# running a public instance. Do not expose any sensitive
# information. You can restrict access by configuring a list
# of access tokens under tokens.
# - name: git grep
# engine: command
# command: ['git', 'grep', '{{QUERY}}']
# shortcut: gg
# tokens: []
# disabled: true
# delimiter:
# chars: ':'
# keys: ['filepath', 'code']
# Be careful when enabling this engine if you are
# running a public instance. Do not expose any sensitive
# information. You can restrict access by configuring a list
# of access tokens under tokens.
# - name: locate
# engine: command
# command: ['locate', '{{QUERY}}']
# shortcut: loc
# tokens: []
# disabled: true
# delimiter:
# chars: ' '
# keys: ['line']
# Be careful when enabling this engine if you are
# running a public instance. Do not expose any sensitive
# information. You can restrict access by configuring a list
# of access tokens under tokens.
# - name: find
# engine: command
# command: ['find', '.', '-name', '{{QUERY}}']
# query_type: path
# shortcut: fnd
# tokens: []
# disabled: true
# delimiter:
# chars: ' '
# keys: ['line']
# Be careful when enabling this engine if you are
# running a public instance. Do not expose any sensitive
# information. You can restrict access by configuring a list
# of access tokens under tokens.
# - name: pattern search in files
# engine: command
# command: ['fgrep', '{{QUERY}}']
# shortcut: fgr
# tokens: []
# disabled: true
# delimiter:
# chars: ' '
# keys: ['line']
# Be careful when enabling this engine if you are
# running a public instance. Do not expose any sensitive
# information. You can restrict access by configuring a list
# of access tokens under tokens.
# - name: regex search in files
# engine: command
# command: ['grep', '{{QUERY}}']
# shortcut: gr
# tokens: []
# disabled: true
# delimiter:
# chars: ' '
# keys: ['line']
doi_resolvers:
oadoi.org: 'https://oadoi.org/'
doi.org: 'https://doi.org/'
doai.io: 'https://dissem.in/'
sci-hub.se: 'https://sci-hub.se/'
sci-hub.st: 'https://sci-hub.st/'
sci-hub.ru: 'https://sci-hub.ru/'
default_doi_resolver: 'oadoi.org'