10/12/2025 à 00:05 - Le moteur de recherche ne donne plus de résultat

Ces derniers temps, on peine à ne pas être blacklisté par les index de recherche qu’on interroge.
cf https://forum.arn-fai.net/t/topic/12162

Là c’est de nouveau le cas.

J’ai mis à jour searxng et j’ai adapté la liste et la config des moteurs de recherche pour essayer de maximiser les chances d’avoir des résultats pertinents.

Ci-dessous la liste des moteurs configurés par défaut:

  • brave
  • duckduckgo
  • startpage
  • mojeek

Bing est désactivé car il envoie des sites chinois OU des sites pour adulte au bout d’un moment. Google est désactivé car il renvoie une liste vide. Qwant idem (même en mode web-lite).

Résumé

La mise à jour se fait avec un simple

yunohost app upgrade searxng

La configuration /var/www/searxng/settings.yml est la suivante:

general:
  # Debug mode, only for development
  debug: false
  # displayed name
  instance_name: "sans-nuage.fr"
  # For example: https://example.com/privacy
  privacypolicy_url: "https://arn-fai.net/fr/cgs"
  # use true to use your own donation page written in searx/info/en/donate.md
  # use false to disable the donation link
  donation_url: "https://arn-fai.net/fr/asso/nous-soutenir"
  # mailto:■■■■■■■■■■■■■■■■■■■
  contact_url: "https://arn-fai.net/fr/contact"
  # record stats
  enable_metrics: true

brand:
  new_issue_url: https://github.com/searxng/searxng/issues/new
  docs_url: https://docs.searxng.org/
  public_instances: https://searx.space
  wiki_url: https://github.com/searxng/searxng/wiki
  issue_url: https://github.com/searxng/searxng/issues
#  custom:
#    maintainer: "Alsace Réseau Neutre"
    # Custom entries in the footer: [title]: [link]
#    links:
#      "Mentions légales": https://arn-fai.net/mentions

search:
  # Filter results. 0: None, 1: Moderate, 2: Strict
  safe_search: 1
  # Existing autocomplete backends: "dbpedia", "duckduckgo", "google", "yandex",
  # "seznam", "startpage", "swisscows", "qwant", "wikipedia" - leave blank to turn it off
  # by default.
  autocomplete: ""
  # minimun characters to type before autocompleter starts
  autocomplete_min: 4
  # Default search language - leave blank to detect from browser information or
  # use codes from 'languages.py'
  default_lang: "auto"
  # Available languages
  # languages:
  #   - all
  #   - en
  #   - en-US
  #   - de
  #   - it-IT
  #   - fr
  #   - fr-BE
  # ban time in seconds after engine errors
  ban_time_on_fail: 5
  # max ban time in seconds after engine errors
  max_ban_time_on_fail: 120
  suspended_times:
    # Engine suspension time after error (in seconds; set to 0 to disable)
    # For error "Access denied" and "HTTP error [402, 403]"
    SearxEngineAccessDenied: 86400
    # For error "CAPTCHA"
    SearxEngineCaptcha: 86400
    # For error "Too many request" and "HTTP error 429"
    SearxEngineTooManyRequests: 3600
    # Cloudflare CAPTCHA
    cf_SearxEngineCaptcha: 1296000
    cf_SearxEngineAccessDenied: 86400
    # ReCAPTCHA
    recaptcha_SearxEngineCaptcha: 604800

  # remove format to deny access, use lower case.
  # formats: [html, csv, json, rss]
  formats:
    - html

server:
  # If you change port, bind_address or base_url don't forget to rebuild
  # instance's environment (make buildenv)
  port: 8888
  bind_address: "127.0.0.1"
  base_url: https://fouiner.sans-nuage.fr/  # Possible values: false or "https://example.org/location".
  limiter: true  # rate limit the number of request on the instance, block some bots

  # If your instance owns a /etc/searxng/settings.yml file, then set the following
  # values there.

  secret_key: "********************************"
  # Proxying image results through searx
  image_proxy: true
  public_instance: true
  # 1.0 and 1.1 are supported
  http_protocol_version: "1.0"
  # POST queries are more secure as they don't show up in history but may cause
  # problems when using Firefox containers
  method: "POST"
  default_http_headers:
    X-Content-Type-Options: nosniff
    X-XSS-Protection: 1; mode=block
    X-Download-Options: noopen
    X-Robots-Tag: noindex, nofollow
    Referrer-Policy: no-referrer

valkey:
  # https://redis-py.readthedocs.io/en/stable/connections.html#redis.client.Redis.from_url
  url: "redis://@localhost:6379/1"

ui:
  # Custom static path - leave it blank if you didn't change
  static_path: ""
  static_use_hash: false
  # Custom templates path - leave it blank if you didn't change
  templates_path: ""
  # query_in_title: When true, the result page's titles contains the query
  # it decreases the privacy, since the browser can records the page titles.
  query_in_title: false
  # infinite_scroll: When true, automatically loads the next page when scrolling to bottom of the current page.
  infinite_scroll: false
  # ui theme
  default_theme: simple
  # center the results ?
  center_alignment: false
  # URL prefix of the internet archive, don't forgett trailing slash (if needed).
  # cache_url: "https://webcache.googleusercontent.com/search?q=cache:"
  # Default interface locale - leave blank to detect from browser information or
  # use codes from the 'locales' config section
  default_locale: ""
  # Open result links in a new tab by default
  # results_on_new_tab: false
  theme_args:
    # style of simple theme: auto, light, dark
    simple_style: auto
  url_formatting: host

# Lock arbitrary settings on the preferences page.  To find the ID of the user
# setting you want to lock, check the ID of the form on the page "preferences".
#
# preferences:
#   lock:
#     - language
#     - autocomplete
#     - method
#     - query_in_title

# searx supports result proxification using an external service:
# https://github.com/asciimoo/morty uncomment below section if you have running
# morty proxy the key is base64 encoded (keep the !!binary notation)
# Note: since commit af77ec3, morty accepts a base64 encoded key.
#
# result_proxy:
#   url: http://127.0.0.1:3000/
#   # the key is a base64 encoded string, the YAML !!binary prefix is optional
#   key: !!binary "your_morty_proxy_key"
#   # [true|false] enable the "proxy" button next to each result
#   proxify_results: true

# communication with search engines
#
outgoing:
  # default timeout in seconds, can be override by engine
  request_timeout: 3.0
  # the maximum timeout in seconds
  # max_request_timeout: 10.0
  # suffix of searx_useragent, could contain information like an email address
  # to the administrator
  useragent_suffix: ""
  # The maximum number of concurrent connections that may be established.
  pool_connections: 100
  # Allow the connection pool to maintain keep-alive connections below this
  # point.
  pool_maxsize: 20
  # See https://www.python-httpx.org/http2/
  enable_http2: true
  # uncomment below section if you want to use a custom server certificate
  # see https://www.python-httpx.org/advanced/#changing-the-verification-defaults
  # and https://www.python-httpx.org/compatibility/#ssl-configuration
  #  verify: ~/.mitmproxy/mitmproxy-ca-cert.cer
  #
  # uncomment below section if you want to use a proxyq see: SOCKS proxies
  #   https://2.python-requests.org/en/latest/user/advanced/#proxies
  # are also supported: see
  #   https://2.python-requests.org/en/latest/user/advanced/#socks
  #
  #  proxies:
  #    all://:
  #      - http://proxy1:8080
  #      - http://proxy2:8080
  #
  #  using_tor_proxy: true
  #
  # Extra seconds to add in order to account for the time taken by the proxy
  #
  #  extra_proxy_timeout: 10.0
  #
  # uncomment below section only if you have more than one network interface
  # which can be the source of outgoing search requests
  #
  #  source_ips:
  #    - 1.1.1.1
  #    - 1.1.1.2
  #    - fe80::/126

# External plugin configuration, for more details see
#   https://docs.searxng.org/dev/plugins.html
#
# plugins:
#   - plugin1
#   - plugin2
#   - ...

# Comment or un-comment plugin to activate / deactivate by default.
#
# enabled_plugins:
#   # these plugins are enabled if nothing is configured ..
#   - 'Hash plugin'
#   - 'Search on category select'
#   - 'Self Information'
#   - 'Tracker URL remover'
#   - 'Ahmia blacklist'  # activation depends on outgoing.using_tor_proxy
#   # these plugins are disabled if nothing is configured ..
#   - 'Hostname replace'  # see hostname_replace configuration below
#   - 'Open Access DOI rewrite'
#   - 'Vim-like hotkeys'
#   - 'Tor check plugin'
#   # Read the docs before activate: auto-detection of the language could be
#   # detrimental to users expectations / users can activate the plugin in the
#   # preferences if they want.
#   - 'Autodetect search language'

plugins:
  searx.plugins.calculator.SXNGPlugin:
    active: true

  searx.plugins.infinite_scroll.SXNGPlugin:
    active: false

  searx.plugins.hash_plugin.SXNGPlugin:
    active: true

  searx.plugins.self_info.SXNGPlugin:
    active: true

  searx.plugins.unit_converter.SXNGPlugin:
    active: true

  searx.plugins.ahmia_filter.SXNGPlugin:
    active: true

  searx.plugins.hostnames.SXNGPlugin:
    active: true

  searx.plugins.time_zone.SXNGPlugin:
    active: true

  searx.plugins.oa_doi_rewrite.SXNGPlugin:
    active: false

  searx.plugins.tor_check.SXNGPlugin:
    active: false

  searx.plugins.tracker_url_remover.SXNGPlugin:
    active: true
hostnames:
  replace:
    '(.*\.)?youtube\.com$':           'yewtu.be'
    '(.*\.)?youtu\.be$':              'yewtu.be'
    '(.*\.)?youtube-noocookie\.com$': 'yewtu.be'
#   '(.*\.)?reddit\.com$':            'teddit.example.com'
#   '(.*\.)?redd\.it$':               'teddit.example.com'
#  '(www\.)?twitter\.com$':          'nitter.d420.de'
#  remove:
#    - '(.*\.)?facebook.com$'

# Configuration of the "Hostname replace" plugin:
#
# hostname_replace:
#   '(.*\.)?youtube\.com$': 'invidious.example.com'
#   '(.*\.)?youtu\.be$': 'invidious.example.com'
#   '(.*\.)?youtube-noocookie\.com$': 'yotter.example.com'
#   '(.*\.)?reddit\.com$': 'teddit.example.com'
#   '(.*\.)?redd\.it$': 'teddit.example.com'
#   '(www\.)?twitter\.com$': 'nitter.example.com'
#   # to remove matching host names from result list, set value to false
#   'spam\.example\.com': false

checker:
  # disable checker when in debug mode
  off_when_debug: true

  # use "scheduling: false" to disable scheduling
  # scheduling: interval or int

  # to activate the scheduler:
  # * uncomment "scheduling" section
  # * add "cache2 = name=searxngcache,items=2000,blocks=2000,blocksize=4096,bitmap=1"
  #   to your uwsgi.ini

  # scheduling:
  #   start_after: [300, 1800]  # delay to start the first run of the checker
  #   every: [86400, 90000]     # how often the checker runs

  # additional tests: only for the YAML anchors (see the engines section)
  #
  additional_tests:
    rosebud: &test_rosebud
      matrix:
        query: rosebud
        lang: en
      result_container:
        - not_empty
        - ['one_title_contains', 'citizen kane']
      test:
        - unique_results

    android: &test_android
      matrix:
        query: ['android']
        lang: ['en', 'de', 'fr', 'zh-CN']
      result_container:
        - not_empty
        - ['one_title_contains', 'google']
      test:
        - unique_results

  # tests: only for the YAML anchors (see the engines section)
  tests:
    infobox: &tests_infobox
      infobox:
        matrix:
          query: ["linux", "new york", "bbc"]
        result_container:
          - has_infobox

categories_as_tabs:
  general:
  images:
  videos:

engines:

  - name: brave
    engine: brave
    shortcut: br
    time_range_support: true
    paging: true
    categories: [general, web]
    brave_category: search
    brave_spellcheck: false
    disabled: false

  - name: brave.images
    engine: brave
    network: brave
    shortcut: brimg
    categories: [images, web]
    brave_category: images
    disabled: false

  - name: brave.videos
    engine: brave
    network: brave
    shortcut: brvid
    categories: [videos, web]
    brave_category: videos
  - name : wikipedia
    engine : wikipedia
    shortcut : wp
    base_url : 'https://{language}.wikipedia.org/'
    disabled: true

  - name : bing
    engine : bing
    shortcut : bi
    disabled : true

  - name : bing images
    engine : bing_images
    shortcut : bii
    disabled : true

  - name : duckduckgo
    engine : duckduckgo
    shortcut : ddg
    disabled : false

  - name: duckduckgo images
    engine: duckduckgo_extra
    categories: [images, web]
    ddg_category: images
    shortcut: ddi
    disabled: true

  - name : google
    engine : google
    shortcut : go
    # additional_tests:
    #   android: *test_android
    disabled : true
  - name: mullvadleta
    engine: mullvad_leta
    disabled: true
    leta_engine: google
    categories: [general, web]
    shortcut: ml

  - name : google images
    engine : google_images
    shortcut : goi
    # additional_tests:
    #   android: *test_android
    #   dali:
    #     matrix:
    #       query: ['Dali Christ']
    #       lang: ['en', 'de', 'fr', 'zh-CN']
    #     result_container:
    #       - ['one_title_contains', 'Salvador']
    disabled : false

  - name : invidious
    engine : invidious
    base_url :
      - https://inv.nadeko.net/
    shortcut: iv
    timeout : 5.0
    #    disabled : True

  - name : openstreetmap
    engine : openstreetmap
    shortcut : osm
    disabled: true

  - name : qwant
    qwant_categ: web-lite
    engine : qwant
    shortcut : qw
    categories : general
    disabled : true
    additional_tests:
      rosebud: *test_rosebud


  - name: qwant images
    qwant_categ: images
    engine: qwant
    shortcut: qwi
    categories: [images, web]
    network: qwant
#  - name : qwant images
#    engine : qwant
#    shortcut : qwi
#    categories : images
    disabled : false

  - name: sepiasearch
    engine: sepiasearch
    shortcut: sep

  - name : startpage
    engine : startpage
    shortcut : sp
    startpage_categ: web
    additional_tests:
      rosebud: *test_rosebud
    disabled: false

  - name : unsplash
    engine : unsplash
    #    disabled: True
    shortcut : us

  - name : yahoo
    engine : yahoo
    shortcut : yh
    disabled : true

  - name: mojeek
    shortcut: mjk
    engine: mojeek
    categories: [general, web]
    disabled: false

  - name: mojeek images
    shortcut: mjkimg
    engine: mojeek
    categories: [images, web]
    search_type: images
    paging: false
    disabled: false

# Doku engine lets you access to any Doku wiki instance:
# A public one or a privete/corporate one.
#  - name: ubuntuwiki
#    engine: doku
#    shortcut: uw
#    base_url: 'https://doc.ubuntu-fr.org'

# Be careful when enabling this engine if you are
# running a public instance. Do not expose any sensitive
# information. You can restrict access by configuring a list
# of access tokens under tokens.
#  - name: git grep
#    engine: command
#    command: ['git', 'grep', '{{QUERY}}']
#    shortcut: gg
#    tokens: []
#    disabled: true
#    delimiter:
#        chars: ':'
#        keys: ['filepath', 'code']

# Be careful when enabling this engine if you are
# running a public instance. Do not expose any sensitive
# information. You can restrict access by configuring a list
# of access tokens under tokens.
#  - name: locate
#    engine: command
#    command: ['locate', '{{QUERY}}']
#    shortcut: loc
#    tokens: []
#    disabled: true
#    delimiter:
#        chars: ' '
#        keys: ['line']

# Be careful when enabling this engine if you are
# running a public instance. Do not expose any sensitive
# information. You can restrict access by configuring a list
# of access tokens under tokens.
#  - name: find
#    engine: command
#    command: ['find', '.', '-name', '{{QUERY}}']
#    query_type: path
#    shortcut: fnd
#    tokens: []
#    disabled: true
#    delimiter:
#        chars: ' '
#        keys: ['line']

# Be careful when enabling this engine if you are
# running a public instance. Do not expose any sensitive
# information. You can restrict access by configuring a list
# of access tokens under tokens.
#  - name: pattern search in files
#    engine: command
#    command: ['fgrep', '{{QUERY}}']
#    shortcut: fgr
#    tokens: []
#    disabled: true
#    delimiter:
#        chars: ' '
#        keys: ['line']

# Be careful when enabling this engine if you are
# running a public instance. Do not expose any sensitive
# information. You can restrict access by configuring a list
# of access tokens under tokens.
#  - name: regex search in files
#    engine: command
#    command: ['grep', '{{QUERY}}']
#    shortcut: gr
#    tokens: []
#    disabled: true
#    delimiter:
#        chars: ' '
#        keys: ['line']

doi_resolvers:
  oadoi.org: 'https://oadoi.org/'
  doi.org: 'https://doi.org/'
  doai.io: 'https://dissem.in/'
  sci-hub.se: 'https://sci-hub.se/'
  sci-hub.st: 'https://sci-hub.st/'
  sci-hub.ru: 'https://sci-hub.ru/'

default_doi_resolver: 'oadoi.org'