Don't argparse read unicode from commandline?

吃可爱长大的小学妹 提交于 2019-12-03 08:20:12


Running Python 2.7

When executing:

$ python get_emails -a "åäö"

I get:

usage: get_emails [-h] [-a AREA] [-t {rfc2822,plain}] get_emails: error: argument -a/--area: invalid unicode value: '\xc3\xa5\xc3\xa4\xc3\xb6'

This is my parser:

def _argparse():
    desc = """
           Simple CLI-client for...
    argparser = argparse.ArgumentParser(description=desc)
    subparsers = argparser.add_subparsers(dest='command')

    # create the parser for the "get_emails" command
    parser_get_emails = subparsers.add_parser('get_emails', help=u'Get email list')
    parser_get_emails.add_argument('-a', '--area', type=unicode, help='Limit to area')
    parser_get_emails.add_argument('-t', '--out_type', choices=['rfc2822', 'plain'],
                                   default='rfc2822', help='Type of output')

    args = argparser.parse_args()
    return args

Does this mean I can't use any unicode characters with python argparse module?


You can try

type=lambda s: unicode(s, 'utf8')

instead of


Without encoding argument unicode() defaults to ascii.


The command-line arguments are encoded using sys.getfilesystemencoding():

import sys

def commandline_arg(bytestring):
    unicode_string = bytestring.decode(sys.getfilesystemencoding())
    return unicode_string

# ...
parser_get_emails.add_argument('-a', '--area', type=commandline_arg)

Note: You don't need it in Python 3 (the arguments are already Unicode). It uses os.fsdecode() in this case because sometimes command-line arguments might be undecodable. See PEP 383 -- Non-decodable Bytes in System Character Interfaces.