The CIA uses Python for hacking as well as utility scripts. Python 2.7 and 3.4 seem to be favorite versions [version]. This post gathers some interesting practical insights that you can apply.
I wanted to codify some nice rules for an organization and decided to refresh myself and look beyond the typical software sources. This post gathered information mainly from WikiLeaks.
I, of course, managed to insert an excalidraw drawing in the post. Hope you enjoy it!
Table of content
Style guide
Pypi
IDE
Installing Pyenv
Installing Python
Testing
Software specs document template
Template for cli scripts
Random
Distribution
Some Python insights from someone working at the CIA
Style guide
Though different style guides abound, the CIA has its own twist of things based on the Google Python Style Guide. It was written for people coming from a C background.
Written for people working at a specific location, it cites legibility (the ease with which code can be read and understood by others) reasons.
As with all style guides, it points out that it’s not hard rules and when tasked with a decision, exercise a common-sense solution.
Here are some interesting parts [style].
Exceptions
Must be derived from Exception. Raising Exception(‘message’) is preferred to (raise MyException, 'Error message') or string-based exceptions (raise 'Error message').
The body of try-catch blocks is to be kept short.
Globals
Globals should be avoided, preferring class-based variables.
If needed they should be made available at module level, accessed by functions.
Nested classes and functions
They are ok to use
List comprehensions
They are ok to use for simple cases, switching to a loop when things get more complicated
Generators
Use as needed noting to use yields instead of returns in docstrings.
Default iterator methods are encouraged
#Yes
for key in adict: ...
if key not in adict: ...
if obj in alist: ...
for line in afile: ...
for k, v in dict.iteritems(): ...
#No
for key in adict.keys(): ...
if not adict.has_key(key): ...
for line in afile.readlines(): ...
Lambda functions
Ok for one-liners, if it is not longer than 60 to 70 chars, preferring functions from the operator module when needed ex. for multiplication.
Truth evaluation
The implicit false is to be avoided. Is or is not is to be used when comparing None.
if x is not to be used if you mean if x is not None.
When comparing a boolean variable to False, don’t use ==, use if not x.
For sequences, use len to know when it’s empty, don’t use the fact that empty sequences evaluate to False.
#This one for readability
#Yes
if foo is not None:
#No
if not foo is None:
Comparison
Startswith is preferred to slicing
if foo.startswith('bar'): # Yes
if foo[:3] == 'bar': # No
Object type comparison
if isinstance(obj, int): # Yes
if type(obj) is type(1): # No
Lexical scoping
An inner function can access outer variables and modify if global or non-local is used. The inner and outer are defined as in the source code.
def get_adder(summand1):
"""Returns a function that adds numbers to a given number."""
def adder(summand2):
return summand1 + summand2
return adder
print(get_adder(1)(2))
Decorators
To be used cautiously, writing good docs and tests for them. Dependencies are to be avoided inside decorators.
A decorator that is called with valid parameters should (as much as possible) be guaranteed to succeed in all cases.
Threading
We should not rely on the atomicity of built-in types. Queue should be used to communicate data between threads else see threading primitives and locks.
Strings
Avoid using the + and += operators to accumulate a string within a loop. Since strings are immutable, this creates unnecessary temporary objects and results in quadratic rather than linear running time. Instead, add each substring to a list and ''.join the list after the loop terminates (or, write each substring to a io.BytesIO buffer).
Imports
#Yes
import os
import sys
#No
import os, sys
Misc.
Avoid fancy features like import hacks, internal modifications or metaclasses. They make the code shorter but more difficult to read later as opposed to code that is longer, but straightforward
Lines should be max 100 chars, except for sensible cases like URLs and imports
Long texts should appear at the top of files except tests
Use () instead of \ for long lines
if (width == 0 and height == 0 and
color == 'red' and emphasis == 'strong'):
x = ('This will build a very long long '
'long long long long long long string')
Avoid () when not needed ex if(x):
Spacing
Indent using 4 spaces
Two blank lines between top-level definitions, one blank line between method definitions.
Two blank lines between top-level definitions, be they function or class definitions.
One blank line between method definitions and between the class line and the first method. Use single blank lines as you judge appropriate within functions or methods.
Generally only one statement per line.
Access control: if access is more complex, or the cost of accessing the variable is significant, you should use function calls
naming: module_name, package_name, ClassName, method_name, ExceptionName, function_name, GLOBAL_CONSTANT_NAME, global_var_name, instance_var_name, function_parameter_name, local_var_name.
Use main to prevent code execution while importing
Comments
The final place to have comments is in tricky parts of the code.
Complicated operations get a few lines of comments before the operations commence.
Non-obvious ones get comments at the end of the line.
To improve legibility, these comments should be at least 2 spaces away from the code.
On the other hand, never describe the code. Assume the person reading the code knows Python (though not what you're trying to do) better than you do.
TODOs
Use TODO comments for code that is temporary, a short-term solution, or good-enough but not perfect.
TODOs should include the string TODO in all caps.
If for a future task, it might belong to a ticket
Pypi
pip2tgz is used to download packages as tarballs [pypi].
Samba (enables file sharing across different operating systems over a network) share is used to share packages.
A local pypi server is also used, with ~/.pip/pip.conf file configured:
[global]
index-url = http://10.2.3.96:8080/simple
trusted-host = 10.2.3.96
They also have a way to drop packages at \fs-01\share\Python\packages that appears on the pypi index within 5 mins [pypi2]
Packages can be installed
pip install --index-url=http://10.3.2.212/simple/ foopackage
IDEs
PyCharm is used, with an explanation on debugging [pycharm].
The auto-complete feature is also well-appreciated [pycharm2].
Installing pyenv
The pyenv install instructions are pretty generic and uses the github version [pyenv].
Installing Python
Python is installed from local sources, using pyenv [pyenv].
$ cat > 2.7.9 << EOF
#require_gcc
install_package "Python-2.7.9" "http://10.3.2.212/python/Python-2.7.9.tgz" ldflags_dirs standard verify_py27 ensurepip
EOF
$ pyenv install ./2.7.9
Downloading Python-2.7.9.tgz...
-> http://10.3.2.212/python/Python-2.7.9.tgz
Installing Python-2.7.9...
Installed Python-2.7.9 to /home/User #71475/.pyenv/versions/2.7.9
$ pyenv rehash
Testing
Maybe it changed now but here’s the setup they were using [testing].
The Python installation is replicated on the remote server each time tests are being executed. Packages installed are to be included and zipped.
This seems to be the pre-era of CI/CD systems.
Software specs document template
Software design docs typically follow this pattern [software specs]
## Goals
- Run on Linux without the Collide overhead
- Simplify the user experience
## Background and strategic fit
Why are you doing this? How does this relate to your overall product strategy?
## Assumptions
List the assumptions you have such as user, technical or other business assumptions. (e.g. users will primarily access this feature from a tablet).
## Requirements
| # | Title | UserStory | Importance | Notes
| 14.1.2.19 | The tool shall have the ability to modify all configuration variables (unless explicitly marked otherwise) at runtime and persist those changes | Must Have | Additional considerations or noteworthy references (links, issues) |
| | Derived | input directory
output directory
working directory
intervals( beacon, jitter, uninstall)
Max allowable beacon failures
Internet connectivity URL
chunk size
Target ID
blacklist/whitelist executables | | |
## User interaction and design
Using argparse to format and parse acceptable command syntax
Use flags to tokenize values
SEARCH command will have the following capabilities:
-- swalkdir - recursively search for a string in a directory path
-- sdirlist - search filenames for a string in a directory path
-- slike - filename pattern match
-- scontains - file name or file content pattern match
-- sfreetext - file content pattern match
-- sliteral - any valid WSS search command
## Questions
Below is a list of questions to be addressed as a result of this requirements document:
| Question | Outcome |
| (e.g. How we make users more aware of this feature?) | Communicate the decision reached |
## Not Doing
List the features discussed which are out of scope or might be revisited in a later release.
Template for cli scripts
They provide a template for cli scripts that follows this pattern. Pretty cool if you ask me as these are common cli operations [cli].
class Application:
"""
This class defines the functionality for the script. It is instantiated in
the global processing handler for __main__
"""
def __init__(self):
"""
Setups the member variable for the application object.
"""
pass
def logger(self):
"""
Setups the python logging for application. By default it logs to both
the console window and to a log file in the current directory.
"""
pass
def platform(self):
"""
Determines the platform the script is running on.
"""
pass
def version(self):
"""
Formats the version number of the script as a string.
"""
pass
def environ(self, name, default=None):
"""
Helper method for looking up environment variables. Writes a warning to
the application log if the environment variable cannot be found.
"""
pass
def shellspawn(self, binPath, binArgs=None):
"""
Runs a shell command and does not wait for the command to complete.
"""
pass
def shellexec(self, binPath, binArgs=None):
"""
Runs a shell command and waits for the command to complete before returning.
"""
pass
def copyExistingFile(self, srcFile, dstFile):
"""
Copies a file from the source to the destination. Logs warnings or errors
if it cannot find the file as expected.
"""
pass
def unittest(self):
"""
This is the default action for the application. It should generally be
a self-test.
"""
pass
def usage(self):
"""
Prints the help text for the application.
"""
pass
def main(self, argv):
"""
Entry point for the application. Command line processing should be done
here.
"""
pass
if __name__ == "__main__":
application = Application()
sys.exit(application.main(sys.argv))
Random
Cython used to obfuscate code [bobby].
Comments are oftentimes in the format [hive]
# # # Sends the trigger... # #
with enlarged versions looking like this
#
#
# Runs the installScript ... Note that the hive trigger should now timeout
# since the install script should remove
# all currently running hive processes including
# our currently triggered implant and replace the
# existing hive with the new hive implant...
#
#
Distribution
.pyz is a known format for tools usage [gyrfalcon].
Parting words
Analyzing repos like Hive shows that the CIA naming conventions follow the famousCase but, the codes are well-documented.
As with hacking scripts, the code as well as solutions used give the hacked-together vibe rather than your typical Python engineering experience.
Some Python insights from someone working at the CIA
Someone on Reddit apparently works for the CIA. They say that they don’t have globally-enforced rules. He also said that since you don’t have the internet sometimes or a phone, you’d have to go fetch the results, print it and be back. You also could not install from the internet as there was … no internet.
They also use Vim or VsCode, depends on the preference of the individual. There is a pre-approved list of software and versions. Very few libraries are approved and they end up writing a lot of stuffs already available. Code written is heavily vetted. Since new software takes time to be approved, they are always behind.
Else the level of the people vary, some still google everything and it’s not that different from the public sector, except for maintenance windows at really inconvenient times.
He likes Vim as everywhere there is Linux, Vi or Vim is sure to be around and he can start coding right away.
How can we verify this is real?
“ the code as well as solutions used give the hacked-together vibe rather than your typical Python engineering experience.”
Can you illustrate this a bit more or point to an example please?