I recently had a use case at work where I wanted to check that file paths given in a Python script actually existed. These paths were in various GitHub repositories, so all I had to do was pull out the paths and check if they exist on GitHub.
There were a few catches though.
First, I couldn’t simply get any string out of each Python script - they needed to be strings specficied by a specific function parameter, and match a regex (e.g., start with ‘abc’).
Second, the script paths lack the GitHub repository root name. This name was part of the function name - so I needed to get access to the function that the path was specified within, and then parse the function name to get the repository name.
The obvious solution I thought was the ast library.
ast library
I started by using ast
. The ast.NodeVisitor
class seemed like it would do the trick.
An example script (“my_script.py”):
def hello(path, stuff=None):
return path
if __name__ == "__main__":
print(hello(path="hello/world.py", stuff="hello mars"))
import ast
class CollectStrings(ast.NodeVisitor):
def visit_Module(self, node):
self.out = set()
self.generic_visit(node)
return list(filter(lambda w: w.startswith("hello") and w.endswith(".py"), self.out))
def visit_Str(self, node):
self.out.add(node.s)
file = "my_script.py"
with open(file, "r") as f:
body = ast.parse(f.read())
coll = CollectStrings()
coll.visit(body)
## ['hello/world.py']
That worked great at fetching paths - only because all the paths I was looking for started with the same text and all have the same file extension.
HOWEVER - I also needed the function name that the path
argument was called from. I tried to make this work with ast.NodeVisitor
but couldn’t get it to work.
I eventually got frustrated enough and figured there must be some libraries that build on top of ast
that make it easier to work with ast’s in Python.
redbaron
Enter redbaron. I found this library pretty quickly upon searching for a library building on top of ast
.
Another example script (“their_script.py”):
def hello(path, stuff=None):
return path
def goodbye(path, stuff=None):
return path
def world():
path_str = hello(path="src/world.py", stuff="hello mars")
other_path_str = goodbye(path="src/world.py", stuff="hello saturn")
return path_str, other_path_str
if __name__ == "__main__":
print(world())
import re
from redbaron import RedBaron
file = "their_script.py"
with open(file, "r") as src:
red = RedBaron(src.read())
red
## 0 def hello(path, stuff=None):
## return path
##
##
##
## 1 def goodbye(path, stuff=None):
## return path
##
##
##
## 2 def world():
## path_str = hello(path="src/world.py", stuff="hello mars")
## other_path_str = goodbye(path="src/world.py", stuff="hello saturn")
##
## return path_str, other_path_str
##
##
##
## 3 if __name__ == "__main__":
## print(world())
##
Even just the resulting object you get from parsing something is useful:
And with .help()
you get a very detailed map of the structure of the thing you’re trying to navigate (only printing first 20 lines):
red.help()
## 0 -----------------------------------------------------
## DefNode()
## # identifiers: def, def_, defnode, funcdef, funcdef_
## # default test value: name
## async=False
## name='hello'
## return_annotation ->
## None
## decorators ->
## arguments ->
## * DefArgumentNode()
## # identifiers: def_argument, def_argument_, defargument, defargumentnode
## target ->
## NameNode() ...
## annotation ->
## None
## value ->
## None
## * DefArgumentNode()
## # identifiers: def_argument, def_argument_, defargument, defargumentnode
...
Looking at the result from red.help()
I can then use .find_all()
to find certain nodes in the ast.
nodes = red.find_all("AtomtrailersNode")
nodes = list(filter(lambda w: "hello" in w.dumps(), nodes))
nodes
## [hello(path="src/world.py", stuff="hello mars"), goodbye(path="src/world.py", stuff="hello saturn")]
Then I can write some okay code to extract out the function name, and ugly code to get the string supplied to the path
parameter. Then f-string those together to get the path I’m after.
paths = []
for node in nodes:
fxn_name = node.name.value
command = re.search("src/.*\\.py", node.dumps()).group()
paths.append(f"{fxn_name}/{command}")
for path in paths:
print(path)
## hello/src/world.py
## goodbye/src/world.py
Not super proud of this but gets the job done for my use case - and when you’re not making open source for others, you don’t need to worry about other use cases :)
I’ll definitely try to learn how to properly extract stuff using redbaron
- but it got me to answer much faster than the ast
library.