Notes on Software Development Tools

Nbdev

Here is our main process using nbdev

Create a new project

Create a new GitHub project with Poetry for dependency management (or using pip)
Create the .gitignore file
Install Nbdev

poetry install
poetry shell
poetry add nbdev --group dev
nbdev_install_quarto
poetry add jupyterlab-quarto --group dev

Initiate Nbdev project

nbdev_new

PyPi library name can’t have _ while repo path can’t have -.

nbdev_new assumes that your package name is the same as your repo name (with - replaced by _). Use the --lib_name option if that isn’t the case.

So lib_name can be nbdev-cards and the lib_path can be nbdev_cards. If encountering No module named 'pkg_resources' error, do poetry add setuptools (or pip install setuptools).

Do a poetry install # this is in lieu of pip install -e . and this needs to be done only once

Develop the project

nbdev_export will convert notebook to modules.
nbdev_docs to update docs including the README
nbdev_preview initiates a server to display documentation
nbdev_readm render the README
nbdev_test to run local tests defined in the notebook
nbdev_pypi to publish to PyPI

nbdev_prepare, to run the nbdev_export, nbdev_clean, nbdev_test, nbdev_readme all together, useful before we push changes.

Then

git status
git commit -am "update"
git push

Consider using pre-commit hook to make notebook version control cleaner. If the hook processing fails and modifies the file, we need to add the modifications into commit, and then it will pass! Do git add -u and commit again again will do the trick!
Consider use various methods in fastcore.test module for tests.

What goes to where (doc vs. module)

To add the cell to the module file, use #| export
To use the cell for development only, without it exposing to module and docs, use #| hide, this is useful, e.g., for inline tests that don’t need to be in the doc.
To show documentation for class, for a class cell with #| export, its signature will automatically appear in the docs. But the methods inside will NOT.
To show documentation for methods, we can define them in their own cells and then use show_doc(method_name). For methods that belong to a class, use@patch`. For static methods, use:

@staticmethod
@patch_to(Class)

If a cell has neither #| export nor #| hide, it will still appear in the doc in its literal form.

Misc tips

From time to time, restart, and run all cells. Restart the Jupyter kernel after making changes to dependent modules.
For debugging, simply add a new cell and put %debug, note that the pop up window is on the top portion of the VScode Window.
Suspend and put into background ^Z, bg
Assert not level error when running nbdev_export: Make sure all the import statements in all notebooks under “nbs” folder are valid. If necessary, moving the ones that are not working to check.
if nbdev_docs raises error and things get cluttered, try to delete the _docs and _proc directories and re-run.
Still encounter Nbdev_export SyntaxError from time to time, made a check_syntax gist to solve it.

FastHTML

Display code

def example()
  return Div(
            H1("FastHTML APP"),
            P("Let's do this"),
            cls="go"
            )

ft_code = example()
print(ft_code)
print(to_xml(ft_code))
ft_code.__repr__()
ft_code.__html__()
print(ft_code.__repr__())
print(ft_code.__html__())

Conversion between the fasttag representation and the raw html representation

Markdown

from fasthtml.common import *

hdrs = (MarkdownJS(), )
app = FastHTML(hdrs=hdrs)

content = """
Here are some _markdown_ elements.

- This is a list item
- This is another list item
- And this is a third list item

**Fenced code blocks work here.**
"""

# @rt('/')
@app.route("/")
def get(req):

    code_content = """
    Here are some code _markdown_ elements.

    - This is a list item
    - This is another list item
    - And this is a third list item

    **Fenced code blocks work here.**
    """

    normal_content = """
Here are some normal _markdown_ elements.

- This is a list item
- This is another list item
- And this is a third list item

**Fenced code blocks work here.**
    """
    return Titled("Markdown rendering example", Div(content,cls="marked"), Div(normal_content, cls="marked"), Div(code_content, cls="marked"))

serve()

Note the difference between normal_content and code_content! This is because the markdown parser will interpret the leading spaces before the code_content as code block!

<script type="module">
import { marked } from "https://cdn.jsdelivr.net/npm/marked/lib/marked.esm.js";
import { proc_htmx } from "https://cdn.jsdelivr.net/gh/answerdotai/fasthtml-js/fasthtml.js";
proc_htmx('.marked', e => e.innerHTML = marked.parse(e.textContent));</script>

Test in notebook

from fasthtml.common import *
# Setting up the Starlette test client
from starlette.testclient import TestClient

hdrs = (MarkdownJS(), )
app = FastHTML(hdrs=hdrs)

@app.route("/")
def get(req):
    content = """
Here are some _markdown_ elements.

- This is a list item
- This is another list item
- And this is a third list item

**Fenced code blocks work here.**
    """
    return Titled("Markdown rendering example", Div(content,cls="marked"))

client = TestClient(app)
print(client.get("/").text)
display(HTML(client.get("/").text))  # note that js-rendered markdown will not be shown in notebook output


# Loading tailwind and daisyui
headers = (Script(src="https://cdn.tailwindcss.com"),
           Link(rel="stylesheet", href="https://cdn.jsdelivr.net/npm/daisyui@4.11.1/dist/full.min.css"))

# Displaying a single message
d = Div(
    Div("Chat header here", cls="chat-header"),
    Div("My message goes here", cls="chat-bubble chat-bubble-primary"),
    cls="chat chat-start"
)

print(to_xml(d))

show(Html(*headers, d))

Convert HTML to FastTags

Jupyter Notebook

Absolute File Path

The __file__ attribute is not available in Jupyter notebooks. It’s a built-in attribute in Python scripts that contains the path of the script that is currently being executed.

In a Jupyter notebook, you can use the os and IPython libraries to get the notebook’s path:

import os
from IPython.core.getipython import get_ipython

# Get the current notebook's path
notebook_path = os.path.join(os.getcwd(), get_ipython().starting_dir)

# Specify the relative path to the directory
relative_path = "data/test_data"

# Construct the absolute path
file_dir = os.path.join(notebook_path, relative_path)

# assert the file_dir exists
assert os.path.isdir(file_dir)

# list 5 files in the file_dir
os.listdir(file_dir)[:5]

VSCode mode

settings.json

"notebook.lineNumbers": "on",
"notebook.output.wordWrap": true

Conda

conda: error: argument COMMAND: invalid choice: ‘activate’

Do conda init first then restart the terminal and do conda activate xxx again. But this will init conda everytime you start the shell by placing the related commands in your shell init scripts like bash_profile. If we also have other package management systems like poetry, venv in use, we might not want to initialize conda all the time. We can instead activate conda explicitly:

conda env list
source /opt/homebrew/Caskroom/miniconda/base/bin/activate 
 /opt/homebrew/Caskroom/miniconda/base/envs/venv_name

instead of

conda activate venv_name

Use the actual paths for activate and the virtual environment.

Alternatively, we can leave the conda initializaton scripts on, but use:

conda config --set auto_activate_base false

To make sure conda do not automatically invoke its base environment.

Streamlit

Streamlit re-runs the entire script each time an input changes. To avoid unnecessarily re-run some one time statements, we can use the @st_cache decorator.

For example, we import from logging_config

from config.logging_config import get_logger

@st.cache_data()
def cached_get_logger(name):
    return get_logger(name)


logger = cached_get_logger(__name__)

Poetry

Configure lint

E501 stands for “line too long”. By default, PEP 8 recommends that lines should not exceed 79 characters.

[tool.autopep8]
ignore = [ "E501" ]

If using black instead of autopep8, you will need to it differently to disable it (not recommended)

[too.black]
line-length = 999  # very large number

When using black together with isort. We use

[tool.black]
line-length = 88

[tool.isort]
profile = "black"

to ensure that both isort and Black are using compatible rules, particularly regarding line length and import style.

Brew

Purpose	Code	Note
Update brew	`brew update`
Upgrade brew installed packages	`brew upgrad`
Check installed Python versions	`brew list \| grep python`
Install a particular Python version	`brew install python@3.11`
Upgrade a particular Python version	`brew upgrad python@3.11`

Common logger setup

config.logging.py

import logging.config
import os
from pathlib import Path
import yaml

def configure_logging(log_config_path, log_directory):
  with open(log_config_path, 'r') as f:
    config = yaml.safe_load(f)
    config['handlers']['file']['filename'] = str(log_directory / config['handlers']['file']['filename'])
    logging.config.dictConfig(config)

project_root = Path(__file__).resolve().parent.parent
logs_directory = project_root / 'logs'
logs_directory.mkdir(exist_ok=True)

logging_config_path = project_root / 'config' / 'logging.yaml'
configure_logging(logging_config_path, logs_directory)

logger = logging.getLogger("package_name")

logger.info("Logging started")

config.logging.yml

version: 1

formatters:
  simple:
    format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'

  handlers:
    console:
      class: logging.StreamHandler
      level: INFO
      formatter: simple
    file:
      class: logging.FileHandler
      level: INFO
      formater: simple
      filename: package_name.log

  loggers:
    root:
      level: INFO
      handlers: [console, file]
    package_name:
      level: INFO
      handlers: [console, file]

  root:
    level: INFO
    handlers: [console, file]

We can then use it by:

import logging
logger = logging.getLogger(__name__)

This will place the package_name.log logfile under projectroot/logs directory.

clear ANSI codes

Most modern terminal emulators support ANSI escape codes and can interpret them to display colored and formatted text. If we have the logs saved in a text file. We can examine them in terminal with cat or less, which will display the formats and colors.

If we use a text editor or some other program that doesn’t interpret these codes, we can strip the codes from the output. For example:

sed -r "s/\x1B\[([0-9]{1,2}(;[0-9]{1,2})?)?[mGK]//g" output.log > clean_output.log

TMUX

Function	Command	Note
create a session	`tmux new -s Session1`
list sessions	`tmux ls`
attach to a session	`tmux attach -t Session1`
kill a session	`tmux kill-session -t Session1`
leader key within tmux	`ctrl a`
split window into panes v/h	`ctrl a \\| or -`
resize pane	`ctrl a` then `i/j/h/k`
maximize pane	`ctrl a m`
create a window in a session	`ctrl a c`
navigate windows	`ctrl a 0/1/p/w`
list windows	`ctrl a w`
rename a window	`ctrl a ,`
list sessions	`ctrl a s`
navigate sessions	`k` or `j`
exit a session	`tmux detach`
install plugin	`ctrl a I`
reload configuration	`ctrl a r`
copy mode	`ctrl a [` then use `k/j/ctrl+u/d/b/f/shift+k/j` to select and `v/y` to copy, or mouse scrolling, use `ctrl c` to exit copy mode
detach from a session	`ctrl a` then `d`
avoid nested tmux sessions on remote server	`ssh -t user@remote 'tmux attach-session -t 0'` or `ssh -t user@remote 'tmux attach'`

Above uses ctrl a and other custom setting from the great TMUX Dev Flow configuration by Josean Martinez

Diagramming

Excalidraw

To run and use Excalidraw locally:

git clone https://github.com/excalidraw/excalidraw.git
yarn
yarn start

Excalidraw doc

Mermaid

Mermaid Live Editor

Quarto

Commonly used commands after setup

quarto preview  # to preview the website
quarto render  # to render the website (recommanded before push)

Use quarto preview or shift+cmd+k to preview side-by-side.

When publishing from gh-pages instead of the _docs directory, use quarto publish gh-pages in the "main" branch.

Use VS Code Notebook Editor for publishing with Quarto

Cross references

Cross-reference can also be created in float or with division

Book Crossrefs

Code content

There are several ways to publish content that includes code. For a simple blog post, we can embed code (and their corresponding results) directly into its qmd file. Alternatively, we can create a separate Jupyter notebook and embed parts of it and its results into the blog’s qmd file.

For more comprehensive projects, we might consider other formats offered by Quarto, such as Manuscripts and Books. For instance, a quarto manuscript has an article view that only shows the results of code execution (like graph visualizations) in the article body, and ook view that reveals all the underlying notebook code. This is includes the notebook version of the article itself, along with additional notebooks as needed. It is essentially a qmd file creating the index page with embedded cells from companion Jupyter notebooks. We can also download these notebooks directly. In comparison, Quarto books support multiple chapters and cross-referencing in their HTML format. They also provide a normal view with its companion source code repository.

Manuscripts and Books are really just different Quarto project types that have their customized behavior, similar to other types like website and blogs. A key file that specifies the behaviro is the _quarto.yml configuration file.

using `qmd` or `ipynb`?

Both file formats allow us to embed code. The following guidelines can be helpful:

If the subject is primarily a non-coding topic and we are simply embedding code to supplement the presentation, such as creating graph visualizations, then it is good to go with the qmd and embed code in it.
If the subject is primarily a coding topic, it is easier to start with ipynb natively.

For qmd files, it is easier to go with VSCode or text editors like NeoVim. For ipynb files, Jupyter Lab/notebook is the native approach, but VSCode also offers surprisingly good Notebook Editor.

Here are examples of authoring a manuscript in qmd with VSCode and in ipynb in Jupyter Lab.

Tex errors

If compilation fails, quarto publish gh-pages will not publish to the website!

If quarto render reports Tex related errors, check the index.tex and index.log file.

! LaTeX Error: Something's wrong--perhaps a missing \item.

# References {.unnumbered}

::: {#refs}
:::

This could be the cause.

❯ grep -n -A 1 '\\begin{itemize}\|\\begin{enumerate}\|\\begin{description}' index.tex

Draft setting bugs

The latest Quarto 1.5 seems to have bugs with setting posts as drafts. After each rendering and preview the drafts posts will re-appear in the list. One way to get rid of it is to comment out or add back the drafts: attribute of the frontmatter in the list qmd file or the site’s _quarto.yml file.

But this could reset the custom domain setting and may require adding the custom domain again.

Quarto slides

Note: We can use the following to have

format: 
  # html: default
  revealjs:
    multiplex: true
    self-contained: true
    slide-number: true
    # chalkboard: 
    #   buttons: false
    preview-links: auto

Then it will produce a local index-speaker.html used for presentation, and an index.html on server for download.

Note:

ERROR: Reveal plugin ’RevealChalkboard is not compatible with self-contained output

Quarto subscription

Blogging with Quarto and Jupyter: The Complete Guide

Adding Subscriptions to a Quarto Site

AWS concepts

Resources: S3 bucket, KMS key, EC2 instances Principals: users, services, or accounts Policies: permissions Roles: list of permissions User: persona associated with roles

Resource-based policies: specify who (the principal) has permission to access the rsources. Trust Policies (Relationships): specify which principlas are allowed to assume the role. User/Role-based policies: attach directly to users or roles (does not need the Principals here)

Fasthtml

Getting started

Source repo

How to run the demo app under examples?

pip install .  # install from source
uvicorn examples.app:app --reload

examples

Tutorial app

Installation

brew install railway  # install railway Cli
pip install -U python-fasthtml  # install FastHTML
railway login  # login to `railway`

Move to the project directory (this is important, otherwise the railway up -c command will fail to recognize what type of project it is)

cd simple
railway init -n project_name
railway up -c  # this takes several minutes, wait for finish
railway domain
fh_railway_link
railway volume add -m /app/data

Rancher for Docker

Rancher desktop

If we got:

RuntimeError: Docker is not running. Please start Docker and try again

Try to set the current environment variable (in the specific virtual environment if applicable).

export DOCKER_HOST=unix:///Users/username/.rd/docker.sock

Check and confirm

`os.environ[‘DOCKER_HOST’] = ‘unix:///Users/cshen2/.rd/docker.sock’

We need rebuild the specific containers.

Common commands:

docker context create rancher-desktop
docker context use rancher-desktop
docker context ls
docker context inspect rancher-desktop
curl --unix-socket /Users/username/.rd/docker.sock http://v1.24/version

A simple script to test Docker

import docker
import os

print("DOCKER_HOST:", os.getenv('DOCKER_HOST'))
print("Initializing Docker client...")

try:
  client = docker.from_env()
  if client.ping():
    print("Docker client initialized and server ping successful")
except Exception as e:
  print(f"Docker client initialization or ping failed: {e}")
  raise e

If we initiated a docker client

docker_client = docker.from_env()

and need to get the container, we should use

docker_client.containers.get(container_name)

avoid using

docker.DockerClient().containers.get(container_name)

The latter might work for Docker Desktop but could cause issue with Rancher.

(An alternative to the above might be to explose Rancher’s Docker compatible API over the default Unix socket docker context create rancher-desktop --docker "host=unix:///var/run/docker.sock")

Misc

use vars() to display local variables or attributes.

try:
    del obj
    print("Deleted obj")
except NameError:
    pass

if 'obj' in locals():
    del obj

assert 'obj' not in locals(), "obj not deleted"

def get_absolute_path_in_notebook(relative_path: str) -> str:
    # Get the current notebook's path
    notebook_path = os.path.join(os.getcwd(), get_ipython().starting_dir)

    # Construct the absolute path
    absolute_path = os.path.join(notebook_path, relative_path)

    return absolute_path

file_path_abs = get_absolute_path_in_notebook(file_path_relative)

assert os.path.isfile(file_path_abs), f"File not found: {file_path_abs}"