Encoding: Base64 vs. ASCII or UTF-8
Text Encoding (ASCII and UTF-8)
Text encoding schemes like ASCII and UTF-8 define how characters (text) are represented as binary data (bits):
- ASCII: Uses 7 bits per character, extended to 8 bits (1 byte) for practical purposes.
- Example: ‘A’ -> 01000001 (65 in decimal)
- UTF-8: Uses 1 to 4 bytes per character to represent all Unicode characters.
- Example: ‘A’ -> 01000001 (1 byte), ‘€’ -> 11100010 10000010 10101100 (3 bytes)
Examples:
= "Hello, world!"
text with open('hello.txt', 'w', encoding='utf-8') as f:
f.write(text)
with open('hello.txt', 'r', encoding='utf-8') as f:
= f.read()
content print(content) # Output: Hello, world!
When you transmit text over the internet (e.g., via HTTP), the text is sent as binary data. The text is encoded in a specific encoding (like UTF-8) before transmission. The recipient decodes the binary data back into text using the same encoding.
What is Base64 Encoding?
- Base64: A method of encoding binary data into a string of 64 printable ASCII characters. It is used to encode binary data (e.g., images, files) into text that can be safely transmitted over text-based protocols such as HTTP, SMTP, etc.
- Character Set: The 64 characters used in Base64 are:
- A-Z (uppercase letters)
- a-z (lowercase letters)
- 0-9 (digits)
- and / (two additional symbols)
Purpose of Base64: Base64 encoding converts binary data into a string of ASCII characters. This is useful for embedding binary data in text-based formats like JSON, XML, or HTTP requests and responses.
Base64 encoding itself does not inherently use ASCII or UTF-8; instead, it produces a string of characters that fall within the ASCII character set. Let’s break this down:
Base64 and Character Encodings (ASCII/UTF-8)
- ASCII: The output of Base64 encoding is a string that uses only ASCII characters. This means that any Base64-encoded string is also valid ASCII text.
- UTF-8: UTF-8 is a superset of ASCII. Any ASCII string is also a valid UTF-8 string. Therefore, Base64-encoded strings can be safely represented as UTF-8.
How Base64 Works
- Encoding Process:
- Binary data is grouped into 24-bit chunks (3 bytes).
- Each 24-bit chunk is split into four 6-bit groups.
- Each 6-bit group is mapped to a corresponding character in the Base64 alphabet.
- Output:
- The output is a string of ASCII characters that represents the binary data.
Example of Base64 Encoding
- Binary Data: Let’s say we have binary data representing the text “Hello”.
- ‘H’ ->
01001000
- ‘e’ ->
01100101
- ‘l’ ->
01101100
- ‘l’ ->
01101100
- ‘o’ ->
01101111
- ‘H’ ->
- Grouping: Group into 24-bit chunks and then into 6-bit groups.
010010 000110 010101 101100 011011 011011 011111
- Mapping to Base64 Characters:
010010
->S
000110
->G
010101
->V
101100
->s
011011
->b
011011
->b
011111
->v
- Result: The Base64-encoded string is “SGVsbG8=”.
Transmission of Base64
Base64 Encoding: Convert binary data to a Base64 string.
- Example: A binary image file is converted to a Base64 string.
import base64
# Binary data (example: part of a JPEG file)
= b'\xff\xd8\xff\xe0\x00\x10JFIF...'
binary_data
# Encode binary data as Base64
= base64.b64encode(binary_data)
base64_encoded = base64_encoded.decode('ascii')
base64_string print(base64_string) # Output: '/9j/4AAQSkZJRgABAQEASABIAAD/...'
HTTP Transmission: When transmitting over HTTP, the Base64 string is included in the HTTP request or response body.
- Example: JSON payload in an HTTP request
{
"image_data": "/9j/4AAQSkZJRgABAQEASABIAAD/..."
}
Conversion to Binary: Before the data leaves your computer, it is converted to binary form. - Text data (including Base64 strings) is encoded as bytes.
import requests
= {
json_payload "image_data": base64_string
}
= requests.post('http://example.com/upload', json=json_payload)
response print(response.status_code)
Binary Transmission: The network protocol (e.g., HTTP) handles the conversion of text data (Base64 string) to binary data for transmission. This binary data is then sent over the network.
Reception and Decoding - Binary Data Reception: The receiver gets the binary data transmitted over the network. - Text Decoding: The binary data is decoded back to text (the original Base64 string). - Base64 Decoding: The Base64 string is decoded back to the original binary data.
import base64
# Simulate receiving the Base64 string from an HTTP response
= response.json()['image_data']
received_base64_string
# Decode Base64 string back to binary data
= base64.b64decode(received_base64_string) received_binary_data
Practical Example in Python
Encoding Binary Data to Base64
import base64
# Original text
= "Hello"
text
# Convert text to bytes using UTF-8
= text.encode('utf-8')
utf8_bytes
# Encode bytes to Base64
= base64.b64encode(utf8_bytes)
base64_encoded = base64_encoded.decode('ascii') # Base64 string using ASCII characters
base64_string print(base64_string) # Output: SGVsbG8=
Decoding Base64 to Binary Data
# Decode Base64 string to bytes
= base64.b64decode(base64_string)
decoded_bytes
# Convert bytes back to text using UTF-8
= decoded_bytes.decode('utf-8')
decoded_text print(decoded_text) # Output: Hello
Summary
- Base64: Encodes binary data into a string of 64 ASCII characters.
- Character Set: The Base64 alphabet consists of ASCII characters.
- UTF-8 Compatibility: Since ASCII is a subset of UTF-8, Base64-encoded strings are also valid UTF-8 strings.
- Encoding and Decoding: Base64 is used to convert binary data into a text format for safe transmission and can be decoded back to binary data.
In practice, when you Base64 encode data in Python or another language, the resulting string can be safely handled as ASCII or UTF-8 text, ensuring compatibility across various text-based protocols and systems.
WebSockets
What is WebSocket?
WebSocket is a communication protocol that provides full-duplex communication channels over a single TCP connection. Unlike HTTP, which is a request-response protocol, WebSocket allows for persistent connections where both the client and server can send and receive messages at any time. This makes WebSocket ideal for real-time applications such as chat applications, live updates, and online gaming.
Why Use WebSocket?
- Real-Time Communication: WebSocket enables real-time data exchange between the client and server.
- Efficiency: It reduces the overhead of establishing multiple HTTP connections, making it more efficient for applications that require frequent updates.
- Bidirectional Communication: Both the client and server can initiate communication, allowing for more interactive applications.
Explanation of a WebSocket code example
Basic WebSocket code with fasthtml.
Imports and Initialization
from asyncio import sleep
*
from fasthtml.common import
= FastHTML(ws_hdr=True)
app
= app.route rt
- Imports: Imports necessary modules, including sleep from asyncio and common components from fasthtml.
- App Initialization: Initializes a FastHTML application with WebSocket support
(ws_hdr=True)
. - Route Initialization: Sets up a route handler.
Helper Function and Constants
id='msg')
def mk_inp(): return Input(
= 'notifications' nid
mk_inp
: A helper function that creates an input element with the ID msg.nid
: A constant for the notifications div ID.
Main Route
@rt('/')
async def get():
= Div(
cts
id=nid),
Div(
='form', ws_send=True),
Form(mk_inp(), id
='ws', ws_connect='/ws')
hx_ext
'Websocket Test', cts) return Titled(
- Route Definition: Defines an asynchronous route for the root URL (/).
- Content Setup: Creates a Div containing:
- Another Div with the ID nid for notifications.
- A form with an input field created by
mk_inp()
, which sends data via WebSocket (ws_send=True
). - WebSocket extension (
hx_ext='ws'
) and connection URL (ws_connect='/ws'
).
- Return: Returns a titled page with the content.
WebSocket Connection Handlers
'Hello, you have connected', id=nid))
async def on_connect(send): await send(Div(
'Disconnected!') async def on_disconnect(): print(
- on_connect: Sends a message to the client when a WebSocket connection is established.
- on_disconnect: Prints a message when the WebSocket connection is closed.
WebSocket Route
@app.ws('/ws', conn=on_connect, disconn=on_disconnect)
async def ws(msg: str, send):
'Hello ' + msg, id=nid))
await send(Div(
2)
await sleep(
'Goodbye ' + msg, id=nid), mk_inp() return Div(
- WebSocket Route: Defines a WebSocket route at /ws.
- Connection Handlers: Specifies on_connect and on_disconnect handlers.
- WebSocket Handler: An asynchronous function that:
- Receives a message (msg) from the client.
- Sends a greeting message back to the client.
- Waits for 2 seconds.
- Returns a goodbye message and a new input field.
Serve the Application
serve()
- Starts the FastHTML application.
Summary
- WebSocket: A protocol for real-time, bidirectional communication.
- Usage: Ideal for applications requiring frequent updates and real-time interaction.
- Code Functionality: Sets up a simple WebSocket application that greets the user upon connection, echoes messages, and handles disconnection.
For more details on WebSocket: - Starlette WebSocket documentation. - htmx websockets extension.